Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Karpathy provided additional context on the removal of LiDAR during his Lex Fridman Podcast appearance. This article condenses what he said:

https://archive.is/PPiVG

And here's one of Elon's mentions (he also has talked about it quite a bit in various spots).

https://xcancel.com/elonmusk/status/1959831831668228450?s=20

Edit: My personal view is that LiDAR and other sensors are extremely useful, but I worked on aircraft, not cars.



Based on that list it boils down to 2 things it seems:

- cost (no longer a problem)

- too much code needed and it bloats the data pipelines. Does anyone have any actual evidence of this being the case? Like yes, code would be needed, but why is that innately a bad thing? Bloated data pipelines feels like another hand-wave when I think if you do it right it’s fine. As proven by Waymo.

Really curious if any Tesla engineers feel like this is still the best way forward or if it’s just a matter of having to listen to the big guy musk.

I’ve always felt that relying on vision only would be a detriment because even humans with good vision get into circumstances where they get hurt because of temporary vision hindrances. Think heavy snow, heavy rain, heavy fog, even just when you crest a hill at a certain time of day and the sun flashes you


Just for the record though, Musk isn't blindly anti-LIDAR. He has said (and I think this is an objective fact) that all existing roads and driving are based on vision (which is what all humans do). So that should technically be sufficient. SpaceX uses LIDAR for their docking systems.

I would argue that yes, we do use vision but we get that "lidar depth" from our stereo vision. And that used to be why I thought cameras weren't enough.

But then look at all the work with gaussian splatting (where you can take multiple 2d samples and build a 3d world out of it). So you could probably get 80% there with just that.

The ethos of many Musk companies (you'll hear this from many engineers that work there) is simplify, simplify, simplify. If something isn't needed, take it out. Question everything that might be needed.

To me, LIDAR is just one of those things in that general pattern of "if it isn't absolutely needed, take it out" – and the fact that FSD works so well without it proves that it isn't required. It's probably a nice to have, but maybe not required.


Humans aren't using only fixed vision for driving. This is such a tiresome thing to see repeated in every discussion about self driving.

You're listening to the road and car sounds around you. You're feeling vibration on the road. You're feeling feedback on the steering wheel. You're using a combination of monocular and binocular depth perception - plus, your eyes are not a fixed focal length "cameras". You're moving your head to change the perspective you see the road at. Your inner ear is telling you about your acceleration and orientation.


And also, even with the suite of sensors that humans have, their vision perception is frequently inadequate and leads to crashes. If vision was good enough, "SMIDSY" wouldn't be such an infamous acronym in vehicle injury cases.


For those of us not aware of Australian cycling jargon, "SMIDSY" means "Sorry, Mate, I Didn't See You".


the issue is clearly attention not vision when it comes to humans. if we could actually process 100% of the visual information in our field of view, then accidents would probably go down a shit load.


Humans have both issues. There are many human failures which are distinctly a vision issue and not attention related, e.g. misestimation of depth/speed, obscured or obstructed vision, optical focus issues, insufficient contrast or exposure, etc.


But how many of those crashes not caused by inattention could have been avoided with less idiocy and more defensive driving? I mean, yes, we can’t see as well in fog, but that’s why you should slow down


Again, I'm still not saying that humans don't make bad decisions. I'm saying that, unequivocally, they also get into accidents while paying attention and being careful, as a result of misinterpretation or failure of their senses. These accidents are also common, for example:

* someone parking carefully, misjudges depth perception, bumps an object

* person driving at night, their eyes failed to perceive a poorly lit feature of the road/markings/obstacles

* person driving and suddenly blinded by bright object (the sun, bright lights at night)

* person pulling out in traffic who misinterprets their depth perception and therefore misjudges the speed of approaching traffic

* people can only focus their eyes at one distance at a time, and it takes time to focus at a different distance. It is neither unsafe nor unexpected for humans to check their instruments while driving -- but it can take the human eye hundreds of milliseconds to focus under normal circumstances -- If you look down, focus, look back up, and focus, as quick as you can at highway speeds, you will have travelled quite a long distance.

These type of failures can happen not as a result of poor decision making, but of poor perception.


> But how many of those crashes not caused by inattention could have been avoided with less idiocy and more defensive driving?

Most of them.

We can lump together "inattention" and "idiocy" for the purposes of this conversation, because both could be massively alleviated by a good self-driving car without lidar.

If you look at the parallel comments, you'll see that the majority of accidents and fatalities indeed come from these two factors combined (two-thirds coming from distraction, speeding, and impaired driving), and that kube-system is having to resort to ridiculous fallacies to try to dispute the empirical data that is available.


I didn’t claim vision was responsible for the majority of accidents anywhere in this thread.


> There are many human failures which are distinctly a vision issue and not attention related

Which are a tiny minority. The largest causes of crashes in the US are attention/cognition problems, not vision problems. Most traffic systems in western countries (probably in others, too, but I don't have personal experience), and in particular the US, are designed to limit visibility problems and do so very effectively.


That sounds more like a personal opinion, because I don’t think that data is particularly easy to objectively collect.

Regardless it is irrelevant to the point. Whatever the number may be, lapses in human visual perception are responsible for some crashes


> That sounds more like a personal opinion, because I don’t think that data is particularly easy to objectively collect.

That sounds like a personal opinion?

Maybe do the bare minimum of research before spouting yours.

DOT says that only 5% of crashes are caused by low visibility during weather events.[1]

In 2023, the combined causes of alcohol, speeding, and distracted driving (all cognitive/attention issues) caused 67% of highway deaths. [2]

I was able to find these in 30 seconds. You did zero research to confirm whether your belief was correct before asserting that my claim was opinion. That's pathetic.

> Regardless it is irrelevant to the point.

And your point is therefore irrelevant to the discussion at hand, because the person you were replying to did not claim that vision had no safety impact, but that it had little safety impact:

> the issue is clearly attention not vision when it comes to humans. if we could actually process 100% of the visual information in our field of view, then accidents would probably go down a shit load.

...and, as we can clearly see, the issue is attention (and some bad decision making), not vision.

[1] https://ops.fhwa.dot.gov/weather/roadimpact.htm

[2] https://www.adirondackdailyenterprise.com/opinion/columns/sa...


None of those things you cited is “human vision or perception”

“Low visibility during weather events” is a small subset of this.

A ridiculously common example of the limitations of human vision is when people hit curbs parallel parking because of the inherent limitations of relying on depth perception to estimate the exact location of the vehicle when it cannot otherwise be directly seen. Go look in a parking lot and see how common curbed wheels are.

Also, NHTSA estimates that they don’t have any information for 60% of incidents, because they go unreported.


> None of those things you cited is “human vision or perception”

> “Low visibility during weather events” is a small subset of this.

You're still refusing to do the most basic research or even read my comment:

> In 2023, the combined causes of alcohol, speeding, and distracted driving (all cognitive/attention issues) caused 67% of highway deaths.

Do the math. 100% - 67% is 33%. Even literally not opening Google, you can already deduce that the maximum fraction of fatalities caused by vision is 33%.

Given that you aren't interested in reading or researching and instead just want to push your opinion as fact, I think your claims can be safely discarded.

Edit: Because you're editing your comment because you realize that you're making an absolute fool of yourself:

> A ridiculously common example of the limitations of human vision is when people hit curbs parallel parking

A completely irrelevant distraction - this causes virtually zero accidents and even fewer fatalities, and you know it.

> Also, NHTSA estimates that they don’t have any information for 60% of incidents, because they go unreported.

Aha, so now you actually did research, and found that all of the available data supports my claims, so you're attempting to undermine it. Nice try. "Estimates" vs. actual numbers isn't really a contest.

Come back when you have actual data - until then, you're just continuing to undermine your own point with your ridiculous fallacies and misdirections - because if you actually had a defensible claim, you'd be able to instantly pull out supporting evidence.


Dude, you're arguing with a straw man.

I'm not arguing about fatalities or relative percentages of contributing factors, nor am I arguing that alcohol/speeding/attention are not all also issues. They are, you're right.

The only thing I argued is that "lapses in human visual perception are responsible for some crashes", which is a fact.


Attention is perhaps the limiting factor, but being able to look in two directions at once would help, and would help greatly if we had more attention capacity. E.g. anytime you change lanes you have to alternate between looking behind, beside, and in front and that greatly reduces reaction time should something unexpected happen in the direction you aren't currently looking...


In theory, a computer should be able to do the same. It could do sensor fusion with even more sense modalities than we have. It could have an array of cameras and potentially out-do our stereo vision, or perhaps even use some lightfield magic to (virtually) analyze the same scene with multiple optical paths.

However, there is also a lot of interaction between our perceptual system and cognition. Just for depth perception, we're doing a lot of temporal analysis. We track moving objects and infer distance from assumptions about scale and object permanence. We don't just repeatedly make depth maps from 2D imagery.

The brute-force approach is something like training visual language models (VLMs). E.g. you could train on lots of movies and be able to predict "what happens next" in the imaging world.

But, compared to LLMs, there is a bigger gap between the model and the application domain with VLMs. It may seem like LLMs are being applied to lots of domains, but most are just tiny variations on the same task of "writing what comes next", which is exactly what they were trained on. Unfortunately, driving is not "painting what comes next" in the same way as all these LLM writing hacks. There is still a big gap between that predictive layer, planning, and executing. Our giant corpus of movies does not really provide the ready-made training data to go after those bigger problems.


Putting your point another way, in order to replicate an average human driver’s competence you would need to make several strong advancements in the state of the art in computer vision _and_ digital optics.


In India (among others), honking is essential to reducing crashes

We often greatly underestimate / undervalue the role of our ears relative to vision. As my film director friend says, 80% of the impact in a movie is in the sound


The day a Waymo can functionally navigate the streets of Mumbai is when we really have achieved l5


I'm positive that Teslas have gyroscopes and accelerometers in them. Our eyes actually have a fairly small focal length range due to the fixed nature of our cornea and only being able to change focal length by flexing the crystalline lens.

20 meters away motion vision is more accurate than stereoscopic vision. What is lidar helping to solve here?


Waymo claims its system, which uses a combination of LIDAR & vision, resolves objects up to 500 meters away

https://waymo.com/blog/2024/08/meet-the-6th-generation-waymo...

This company claims their LIDAR works conservatively at 250m, and up to 750m depending on reflectivity

https://www.cepton.com/driving-lidar/reading-lidar-specs-par...


Most of what you said has nothing to do with lidar vs camera


What I said has to do with "vision only systems" (what Musk has claimed will be enough to do FSD) with sensor fusion systems (what everybody else having success in this space does)

Mentioning gaussian splatting for why we don't need lidar depth is a great example of Musk-esque technobabble; surface level seemingly correct, but nonsense to any practitioner. Because one of the biggest problems of all SfM techniques is that the results are scale ambiguous, so they do not in fact recover that crucial real-world depth measurement you get from lidar.

Now you might say "use a depth model to estimate metric depth" and I think if you spend 5 minutes thinking about why a magic math box that pretends to recover real depth from a single 2D image is a very very sketchy proposition when you need it to be correct for emergency braking versus some TikTok bokeh filter you will see that also doesn't get you far.


This is not really true if you have multiple cameras with a known baseline, or well known motion characteristics like you get with an accelerometer+ wheel speed.


> So that should technically be sufficient

Sufficient to build something close to human performance. But self driving cars will be held to a much higher standard by society. A standard only achievable by having sensors like LiDAR.


if a self driving car had the exact vision of humans it would still be better because it has better reaction times. never mind the fact that humans cant actually process all the visual information in our field of view because we dont have the broad attention to be able to do that. its very obvious that you can get super human performance with just cameras.

Whether thats worth completely throwing away LiDAR is a different question, but your argument is just obviously false.


This reminds me of the time I was distantly following a Waymo car at speed on 101 in Mountain View during rush hour. The Waymo brake lights came on first followed a second or two later by the rest of the traffic.


Better reaction times only matter if the decisions are the same / better in every case. Clearly we are not there on that aspect of it yet.

Deciding to crash faster, or "tell human to take over" really fast is NOT better.


Even if they weren’t going to be held to a higher standard for widespread acceptance, tens of thousands of people a year in the us die due to humans driving badly. Why would we not try to do better than that?


Because that's an acceptable loss and better costs more!


Teslas have at least 3 forward facing cameras giving them plenty of depth vision data.

They also have several cameras all around providing constant 360° vision.


Sufficient if all else were equal. But the human brain and artificial neural networks are clearly not equal. This is setting aside the whole question of whether we hope to equal human performance or exceed it.


That doesn't matter. It's not like we use 100% of our brain capacity for driving.

In fact, that's why radio/music/podcasts thrive. Because we're bored when we drive. We have conversations, etc. We daydream.

As long as the skills relevant to actually driving are on parity with humans, the rest doesn't matter.

In fact, in a recent podcast, Musk mused that you actually may have a limit of how smart you want a vehicle model to be, because what if IT starts to get bored? What will it do? I found that to be an interesting (and amusing) thought exercise.


To do gaussian splatting anywhere near in real time, you need good depth data to initialize the gaussian positions. This can of course come from monocular depth but then you are back to monocular depth vs lidar.


LIDAR also struggle in heavy rain, snow, fog, dust. Check how waymo handle such conditions.

It's not only failing, it's causing false positives.


Why is this getting downvoted? It's good faith and probably more accurate than not.


> and the fact that FSD works so well without it proves that it isn't required

The reports that Tesla submits on Austin Robotaxis include several of them hitting fixed objects. This is the same behavior that has been reported on for prior versions of their software of Teslas not seeing objects, including for the incident for which they had a $250M verdict against them reaffirmed this past week. That this is occurring in an extensively mapped environment and with a safety driver on board leads me to the opposite conclusion that you have reached.


If Waymo proven their model works, why the silly automaker is doing several orders of magnitude more autonomous miles?


They aren't. Tesla has logged some 800k total miles with their robotaxi vehicles, including miles with safety drivers. Waymo has logged 200M driverless miles. That's 0.4% of the mileage, with the most generous possible framing.


My understanding is that there's more data processing required with cameras because you need to estimate distance from stereoscopic vision. And as it happens, the required chips for that have shot up in price because of the AI boom.

But I think costs were just part of the reason why Elon decided against Lidar. Apparently, they interfere with each other once the market saturates and you have many such cars on the same streets at the same time. Haven't heard yet how the Lidar proponents are planning to address that.


How does Waymo handle it now? There are many videos of Waymo depots with dozens of cars not running into each other.



Lidar critics like to pretend that anti-collision is not a well-studied branch of Computer Science and telecoms. Wifi, Ethernet and cellphones all work well simultaneously, despite participants all sharing the same physical medium.


I'm not a Lidar critic. I'm really just curious how they're addressing it, or plan to.


The points linked repeatedly focus on cost and complexity as justification, even explicitly stating musks desire to minimise components in Kaparthy’s list.

They don’t focus on safety or effectiveness except to say that vision should be ‘sufficient’. Which is damning with faint praise imho.

If that link was to try and argue that the removal of sensors makes perfect sense i have to point out that anyone that reads that would likely have their negative viewpoint hardened. It was done to reduce cost (back when the sensors were 1000’s) and out of a ridiculous desire by Musk for minimalism. It’s the same desire that removed the indicator stalk i might add.


To be clear, from a personal standpoint, I am pro-more sensors and sensor fusion.

I assume Musk, et al are acting in best faith in trying to find the right compromises.


Why would you assume Musk is acting in good faith? That’s very much not his thing.


Oh, you sweet summer child..


Instead of betting on RADAR and LIDAR HW getting better and cost going down, they went with vision only approach. Everybody in this field knows the strengths and weakness of each system. Multi-modal sensor fusion is the way to go for L4 autonomy. There is no other way to reduce the risk. Vision only will never be able to achieve L4 in all the weather conditions. Tesla may try to demonstrate L4 in limited geography and in good weather conditions but it won't scale.


From the article:

Karpathy’s main points: Extra sensors add cost to the system, and more importantly complexity. They make the software task harder, and increase the cost of all the data pipelines. They add risk and complexity to the supply chain and manufacturing. Elon Musk pushes a philosophy of “the best part is no part” which can be seen throughout the car in things like doing everything through the touchscreen. This is an expression of this philosophy. Vision is necessary to the task (which almost all agree on) and it should also be sufficient as well. If it is sufficient, the cost of extra sensors and tools outweighs their benefit. Sensors change as parts change or become available and unavailable. They must be maintained and software adapted to these changes. They must also be calibrated to make fusion work properly. Having a fleet gathering more data is more important than having more sensors. Having to process LIDAR and radar produces a lot of bloat in the code and data pipelines. He predicts other companies will also drop these sensors in time. Mapping the world and keeping it up to date is much too expensive. You won’t change the world with this limitation, you need to focus on vision which is the most important. The roads are designed to be interpreted with vision.


So the argument is pretty much: it should be sufficient to use vision only, and that it is too difficult / expensive to do otherwise.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: