*> Touch gives pretty cool skills, but language, video and audio are all that ar...

dinfinity · 2025-07-01T16:26:46 1751387206

It is trivial to train AI on 3D representations. In fact, that already happens in cases where robot algorithms are trained in simulations.

Another thing to remember is that the senses we have aren't the only ones in biology and far from the only ones possible. In fact, anything that gives you another type of information about the world (you're modeling) is a different sense. In that sense (ha), AI has access to an incredibly vast and varied array of senses that is inaccessible to humans. Lidar is a very simple example of that.

I don't think touch and temperature sensitivity are needed to achieve it, but I do agree that training with senses specifically for understanding 3D space is very important. At the very least binocular video.

ordu · 2025-07-01T23:36:21 1751412981

> It is trivial to train AI on 3D representations.

So AI developers understand limitations and trying to remove them. It will help, but it will not make AI vision to be on par with a human's.

> In that sense (ha), AI has access to an incredibly vast and varied array of senses that is inaccessible to humans. Lidar is a very simple example of that.

I don't think that current uses of lidars have anything to do with intelligence. Not every neuro-net is about intelligence.

> I don't think touch and temperature sensitivity are needed to achieve it,

I'm sure they are. To understand forms you need to explore them with touch. The ability to understand forms by just looking at them is an acquired skill. Maybe it is possible to train these abilities without the touch, but how? I believe it will take a shitload of training data, and I'm not sure it will be good enough.

Temperature sensitivity is a big thing, because it allow you to guess thermal conductivity of a thing by just looking at it. It allows to guess wetness of a thing. It allows us to guess temperature of things by looking at them: like you see sun shining, fire burning, people touching things and yanking their hands from hot things. Or just how about a person that cautiously trying to learn a temperature of a thing, at first measuring infrared radiation, then a quick touch, then a touch for a longer time, and finally a long sustained contact: how could you understand all these proceedings without your own experience of grasping the hot thing, crying from a pain and dropping the thing on your feet?

These are just obvious ideas from top of my mind. What else comes from temperature sensitivity I don't know and no one is, because no one really knows how people learn to use their senses and to think. There are theories about it, but they are more of descriptive nature: they describe what is known without having a lot of a predictive power. Because of this the optimism of AI crowd seems overinflated. They don't know what they are trying to do, and still they believe in their eventual success.

Probably you can learn it by thinking, but can LLMs think, while training? You can learn it as a pattern of a behaviour, without understanding the meaning of it, but then you'll hallucinate this pattern all the time, just because some of the movements were close enough.

> At the very least binocular video.

I'm not sure that people can learn 3d by looking. At least they do not just rely on a binocular vision to learn it. They touch, they lick. They measure things in different ways (by sticking it in mouth, by grasping, by climbing on top of it or falling from it, by hugging it), they measure distances by crawling or walking along them. They are finding a spot where they can see what happens behind a pack of tree, or maybe behind something else. People not just using more senses, they are acting also, which allows them to learn causal relationships. Watching binocular video is not acting, so you can get correlation only without any hope to learn how to distinguish correlations from causations, and at the same time it is much more limited in a data available.

Science says that 80 or 90% of information people get is coming from their vision? I'm skeptical about this, because I don't know how they measure "information", but in any case human vision was trained with support from other senses. I wouldn't be surprised, if at certain stages of a baby's development other senses are more advanced and are used to get labelled data to train vision.