Of course, the details of how to actually implement something like this are way more complex than "just throw everything into a big neural net with images as the inputs and actuators as the output". You need to provide the right kind of guidance in order to learn a usable policy in any reasonable amount of time.
A very recent development (which this work builds on) is the idea of "online adaptation". It essentially involves doing the training in two stages:
1. You add a variety of dynamically-varying environmental effects to your simulator, by randomly altering parameters such as ground friction, payload weight distribution, motor effectiveness, and so on. You give the motion controller perfect knowledge of these parameters at all times, and let it learn how to move in response to them.
2. Then, you remove the oracle that tells the controller about the current environmental parameters, and replace it with another neural network that is trained to estimate (a latent representation of) those parameters, based on a very short window of data about the robot's own motor commands and the actual motion that resulted from it.
All of this can be done in simulation, many times faster than real-time. But when you transfer the system to a real robot, it adapts to its environment using the estimated parameters, without any of the networks needing to be re-trained. This ends up making it pretty robust to difficult terrain and perturbations. It also has the benefit of papering over subtle differences that arise between the simulated and real-world dynamics.
This paper adds a lot of additional refinements to the same basic idea. In the first stage, the system is given perfect knowledge of its surrounding terrain and the locations of some preselected waypoints, and learns to follow them. The second stage replaces those inputs with estimates derived from an RGB+depth camera.
From your perspective. In reality "amount of time" is quite abstract. Million years can still be "a blink" for an entity with different perception of it.
Earth's ability to support lifeforms such as human beings is expected to end in the next 500 million to 2 billion years. Granted, that's several orders of magnitude longer than we've been around, but still - the sun will still be shining when humans go extinct, assuming we don't find another home planet and manage to get ourselves to it. It will be billions of years afterward that the sun will actually die.
A very recent development (which this work builds on) is the idea of "online adaptation". It essentially involves doing the training in two stages:
1. You add a variety of dynamically-varying environmental effects to your simulator, by randomly altering parameters such as ground friction, payload weight distribution, motor effectiveness, and so on. You give the motion controller perfect knowledge of these parameters at all times, and let it learn how to move in response to them.
2. Then, you remove the oracle that tells the controller about the current environmental parameters, and replace it with another neural network that is trained to estimate (a latent representation of) those parameters, based on a very short window of data about the robot's own motor commands and the actual motion that resulted from it.
All of this can be done in simulation, many times faster than real-time. But when you transfer the system to a real robot, it adapts to its environment using the estimated parameters, without any of the networks needing to be re-trained. This ends up making it pretty robust to difficult terrain and perturbations. It also has the benefit of papering over subtle differences that arise between the simulated and real-world dynamics.
This paper adds a lot of additional refinements to the same basic idea. In the first stage, the system is given perfect knowledge of its surrounding terrain and the locations of some preselected waypoints, and learns to follow them. The second stage replaces those inputs with estimates derived from an RGB+depth camera.