First OpenAI video I've ever seen, the people in it all seem incompetent for some reason, like a grotesque version of apple employees from temu or something.
Any old laptop (especially thinkpad) can run linux well. If you want to use it it's not "trouble" per se because once you really know what you are doing there is no trouble(and you can't get to knowing what you're doing without finding out what is it you did that caused you the trouble).
If you just want to use linux so you can tell someone about it, don't bother using linux and stick to what works for you.
It's just one of those consequences of "I don't care about the specifics just put it in production" that ends up in "why didn't you tell me that I completely misunderstood"
Article makes it seem like finding diamonds is some kind of super complicated logical puzzle. In reality the hardest part is knowing where to look for them and what tool you need to mine them without losing them once you find them. This was given to the AI by having it watch a video that explains it.
If you watch a guide on how to find diamonds it's really just a matter of getting an iron pickaxe, digging to the right depth and strip mining until you find some.
Hi, author here! Dreamer learns to find diamonds from scratch by interacting with the environment, without access to external data. So there are no explainer videos or internet text here.
It gets a sparse reward of +1 for each of the 12 items that lead to the diamond, so there is a lot it needs to discover by itself. Fig. 5 in the paper shows the progression: https://www.nature.com/articles/s41586-025-08744-2
Since diamonds are surrounded by danger and if it dies, it loses its items and such, why would it not be satisfied after discovering iron pick axe or somesuch? Is it in a mode where it doesn't lose its item when it dies? Does it die a lot? Does it ever try digging vertically down? Does it ever discover other items/tools you didn't expect it to? Open world with sparse reward seems like such a hard problem. Also, once it gets the item, does it stop getting reward for it? I assume so. Surprised that it can work with this level of sparse rewards.
In all reinforcement learning there is (explicitly as part of a fitness function, or implicitly as part of the algorithm) some impetus for exploration. It might be adding a tiny reward per square walked, a small reward for each block broken and a larger one for each new block type broken. Or it could be just forcing a random move every N steps so the agent encounters new situations through “clumsiness”.
When it dies it loses all items and the world resets to a new random seed. It learns to stay alive quite well but sometimes falls into lava or gets killed by monsters.
It only gets a +1 for the first iron pickaxe it makes in each world (same for all other items), so it can't hack rewards by repeating a milestone.
Yeah it's surprising that it works from such sparse rewards. I think imagining a lot of scenarios in parallel using the world model does some of the heavy lifting here.
> Yeah it's surprising that it works from such sparse rewards. I think imagining a lot of scenarios in parallel using the world model does some of the heavy lifting here.
This is such gold. Thanks for sharing. Immediately added to my notes.
I just want to express my condolences in how difficult it must be to correct basic misunderstandings that can be immediately corrected from reading the fourth paragraph under the section "Diamonds are forever"
it didn't watch 'a video', it watched many, many hours of video of playing minecraft (with another specialised model feeding in predictions of keyboard and mouse inputs from the video). It's still a neat trick, but it's far from the implied one-shot learning.
I don't think it was videos. Almost certainly it was replay files with a bunch of work to transform them into something that could be compared to the model's outputs. (Alphastar never 'sees' the game's interface, only a transformed version of information available via an API)
starcraft provides replay files that start with the initial game state and then every action in the game. Not user inputs, but the actions bound to them.
The other replies have observed that the AI didn't get any "videos to watch" but I'd also observe that this is being used as an English colloquialism. The AIs aren't "watching videos", they're receiving videos as their training data. That's quite different from what is coming to your mind as "watching a video" as if the AI watched a single YouTube tutorial video once and got the concept.
I feel like you are jumping to conclusions here, I wasn't talking about the achievement or the AI, I was talking about the article and the way it explains finding diamonds in minecraft to people who don't know how to find diamonds in minecraft.
I am not american nor I live in america so I don't really have a horse in this race, but the DOGE approach seems to be the classic "move fast and break things" approach. The reactions to it are the classic reactions to that approach, competent people speak out to get broken things fixed and others are confused about what is happening.
> All of that social activity with zero ROI.
Pick one