Hacker Newsnew | past | comments | ask | show | jobs | submit | reportgunner's commentslogin

> I’ve benefited incredibly from commenting.

> All of that social activity with zero ROI.

Pick one


First OpenAI video I've ever seen, the people in it all seem incompetent for some reason, like a grotesque version of apple employees from temu or something.


Any old laptop (especially thinkpad) can run linux well. If you want to use it it's not "trouble" per se because once you really know what you are doing there is no trouble(and you can't get to knowing what you're doing without finding out what is it you did that caused you the trouble).

If you just want to use linux so you can tell someone about it, don't bother using linux and stick to what works for you.


It's just one of those consequences of "I don't care about the specifics just put it in production" that ends up in "why didn't you tell me that I completely misunderstood"


> cyber activists, and in the text of the article, they were called cybercriminals

depends who's side you are on


Their target demographic was already born into the matrix and they don't even know it's there; it will hardly be a problem for them.


people who don't use ddg believe ddg=bing; there is no point in debating that


Article makes it seem like finding diamonds is some kind of super complicated logical puzzle. In reality the hardest part is knowing where to look for them and what tool you need to mine them without losing them once you find them. This was given to the AI by having it watch a video that explains it.

If you watch a guide on how to find diamonds it's really just a matter of getting an iron pickaxe, digging to the right depth and strip mining until you find some.


Hi, author here! Dreamer learns to find diamonds from scratch by interacting with the environment, without access to external data. So there are no explainer videos or internet text here.

It gets a sparse reward of +1 for each of the 12 items that lead to the diamond, so there is a lot it needs to discover by itself. Fig. 5 in the paper shows the progression: https://www.nature.com/articles/s41586-025-08744-2


Since diamonds are surrounded by danger and if it dies, it loses its items and such, why would it not be satisfied after discovering iron pick axe or somesuch? Is it in a mode where it doesn't lose its item when it dies? Does it die a lot? Does it ever try digging vertically down? Does it ever discover other items/tools you didn't expect it to? Open world with sparse reward seems like such a hard problem. Also, once it gets the item, does it stop getting reward for it? I assume so. Surprised that it can work with this level of sparse rewards.


In all reinforcement learning there is (explicitly as part of a fitness function, or implicitly as part of the algorithm) some impetus for exploration. It might be adding a tiny reward per square walked, a small reward for each block broken and a larger one for each new block type broken. Or it could be just forcing a random move every N steps so the agent encounters new situations through “clumsiness”.


That is right, there is usually a parameter on the action selection function -- the exploitation vs exploration balance.


When it dies it loses all items and the world resets to a new random seed. It learns to stay alive quite well but sometimes falls into lava or gets killed by monsters.

It only gets a +1 for the first iron pickaxe it makes in each world (same for all other items), so it can't hack rewards by repeating a milestone.

Yeah it's surprising that it works from such sparse rewards. I think imagining a lot of scenarios in parallel using the world model does some of the heavy lifting here.


> Yeah it's surprising that it works from such sparse rewards. I think imagining a lot of scenarios in parallel using the world model does some of the heavy lifting here.

This is such gold. Thanks for sharing. Immediately added to my notes.


I just want to express my condolences in how difficult it must be to correct basic misunderstandings that can be immediately corrected from reading the fourth paragraph under the section "Diamonds are forever"

Thanks for your hard work.


Haha thanks!


For the curious, from the link above:

> log, plank, stick, crafting table, wooden pickaxe, cobblestone, stone pickaxe, iron ore, furnace, iron ingot, iron pickaxe and diamond


While I agree with your comment, this sentence:

"This was given to the AI by having it watch a video that explains it."

This was not as trivial as it may seem just a few months ago...


EDIT: Incorrect, see below

it didn't watch 'a video', it watched many, many hours of video of playing minecraft (with another specialised model feeding in predictions of keyboard and mouse inputs from the video). It's still a neat trick, but it's far from the implied one-shot learning.


The author replied in this thread and says the opposite.


Ah, I was incorrect. I got that impression from one of the papers linked at the end of the article, but I suspect that's actually some previous work.


I applaud you for acknowledging your mistake. So many people double down, especially in this pernicious and polarized age.


Alpha Star was also trained initially from youtube videos of pros playing Starcraft. I would argue that it was pretty trivial a few years ago.


I don't think it was videos. Almost certainly it was replay files with a bunch of work to transform them into something that could be compared to the model's outputs. (Alphastar never 'sees' the game's interface, only a transformed version of information available via an API)


This was my understanding as well, as the replay files are all available anyway.

The YouTube documentary is actually very detailed about how they implemented everything.


Which documentary? Is it this one?

https://www.youtube.com/watch?v=UuhECwm31dM


It was a ~1h documentary


Do you know if it was actual videos or some simpler inputs like game state and user inputs? I’d be impressed if it was the former at that time.


starcraft provides replay files that start with the initial game state and then every action in the game. Not user inputs, but the actions bound to them.


>This was given to the AI by having it watch a video that explains it.

That is not what the article says. It says that was separate, previous research.


I don't get it. How can you reduce this achievement down to this?

Have you gotten used to some ai watching a video and 'getting it' so fast that this is boring? Unimpressive?


The other replies have observed that the AI didn't get any "videos to watch" but I'd also observe that this is being used as an English colloquialism. The AIs aren't "watching videos", they're receiving videos as their training data. That's quite different from what is coming to your mind as "watching a video" as if the AI watched a single YouTube tutorial video once and got the concept.


I feel like you are jumping to conclusions here, I wasn't talking about the achievement or the AI, I was talking about the article and the way it explains finding diamonds in minecraft to people who don't know how to find diamonds in minecraft.


The AI is able to learn from video and you don't find that even a little bit impressive? Well I disagree.



Financial collapse ? Surely we can just roll out AI powered money printers and make them go BRRR /s


I am not american nor I live in america so I don't really have a horse in this race, but the DOGE approach seems to be the classic "move fast and break things" approach. The reactions to it are the classic reactions to that approach, competent people speak out to get broken things fixed and others are confused about what is happening.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: