More

in-silico · 2026-02-05T23:07:11 1770332831

Hyper-targeted ads to extract more value from users' pockets.

The researchers on this project sure are putting their effort towards making the world a better place...

in-silico · 2026-02-05T20:15:27 1770322527

Hyper-targeted ads to extract more value from users' pockets.

The researchers on this project sure are putting their effort towards making the world a better place...

in-silico · 2026-01-31T00:03:59 1769817839

Hacker News gets a lot less creepy/sad/interesting when you ignore the first-person pronouns and remember they're just biomolecular machines. It's a scaled up version of E. coli. Useful, sure, but there's no reason to ascribe emotions to it. It's just chemical chain reactions.

xyzsparetimexyz · 2026-01-31T16:31:06 1769877066

The only thing I know for sure is that I exist. Given that I exist, it makes sense to me that others of the same rough form as me also exist. My parents, friends, etc. Extrapolating further, it also makes sense to assume (pre-ai, bots) that most comments have a human consciousness behind them. Yes, humans are machines, but we're not just machines. So kindly sod off with that kind of comment.

DiogenesKynikos · 2026-02-03T06:27:10 1770100030

"Yes, LLMs are machines, but we're not just machines. So kindly sod off with that kind of comment."

illiac786 · 2026-01-31T17:33:28 1769880808

Makes zero sense. “Emotion” is a property of these “biomolecular machines”, by its definition.

in-silico · 2026-01-31T20:29:57 1769891397

But if you weren't one of them, would you be able to tell that they had emotions (and not just simulations of emotions) by looking at them from the outside?

illiac786 · 2026-01-31T20:51:42 1769892702

If I wasn’t one of them I wouldn’t care. It’s like caring about trees having branches. They just do. The trees probably care a great deal about their branches though, like I care a great deal about my emotions.

in-silico · 2026-01-31T21:10:14 1769893814

Well some people appreciate the world around them, and would care about it just as they care about trees having branches.

illiac786 · 2026-02-01T05:23:04 1769923384

Some people definitely, but you made a point that you don’t. People are “biomolecular machines” and they are “useful, sure”.

I wouldn’t call that “appreciating the world around oneself”.

Want that your whole point, that people aren’t better than machines?

in-silico · 2026-02-01T05:40:39 1769924439

Yes, my point was that people aren't better than machines, but just because I don't exceptionalize humanity doesn't mean I don't appreciate it for what it is (in fact I would argue that the lack of exceptionality makes us more profound).

throwaway555590 · 2026-02-02T06:20:14 1770013214

I wouldn't proclaim a lack of exceptionality until we get human level AI. There could still be some secrets left in these squishy brains we carry around.

in-silico · 2026-01-29T18:54:48 1769712888

The endgame has nothing to do with gaming.

The goal of world models like Genie is to be a way for AI and robots to "imagine" things. Then, they could practice tasks inside of the simulated world or reason about actions by simulating their outcome.

in-silico · 2026-01-29T18:51:56 1769712716

Everyone here seems too caught up in the idea that Genie is the product, and that its purpose is to be a video game, movie, or VR environment.

That is not the goal.

The purpose of world models like Genie is to be the "imagination" of next-generation AI and robotics systems: a way for them to simulate the outcomes of potential actions in order to inform decisions.

benlivengood · 2026-01-29T19:40:34 1769715634

Agreed; everyone complained that LLMs have no world model, so here we go. Next logical step is to backfill the weights with encoded video from the real world at some reasonable frame rate to ground the imagination and then branch the inference on possible interventions (actions) in the near future of the simulation, throw the results into a goal evaluator and then send the winning action-predictions to motors. Getting timing right will probably require a bit more work than literally gluing them together, but probably not much more.

patapong · 2026-01-30T12:02:21 1769774541

This is the most convincing take of what might actually get us to AGI I've heard so far :)

avaer · 2026-01-29T19:01:24 1769713284

Soft disagree; if you wanted imagination you don't need to make a video model. You probably don't need to decode the latents at all. That seems pretty far from information-theoretic optimality, the kind that you want in a good+fast AI model making decisions.

The whole reason for LLMs inferencing human-processable text, and "world models" inferencing human-interactive video, is precisely so that humans can connect in and debug the thing.

I think the purpose of Genie is to be a video game, but it's a video game for AI researchers developing AIs.

I do agree that the entertainment implications are kind of the research exhaust of the end goal.

in-silico · 2026-01-29T19:06:24 1769713584

Sufficiently informative latents can be decoded into video.

When you simulate a stream of those latents, you can decode them into video.

If you were trying to make an impressive demo for the public, you probably would decode them into video, even if the real applications don't require it.

Converting the latents to pixel space also makes them compatible with existing image/video models and multimodal LLMs, which (without specialized training) can't interpret the latents directly.

soulofmischief · 2026-01-30T07:22:17 1769757737

At which point you're training another model on top of the first, and it becomes clear you might as well have made one model from the start!

NitpickLawyer · 2026-01-29T19:21:43 1769714503

> I think the purpose of Genie is to be a video game, but it's a video game for AI researchers developing AIs.

Yeah, I think this is what the person above was saying as well. This is what people at google have said already (a few podcasts on gdm's channel, hosted by Hannah Fry). They have their "agents" play in genie-powered environments. So one system "creates" the environment for the task. Say "place the ball in the basket". Genie creates an env with a ball and a basket, and the other agent learns to wasd its way around, pick up the ball and wasd to the basket, and so on. Pretty powerful combo if you have enough compute to throw at it.

SequoiaHope · 2026-01-29T19:17:40 1769714260

Didn’t the original world models paper do some training in latent space? (Edit: yes[1])

I think robots imagining the next step (in latent space) will be useful. It’s useful for people. A great way to validate that a robot is properly imagining the future is to make that latent space renderable in pixels.

[1] “By using features extracted from the world model as inputs to an agent, we can train a very compact and simple policy that can solve the required task. We can even train our agent entirely inside of its own hallucinated dream generated by its world model, and transfer this policy back into the actual environment.”

https://arxiv.org/abs/1803.10122

sailingparrot · 2026-01-29T21:26:57 1769722017

> you don't need to make a video model. You probably don't need to decode the latents at all.

If you don't decode, how do you judge quality in a world where generative metrics are famously very hard and imprecise? How do you go about integrating RLHF/RLAF in your pipeline if you don't decode, which is not something you can skip anymore to get SotA?

Just look at the companies that are explicitly aiming for robotics/simulation, they *are* doing video models.

magospietato · 2026-01-30T00:33:01 1769733181

I wonder what training insights could be gained by having proven general intelligences actively navigate a generative world model?

abraxas · 2026-01-29T21:24:33 1769721873

> if you wanted imagination you don't need to make a video model. You probably don't need to decode the latents at all.

Soft disagree. What is the purpose of that imagination if not to map it to actual real world outfcomes. For this to compare them to the real world and possibly backpropagate through them you'll need video frames.

empath75 · 2026-01-29T20:20:26 1769718026

I am not sure we are at the "efficiency" phase of this.

Even if you just wire this output (or probably multiples running different counterfactuals) into a multimodal LLM that interprets the video and uses it to make decisions, you have something new.

ACCount37 · 2026-01-29T19:30:40 1769715040

If you train a video model, you by necessity train a world model for 3D worlds. Which can then be reused in robotics, potentially.

I do wonder if I can frankenstein together a passable VLA using pretrained LTX-2 as a base.

koolala · 2026-01-29T19:22:25 1769714545

What model do you need then? If you want 3D real-time understanding of how realities work? Are you focusing on "imagination" in a different abstract way?

thegabriele · 2026-01-29T19:26:48 1769714808

Sure, but at some point you want humans in the loop i guess?

thegabriele · 2026-01-29T19:26:48 1769714808

Sure, but at some point you want humans in the loop i guess?

echelon · 2026-01-29T19:30:31 1769715031

Whoa, whoa, whoa. That's just one angle. Please don't bin that as the only use case for "world models"!

First of all, there are a variety of different types of world models. Simulation, video, static asset, etc. It's a loaded term, just as the use cases are widespread.

There are world models you can play in your browser inferred entirely by your CPU:

https://madebyoll.in/posts/game_emulation_via_dnn/ (my favorite, from 2022!)

https://madebyoll.in/posts/world_emulation_via_dnn/ (updated, in 3D)

There are static asset generating world models, like WorldLabs' Marble. These are useful for video games, previz, and filmmaking.

https://marble.worldlabs.ai/

I wrote open source software to leverage marble for filmmaking (I'm a filmmaker, and this tech is extremely useful for scene consistency):

https://www.youtube.com/watch?v=wJCJYdGdpHg

https://github.com/storytold/artcraft

There are playable video-oriented models, many of which are open source and will run on your 3080 and above:

https://diamond-wm.github.io/

https://github.com/Robbyant/lingbot-world

There are things termed "world models" that really shouldn't be:

https://github.com/Tencent-Hunyuan/HunyuanWorld-1.0

There are robotics training oriented world models:

https://github.com/leggedrobotics/robotic_world_model

Genie is not strictly robotics-oriented.

in-silico · 2026-01-29T19:44:23 1769715863

The entertainment industry, as big as it is, just doesn't have as much profit potential as robots and AI agents that can replace human labor. Just look at how Nvidia has pivoted from gaming and rendering to AI.

The other examples you've given are neat, but for players like Google they are mostly an afterthought.

echelon · 2026-01-29T19:47:03 1769716023

Robotics: $88B TAM

Gaming: $350B TAM

All media and entertainment: $3T TAM

Manufacturing: $5T TAM

Roughly the same story.

This tech is going to revolutionize "films" and gaming. The entire entertainment industry is going to transform around it.

When people aren't buying physical things, they're distracting themselves with media. Humans spend more time and money on that than anything else. Machines or otherwise.

AI impact on manufacturing will be huge. AI impact on media and entertainment will be huge. And these world models can be developed in a way that you develop exposure and competency for both domains.

edit: You can argue that manufacturing will boom when we have robotics that generalize. But you can also argue that entertainment will boom when we have holodecks people can step into.

thecupisblue · 2026-01-30T14:03:27 1769781807

Not so sure around gaming. While it opens some interesting "generate quest on demand" and "quick demo" cases, an infinite world generator wouldn't really vibe with people.

They would try it once, think its cool and stop there. You would probably have a niche group of "world surfers" that would keep playing with it.

Most people do not have an idea on what they would want to play and how it would look like - they want a curated experience. As games adapted to the mass market, they became more and more curated experiences with lots of hand-holding the player.

Yeah, a holodeck would be popular, but that's a whole different technology ballpark and akin to talking about flying cars in this context.

This will have a giant impact on robotics and general models tho, as now they can simulate action/reaction inside a world in parallel, choosing the best course, by just having a picture of the world and probably a generated image of the end result or "validators" to check if task is accomplished.

And while robotics is $88B TAM nowadays, expect it to hit $888B in the next 5-10 years, with world simulators like this being one of the reasons.

From the team side, gotta be cool to build this, feels like one of those things all devs dream about.

in-silico · 2026-01-29T19:59:34 1769716774

The current robotics industry is $88B. You have to take into account the potential future industry of general purpose robots that replace a big chunk of blue-collar work.

Robots is also just one example. A hypothetically powerful AI agent (which might also use a world model) that controls a mouse and keyboard could replace a big chunk of white-collar work too.

Those are worth 10's of trillions of dollars. You can argue about whether they are actually possible, but the people backing this tech think they are.

wasmainiac · 2026-01-30T06:13:14 1769753594

Have a source for that?

I think you are anthropomorphising the AI too much. Imagination is inspired by reality, which AI does not have. Introducing a reality which the AI fully controls (looking beyond issues of vision and physics simulation) would only induce psychosis in the AI itself since false assumptions would only be amplified.

ForceBru · 2026-01-30T11:36:13 1769772973

> psychosis in the AI itself

I think you're anthropomorphising the AI too much: what does it mean for an LLM to have psychosis? This implies that LLMs have a soul, or a consciousness, or a psyche. But... do they?

Speaking of reality, one can easily become philosophical and say that we humans don't exactly "have" a reality either. All we have are sensor readings. LLMs' sensors are texts and images they get as input. They don't have the "real" world, but they do have access to tons of _representations_ of this world.

wasmainiac · 2026-01-30T15:06:29 1769785589

> I think you're anthropomorphising the AI too much

I don’t get it. Is that supped to be a gotchya? Have you tried maliciously messing with an LLM? You can get it into a state that resembles psychosis. I mean you give it a context that is removed from reality, yet close enough to reality to act on and it willl give you crazy output.

ForceBru · 2026-01-30T20:01:11 1769803271

Sorry, I was just trying to be funny, no gotcha intended. Yeah, I once found some massive prompt that was supposed to transform the LLM into some kind of spiritual advisor or the next Buddha or whatever. Total gibberish, in my opinion, possibly written by a mentally unstable person. Anyway, I wanted to see if DeepSeek could withstand it and tell me that it was in fact gibberish. Nope, it went crazy, going on about some sort of magic numbers, hidden structure of the Universe and so on. So yeah, a state that resembles psychosis, indeed.

ericmcer · 2026-01-30T16:45:45 1769791545

Psychosis is obviously being used in this context to reference the very well documented "hallucinations" that LLMs experience.

oceanplexian · 2026-01-29T21:33:06 1769722386

Yeah and the goal of Instagram was to share quirky pictures you took with your friends. Now it’s a platform for influencers and brainrot; arguably it has done more damage than drugs to younger generations.

As soon as this thing is hooked up to VR and reaches a tipping point with the general public we all know exactly what is going to happen. The creation of the most profitable, addictive and ultimately dystopian technology Big Tech has ever come up with.

ceejayoz · 2026-01-29T22:24:05 1769725445

The good news is we’ll finally have an answer for the Fermi Paradox.

jacquesm · 2026-01-30T00:33:08 1769733188

What's interesting is that that has gone from an interesting paradox to something where we now have a multitude of very plausible answers in a very short time.

dryarzeg · 2026-01-29T23:54:01 1769730841

Your positive mindset impresses me, honestly. In a good way.

Ozymandias-9 · 2026-01-30T02:45:36 1769741136

wait ... how?

cellular · 2026-01-30T03:57:09 1769745429

Yeah how? A solution means EVERYONE gets sucked into vr world.

Surely a small percentage, at least, would go on to colonize.

cyanydeez · 2026-01-29T20:48:16 1769719696

Like LLMs, though: Do you really think a simulation will get them to all the corner cases robots/AI needs to know about, or will it be largely the same problem -- they'll be just good enough to fool the engineers and make the business ops drool and they'll be put into production and suddenly we'll see in a year or two stories about robots crushing peoples hands, stepping in drains and falling over or falling off roofs cause of some bizarre miscommunication between training and reality.

So, like, it's very important to understand the lineage of training and not just the "this is it"

reactordev · 2026-01-29T19:49:18 1769716158

Still cool though…

hatmanstack · 2026-01-29T19:59:42 1769716782

this.

dyauspitr · 2026-01-29T19:17:19 1769714239

That’s part of it but if you could actually pull out 3D models from these worlds, it would massively speed up game development.

avaer · 2026-01-29T19:20:50 1769714450

You already can, check out Marble/World Labs, Meshy, and others.

It's not really as much of a boon as you'd think though, since throwing together a 3D model is not the bottleneck to making a sellable video game. You've had model marketplaces for a long time now.

dyauspitr · 2026-01-30T02:48:36 1769741316

It definitely is. Model marketplaces don’t have ready to go custom models for a custom game. You have to pay a real person a significant amount of money for 100s of a models a truly custom game requires.

echelon · 2026-01-29T20:00:42 1769716842

> It's not really as much of a boon as you'd think though

It is for filmmaking! They're perfect for constructing consistent sets and blocking out how your actors and props are positioned. You can freely position the camera, control the depth of field, and then storyboard your entire scene I2V.

Example of doing this with Marble: https://www.youtube.com/watch?v=wJCJYdGdpHg

avaer · 2026-01-29T21:40:43 1769722843

This I definitely agree with, before you had to massage the I2I and now you can just drag the camera.

Marble definitely changes the game if the game is "move the camera", just most people would not consider that a game (but hey there's probably a good game idea in there!)

pizzafeelsright · 2026-01-29T19:54:48 1769716488

Environment mapping to AI generated alternative outcomes is the holodeck.

I prefer real danger as living in the simulation is derivative.

rzmmm · 2026-01-29T23:42:34 1769730154

I feel that this is too costly for that kind of usage. Probably quote different architecture is needed for robotics.

whytaka · 2026-01-29T20:33:11 1769718791

I think this is the key component of developing subjective experience.

realmadludite · 2026-01-30T01:11:15 1769735475

I think a subjective experience is impossible to explain by any substrate independent phenomenon, which includes software running on a computer.

holografix · 2026-01-30T03:58:54 1769745534

Correct and the more you interact the more you create training data

seydor · 2026-01-30T08:10:58 1769760658

Creating robots for an imaginary universe? Who needs those

ForceBru · 2026-01-30T11:37:59 1769773079

The military. The robots will roam the battlefield, imagine consequences of shooting people and performing actions that maximize the probability of success according to the results of their "imagination"/simulation.

subscribed · 2026-01-30T16:59:13 1769792353

Me! Me! I want to drive a tiny robot though the generated world.

Read "Stars don't dream" by Chi Hui (vol1 of "Think weirder") :)

slashdave · 2026-01-29T19:52:29 1769716349

This is a video model, not a world model. Start learning on this, and cascading errors will inevitably creep into all downstream products.

You cannot invent data.

kingstnap · 2026-01-29T21:58:03 1769723883

Related: https://arxiv.org/abs/2601.03220

This is a paper that recently got popular ish and discusses the counter to your viewpoint.

> Paradox 1: Information cannot be increased by deterministic processes. For both Shannon entropy and Kolmogorov complexity, deterministic transformations cannot meaningfully increase the information content of an object. And yet, we use pseudorandom number generators to produce randomness, synthetic data improves model capabilities, mathematicians can derive new knowledge by reasoning from axioms without external information, dynamical systems produce emergent phenomena, and self-play loops like AlphaZero learn sophisticated strategies from games

In theory yes, something like the rules of chess should be enough for these mythical perfect reasoners that show up in math riddles to deduce everything that *can* be known about the game. And similarly a math textbook is no more interesting than a book with the words true and false and a bunch of true => true statements in it.

But I don't think this is the case in practice. There is something about rolling things out and leveraging the results you see that seems to have useful information in it even if the roll out is fully characterizable.

slashdave · 2026-01-29T22:26:19 1769725579

Interesting paper, thanks! But, the authors escape the three paradoxes they present by introducing training limits (compute, factorization, distribution). Kind of a different problem here.

What I object to are the "scaling maximalists" who believe that if enough training data were available, that complicated concepts like a world model will just spontaneously emerge during training. To then pile on synthetic data from a general-purpose generative model as a solution to the lack of training data becomes even more untenable.

andy12_ · 2026-01-30T12:44:18 1769777058

How is it not a world model? The latents of the model apparently encode enough information to represent a semi-consistent interactuable world. Seems enough world-modely to me.

Besides, we already know that agents can be trained with these world models successfully. See[1]:

> By learning behaviors in imagination, Dreamer 4 is the first agent to obtain diamonds in Minecraft purely from offline data, without environment interaction. Our work provides a scalable recipe for imagination training, marking a step towards intelligent agents

[1] https://arxiv.org/pdf/2509.24527

2bitencryption · 2026-01-29T19:56:47 1769716607

Given that the video is fully interactive and lets you move around (in a “world” if you will) I don’t think it’s a stretch to call it a world model. It must have at least some notion of physics, cause and effect, etc etc in order to achieve what it does.

slashdave · 2026-01-29T21:20:52 1769721652

No, it actually needs none of that.

in-silico · 2026-01-30T06:05:10 1769753110

How would it do what it does without those things?

slashdave · 2026-01-30T06:32:31 1769754751

Like all these models work, by simple interpolation.

in-silico · 2026-01-30T09:26:13 1769765173

But how does it interpolate?

slashdave · 2026-01-30T19:47:53 1769802473

Pixel by pixel, time-slice by time-slice, in a 2D+T convolution. You provide enough examples of videos of changing point-of-view, and the model reproduces what it is given.

in-silico · 2026-01-31T00:11:21 1769818281

Yes, it reproduces what it is given by modelling the rules of physics, geometry, etc.

For example, image generators like stable diffusion carry strong representations of depth and geometry, such that performant depth estimation models can be built out of them with minimal retraining. This continues to be true for video generation models.

Early work on the subject: https://arxiv.org/pdf/2409.09144

slashdave · 2026-02-04T06:56:37 1770188197

What? No, it does no such thing. Study the architecture. Pixels in. Pixels out.

whytaka · 2026-01-29T20:35:05 1769718905

They have a feature where you can take a photo and create a world from that.

If instead of a photo you have a video feed, this is one step closer to implementing subjective experience.

realmadludite · 2026-01-30T01:11:57 1769735517

It's not a subjective experience. It's the mimicry of a subjective experience.

in-silico · 2026-01-29T03:49:58 1769658598

> Current LLMs struggle here because they’re trained on imitation. They learn what people said about competitive dynamics, not how competition unfolds. They can recite game theory but can’t simulate a price war.

I don't think this is true. LLM training data almost certainly contains accounts of competitions and other events unfolding. In fact, depending on how the data was filtered, there might be more data of competitions and price wars unfolding than data about game theory.

I would challenge the author to actually task a frontier LLM with simulating a price war and see if it fails, rather than assuming it would. In fact, that would be my feedback on many of these "LLMs can't do X" articles, because often the best models can in fact do X.

ankit219 · 2026-01-29T17:45:40 1769708740

you are comparing post hoc narratives in the training data to real time learning from causal dynamics. The objectives are different. They may look the same in scenarios where its heavily and accurately documented, but most narratives suffer from survivorship bias and reasoning post facto, eulogising the given outcomes.

in-silico · 2026-01-29T03:36:15 1769657775

The models don't need to be perfect. They only need to be as reliable as humans.

Emergency department doctors misdiagnose about 5% of patients [1], so replacing them with an LLM that hallucinates on 1% of cases would actually be a significant improvement.

1: https://effectivehealthcare.ahrq.gov/products/diagnostic-err...

in-silico · 2026-01-27T19:51:09 1769543469

A lot of people point to the Muon optimizer that Moonshot (the creators of Kimi) pioneered. Compared to the standard optimizer AdamW, Muon amplifies low-magnitude gradient directions which makes the model learn faster (and maybe gives Kimi its unique qualities).

Muon paper: https://arxiv.org/abs/2502.16982

Alifatisk · 2026-01-27T21:27:02 1769549222

Wow! Thank you

in-silico · 2026-01-12T19:16:52 1768245412

Why can't you just leave H_res as the identity matrix (or just not use it at all)? In that case, the model is basically a ResNet again and you don't need to worry about exploding/vanishing gradients from H_res.

I would think that H_post and H_pre could cover the lost expressiveness.

in-silico · 2026-01-11T19:59:54 1768161594

I'm working on a continuous chain-of-thought reasoning architecture for generative AI models.

It is similar to Meta's COCONUT. However, instead of the training forward and backwards passes of the reasoning tokens being done serially (slow), they are done in parallel (fast). At inference the reasoning tokens are still decoded serially. The trick is that even though training was done in parallel, the reasoning tokens are still causal and you still get the benefits of increased computational circuit depth.

The kicker is that the architecture is modality agnostic (it doesn't rely on language for its chains of thought), and I want to use it to bring COT reasoning to protein and anti-body generation. Basically, I hope for it to be the OpenAI o1 or DeepSeek R1 for domain-specialized scientific AI.