What's mind blowing is that you can extrapolate where this is going to go. Event...

treis · on Sept 29, 2022

IMHO this particular avenue is a dead end. It's an extraordinarily impressive dead end but it's clear that there's no real understanding here. Look at this video of the knight riding a horse:

>https://makeavideo.studio/assets/A_knight_riding_on_a_horse_...

The horse's face is all wrong

The gait is wrong

The interface with the ground & hooves is wrong

The knight's upper body doesn't match with the lower and they're not moving correctly

I think ultimately the right path is something like AI automated Blender. AI creates the models & actions while Blender renders it according to a rules based physics engine.

yreg · on Sept 29, 2022

Of course there "is no understanding here", but yet it's not all wrong. Somehow it did move the horse's legs roughly correctly (using the proper joints and all), somehow the cape is moving roughly as it should through the air and the knight's body absorbs the force of stomping on the ground…

It doesn't seem that the fundamental inability to understand what is going on in the scene is stopping models of this kind to eventually lead to realistic results.

Same applies to DALL-E and GPT.

treis · on Sept 30, 2022

I guess it comes down to how much wiggle room is in "roughly". If you watch the video closely the horse briefly gets a 5th leg when the front left one moves the first time. And yes the legs and joints are sort of right but they don't match up with the direction the horse is moving and wouldn't work in the real world.

It's superficially close but when you look at details they're all slightly off. To wit:

> the knight's body absorbs the force of stomping on the ground…

But the Knight doesn't have any way to see through the helmet.

yreg · on Sept 30, 2022

I would say that the wiggle room is smaller then I would expect it to be. Don't you agree?

And I was surprised about how small the wiggle room was the first time I interacted with GPT in text or saw the first images from DALL-E, since I too expected them to be (severly) limited by not understanding what's actually respresented in the input/output.

With new versions the wiggle room shrinks further. So I guess the question is, whether it will be able to shrink enough to be satisfactory. We will see…

numtel · on Sept 29, 2022

"Don't look where we are right now but imagine where we'll be two more papers down the line" - Two minute papers

giarc · on Sept 29, 2022

Would be interesting to input some existing screenplays into a future tool like this and see what comes out.

SuperCuber · on Sept 29, 2022

Related: https://pub.towardsai.net/stable-diffusion-based-image-compr...

bdickason · on Sept 29, 2022

Or full 3D scenes that are interactive?