The light reflection on the shoulder doesn't make much sense. I think this is less pronounced when soft light is involved. I wonder if sharp light and reflections will take longer to be handled properly. Every frame on the shoulder looks ok on its own, but when animated it feels like it was taken in a different environment than the rest.
I suspect depth maps (SD2 already supports them) could be used to achieve that in the future.
I wonder if a diffusion model could accept an "onion skin" noise, so the transitions between frames would be less jarring. Can someone with more knowledge than me explain what's the most promising approach here?
I think @bondarchuk took my comment out of context, quoting just the first line. No big deal, it's just hard to respond to that.
I don't even see an issue with that particular artefact, I think it's an interesting problem from a technical pov, hence literally every other part of my comment.
It just seemed like a bit of a nitpick in the context of the huge temporal consistency issues/fixes, of course you're all free to discuss whatever you want and I agree it's somewhat interesting.
I suspect depth maps (SD2 already supports them) could be used to achieve that in the future.
I wonder if a diffusion model could accept an "onion skin" noise, so the transitions between frames would be less jarring. Can someone with more knowledge than me explain what's the most promising approach here?