Yes, this was completely image-based. Not quite of a point of using it in production since I agree it can be flakey at times. Although I do think there's viable workarounds, like sending the same prompt multiple times, and seeing if the returned results overlap.
It really feels like we're maybe half a model generation away from this being a solved problem.
It really feels like we're maybe half a model generation away from this being a solved problem.