The problem with the current crop of projectors such as LLaVA is that as far as ...

vessenes · on July 10, 2024

The original gpt4 did this too, it had almost no memory before or after the image provided. I haven’t tested gpt4o on this directly, but my feeling is that it’s better from casual usage.

I do think some of these thin line drawings are likely extra hard to tokenize depending on the image scaling sizes for tokenization. I’d wager thicker lines would help, although obviously not all of this is just ‘poor tokenization’.