yup, this is a pretty common occurrence in using LLMs for data extraction. For p...

llm_trw · on Feb 7, 2025

It's not possible with current gen models.

To even have a chance at doing it you'd need to start the training from scratch with _huge_ penalties for filling in missing information and a _much_ larger vision component to the model.

See an old post I made on what you need to get above sota OCR that works today: https://news.ycombinator.com/item?id=42952605#42955414

amelius · on Feb 7, 2025

Maybe ask it to return the bounding box of every glyph.

thegeomaster · on Feb 7, 2025

This universally fails, on anything from frontier models to Gemini 2.0 Flash in its custom fine-tuned bounding box extraction mode.