Call me when it can do Russian Cursive.

decimalenough · 2025-12-03T09:58:24 1764755904

Seems to do an OK job:

This is a random image from Twitter with no transcript or English translation provided, so it's not going to be in the training data.

GaggiX · 2025-12-03T10:42:24 1764758544

That's Gemini 2.5 Flash btw

The result from Gemini 3 Pro using the default media resolution (the medium one): "(Заголовок / Header): Арсеньев (Фамилия / Surname - likely "Arsenyev")

    Состояние удовл-

    t N, кожные

    покровы чистые,

    [л/у не увел.]

    В зеве умерен. [умеренная]

    гипер. [гиперемия]

    В легких дыха-

    ние жесткое, хрипов

    нет. Тоны серд-

    [ца] [ритм]ичные.

    Живот мяг-

    кий, б/б [безболезненный].

    мочеисп. [мочеиспускание] своб. [свободное]

    Ds: ОРЗ [или ОРВИ]" and with the translation: "Arsenyev

Condition satisfactory. Temp normal, skin coverings [skin] are clean, lymph nodes not enlarged. In the throat [pharynx], moderate hyperemia [redness]. In the lungs, breathing is rigid [hard], no rales [crackles/wheezing]. Heart tones are rhythmic. Abdomen is soft, painless. Urination is free [unhindered]. Diagnosis: ARD (Acute Respiratory Disease)."

red75prime · 2025-12-03T18:41:37 1764787297

My first language is Russian. I can't fully understand this dreaded "doctor's cursive", but I can see that some parts of Gemini's text is probably wrong.

It's most likely "но кашель сохр-ся лающий" ("but barking cough is still present"), not "кожные покровы чистые" ("the skin is clean"). Diagnose is probably wrong too. Judging by symptoms it should be "ОРЗ", but I have no idea what's actually written there.

Still, it's very, very impressive.

__alexs · 2025-12-03T18:27:42 1764786462

Ok fine I'm impressed

shatsky · 2025-12-03T10:20:08 1764757208

No, transcription has nothing to do with written text, it guessed few words here and there but not even general topic. That's doctors note about patient visit, beginning with "Прием: состояние удовл., t*, но кашель / patient visit: condition is OK, t(temperature normal?) but coughing". But unreadable doctors handwriting is a meme...

myth_drannon · 2025-12-03T15:37:04 1764776224

This is a historical church document from 19th century and Gemini got it right with common words but completely hallucinated the names of village and people.

https://gemini.google.com/share/f98de1d5ac55

myth_drannon · 2025-12-03T13:51:12 1764769872

Right, it can do modern writing but anything older than a century ( church records and census)and it produces garbage. Yandex Archives figured that out and have CER in a single digit but they have the resources to collect immense data for training. I'm slowly building a dataset for finetuning TROCR model and the best it can do is CER 18% ... which is sort of readable.

coredog64 · 2025-12-03T15:29:27 1764775767

How do you do, fellow TrOCR fine-tuner?

I'm using TrOCR because it's a smaller model that I can fine tune on a consumer card, but the age of the model and resources certainly make it a challenge. The official notebook for fine tuning hasn't been updated in years and has several errors due to the march of progress in the primary packages.

myth_drannon · 2025-12-03T22:40:37 1764801637

I think I based my notebook on the official example but yes at some point new versions of the libraries completely broke it. I had to pin the versions for it to work again.

This one works, you can check the versions https://pastebin.com/QPjGHN8j