Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Call me when it can do Russian Cursive.


Seems to do an OK job:

https://g.co/gemini/share/e173d18d1d80

This is a random image from Twitter with no transcript or English translation provided, so it's not going to be in the training data.


That's Gemini 2.5 Flash btw

The result from Gemini 3 Pro using the default media resolution (the medium one): "(Заголовок / Header): Арсеньев (Фамилия / Surname - likely "Arsenyev")

    Состояние удовл-

    t N, кожные

    покровы чистые,

    [л/у не увел.]

    В зеве умерен. [умеренная]

    гипер. [гиперемия]

    В легких дыха-

    ние жесткое, хрипов

    нет. Тоны серд-

    [ца] [ритм]ичные.

    Живот мяг-

    кий, б/б [безболезненный].

    мочеисп. [мочеиспускание] своб. [свободное]

    Ds: ОРЗ [или ОРВИ]" and with the translation: "Arsenyev
Condition satisfactory. Temp normal, skin coverings [skin] are clean, lymph nodes not enlarged. In the throat [pharynx], moderate hyperemia [redness]. In the lungs, breathing is rigid [hard], no rales [crackles/wheezing]. Heart tones are rhythmic. Abdomen is soft, painless. Urination is free [unhindered]. Diagnosis: ARD (Acute Respiratory Disease)."


My first language is Russian. I can't fully understand this dreaded "doctor's cursive", but I can see that some parts of Gemini's text is probably wrong.

It's most likely "но кашель сохр-ся лающий" ("but barking cough is still present"), not "кожные покровы чистые" ("the skin is clean"). Diagnose is probably wrong too. Judging by symptoms it should be "ОРЗ", but I have no idea what's actually written there.

Still, it's very, very impressive.


Ok fine I'm impressed


No, transcription has nothing to do with written text, it guessed few words here and there but not even general topic. That's doctors note about patient visit, beginning with "Прием: состояние удовл., t*, но кашель / patient visit: condition is OK, t(temperature normal?) but coughing". But unreadable doctors handwriting is a meme...


This is a historical church document from 19th century and Gemini got it right with common words but completely hallucinated the names of village and people.

https://gemini.google.com/share/f98de1d5ac55


Right, it can do modern writing but anything older than a century ( church records and census)and it produces garbage. Yandex Archives figured that out and have CER in a single digit but they have the resources to collect immense data for training. I'm slowly building a dataset for finetuning TROCR model and the best it can do is CER 18% ... which is sort of readable.


How do you do, fellow TrOCR fine-tuner?

I'm using TrOCR because it's a smaller model that I can fine tune on a consumer card, but the age of the model and resources certainly make it a challenge. The official notebook for fine tuning hasn't been updated in years and has several errors due to the march of progress in the primary packages.


I think I based my notebook on the official example but yes at some point new versions of the libraries completely broke it. I had to pin the versions for it to work again.

This one works, you can check the versions https://pastebin.com/QPjGHN8j




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: