Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

ULM-FiT and OpenAI's Transformer* are quite different. Both are pretrained language-models, but ULM-FiT is a standard stack of LSTMs with a particular recipe for fine-tuning, whereas the OpenAI's Transformer uses the much newer Transformer architecture, and no really fancy tricks in the actual fine-tuning. I suspect the difficulty is with the Transformer model itself - this is not the first time I've heard that it is difficult to train.

* = To be clear, this refers to OpenAI's pretrained Transformer model. The Transformer architecture was from work at Google.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: