Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The post mentions not getting great results with OpenAI Transformer. I haven't tried that, but using a similar framework, ULM-FiT, I narrowly beat the fasttext benchmark on a 250-class dataset we use internally. I will follow up with how it does on this data set.


ULM-FiT and OpenAI's Transformer* are quite different. Both are pretrained language-models, but ULM-FiT is a standard stack of LSTMs with a particular recipe for fine-tuning, whereas the OpenAI's Transformer uses the much newer Transformer architecture, and no really fancy tricks in the actual fine-tuning. I suspect the difficulty is with the Transformer model itself - this is not the first time I've heard that it is difficult to train.

* = To be clear, this refers to OpenAI's pretrained Transformer model. The Transformer architecture was from work at Google.


Do any freelance work? We have a small fastText project. Email in profile if you're interested.


I'd be very interested to know, thanks!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: