ULM-FiT and OpenAI's Transformer\* are quite different. Both are pretrained lang...

ULM-FiT and OpenAI's Transformer* are quite different. Both are pretrained language-models, but ULM-FiT is a standard stack of LSTMs with a particular recipe for fine-tuning, whereas the OpenAI's Transformer uses the much newer Transformer architecture, and no really fancy tricks in the actual fine-tuning. I suspect the difficulty is with the Transformer model itself - this is not the first time I've heard that it is difficult to train.

* = To be clear, this refers to OpenAI's pretrained Transformer model. The Transformer architecture was from work at Google.