Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Don’t forget machine-translated texts, where until ~2017 the translation was likely done by something much dumber / semantically lossy than an LLM, and after 2017 was basically done by an early form of LLM (the Transformers architecture originating in Google Translate.)

Many historical English-language news reports published on the English-language websites of foreign news media from non-English-speaking countries, from 1998 (Babelfish era) to ~a few months ago, may be unreliable training data for this reason.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: