Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Was this not sort of the clear implication of the fact that most LLMs are currently only being trained with one epoch?

Slight nit: Many public LLMs are trained for at least slightly over one epoch, and usually several epochs on particular subsets of the data (like wikipedia).



Source? Maybe several epochs on some very small subsets, but my strong impression was that it was 1 epoch in the pre-training run for pretty much all of the top LLMs.


Llama off the top of my head: https://arxiv.org/pdf/2302.13971.pdf


Good one, I certainly managed to forget this and had "1" firmly etched in my brain.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: