> Was this not sort of the clear implication of the fact that most LLMs are currently only being trained with one epoch?
Slight nit: Many public LLMs are trained for at least slightly over one epoch, and usually several epochs on particular subsets of the data (like wikipedia).
Source? Maybe several epochs on some very small subsets, but my strong impression was that it was 1 epoch in the pre-training run for pretty much all of the top LLMs.
Slight nit: Many public LLMs are trained for at least slightly over one epoch, and usually several epochs on particular subsets of the data (like wikipedia).