It's definitely not the case for me. I have models trained on the same dataset which is 14mb (though I needed to tweak more for the 1.5b).
1.5b outperforms it here if trained long enough - in this case 1-2 months as I was doing it all for free on Colab.
One of the big things was batching - it seems like nobody really tries todo larger batches the biggest models, and without batching but while having little data the model was getting stuck.
I train for maybe ~12 hours a day, some days, especially around Christmas I didn't. I also lost a lot of days when trying out different stuff or when the weights didn't save to drive before the Colab timed out.
Having said that, I was training the full model with an accumulated batch size for a while so it was taking > 10min per step. I've also been using pretty low learning rates for most of the latter stages.
Overall the model is currently at ~11k steps and the loss can actually go down further but after playing with different checkpoints last week, the best one didnt seem to be the newest one so I left it at that one.
1.5b outperforms it here if trained long enough - in this case 1-2 months as I was doing it all for free on Colab.
One of the big things was batching - it seems like nobody really tries todo larger batches the biggest models, and without batching but while having little data the model was getting stuck.