Hi everyone,
I am looking to fine-tune a Llama 2 (the 7B and 70B to see if there is a big difference), and I am looking at the different Cloud options for GPUs.
There are of course the big cloud providers like AWS, and the smaller ones like Paperspace and co.
I am trying to benchmark each in terms of price, ease of use, quick availability of GPUs, and feature-richness.
Could you share the insights on big vs small cloud providers when training a LLM? If you have other criteria to make a decision I would be interested too!
When you cannot get any improvements over several days/weeks of different experiments and have converged somewhat, then move up to the next model size, and do that to convergence. Then repeat this loop as you go.
Same concept as mipmapping, just with resources. How you use your resources is more important than the ones you have. I've made the vast majority of my own big discoveries with a T4 or a single A100, generally speaking, and I've done this for years.
In terms of providers, I like Lambda the best, personally, but I do a shocking amount of work in Colab Pro due to its iterative nature. I believe I've had GPU availability issues for both of them, however.