Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Big vs. small GPU clouds for fine-tuning LLMs
12 points by DanyWin on Aug 12, 2023 | hide | past | favorite | 16 comments
Hi everyone,

I am looking to fine-tune a Llama 2 (the 7B and 70B to see if there is a big difference), and I am looking at the different Cloud options for GPUs.

There are of course the big cloud providers like AWS, and the smaller ones like Paperspace and co.

I am trying to benchmark each in terms of price, ease of use, quick availability of GPUs, and feature-richness.

Could you share the insights on big vs small cloud providers when training a LLM? If you have other criteria to make a decision I would be interested too!



Start with 7B and refine as much as you possibly can there. Your value is going to be determined by the variable iteration_time, where your final equation involves 1/iteration_time. Any lessons learned there will effectively scale.

When you cannot get any improvements over several days/weeks of different experiments and have converged somewhat, then move up to the next model size, and do that to convergence. Then repeat this loop as you go.

Same concept as mipmapping, just with resources. How you use your resources is more important than the ones you have. I've made the vast majority of my own big discoveries with a T4 or a single A100, generally speaking, and I've done this for years.

In terms of providers, I like Lambda the best, personally, but I do a shocking amount of work in Colab Pro due to its iterative nature. I believe I've had GPU availability issues for both of them, however.


I have definitely run into GPU scarcity lately on Colab Pro, and I have VERY limited use compared to many people here or researchers and enthusiasts at large.


I've been wondering what much of the drive for it is, to be honest. Before I didn't have much problems securing A100s, now it seems to be a lot more frequent (having taken a short break from things).

I made a training speedrunning repository that sort of assumes anyone can grab an A100 on Colab, but I guess that's not true as much now. Which honestly is a bit of a surprise for me! DDDD::::


I've had good experiences hosting 4x 4090 and 6x a5000 / a6000 machines from Vast AI [0]. Surprisingly good, but some clients do have issues with multi-week uptime.

0 - https://cloud.vast.ai/?ref_id=74601


not afiliated with the url: https://www.unite.ai/best-gpu-hosting-providers/

also this: https://vast.ai/

maybe also vultr


I’ve written quite a bit about this. See https://gpus.llm-utils.org/cloud-gpu-guide/#which-gpu-cloud-... as well as some of the other posts linked in that “which gpu cloud should I use” section.

Edit: And I realize you emailed me the other day - hi again!

See also - https://hn.algolia.com/?dateRange=pastMonth&page=0&prefix=fa...


You can fine-tune Llama 2 7B/13B/70B on Cerebrium (https://www.cerebrium.ai). Disclosure: I am the founder.

We allow you to change all the hyper parameters such as num_epochs, learning_rate, tokenizer etc and even can submit your own prompt templates. If you want to use the recommended settings you can. It allows you to focus on your data and your hyperparameters rather than worrying about setup and infrastructure.

If you want to write your own code to train, you can do that also - the interface is as if you were developming locally:)


I use vast.ai to train 110M models - so much smaller but it is good value for money and I am sure you can scale it up. I just use a single RTX3090.

You might also consider Lambda Labs. Or GCP.


We have built Shadeform https://shadeform.ai, which has all the small cloud providers in a single, unified api and platform.


I guess you are early in your journey but there is no pricing


you can access our platform with live pricing and availability here: https://platform.shadeform.ai


What claud said.

And QLora makes 70B training relatively affordable.

But as a random aside, consider starting with an existing finetune instead of base llama 70B, and match the formatting in your dataset.


Is there a particular fine tune you would suggest starting from?

I really wish there was (maybe there is?) a 7 or 13B (or 70B) version with extended context (at least 16k) and function calling support ala OpenAI.

Both exist on their own, I don't know of a combination.


TBH I am out of the loop on finetunes, but you can search for "70B" on huggingface and sort by date.


I tried paperspace but their pipelines feature bugged out. The support was entirely unhelpful.

I would say not production ready.


I like runpod, although I've found that I typically have to set NCCL_P2P_DISABLE=1




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: