And QLora makes 70B training relatively affordable.
But as a random aside, consider starting with an existing finetune instead of base llama 70B, and match the formatting in your dataset.
I really wish there was (maybe there is?) a 7 or 13B (or 70B) version with extended context (at least 16k) and function calling support ala OpenAI.
Both exist on their own, I don't know of a combination.
And QLora makes 70B training relatively affordable.
But as a random aside, consider starting with an existing finetune instead of base llama 70B, and match the formatting in your dataset.