> Which part of the pricing seems high, platform or token pricing? Both? You sai...

farouqaldori · on Oct 10, 2024

That's true when looking solely at fine-tuning costs. In theory, you could fine-tune a model locally and only cover electricity expenses. However, we provide a complete end-to-end workflow that simplifies the entire process.

Once a model is fine-tuned, you can run inference on Llama 3.2 3B for as low as $0.12 per million tokens. This includes access to logging, evaluation, and continuous dataset improvement through collaboration, all without needing to set up GPUs or manage the surrounding infrastructure yourself.

Our primary goal is to provide the best dataset for your specific use case. If you decide to deploy elsewhere to reduce costs, you always have the option to download the model weights.

kouteiheika · on Oct 11, 2024

Sure, I'm just comparing the baseline costs of finetuning. Assuming you own the hardware and optimize the training I'm guessing you could easily get the costs significantly lower than $0.1/M tokens (considering I can get the $0.1/M right now using publicly rented GPUs, and whoever I'm renting the GPU from is still making money on me), and if you're only doing LoRA that cost would go down even further (don't have the numbers on hand because I never do LoRA finetuning, so I have no idea how much faster that is per token compared to full finetuning).

So your $2/M tokens for LoRA finetuning tells me that you either have a very (per dollar) inefficient finetuning pipeline (e.g. renting expensive GPUs from AWS) and need such a high price to make any money, or that you're charging ~20x~30x more than it costs you. If it's the latter - fair enough, some people will pay a premium for all of the extra features! If it's the former - you might want to consider optimizing your pipeline to bring those costs down. (: