Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is a dumb question I know, but how expensive is model distillation? How much training hardware do you need to take something like this and create a 7B and 12B version for consumer hardware?


The process involves running the original model. You can rent these big GPUs for ~$10 per hour, so that is ~$160 per hour for as long as it takes


You can rent H100s for $1.50/gpu/hr these days.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: