Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I know what MoE is. Maybe read my comments more carefully and give me the benefit of the doubt.


My comment would've done an astoundingly bad job at introducing you to what mixture of experts is, had that been its goal. It's really about why the MoE-style enhancements don't target how to keep parts on disk when optimizing the model to be most economical to host. There's really not any doubt in that, it's just an observation as to why they optimize the way they do.

If you were put off by defining terms on first use: that's just good form, not something related to you.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: