Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Interesting that OpenBLAS and MPS are reportedly nearly the same speed although the README sounds like only MPS uses the GPU.




I think that this is because the current code does a terrible job at taking the activations in the GPU and fusing the kernels. This is the next thing to fix in this implementation indeed.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: