Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The most relevant reported benchmark results are on https://arxiv.org/abs/2103.03874

They compare it against PaLM and Minerva, both of which are fine tuned on math datasets.

It outperforms both at roughly the same parameter count (eg Llama 65B vs Palm and Minerva 62B) but unclear how much of this is due to encoding vs the many other differences.

It is useful to see that the performance increase is clearly not due to fine turning though.



Where do I look, the paper was from 2021 before llama but did they update it (or you mean the llama paper mentions that benchmark)?


I posted the wrong link sorry!

https://arxiv.org/pdf/2302.13971.pdf (the Llama paper), table 7.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: