They compare it against PaLM and Minerva, both of which are fine tuned on math datasets.
It outperforms both at roughly the same parameter count (eg Llama 65B vs Palm and Minerva 62B) but unclear how much of this is due to encoding vs the many other differences.
It is useful to see that the performance increase is clearly not due to fine turning though.
They compare it against PaLM and Minerva, both of which are fine tuned on math datasets.
It outperforms both at roughly the same parameter count (eg Llama 65B vs Palm and Minerva 62B) but unclear how much of this is due to encoding vs the many other differences.
It is useful to see that the performance increase is clearly not due to fine turning though.