Why would you ever *want* an LLM that is a perfect calculator? Humans invented c...

ciphix · on Feb 10, 2025

While your engineering perspective emphasizes efficiency, it's worth noting that, akin to the human brain, we aim to develop powerful LLMs capable of performing complex cognitive tasks. Although they may operate more slowly, these models can, for instance, reason through intricate problems without external tools, much like Einstein conceptualized relativity through thought experiments or Andrew Wiles proved Fermat's Last Theorem through deep mathematical insight

esafak · on Feb 10, 2025

Solving FLT is not like using a calculator. You don't use the same skills. It is not mechanical.

levn11 · on Feb 11, 2025

i mean https://www.techrxiv.org/users/717330/articles/702287-on-fer......

sega_sai · on Feb 9, 2025

It is the question of capabilities. People use LLMs to prove theorems. It is therefore a relevant question whether llms can work as generic calculators. And if they can't it shows IMO something is missing.

daxfohl · on Feb 10, 2025

It depends what you mean by LLM, perfect, etc. You can train up a neural net pretty quickly to do basic addition perfectly. It just needs two inputs for the digits, plus one bit for carryover, and an output 0-19 (if base 10). Your code would do the iteration on digits. So once your NN is trained to map inputs to sums exactly, you've got your algorithm, and it's provably correct.

"That's cheating. You have custom code in the loop.": but that's what an LLM does; it feeds input tokens and feeds back output tokens through the LLM one by one. So.

Now, as far as a realistic LLM, no there's no way to prove that it will always get even 1+1=2 correct. There's always a chance that something in the context will throw it off. Generally LLMs are better at interpreting questions, finding some code that maps to the answer, executing that code, and spitting out the answer. As a case in point, try asking one to solve a sudoku. It will grab some code off github, run it, and give you the answer. Now ask it to solve it by pure reasoning step-by-step. It'll get hopelessly lost, tell you numbers are in the wrong places, tell you that eliminating 7 from {2,7} leaves only {3,8}, etc. (And then finally give you the correct answer, now _that's_ cheating!)

So, if not LLMs, and not handwritten loops, the only other option is single-shot. Can a NN be trained to do math in a single run? And the answer is not really. At least, not efficiently. If you think about it, a single run through a NN only has a limited number of steps. So it's going to be limited in what it can do. If your computation requires more steps than that, all your NN can do is guess.

So no, there's really no perfect "pure" AI for math. AI tools for math are generally a combination of NNs that make guesses, and hand-written code that checks or uses those guesses to generate some feedback and ask for next steps. Which, isn't too different from how humans do it either. Make a guess, try it out, look up references, look for tools, create a tool or modify an existing one, and so on until you get it right.

emporas · on Feb 10, 2025

Then you need a Large Arithmetic Model (LAM). We have that, it's called calculator.

The LLM could invoke several command line programs, including calculators or anything else in which a deterministic answer is desirable. Structured outputs for example, people usually mean Json output, but any schema like Xml or Html could be enforced by some command line tools, and when the validation fails, it should double check it's own output and hopefully fix it.

sebzim4500 · on Feb 10, 2025

>And if they can't it shows IMO something is missing

I don't think this follows, since they are trying to replace humans who are also not perfect at arithmatic.