Reimagining mathematics in a world of reasoning machines [video]

versteegen · on Dec 22, 2024

Terence Tao wrote his comments on this talk here:

https://mathstodon.xyz/@tao/113601397729489852

> Akshay Venktesh gave a thoughtful and accessible talk recently entitled "(Re)imagining mathematics in a world of reasoning machines". I particularly liked his highlighting of Davis and Hersh's pithy definition of mathematics (which I had not been previously aware of) as "the study of mental objects with reproducible properties"; it is a nice way to abstract out the most essential features of the diverse universe of mathematical objects (numbers, shapes, functions, etc.) that mathematicians actually study.

> (My own personal definition of mathematics is more descriptive than prescriptive: mathematics is what mathematicians do, and mathematicians are the people who study mathmatics. This sounds like a tautologically circular definition, but I think of it more as describing the equations of motion of a stochastic dynamical system, in which breakthroughs in mathematics attract more attention by mathematicians, which in turn redefines the relative weighting of different fields of mathematics.)

m3kw9 · on Dec 22, 2024

Just waiting on what Terrance Tao thinks of the new o3 intern

pegasus · on Dec 22, 2024

When can infer somewhat from this section in the FrontierMath paper [1]:

"Assessment of FrontierMath difficulty

All four mathematicians characterized the research problems in the FrontierMath benchmark as exceptionally challenging, noting that the most difficult questions require deep domain expertise and significant time investment. For example, referring to a selection of several questions from the dataset, Tao remarked, "These are extremely challenging. I think that in the near term basically the only way to solve them, short of having a real domain expert in the area, is by a combination of a semi-expert like a graduate student in a related field, maybe paired with some combination of a modern AI and lots of other algebra packages...”

However, some mathematicians pointed out that the numerical format of the questions feels somewhat contrived. Borcherds, in particular, mentioned that the benchmark problems “aren’t quite the same as coming up with original proofs.”

It sounds like it will be able to crack some hard math problems, but not actually do mathematics. Which makes sense to me, given how these oN models are being trained. Synthetic data is bound to be in some (more or less obvious) way contrieved, since it's not like we can just generate tons of new/natural/original mathematical results to train the model on.

[1] https://arxiv.org/pdf/2411.04872

versteegen · on Dec 22, 2024

> It sounds like it will be able to crack some hard math problems, but not actually do mathematics.

What, do you say that because format of the FrontierMath problems is a bit contrived? I think I must misunderstanding you; a benchmark can't rule out something it doesn't test. And saying solving these problems requiring "deep domain expertise" isn't real mathematics sure sounds like a No True Scotsman argument. Why shouldn't o3 be able to produce novel mathematical proofs? It's got plenty of maths textbooks and papers to train on, the same thing mathematicians train on.

pegasus · on Dec 22, 2024

Look at the sample problems and it will be obvious. For example, look at all the arbitrary constants here:

"Let an for n ∈ Z be the sequence of integers satisfying the recurrence formula1 an = (1.981 × 1011)an−1 + (3.549 × 1011)an−2 − (4.277 × 1011)an−3 + (3.706 × 108 )an−4 with initial conditions ai = i for 0 ≤ i ≤ 3. Find the smallest prime p ≡ 4 mod 7 for which the function Z → Z given by n 7→ an can be extended to a continuous function on Zp."

pegasus · on Dec 22, 2024

Those proofs are not by far enough - which is why the raw frontier models don't do that well on this test. The oN models are trained on synthetic data: so generated mathematical theorems and associated proofs. If we could generate legitimately new math we would already have accomplished the hard task, so the generated theorems are going to feel contrieved. I.e. they won't be theorems that ever make it into a future math book.

Doesn't mean they won't be useful to mathematicians, they will, just like computers in general are useful in doing maths today.

versteegen · on Dec 22, 2024

Think about what happens when a student solves some math exercises using some maths they don't fully understand, or some example they contrived to test what they read. They might make some false starts, work through some steps, make reference to other ideas that they do understand, and finally find a path to the solution. This is actually incredibly analogous to LLM-generated synthetic training data. And nobody's going to print those crossed-out scrawls in a textbox either! But then the student learns from the chain-of-reasoning which arrived at the result. They learn what steps they should have followed one after the other so they can apply it effortlessly next time, what mathematical tool A on problem type B results in, and so forth. They distill some insights from it. Well LLMs can also learn many of these same things. They certainly don't learn as well or as much as humans (just look at their appalling data requirements), but my point is synthetic training data really does work. So expect training compute used on frontier models to keep scaling.

isotypic · on Dec 23, 2024

> and finally find a path to the solution.

But how does the student, or in your case the LLM, know that it actually has the solution? For students, this is done by: a grader grading the homework, asking the professor at OH, working on problems with other peers who crosscheck as you go. I see no reason why this LLM produced synthetic data, without this correction factor, would not devolve into a mess of incorrect, maybe even not-even-wrong style "proofs". And then how can training on this yield anything?

cevi · on Dec 23, 2024

When you get good enough at mathematics, you can tell if your proofs are correct or not without asking a TA to grade them for you. Most mathematicians reach this level before they finish undergrad (a rare few reach it before they finish high school). While AI hasn't reached this level yet, there is no fundamental barrier stopping it from happening - and for now, researchers can use formal proof-checking software like Lean, Coq, or Isabelle to act as a grader.

(In principle, it should be also be possible to get good enough at philosophy to avoid devolving into a mess of incoherence while reasoning about concepts like "knowledge", "consciousness", and "morality". I suspect some humans have achieved that, but it seems rather difficult to tell...)

isotypic · on Dec 24, 2024

> When you get good enough at mathematics, you can tell if your proofs are correct or not without asking a TA to grade them for you.

This is simply not true - you can get a very good sense of when your argument is correct, yes. But having graded for (graduate, even!) courses, even advanced students make mistakes. It's not limited to students, either; tons of textbooks have significant errata, and its not as if no retraction in math has ever been issued.

These get corrected by talking with other people - if you have an LLM spew out this synthetic chain-of-reasoning data, you probably get at least some wrong proofs, and if you blindly try to scale with this I would expect it to collapse.

Even tying into a proof-checker seems non-trivial to me. If you work purely in the proof-checker, you never say anything wrong - but the presentations in proof checking language is very different from textual ones, so I would anticipate issues of the LLM leveraging knowledge from, say, textbooks in its proofs. You might also run into issues of the AI playing a game against the compiler rather than building understanding (you see elements of this in the proofs produced by AlphaProof). And if you start mixing natural language and proof checkers, you've just kicked the verification can up the road a bit, since you need some way of ensuring the natural language actually matches the statements being shown by the proof checker.

I don't think these are insurmountable challenges, but I also don't think its as simple as the "generate synthetic data and scale harder" approach the parent comment thinks. Perhaps I'm wrong - time will tell.

cevi · on Dec 25, 2024

The error rate of human mathematical work is not zero, but it does go down exponentially with the amount of time that the mathematician spends carefully thinking about the problem. Mistakes tend to be the result of typos, time pressure, or laziness - showing your work to others and having them check it over does help (it's one of the reasons we have peer review!), but is absolutely not necessary.

If the error rate is low enough - and by simply spending a constant factor more time finding and correcting errors, you can get it below one in a million - then you do get a virtuous feedback loop even without tying in a proof-checker. That's how humans have progressed, after all. While you are right to say that the proof-checker approach certainly is not trivial, it is currently much easier than you would expect - modern LLMs are surprisingly good at converting math written in English directly to math formalized in Lean.

I do think that LLMs will struggle to learn to catch their mistakes for a while. This is mostly because the art of catching mistakes on your own is not taught well (often it is not taught at all), and the data sets that modern LLMs train on probably contain very, very few examples of people applying this art.

A tangent: how do human mathematicians reliably manage to catch mistakes in proofs? Going line-by-line through a proof and checking that each line logically follows from the previous lines is what many people believe we do, but it is actually a method of last resort - something we only do if we are completely lost and have given up on concretely understanding what is going on. What we really do is build up a collection of concrete examples and counterexamples within a given subject, and watch how the steps of the proof play out in each of these test cases. This is why humans tend to become much less reliable at catching mistakes when they leave their field of expertise - they haven't yet built up the necessary library of examples to allow them to properly interact with the proofs, and must resort to reading line by line.

pegasus · on Dec 23, 2024

It does work, for in-distribution detection/generalization. Which is avery useful tool for a mathematician (basically, a calculator on steroids), but it's not at all what a good mathematician ultimately does. Discovering new maths is by definition stepping out of distribution and into the unknown. There, mathematical intuition is much more important, and that's not something we understand and even less can model today.

ocular-rockular · on Dec 23, 2024

Why does it matter what Terrance Tao thinks? Genuinely I find this obsession with what some specific math researcher thinks about x topic bizarre.

What do you think of it?

famouswaffles · on Dec 23, 2024

Tao is a Fields Medal winner and widely regarded as one of the most brilliant mathematics of his generation. He also often tests these models in good faith.

Why wouldn't someone wish to know what he thought of this new advance?

ocular-rockular · on Dec 28, 2024

Because who cares? There are any number of mathematicians out there, why put so much value on one "thought leader"? I just find HNs obsession with knowing what one particular person thinks odd. It's a consistent reoccurrence.

famouswaffles · on Dec 31, 2024

There are not many mathematicians on Tao's level to put it mildly. And evidently, lots of people care lol. Yes people are interested in what an expert has to say about a potential advancement in their field. This shouldn't be hard to grasp.

ocular-rockular · on Jan 5, 2025

Ok. I'm fine agreeing to disagree.

sleno · on Dec 22, 2024

nothing very interesting was said

revskill · on Dec 22, 2024

My wish is for math to remove all the gatekeeping layers and one day, everyone can do maths, just like with programming.

Tainnor · on Dec 22, 2024

Everyone can do maths. All you need is pencil and paper. Or download Lean, if that's your preference.

Of course it's possible that instead of "no gatekeeping" what you mean is "free tutoring".

godelski · on Dec 22, 2024

I'm curious, what do you see as "gatekeeping layers" in math? Books? Material? Pedagogy? Nomenclature? Something else?

wslh · on Dec 22, 2024

Not OP, but I'd like to add the term “learning complexity reduction” which relates closely to “pedagogy”.

Reflecting on my experience as a CS graduate with a strong interest in math (though not very advanced knowledge), I realize I would have benefited from a more intuitive, black-box approach when first engaging with complex topics, rather than diving straight into their intricacies.

godelski · on Dec 22, 2024

I think this is kinda like the C++ vs Python as your first language question. C++ is better if you want to really learn programming but it’s harder to get to the magic of your first real program. Python gets people making programs sooner but it’s harder to build good habits and get that deep understanding. It depends what your goals are. There’s stuff like this for math but it’s kinda like trying to learn math from 3Blue1Brown videos. They sure help, but you eventually need the rigor

wslh · on Dec 23, 2024

I'm not suggesting that you can fully learn math in this way. My point is that the approach you take can boost your confidence and motivate you to progress further. Of course, some people don’t need that kind of push or entry point.

I've seen other great examples of this from good professors. For instance, in probability and statistics courses, we would validate our results by combining theoretical concepts with simple Monte Carlo simulations to ensure they matched.

godelski · on Dec 23, 2024

Sure, I don't disagree. But there's a lot of trash and not a lot of gold. It is hard to find the gold and to make the problem even worse, what's gold is trash to others. Worse than that, what may be trash to you at one point in time may be gold at another, and vise versa. It's unfortunate, but I think this is true in any learning process. There's lots of great programming articles on Medium, but you're more likely to learn garbage, and even more likely to not know you are.

globalnode · on Dec 23, 2024

the only potential gatekeeping ive noticed in mathematics is (some) teachers that arent really teaching but more just showing off what they know. its a waste of everyone's time. i say potential because im sure its not always the teachers' fault, its likely just me being dumb.

Tainnor · on Dec 23, 2024

There are definitely some people that either don't understand that their explanations are completely confusing to beginners or that don't even care. Sadly, teaching isn't always valued in academia.

godelski · on Dec 23, 2024

Unfortunately lots of things are obvious or easy after you’ve learned them but seemingly impenetrable before then. Think about calculus. I failed my first time but now it’s easy. Though I think this makes me a better teacher because it’s harder to forget that it was hard at one point.

Al0neStar · on Dec 22, 2024

Not really gatekeeping, but as someone who likes to self-study, not having solutions is very annoying.

nicf · on Dec 22, 2024

I work as a private tutor for proof-based math, and I have a lot of students who've spent some time self-studying before coming to me. The comment by godelski matches my experience: the biggest obstacle seems to be the fact that it's hard to learn how to check your own proofs if no one has ever taught you how. I see a lot of variation in how well people have managed to develop that skill on their own.

Having more textbooks with solutions to the exercises would probably help a lot with this, especially if you used the solutions judiciously. I think the fact that this isn't more common sadly has a lot to do with their role in undergraduate teaching: every exercise that has a solution in the back of the book is one that college students can very easily cheat on. I definitely agree that it's frustrating that the product has to be made worse for everyone else just because some people would misuse the better version. Far from the only such case in the world!

godelski · on Dec 22, 2024

Yeah it’s a catch-22. I’m in grad school and occasionally I get to be instructor. When I am I focus far less on tests and more on homeworks and projects (I do ML so it’s well suited for that style). The homeworks are made to be “play around” and the project is to be very self driven (with plenty of help, but they are juniors or seniors so they be fairly self reliant) and to find passion.

The reason I do this is because grades matter so much to students that even if they care to learn material they are incentivized to cheat (and subsequently cheat themselves). I think a lot of academics still don’t get this and are resistant to change (it is a lot of work to create a class but not to much once you worked everything out).

I think this confidence thing is also something that needs to be learned in every subject. Even in CS the compiler, type checking, and even unit tests aren’t enough (though they are extremely useful).

I should also say, one unfortunate thing I find in academic teaching of coding is we often don’t look at code. There’s not enough time. But to me this feels like trying to grade a math proof by looking only at the first and last lines. I think this builds lots of bad habits and over confidence

godelski · on Dec 22, 2024

I totally get that. I was fortunate to get properly educated up through essentially the level of an undergraduate math degree (minus maybe typology), but then continued learning a lot on my own. It’s common to hear that the struggle is part of the learning process. The more I’ve advanced the more I find this to be true. It’s that struggle that makes you pay attention to the small details that are so critical. I was also fortunate to have a professor who would pester me and he later told me he wanted me to be confident in my results, because eventually I would have no one to double check (and he was right). The struggle really helps with this.

I think the main difference between learning provisioning and math is a compiler. To learn either you can only learn by doing. Reading and lectures aren’t enough. What is hard to learn in math is to be the compiler yourself. To be able to verify “programs” (do I even need quotes here?). This is a very powerful tool to add to your tool belt and one I think even helps in programming.

I hope others can add advice here and words of encouragement. The struggle is real, but it is part of the process, for better or for worse.

globalnode · on Dec 23, 2024

Do you have any favourite books? like maybe 2 or 3 that you'd grab in the event of ww3 or a zombie apocalypse? I'm always on the look out for good self-learning maths books.

Al0neStar · on Dec 23, 2024

These books are what i would consider the best books for self-study. (I have gone through many textbooks after i went back to school to study statistics.)

• Real Analysis: A Long-Form Mathematics Textbook

• Proofs: A Long-Form Mathematics Textbook

• Mathematical Logic Through Python

• A Course in Calculus and Real Analysis

• A Course in Multivariable Calculus and Analysis

• Differential Equations With Applications and Historical Notes

• Probability and Random Processes (Grimmet)

• Probability Through Problems

• Fifty Challenging Problems in Probability with Solutions

• The Simple and Infinite Joy of Mathematical Statistics

• An Introduction to Error Analysis

globalnode · on Dec 24, 2024

Thank you for those, the Logic through Python one looks interesting at first glance. I've discovered a peculiarity about my own learning style with maths. If I can implement a concept in code, I can remember and understand it better!

wakawaka28 · on Dec 22, 2024

What gatekeeping? Anyone is free to do math to the extent of their abilities. Math is known for accepting contributions from other fields. You don't need special credentials in math to publish a book or paper about math. It helps but it is not essential. Some academic positions are very difficult to get without a proper degree in the subject but that applies to every field, and for good reason.

zxvkhkxvdvbdxz · on Dec 23, 2024

Not OP, but I read it like they meant gatekeeping == lack of skill, because OP seems to think that everybody knows programming now thanks to AI.

I hope I am misreading this.

zxvkhkxvdvbdxz · on Dec 23, 2024

> one day, everyone can do maths, just like with programming.

Everyone can absolutely not do programming with an AI.

What is true tho is that everyone who cannot write program code, also cannot evaluate if the code the AI produces is correct.

And we who do write alot of code are not all that impressed.

random3 · on Dec 22, 2024

Programming should remove all gatekeeping one day so everyone can do programming

Nevermark · on Dec 22, 2024

And gatekeeping, so many of us want to gatekeep but there too, other gatekeepers stand in our way

amelius · on Dec 22, 2024

Just build a gate around their gate.

olddustytrail · on Dec 22, 2024

Yes, like the Fields Medal. Everyone should get a Fields Medal, that way we're all winners.

Or the Riemann Hypothesis! Why not call it the American Hypothesis? That way everyone could think it.