>> We can (for programming, at least): run the output thru theorem prover, ensur...

red75prime · on Aug 26, 2024

> Which one do you mean?

Ah, I said "theorem prover", I should have said "proof verifier". What I meant is something like DeepMind's AlphaProof with an additional step of generating a formal specification from a natural language description of the problem. In this way we get a semantically correct program wrt the formal specification. But with current generation of LLMs we probably won't get anything for non-trivial problems (the LLM won't be able to generate a valid proof).

> Note that in program synthesis we usually make a distinction between complete and incomplete specifications

Program synthesis begins after you can coherently express an idea of what you want to do. And getting to this point might involve a ton of reasoning that will not go into a program synthesis pipeline. That's what I mean when I say an "informal problem statement": some brain dumps of half-baked ideas that doesn't even constitute an incomplete specification because they are self-contradictory (but you haven't noticed it yet).

LLMs can help here by trying to generate some specification based on a brain dump.

YeGoblynQueenne · on Aug 26, 2024

>> What I meant is something like DeepMind's AlphaProof with an additional step of generating a formal specification from a natural language description of the problem.

That's even more ambitious. From DeepMind's post on AlphaProof:

First, the problems were manually translated into formal mathematical language for our systems to understand.

https://deepmind.google/discover/blog/ai-solves-imo-problems...

DeepMind had to resort to this manual translation because LLMs are not reliable enough, and natural language is not precise enough, to declare a complete specification of a formal statement, like a program or a mathematical problem (as in AlphaProof) at least not easily.

I think you point that out in the rest of your comment but you say "the LLM won't be able to generate a valid proof" where I think you meant to say "a valid specification". Did I misunderstand?

>> Program synthesis begins after you can coherently express an idea of what you want to do.

That's not exactly right. There are two kinds of program synthesis. Deductive program synthesis is when you have a complete specification in a formal language and you basically translate it to another language, just like with a compiler. That's when you "coherently express an idea of what to do". Inductive program synthesis is when you have an incomplete specification, consisting of examples of program behaviour, usually in the form of example pairs of the inputs and outputs of the target program, but sometimes program traces (like debug logs), abstract syntax trees, program schemas (a kind of rich program template) etc.

Input-output examples are the simplest case. Today, if you can express your problem in terms of input-output examples there are approaches that can synthesize a program that is consistent with the examples. You don't even need to know how to write that program yourself.

red75prime · on Aug 27, 2024

> where I think you meant to say "a valid specification". Did I misunderstand?

What do you mean when you say "a valid specification"? There are known algorithms to check validity of a proof. How do you check that specification is valid? People are inspecting it and agree that "yes, it seems to be expressing what was intended to be expressed in the natural language", or, "no, this turn of phrase needs to be understood in a different way" or some such. Today there's no other system that can handle this kind of a task besides humans (who are fallible) and LLMs (that are much more fallible).

That is deciding that specification is valid cannot be done without human involvement. I left that part out and focused on what we can mechanistically check (that is validity of a proof).

So, no, I didn't mean "a valid specification". And, yes, I don't think that today's LLMs would be good at producing specifications that would be deemed valid by a consensus of experts.

> Today, if you can express your problem in terms of input-output examples there are approaches that can synthesize a program that is consistent with the examples

In a limited domain with agreed-upon rules of generalization? Sure. In general? No way. The problem of generalizing from a limited number of examples with no additional restrictions is ill-defined.

And the problem "generalize as an expert would do" is in the domain of AI.