Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>> We can (for programming, at least): run the output thru theorem prover, ensure that proof is constructive, the Curry-Howard correspondence guarantees that you can turn the output into a correct program. It doesn't guarantee that formal properties of the program correspond to the informal problem statement. But even people occasionally make such errors (a provably correct program doesn't do what we wanted it to do).

That sounds very ambitious. Automated theorem provers are real sticklers for complete specifications in a formal language and can't parse natural language at all, but when you generate code with an LLM all you have in terms of a specification is a natural language prompt (that's your "informal problem statement"). In that case what exactly is the prover going to prove? Not the natural language prompt it can't parse!

The best you can do if you start with a natural language specification, like an LLM prompt, is to verify that the generated program compiles, i.e. that it is correct syntactically. As to semantic correctness, there, you're on your own.

Edit: I'm not really sure whether you're talking about syntactic or semantic correctness after all. Which one do you mean?

>> You just can't make a system that guarantees correct transformation from an informal problem statement into a formally correct implementation. "Informal" implies that there's wiggle room for interpretation.

Note that in program synthesis we usually make a distinction between complete and incomplete specifications ("problem statements") not formal and informal. An incomplete specification may still be given in a formal language. And, for the record, yes, you can make a system that guarantees that an output program is formally consistent with an incomplete specification. There exist systems like that already. You can find a bit about this online if you search for "inductive program synthesis" but the subject is spread over a wide literature spanning many fields so it's not easy to get a clear idea about it. But, in general, it works and there are approaches that give you strong theoretical guarantees of semantic correctness.



> Which one do you mean?

Ah, I said "theorem prover", I should have said "proof verifier". What I meant is something like DeepMind's AlphaProof with an additional step of generating a formal specification from a natural language description of the problem. In this way we get a semantically correct program wrt the formal specification. But with current generation of LLMs we probably won't get anything for non-trivial problems (the LLM won't be able to generate a valid proof).

> Note that in program synthesis we usually make a distinction between complete and incomplete specifications

Program synthesis begins after you can coherently express an idea of what you want to do. And getting to this point might involve a ton of reasoning that will not go into a program synthesis pipeline. That's what I mean when I say an "informal problem statement": some brain dumps of half-baked ideas that doesn't even constitute an incomplete specification because they are self-contradictory (but you haven't noticed it yet).

LLMs can help here by trying to generate some specification based on a brain dump.


>> What I meant is something like DeepMind's AlphaProof with an additional step of generating a formal specification from a natural language description of the problem.

That's even more ambitious. From DeepMind's post on AlphaProof:

First, the problems were manually translated into formal mathematical language for our systems to understand.

https://deepmind.google/discover/blog/ai-solves-imo-problems...

DeepMind had to resort to this manual translation because LLMs are not reliable enough, and natural language is not precise enough, to declare a complete specification of a formal statement, like a program or a mathematical problem (as in AlphaProof) at least not easily.

I think you point that out in the rest of your comment but you say "the LLM won't be able to generate a valid proof" where I think you meant to say "a valid specification". Did I misunderstand?

>> Program synthesis begins after you can coherently express an idea of what you want to do.

That's not exactly right. There are two kinds of program synthesis. Deductive program synthesis is when you have a complete specification in a formal language and you basically translate it to another language, just like with a compiler. That's when you "coherently express an idea of what to do". Inductive program synthesis is when you have an incomplete specification, consisting of examples of program behaviour, usually in the form of example pairs of the inputs and outputs of the target program, but sometimes program traces (like debug logs), abstract syntax trees, program schemas (a kind of rich program template) etc.

Input-output examples are the simplest case. Today, if you can express your problem in terms of input-output examples there are approaches that can synthesize a program that is consistent with the examples. You don't even need to know how to write that program yourself.


> where I think you meant to say "a valid specification". Did I misunderstand?

What do you mean when you say "a valid specification"? There are known algorithms to check validity of a proof. How do you check that specification is valid? People are inspecting it and agree that "yes, it seems to be expressing what was intended to be expressed in the natural language", or, "no, this turn of phrase needs to be understood in a different way" or some such. Today there's no other system that can handle this kind of a task besides humans (who are fallible) and LLMs (that are much more fallible).

That is deciding that specification is valid cannot be done without human involvement. I left that part out and focused on what we can mechanistically check (that is validity of a proof).

So, no, I didn't mean "a valid specification". And, yes, I don't think that today's LLMs would be good at producing specifications that would be deemed valid by a consensus of experts.

> Today, if you can express your problem in terms of input-output examples there are approaches that can synthesize a program that is consistent with the examples

In a limited domain with agreed-upon rules of generalization? Sure. In general? No way. The problem of generalizing from a limited number of examples with no additional restrictions is ill-defined.

And the problem "generalize as an expert would do" is in the domain of AI.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: