Claude has certainly been getting better with TLA+. It's not perfect yet but for laughs I got it to model the rules of Monopoly last night [1]. I haven't done any exhaustive checking on it yet, but it certainly looks passable.
It is pretty impressive at how good it's gotten at this, in a relatively short amount of time no less. I still usually write my specs by hand, but who knows how much longer I'll be doing that.
It looks quite complicated and I have no idea what it is doing. Obviously, since I don't know about TLA+. But what about someone who knows TLA+? It still seems hard to make sure it is valid. And it's just for a relatively simple game.
> Decline to buy: property stays with bank (auction abstracted out)
Ignoring an entire game mechanic is really stretching the definition of “abstracted out”…
Also, at the bottom it defines a “Liveness: someone eventually wins” property which I believe cannot be proven. Monopoly doesn’t have any rules forcing the game to end eventually. There is only a probabilistic guarantee, and even that only applies if the players are trying to win; if the players are conspiring to prevent the game from ending then they’re unlikely to fail.
From the link above, I also found Leslie Lamport's Home page[1], The original developer of TLA+. Leslie's home page contains an incredible amount of info about TLA. There's a published Book with pdf available. There are video courses and talks.
There are also git repos of examples[2] and tools[3]
I've played a bit with having a Claude generate a TLA+ model, I review it, Claude reviews it, and then Claude checks for alignment with the actual system. The code in the system can be annotated with comments to link back to the model, and vice versa, to make human and AI review easier.
I think it's highly promising. I might even call it a success. For example, I was able to express a bug in the system and Claude was able to find me a property that would reveal that in the model; and then revise the model to eliminate that bug, and then revise the code accordinly.
Additionally (with OpenAI) I have a small guest management system I had the LLM generate a TLA+ model about the API which endpoints a client can request from it, and which tokens it learns from it (there are session ids, event ids, guest ids and invitation ids that interact in interesting ways, as guests are not required to have an invitation to the system, and they can have only magic-link protected account, resulting in different visibility on the contact details of others), and what data it can or cannot access with those tokens.
I still haven't manually verified, but it seems rather promising as well.
What's the advantage of provable correctness if it's apparently not easy to prove even for people who understand TLA+? I'm not trying to be a party pooper, just curious.
Isn't logical incorrectness less of a problem in software than failures of imagination or conscientiousness in modeling the domain?
My thought (as someone interested in formal verification but unable to grok the math) is it exists as a canary in any sufficiently-complex codebase I let AI create. Even if it's wrong, knowing that something "we" changed breaks an agreement things currently assume is valuable.
It is pretty impressive at how good it's gotten at this, in a relatively short amount of time no less. I still usually write my specs by hand, but who knows how much longer I'll be doing that.
[1] https://pdfhost.io/v/KU2j37YKrP_Monopoly