Hacker Newsnew | past | comments | ask | show | jobs | submit | AdieuToLogic's commentslogin

Both can be true if each group selectively provides LLM output supporting their position. Essentially, this situation can be thought of as a form of the Infinite Monkey Theorem[0] where the result space is drastically reduced from "purely random" to "likely to be statistically relevant."

For an interesting overview of the above theorem, see here[1].

0 - https://en.wikipedia.org/wiki/Infinite_monkey_theorem

1 - https://www.yalescientific.org/2025/04/sorry-shakespeare-why...


> This is artisans vs industry. ... But it's close enough often enough.

If I had a nickle every time I heard the equivalent of "close enough often enough" on root-cause analysis bridge calls during prod outages...


> I love this, it resonates so deeply with me. Code is, for me, joy.

A credo I have held for some time is:

  When making software, remember that it is a snapshot of
  your understanding of the problem.  It states to all,
  including your future-self, your approach, clarity, and
  appropriateness of the solution for the problem at hand.
  Choose your statements wisely.
HTH

Cool article. It makes me think about an "old school Unix" approach which might work for some use-cases.

Essentially, the untested brainstorming-only idea is:

  1. Make $HOME have 0751 permissions
  2. Assume the dev project exists in $HOME/foo and has
     0715 permissions
  3. Assume $HOME/foo/src is where all source code resides
     and has 0755 permissions (recursively)
  4. Install the agent tools with a uid:gid of something
     like llm:agent
  5. Turn on the setuid/setgid bits for executable(s) in
     the agent tools or make wrapper script(s) having
     same which delegate to agent tools
This would ensure agent tooling could not read nor modify $HOME, only be able to read $HOME/foo (top-level project directory) and its files (assuming `o+r` is the default), and could only modify files in $HOME/foo/src having `o+w` permission as well. If agent directory creation in $HOME/foo/src is desired, enable `o+w` on it and directories within it.

There is probably some "post agent use" processing that would be needed as well.


>> Please run at least a dev-container or a VM for the tools.

> I would like to know how to do this. Could you share your favorite how-to?

See: https://www.docker.com/get-started/

EDIT:

Perhaps you are more interested in various sandboxing options. If so, the following may be of interest:

https://news.ycombinator.com/item?id=46595393


Note that while containers can be leveraged to run processes at lower privilege levels, they are not secure by default, and actually run at elevated privileges compared to normal processes.

Make sure the agent cannot launch containers and that you are switching users and dropping privileges.

On a Mac you are running a VM machine that helps, but on Linux it is the user that is responsible for constraints, and by default it is trivial to bypass.

Containers have been fairly successful for security because the most popular images have been leveraging traditional co-hosting methods, like nginx dropping root etc…

By themselves without actively doing the same they are not a security feature.

While there are some reactive defaults, Docker places the responsibility for dropping privileges on the user and image. Just launching a container is security through obscurity.

It can be a powerful tool to improve security posture, but don’t expect it by default.


The "barrier to entry for building software" has not collapsed, as it was never about "where engineering shifts from writing code to shaping systems". It has always been about understanding the problem to solve and doing so in a provably correct manner.

Another way to reify this is:

  When making software, remember that it is a snapshot of 
  your understanding of the problem.  It states to all, 
  including your future-self, your approach, clarity, and 
  appropriateness of the solution for the problem at hand.  
  Choose your statements wisely.

> The file system as an abstraction is actually not that good at all beyond the basic use-cases. Imagine you need to find an email.

Unrelated to FUSE and MCP[1] agents, this scenario reminded me of using nmh[0] as an email client. One of the biggest reasons why nmh[0] is appealing is to script email handling, such as being able to use awk/find/grep/sed and friends.

0 - https://www.nongnu.org/nmh/

1 - https://en.wikipedia.org/wiki/Model_Context_Protocol


The author presents a false dichotomy when discussing "Why Not AI".

  ... there are some serious costs and reasonable 
  reservations to AI development. Let's start by listing 
  those concerns

  These are super-valid concerns. They're also concerns that 
  I suspect came around when we developed compilers and 
  people stopped writing assembly by hand, instead trusting 
  programs like gcc ...
Compilers are deterministic, making their generated assembly code verifiable (for those compilers which produce assembly code). "AI", such as "Claude Code (or Cursor)" referenced in the article, is nondeterministic in their output and therefore incomparable to a program compiler.

One might as well equate the predictability of a Fibonacci sequence[0] to that of a PRNG[1] since both involve numbers.

0 - https://en.wikipedia.org/wiki/Fibonacci_sequence

1 - https://en.wikipedia.org/wiki/Pseudorandom_number_generator


If LLMs were like compilers, you could put src/ into .gitignore and only upload the prompt.

Even the earliest compilers didn't work by the programmer writing code, copying the assembly output into their source tree, and throwing away the code.

This is not a value judgement, they simply aren't the same thing at all.


here you go, a prompt only library: https://github.com/dbreunig/whenwords

That's great. Here's "me" implementing a JS version of that library in one shot using Github Copilot and a 1 sentence prompt:

> Implement when.js as a simple, zero-dependency js library following SPEC.md exactly.

https://github.com/jncraton/whenwords/pulls


>> Compilers are deterministic, making their generated assembly code verifiable

This is true (to an extent), but the generated LLM code is also verifiable. We use automated tests to do it.


automated tests are not verification. The "llm as a compiler" provides zero guarantees about the code.

A compiler offers absolute guarantees that what you write is semantically preserved, barring bugs in the compiler itself. An llm provides zero guarantees even with zero bugs in the llm's code.


>> A compiler offers absolute guarantees

I think one of the sibling comments addresses this myth rather neatly: https://news.ycombinator.com/item?id=46563383

tl;dr compilers are not fully deterministic either.


please point out where i said "deterministic".

I said guarantees that semantics are preserved.


I don't know what you are arguing, or why. Please follow the thread in its full context. Specifically, the argument the article author is making is that moving to a higher level of abstraction also cost developers the benefit of understanding the internals. Ultimately, that ended up not mattering very much.

The OP pushed back on this, saying compilers are deterministic and LLMs are not, and that lack of determinism makes LLM output unverifiable. I said the latter is not true because you can perform verification using tests. You claimed tests are not verification because LLMs don't preserve the semantics.

I'm not sure why semantics matter. LLMs providing no guarantees regarding the preservation of semantics is not important because you can guarantee the behavior of the generated code using tests. In most domains, this is sufficient. You tell the LLM to write code that does X, Y and Z, and then verify X, Y and Z using a test. That's it.


no, writing tests to verify that the "compiled" code semantically matches the code in the source language is not a good thing. The guarantees that I'm talking about are different.

You write tests for your own logic, not to do the compiler's job.

I have no idea why you are so stuck on determinism. That has nothing to do with what i'm saying. Sure compilers can be nondeterministic with things such as register allocation, but that is totally transparent to the programmer. The compiled code will do exactly what the source code describes. The nondeterminism in llms does not apply just to those things. An llm's nondeterminism might mean it decides to encode different logic, instead of a different implementation that is logically equivalent.

We don't usually write steps to verify that the compiler decided to ignore our code and do its own thing. You have to do that with llms


Nobody suggested using LLMs as a compiler.

I suspect the argument is that both AI and a compiler enables building software at a higher level of abstraction.

Abstraction is only useful when it involves a consistent mapping between A and B, LLM’s don’t provide that.

In most contexts you can abstract the earth as a sphere and it works fine ex:aligning solar panels etc. Until you enter the realm of precision where treating the earth as a sphere utterly fails. There’s no realistic set of tests you can right where an unsupervised LLM’s output can be trusted to generate a complex system which works if it’s constantly being recreated. Actual compilers don’t have that issue.


> Compilers are deterministic, making their generated assembly code verifiable

People keep saying this like it is an absolute fact, whereas in reality it is a scale.

Compilers are more deterministic than LLMs in general, but no they are not completely deterministic. That's why making reproducible builds is hard!

https://stackoverflow.com/questions/52974259/what-are-some-e... and https://github.com/mgrang/non-determinism give some good examples of this.

This leads to the point: in general do we care about this non-determinism?

Most of the time, no we don't.

Once you accept that the next stage is accepting that most of the time the non-deterministic output of an LLM is good enough!

This leads to how do I verify it is good enough which leads to testing and then suddenly you have a working agentic loop....


>> Compilers are deterministic, making their generated assembly code verifiable

> People keep saying this like it is an absolute fact, whereas in reality it is a scale.

My statement is of course a generalization due to its terseness and focuses on the expectation of repeatable results given constant input, excluding pathological definitions of nondeterminism such as compiler-defined macro values or implementation defects. Modern compilers are complex systems and not really my point.

> This leads to the point: in general do we care about this non-determinism?

> Most of the time, no we don't.

Not generally the type of nondeterminism I described, no. Nor the nondeterministic value of the `__DATE__` macro referenced in the StackOverflow link you provided.

> Once you accept that the next stage is accepting that most of the time the non-deterministic output of an LLM is good enough!

This is where the wheels fall off.

First, "most of the time" only makes sense when there is another disjoint group of "other times." Second, the preferred group defined is "non-deterministic [sic] output of an LLM is good enough", which means the "other times" are when LLM use is not good enough. Third, and finally, when use of an approach (or a tool) is unpredictable (again, excluding pathological cases) given the same input, it requires an open set of tests to verify correctness over time.

That last point may not be obvious, so I will extrapolate as to why it holds.

Assuming the LLM in use has, or is reasonably expected to have, model evolution, documents generated by same will diverge unpredictably given a constant prompt. This implies prompt evolution will also be required at a frequency almost certainly different than unpredictable document generation intrinsic to LLMs. This in turn implies test expectations and/or definitions having to evolve over time with nothing changing other than undetectable model evolution. Which means any testing which exists at one point in time cannot be relied upon to provide the same verifications at a later point in time. Thus the requirement of an open set of tests to verify correctness over time.

Finally, to answer your question of:

  how do I verify it is good enough
You can't, because what you describe is a multi-story brick house built on a sand dune.

> Assuming the LLM in use has, or is reasonably expected to have, model evolution, documents generated by same will diverge unpredictably given a constant prompt.

So what?

You tell it once. It writes code.

You test that code, not the prompt.


> This leads to the point: in general do we care about this non-determinism?

> Most of the time, no we don't.

well that’s a sweeping generalisation. i think this is a better generalised answer to your question.

> It depends on the problem we’re trying solve and the surrounding conditions and constraints.

software engineering is primarily about understanding the problem space.

are 99% of us building a pacemaker? no. but that doesn’t mean we can automatically make the leap to assuming a set of tools known for being non-deterministic are good enough for our use case.

it depends.

> Once you accept that the next stage is accepting that most of the time the non-deterministic output of an LLM is good enough!

the next stage is working with whatever tool(s) is/are best suited to solve the problem.

and that depends on the problem you are solving.


> are 99% of us building a pacemaker? no. but that doesn’t mean we can automatically make the leap to assuming a set of tools known for being non-deterministic are good enough for our use case.

This seems irrelevant?

Either way hopefully you test the pacemaker code comprehensively!

That's pretty much the best case for llm generated code: comprehensive tests of the desired behaviour.


From the article:

  I work as an Engineering Manager ...

  If you try to end at 1:55pm, you will likely talk until 
  2:00pm anyway, which then runs into the next meeting.
This is more a statement to the lack of respect for other's time than anything else, as evidenced by the presumption; "you will likely talk until 2:00pm anyway."

Engineering Managers which see value in giving coworkers a five minute break between meetings ensure the breaks exist. Those which do not and only pay lip service to the concept will burn through predefined breaks no matter where they exist on a clock face.


Those papers are really interesting, thanks for sharing them!

Do you happen to know of any research papers which explore constraint programming techniques wrt LLMs prompts?

For example:

  Create a chicken noodle soup recipe.

  The recipe must satisfy all of the following:

    - must not use more than 10 ingredients
    - must take less than 30 minutes to prepare
    - ...

I suspect LLM-like technologies will only rarely back out of contradictory or otherwise unsatisfiable constraints, so it might require intermediate steps where LLM:s formalise the problem in some SAT, SMT or Prolog tool and report back about it.

This is an area I'm very interested in. Do you have a particular application in mind? (I'm guessing the recipe example is just illustrate the general principle.)

> This is an area I'm very interested in. Do you have a particular application in mind? (I'm guessing the recipe example is just illustrate the general principle.)

You are right in identifying the recipe example as being illustrative and intentionally simple. A more realistic example of using constraint programming techniques with LLMs is:

  # Role
  You are an expert Unix shell programmer who comments their code and organizes their code using shell programming best practices.

  # Task
  Create a bash shell script which reads from standard input text in Markdown format and prints all embedded hyperlink URL's.

  The script requirements are:

    - MUST exclude all inline code elements
    - MUST exclude all fenced code blocks
    - MUST print all hyperlink URL's
    - MUST NOT print hyperlink label
    - MUST NOT use Perl compatible regular expressions
    - MUST NOT use double quotes within comments
    - MUST NOT use single quotes within comments
  
In this exploration, the list of "MUST/MUST NOT" constraints were iteratively discovered (4 iterations) and at least the last three are reusable when the task involves generating shell scripts.

Where this approach originates is in attempting to limit LLM token generation variance by minimizing use of English vocabulary and sentence structure expressivity such that document generation has a higher probability of being repeatable. The epiphany I experienced was that by interacting with LLMs as a "black box" whose results can only be influenced, and not anthropomorphizing them, the natural way to do so is to leverage their NLP capabilities to produce restrictions (search tree pruning) for a declarative query (initial search space).


If one goal is to reduce the variance of output, couldn't this be done by controlling the decoding temperature?

Another related technique is constrained decoding, whether the LLM sampler only considers tokens allowed by a certain formal grammar. This could be applicable for your "quotes within comments" requirements.

Both techniques clearly require code or hyperparameter changes to the machinery that drives the LLM. What's missing is the ability to express these, in natural language, directly to the LLM and have it comply.

The angle I was coming from was whether one could use a constraint satisfaction solver, but I don't see how that would help for your example.


Anything involving numbers, or conditions like ‘less than 30 minutes’ is going to be really hard.

I've seen some interesting work going the other way, having LLMs generate constraint solvers (or whatever the term is) in prolog and then feeding input to that. I can't remember the link but could be worthwhile searching for that.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: