The thing is that I don't use AI to replace things I can do deterministically with code. I use it to replace things I cannot do deterministically with code - often something I would have a person do. People are also fallible and can't be completely trusted to do the thing exactly right. I think it works very well for things that have a human in the loop, like codeing agents where someone needs to review changes. For instance, I put an agent in a tool for generating aws access policies from english descriptions or answering questions about current access (where they agent has access to tools to see current users, buckets policies etc). I don't trust the agent to do it exactly right so it just proposes the policies and I have to accept or modify them before they are applied, but its still better than writing them myself. And it's better than having a web interface do it because that is lacking context.
I think it's a good example of the kind of internal tools the article is talking about. I would not have spent the time to build this without claude making it much faster to build stand-alone projects and I would not have the agent to do the english -> policy output with LLMs.
>> The thing is that I don't use AI to replace things I can do deterministically with code. I use it to replace things I cannot do deterministically with code - often something I would have a person do.
Nailed it. And the thing is, you can (and should) still have deterministic guard rails around AI! Things like normalization, data mapping, validations etc. protect against hallucinations and help ensure AI’s output follows your business rules.
> Things like normalization, data mapping, validations etc. protect against hallucinations
And further downstream: Audit trails, human sign-offs, operations which are reversible or have another workflow for making compensating actions to fix it up.
Or, you could make a tool that can generate this stuff deterministically every time the exact same way. At least with that situation you can audit the tool and see if it is correct or not. You still leave the point of failure on the user in your situation, even higher because they could get complacent with the llm output and assume it is correct or mistakenly think it is correct.
In my mind you are trading potentially a function that always evaluates the same for a given f(x) for one that might not evaluate the same and requires oversight.
So how would you implement "generating aws access policies from english descriptions" using deterministic code in a way that doesn't require human oversight?
> I think it works very well for things that have a human in the loop, like codeing agents where someone needs to review changes
This is the best case for AI, it's not very different from the level 3 autonomous car with driver in the loop instead of fully autonomous level 5 vehicle that probably requires AGI level of AI.
The same applies to medicine where limited number specialists (radiologist/cardiologist/oncologist/etc) in the loop are being assisted by AI for activities that probably require too much time for experts manually looking at laborious evidences especially for non-obvious early symptom detection (X-ray/ECG/MRI) for the modern practice of evidence based medicine.
I think it's a good example of the kind of internal tools the article is talking about. I would not have spent the time to build this without claude making it much faster to build stand-alone projects and I would not have the agent to do the english -> policy output with LLMs.