Hacker Newsnew | past | comments | ask | show | jobs | submit | suninsight's commentslogin

Very cool and nicely executed ! Definitely see a lot of value in this.

I was actually building a version of this using NonBioS.ai, but this is already pretty well done, so will just use this instead.


So I can attest to the fact that all of the things proposed in this article actually works. And you can try it out yourself on any arbitrary code base within few minutes.

This is how: I work for a company called NonBioS.ai - we already implement most of what is mentioned in this article. Actually we implemented this about 6 months back and what we have now is an advanced version of the same flow. Every user in NonBioS gets a full linux VM with root access. You can ask nonbios to pull in your source code and ask it to implement any feature. The context is all managed automatically through a process we call "Strategic Forgetting" which is in someways an advanced version of the logic in this article.

Strategic Forgetting handles the context automatically - think of it like automatic compaction. It evaluates information retention based on several key factors:

1. Relevance Scoring: We assess how directly information contributes to the current objective vs. being tangential noise

2. Temporal Decay: Information gets weighted by recency and frequency of use - rarely accessed context naturally fades

3. Retrievability: If data can be easily reconstructed from system state or documentation, it's a candidate for pruning

4. Source Priority: User-provided context gets higher retention weight than inferred or generated content

The algorithm runs continuously during coding sessions, creating a dynamic "working memory" that stays lean and focused. Think of it like how you naturally filter out background conversations to focus on what matters.

And we have tried it out in very complex code bases and it works pretty well. Once you know how well it works, you will not have a hard time believing that the days of using IDE's to edit code is probably numbered.

Also - you can try it out for yourself very quickly at NonBioS.ai. We have a very generous free tier that will be enough for the biggest code base you can throw at nonbios. However, big feature implementations or larger refactorings might take time longer than what is afforded in the free tier.


how about halarax ...halucinate and paralax


I think if you use Cursor, using Claude Code is a huge upgrade. The problem is that Cursor was a huge upgrade from the IDE, so we are still getting used to it.

The company I work for builds a similar tool - NonBioS.ai. It is in someways similar to what the author does above - but packaged as a service. So the nonbios agent has a root VM and can write/build all the software you want. You access/control it through a web chat interface - we take care of all the orchestration behind the scene.

Its also in free Beta right now, and signup takes a minute if you want to give it a shot. You can actually find out quickly if the Claude code/nonbios experience is better than Cursor.


I think the path forward there is slack/teams/discord/etc integration of agents, so you can monitor and control whatever agent software you like via a chat interface just like you would interact with any other teammate.


So we tried that route - but problem is that these interfaces aren't suited for asynchronous updates. Like if the agent is working for the next hour or so - how do you communicate that in mediums like these. An Agent, unlike a human, is only invoked when you give it a task.

If you use the interface at nonbios.ai - you will quickly realize that it is hard to reproduce on slack/discord. Even though its still technically 'chat'


On Slack I think threads are fine for this. Have an agent work channel, and they can just create a thread for each task and just dump updates there. If an agent is really noisy about its thinking you might need a loglevel toggle but in my experience with Claude Code/Cursor you could dump almost everything they're currently emitting to the UI into a thread.

It's still nice to have a direct web interface to agents, but in general most orgs are dealing with service/information overload and chat is a good single source of truth, which is why integrations are so hot.


He is NOT talking about multi-agent systems, which is exactly why he is calling it an Agency. The author goes to great length to explain why this is NOT a multi-agent system because it can be easily misunderstood to be that.


This isn't multi-agents at all. Infact if you read the article in detail, you will realize that the author goes in detail to explain how this system is different from multi-agents. And this is exactly why the author calls it "Agency" because it is fundamentally different from multi-agents.

I agree that multi-agent doesnt work in practice. But this isnt that.


How is this different from multiple agents? Are you saying using different models for different parts of the task is a fundamental difference from using one model for different parts of the task?

Using different models for different things isn’t new at all. The article seems like an excuse to get some marketing out there (and it’s poor at that - they got me looking at what was built with their product but I can’t see the actual code. Feels scammy.)


1. Multi-Agent is divide a part into tasks and hand off each part to a different Agent. This is different in the sense that a task is not divided into parts aprior. When the agent gets to a roadblock - lets say it is unable to fix a software issue - it rolls up to a deep think model to unblock. But you might be right that the difference is too subtle too notice.

2. "they got me looking at what was built with their product but I can’t see the actual code. Feels scammy" - What do you mean by you can't see the actual code ? You can just signup and use NonBioS to build software. And you can see the code written by NonBioS in multiple ways - ask it give you a downloadable zip, ask it to checkin the code to github, ask it to show you the code on the screen. Infact that the black boxes which scroll up, you can just expand them and see the code it is writing directly.


So what we do at NonBioS.ai is to use a cheaper model to do routine tasks, but switch to a higher thinking model seamlessly if the agent get stuck. Its most cost efficient, and we take that switching cost away from the engineer.

But broadly agree to the argument of the post - just spending more might still be worth it.


This will not end well.


As someone who works for a company having a real Agent in production, (not a workflow), I cannot disagree more than the very first statement here: Use Agent Frameworks like Langraph. We did exactly that, and had to throw everything away just a month down the line. Then we built everything from scratch and now our system scales pretty well.

To be fair, I think there might be a space for using Agent Frameworks, but the Agent space is too early for a good enough framework to emerge. The semi contrarian though, which I hold to a certain extent, is that the Agent space is moving so fast that a good enough framework might NEVER emerge.


It sounds like you're agreeing with the article? From TFA:

> Over the past year, we've worked with dozens of teams building large language model (LLM) agents across industries. Consistently, the most successful implementations weren't using complex frameworks or specialized libraries. Instead, they were building with simple, composable patterns.

> ...There are many frameworks that make agentic systems easier to implement. ...These frameworks make it easy to get started by simplifying standard low-level tasks like calling LLMs, defining and parsing tools, and chaining calls together. However, they often create extra layers of abstraction that can obscure the underlying prompts and responses, making them harder to debug. They can also make it tempting to add complexity when a simpler setup would suffice. We suggest that developers start by using LLM APIs directly: many patterns can be implemented in a few lines of code.


I'm just in the process of moving from a prototype in N8N's agent tools to an actual system that could be self-hosted.

I've read a lot of comments that most pragmatic shops have dumped langchain/graph, haystack, crew etc for their own internal code that does everything more simply, but I can't currently conceptualize how tooling etc is actually done in the real world.

Do you have any links or docs that you've used as a basis for the work you could share? Thanks.


Most of our stuff is built in house actually, simply because everything else is still kind of catching up. You can find a bunch of information on the blog (https://www.nonbios.ai/blog)

The only software that we use is Langfuse for observability and that too was breaking down for us. But they launched a new version - V3 - which might still work out for us.

I would suggest to just use standard non-AI specific python libraries and build your own systems. If you are migrating from N8N to a self hosted system then you can actually use NonBioS to build it out for you directly. If you join our discord channels, we can get an engineer to help you out also.


The event horizon of current AI space has been quite a thing to observe.


What job is the agent performing?


It is AI Software Dev called NonBioS.ai


Yes it works really well. We do something like that at NonBioS.ai - longer post below. The agent self reflects if it is stuck or confused and calls out the human for help.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: