Friend, you are putting too much effort to debate a topic that is implicitly banned on this website. This post has already been hidden from the front page. Hacker News is openly hostile to anything that even mildly paints a handful of billionaires in a poor light. But let's continue to deify Dang as the country descneds openly into madness.
I also see it back now too, despite it being removed earlier. Do you have faith in the HN algo? Position 22 despite having more votes and comments and being more recent than all of the posts above it?
lol. Always fun to watch HN remove hightly relevant topics from the top of the front page. To their credit they usually give us about an hour to discuss before doing so. How kind of them.
It is heavy crude but it's what our refineries are set up to use. There was a very informative news report on this recently posted to youtube: https://www.youtube.com/watch?v=Pgwny1BiCYk
Apparently shale oil mostly comes out as light, so our own production doesn't feed our refineries and we've increasingly taken to importing heavy crude.
I literally linked my source. It's less than 7 minutes long. It's not that refineries are idle, it's that we have to import the heavy crude to keep them producing. Heavy crude went from 12% of imports to 70% as of 2025. The most relevant graphs, if you really can't stomach 7 minutes of video, are shown from 3:30 - 4:30ish in the video. And again, record production levels of shale oil is producing LIGHT crude. The information you're seeking has already been provided to you.
Your best bet would be to look deeply into performance on ARC-AGI fully-private test set performances (e.g. https://arcprize.org/blog/arc-prize-2025-results-analysis), and think carefully about the discrepancies here, or, just to broadly read any academic research on classic benchmarks and note the plateaus on classic datasets.
It is very clear when you look at academic papers actually targeting problems specific to reasoning / intelligence (e.g. rotation invariance in images, adversarial robustness) that all the big companies are doing is just fitting more data / spending more resources on human raters and other things to boost performance on (open) metrics, but that clear actual gains in genuine intelligence are being made only by milking what we know very well to be a limited approach. I.e. there are trivially-basic problems that cannot be solved by curve-fitting models, which makes it clear most current advances are indeed coming from curve(manifold) fitting. It just isn't clear how far we can exploit these current approaches and in what domains this kind of exploitation is more than good enough.
EDIT: Are people unaware Google Scholar is a thing? It is trivial to find modern AI papers that can be read without requiring access to a research institution. And e.g. HuggingFace collects trending papers (https://huggingface.co/papers/trending), and etc.
At present its only SWE's that are benefitting from a productivity stand point. I know a lot of people in finance (from accounting to portfolio management) and they scoff at the outputs of LLMs in their day to day jobs.
But the bizarre thing is, even though the productivity of SWE's is increasing I dont believe there will be much happening in regards to lay offs due to the fact that there isn't complete trust in LLMs; I dont see this changing either. In which case the LLM producers will need to figure out a way to increase the value of LLMs and get users to pay more.
Are SWE’s really experiencing a productivity uplift? When studies attempt to measure the productivity impact of AI in software the results I have seen are underwhelming compared to the frontier labs marketing.
And, again, this is ignoring all the technical debt of produced code that is poorly understood, weakly-reviewed, and of questionable quality overall.
I still think this all has serious potential for net benefit, and does now in certain cases. But we need to be clearer about spelling out where that is (webshit, boilerplate, language-to-language translation, etc) and where it maybe isn't (research code, legacy code, large codebases, niche/expert domains).
This Stanford study on developer productivity found 0 correlation between developers assessment of their own productivity and independent measures of their productivity. Any anecdotal evidence from developers on how AI has made them more or less productive is worthless.
Yup, most progress is also confined to SWE's doing webshit / writing boilerplate code too. Anything specialized, LLMs are rarely useful, and this is all ignoring the future technical debt of debugging LLM code.
I am hopeful about LLMs for SWE, but the progress is currently contextual.
Even if LLMs could write great code with no human oversight, the world would not change over night. Human creativity is necessary to figure out what stuff to produce that will yield incremental benefits to what already exists.
The humans who possess such capability stand to win long-term; said humans tend to be those from the humanities and liberal arts.
I'm not sure how many HN users frequent other places related to agentic coding like the subreddits of particular providers, but this has got to be the 1000th "ultimate memory system"/break-free-of-the-context-limit-tyranny! project I've seen, and like all other similar projects there's never any evidence or even attempt at measuring any metric of performance improved by it. Of course it's hard to measure such a thing, but that's part of exactly why it's hard to build something like this. Here's user #1001 that's been told by Claude "What a fascinating idea! You've identified a real gap in the market for a simple database based memory system to extend agent memory."
I feel like so many of these memory solutions are incredibly over-engineered too.
You can work around a lot of the memory issues for large and complex tasks just by making the agent keep work logs. Critical context to keep throughout large pieces of work include decisions, conversations, investigations, plans and implementations - a normal developer should be tracking these and it's sensible to have the agent track them too in a way that survives compaction.
- `FEATURE_IMPL_PLAN.md` (master plan; or `NEXT_FEATURES_LIST.md` or somesuch)
- `FEATURE_IMPL_PROMPT_TEMPLATE.md` (where I replace placeholders with next feature to be implemented; prompt includes various points about being thorough, making sure to validate and loop until full test pipeline works, to git version tag upon user confirmation, etc.)
- `feature-impl-plans/` directory where Claude is to keep per-feature detailed docs (with current status) up to date - this is esp. useful for complex features which may require multiple sessions for example
- also instruct it to keep main impl plan doc up to date, but that one is limited in size/depth/scope on purpose, not to overwhelm it
- CLAUDE.md has summary of important code references (paths / modules / classes etc.) for lookup, but is also restricted in size. But it includes full (up-to-date) inventory of all doc files, for itself
- If I end up expanding CLAUDE.md for some reason or temporarily (before I offload some content to separate docs), I will say as part of prompt template to "make sure to read in the whole @CLAUDE.md without skipping any content"
Great advise. For large plans I tell the agent to write to an “implementation_log.md” and make note of it during compaction. Additionally the agent can also just reference the original session logs.
Some with a coding background love prompt engineering, contrived supporting systems, json prompting and any other superstition that makes it feel like they're really doing something.
They refuse to believe that it's possible to instruct these tools in terse plain English and get useful results.
Which of the 1000 is your favorite? There does seem to be a shallow race to optimizing xyz benchmark for some narrow sliver of the context problem, but you're right, context problem space is big, so I don't think we'll hurry to join that narrow race.
None, that's what I'm trying to say. My favorite is just storing project context locally in docs that agents can discover on their own or I can point to if needed. This doesn't require me to upload sensitive code or information to anonymous people's side projects and has and equivalent amount of hard evidence for efficacy (zero), but at least has my own anecdotal evidence of helping and doesn't invite additonal security risk.
People go way overboard with MCPs and armies of subagents built on wishes and unproven memory systems because no one really knows for sure how to get past the spot we all hit where the agentic project that was progressing perfectly hits a sharp downtrend in progress. Doesn't mean it's time to send our data to strangers.
> no one really knows for sure how to get past the spot we all hit where the agentic project that was progressing perfectly hits a sharp downtrend in progress.
FWIW, I find this eventual degradation point comes much later and with fewer consequences when there are strict guardrails inside and outside of the LLM itself.
From what I've seen, most people try to fix only the "inside" part - by tweaking the prompts, installing 500 MCPs (that ironically pollute the context and make problem worse), yell in uppercase in hopes that it will remember etc, and ignore that automated compliance checks existed way before LLMs.
Throw the strictest and most masochistic linting rules at it in a language that is masochistic itself (e.g. rust), add tons of integration tests that encode intent, add a stop hook in CC that runs all these checks and you've got a system that is simply not allowed to silently drift and can put itself back on track with feedback it gets from it.
Basically, rather than trying to hypnotize an agent to remember everything by writing a 5000 line agents.md, just let the code itself scream at it and feed the context.
This is fair, many memory projects out there boil down to better summaries or prompt glue without any clear way to measure impact.
One thing I’d clarify about what we’re building is that it’s not meant to be “the best memory for a single agent.”
The core idea is portability and sharing, not just persistence.
Concretely:
- you can give Codex access to memory created while working in Claude
- Claude Code can retrieve context from work done in other tools
- multiple agents can read/write the same memory instead of each carrying their own partial copy
- specific parts of context can be shared with teammates or collaborators
That’s the part that’s hard (or impossible) to do with markdown files or tool-local memory, and it’s also why we don’t frame this as “breaking the context limit.”
Measuring impact here is tricky, but the problem we’re solving shows up as fragmentation rather than forgetting: duplicated explanations, divergent state between agents, and lost context when switching tools or models.
If someone only uses a single agent in a single tool and already are using their customized CLAUDE.md, they probably don’t need this. The value shows up once you treat agents as interchangeable workers rather than a single long-running conversation.
> That’s the part that’s hard (or impossible) to do with markdown files or tool-local memory.
I'm confused because every single thing in that list is trivial? Why would Codex have trouble reading a markdown file Claude wrote or vice versa? Why would multiple agents need their own copy of the markdown file instead of just referring to it as needed? Why would it be hard to share specific files with teammates or collaborators?
Edit - I realize I could be more helpful if I actually shared how I manage project context:
CLAUDE.md or Agents.md is not the only place to store context for agents in a project, you can just store docs at any layer of granularity you want. What's worked best for me is to:
1. Have a standards doc(s) (you can point the agents to the same standards doc in their respective claude.md/agents.md)
2. Before coding, have the agent create implementation plans that get stored in to tickets (markdown files) for each chunk of work that would take about a context window length (estimated).
3. Work through the tickets and update them as completed. Easy to refer back to when needed.
4. If you want you can ask the agent to contribute to an overall dev log as well, but this gets long fast. Is useful for agents to refer to the last 50 lines or so to immediately get up to speed on "what just happened?", but so could git history.
5. Ultimately the code is going to be the real "memory" of the true state, so try to organize it in a way that's easy for agents to comb through (no 5000 lines files that agents have trouble trying to carefully jump around in to find what they need without eating up their entire context window immediately).
You’re right that reading the same markdown file is trivial, that’s not the hard part.
Where it stopped being trivial for us was once multiple agents were working at the same time. For example, one agent is deciding on an architecture while another is already generating code. A constraint changes mid-way. With a flat file, both agents can read it, but you’re relying on humans as the coordination layer: deciding which docs are authoritative, when plans are superseded, which tickets are still valid, and how context should be scoped for a given agent.
This gets harder once context is shared across tools or collaborators’ agents. You start running into questions like who can read vs. update which parts of context, how to share only relevant decisions, how agents discover what matters without scanning a growing pile of files, and how updates propagate without state drifting apart.
You can build conventions around this with files, and for many workflows that works well. But once multiple agents are updating state asynchronously, the complexity shifts from storage to coordination. That boundary - sharing and coordinating evolving context across many agents and tools — is what we’re focused on and what an external memory network can solve.
If you’ve found ways to push that boundary further with files alone, I’d genuinely be curious - this still feels like an open design space.
You're still not closing the gap between the problems you're naming and how your solution solves them?
> With a flat file, both agents can read it, but you’re relying on humans as the coordination layer: deciding which docs are authoritative, when plans are superseded, which tickets are still valid, and how context should be scoped for a given agent.
So the memory system also automates project management by removing "humans as the coordination layer"? From the OP the only details we got were
"What it does: (1) persists context between sessions (2) semantic & temportal search (not just string grep)"
Which are fine, but neither it nor you explain how it can solve any of these broader problems you bring up:
"deciding which docs are authoritative, when plans are superseded, which tickets are still valid, and how context should be scoped for a given agent, questions like who can read vs. update which parts of context, how to share only relevant decisions, how agents discover what matters without scanning a growing pile of files, and how updates propagate without state drifting apart."
You're claiming that semantic and temporal search has solved all of this for free?
This project was presented as a memory solution and now it seems like you're saying its actually an agent orchestration framework, but the gap between what you're claiming your system can achieve and how you claim it works seems vast.
imho, if it’s not based on a RAG, it’s not a real memory system. the agent often doesn’t know what it doesn’t know, and as such relevant memories must be pushed into the context window by embedding distance, not actively looked up.
It sounds like you've used quite a few. What programs are you expecting? Assuming you're talking about doing some inference on the data? Or optimizing for some RAG or something?
Skill are md files, but they are not just that. They are also scripts. That's what adding things are. You can make a skill that is just a prompt, but that misses the point of the value.
You're packaging the tool with the skill, or multiple tools to do a single thing.
In the end it's still an .md file pointing to a script that ends being just a prompt for the agent that the agent may or may not pick up, may or may not discover, may or may not forget after context compaction etc.
There's no inherent magic to skills, or any fundamental difference between them and "just feeding in different prompts and steps". It literally is just feeding different prompt and steps.
I find in my experience that it's trivial to have the skill systematically call the script, and perform the action correctly. This has not been a challenge to me.
Also, the pick up or not pick up, or discover or may not discover is solved as well. It's handled by my router, which I wrote about here - https://vexjoy.com/posts/the-do-router/
So these are solved problems to me. There are many more problems which are not solved, which are the interesting space to continue with.
Why should he put effort into measuring a tool that the author has not? The point is there are so many of these tools an objective measure that the creators of these tools can compare against each other would be better.
So a better question to ask is - Do you have any ideas for an objective way to a measure a performance of agentic coding tools? So we can truly determine what improves performance or not.
I would hope that internal to OpenAI and Anthropic they use something similar to the harness/test cases they use for training their full models to determine if changes to claude code result in better performance.
Well, if I were Microsoft and training co-pilot, I would log all the <restore checkpoint> user actions and grade the agents on that. At scale across all users, "resets per agent command" should be useful. But then again, publishing the true numbers might be embarrassing..
Nah, just another one of those spam bots on all the small-business, finance and tradies sub-reddits: "Hey fellow users, have you ever suffered from <use case>? What is the problem you want solved? Tell me your honest opinions below!"
It does nothing but send a bunch of data to a "alpha use at your own risk" third-party site that may or may not run some LLM on your data: https://ensue-network.ai/login
I imagine HN, despite being full of experts and vet devs, also might have a prevalent attitude of looking down on using tools like MCP servers or agentic AI libraries for coding, which might be why something like this advertised seems novel rather than redundant.
> I imagine HN, despite being full of experts and vet devs, also might have a prevalent attitude of looking down on using tools like MCP servers or agentic AI libraries for coding, which might be why something like this advertised seems novel rather than redundant.
I’m not sure where the ‘despite’ comes in. Experts and vets have opinions and this is probably the best online forum to express them. Lots of experts and vets also dislike extremely popular unrelated tools like VB, Windows, “no-code” systems, and Google web search… it’s not a personality flaw. It doesn’t automatically mean they’re right, either, but ‘expert’ and ‘vet’ are earned statuses, and that means something. We’ve seen trends come and go and empires rise and fall, and been repeatedly showered in the related hype/PR/FUD. Not reflexively embracing everything that some critical mass of other people like is totally fine.
I think maybe the point they were trying to make is that despite people on HN being very technically experienced, skepticism and distrust of LLM-assisted coding tools may have prevented many of them from exploring the space too deeply yet. So a project like this may seem novel to many readers here, when the reality for users who've been using and following tools like Claude Code (and similar) closely for a while now is that claims like the one's this project is making come out multiple times per week.
I’ve got friends that I’ve known for decades and would fly across the world if they needed. I’ve also got friends I see for drinks occasionally. Other people I might call friends that I don’t even have in my phone. It’s a big range and there are a lot of things my closest friends could reasonably say to me that more casual friends couldn’t.
To be clear, I’m also not saying anyone would ostracized for this, nor that anyone would ostracize me if I said this. But if one of my more casual friends randomly commented that their hairdresser is hot, I’d give them a bit of a sideways look, yeah.
FTA: “The team showed that indeed they express “D1” receptors for the neuromodulator. Commensurate with the degree of dopamine connectivity“
There are receptors specifically for dopamine on the amygdala neurons. Dopamine molecules are released by the pre-synaptic neurons, travel across the synapse, and bind to these receptors.
Dopamine’s role in the nervous system is not simply an intermediate on the pathway to produce epinephrine or norepinephrine. If you thought like this you’d reach the conclusion that testosterone is simply a precursor to estrogen because the pathway to convert it exists in some tissues of the body.
You’re not dumb, it’s an incredibly complex topic with convoluted and contradictory messaging everywhere.
The way that I’ve learned to think about it is that the brain is made up neurons, and they perform specific functions, technically individually but more usefully understood in regional groupings (primarily figured out via fMRI/blood flow studies and lesion experiments).
Each neuron’s activity is regulated by specific neurotransmitters, and the type of receptors expressed in neurons also correlates with these functional areas (figured out through PET/radio-tagged molecule scans and biopsies). Regarding dopamine specifically, the area that is responsible for effortful attention (prefrontal cortex) as well as for reward (in a general sense, broader “good” not only simpler treats) processing (nucleus accumbens) have high concentrations of dopamine receptors.
Therefore, drugs that interact with dopamine receptors or with chemical chains that involve dopamine can affect these functions.
Neurotransmitters are just chemicals and they go through many complex and interrelated metabolic chains, and at baseline (in a typical individual, barring specific genetic differences) it is often most useful to assume they’re all there and instead understand where they’re used.
This comment might not be the most succinct and I’m just started my education on the subject so I’m sure there are inaccuracy’s and I’d be happy if they’re pointed out, but I do hope that it helps you get a somewhat clearer picture and realize that you’re not dumb for being confused about this.
reply