Hacker Newsnew | past | comments | ask | show | jobs | submit | more djoldman's commentslogin

> The calling agent then decides how to use those snippets in its own prompt.

To be reductionist, it seems the claimed product value is "better RAG for code."

The difficulties with RAG are at least:

1. Chunking: how large and how is the beginning/end of a chunk determined

2. Given the above quote, how much or many RAG results are put into the context? It seems that the API caller makes this decision, but how?

I'm curious about your approach and how you evaluated it.


Not quite “better RAG for code”. The core idea is agentic discovery plus semantic search. Instead of static chunks pushed into context, the agent can dynamically traverse docs, follow links, grep for exact identifiers, and request only the relevant pieces on demand.

No manual chunking. We index with multiple strategies (hierarchical docs structure, symbol boundaries, semantic splitting) so the agent can jump into the right part without guessing chunk edges.

Context is selective. The agent retrieves minimal snippets and can fetch more iteratively as it reasons, rather than preloading large chunks. We benchmark this using exact match evaluations on real agent tasks: correctness, reduced hallucination, and fewer round trips.


As a consequence of my profession, I understand how LLMs work under the hood.

I also know that we data and tech folks will probably never win the battle over anthropomorphization.

The average user of AI, nevermind folks who should know better, is so easily convinced that AI "knows," "thinks," "lies," "wants," "understands," etc. Add to this that all AI hosts push this perspective (and why not, it's the easiest white lie to get the user to act so that they get a lot of value), and there's really too much to fight against.

We're just gonna keep on running into this and it'll just be like when you take chemistry and physics and the teachers say, "it's not actually like this but we'll get to how some years down the line- just pretend this is true for the time being."


These discussions often end up resembling religious arguments. "We don't know how any of this works, but we can fathom an intelligent god doing it, therefore an intelligent god did it."

"We don't really know how human consciousness works, but the LLM resembles things we associate with thought, therefore it is thought."

I think most people would agree that the functioning of an LLM resembles human thought, but I think most people, even the ones who think that LLMs can think, would agree that LLMs don't think in the exact same way that a human brain does. At best, you can argue that whatever they are doing could be classified as "thought" because we barely have a good definition for the word in the first place.


I don't think I've heard anyone (beyond the most inane Twitterati) confidently state "therefore it is thought."

I hear a lot of people saying "it's certainly not and cannot be thought" and then "it's not exactly clear how to delineate these things or how to detect any delineations we might want."


You may know the mechanics, but you don't know how LLMs "work" because no one really understands (yet, hopefully).

I'm a neurologist, and as a consequence of my profession, I understand how humans work under the hood.

The average human is so easily convinced that humans "know", "think", "lie", "want", "understand", etc.

But really it's all just a probabilistic chain reaction of electrochemical and thermal interactions. There is literally nowhere in the brain's internals for anything like "knowing" or "thinking" or "lying" to happen!

Strange that we have to pretend otherwise


>I'm a neurologist, and as a consequence of my profession, I understand how humans work under the hood.

There you go again, auto-morphizing the meat-bags. Vroom vroom.


I upvoted you.

This is a fundamentally interesting point. Taking your comment as HN would advise, I totally agree.

I think genAI freaks a lot of people out because it makes them doubt what they thought made them special.

And to your comment, humans have always used words they reserve for humanity that indicates we're special: that we think, feel, etc... That we're human. Maybe we're not so special. Maybe that's scary to a lot of people.


And I upvoted you! Because that's an upstanding thing to do.

(And I was about to react with

"In 2025 , ironically, a lot of anti-anthropomorphization is actually anthropocentrism with a moustache."

I'll have to save it for the next debate)


It doesn't strike you as a bit...illogical to state in your first sentence that you "understand how humans work under the hood" and then go on to say that humans don't actually "understand" anything? Clearly everything at its basis is a chemical reaction, but the right reactions chained together create understanding, knowing, etc. I do believe that the human brain can be modeled by machines, but I don't believe LLMs are anywhere close to being on the right track.


>everything at its basis is a chemical reaction, but the right reactions chained together create understanding, knowing, etc

That was their point. Or rather, that the analogous argument about the underpinnings of LLMs is similarly unconvincing regarding the issue of thought or understanding.


Correct^ Thank you. I knew I was going out on a bit of a limb there :)


There are no properties of matter or energy that can have a sense of self or experience qualia. Yet we all do. Denying the hard problem of consciousness just slows down our progress in discovering what it is.


We need a difference to discover what it is. How can we know that all LLMs don't?


If you tediously work out the LLM math by hand, is the pen and paper conscious too?

Consciousness is not computation. You need something else.


This comment here is pure gold. I love it.

On the flip side: If you do that, YOU are conscious and intelligent.

Would it mean that the machine that did the computation became conscious when it did it?

What is consciousness?


The pen and paper are not the actual substrate of entropy reduction, so not really.

Consciousness is what it "feels like" when a part of the universe is engaged in local entropy reduction. You heard it here first, folks!


Even if they do, it can only be transiently during the inference process. Unlike a brain that is constantly undergoing dynamic electrochemical processes, an LLM is just an inert pile of data except when the model is being executed.


(Hint: I am not denying the hard problem of consciousness ;) )


In my experience "just put 40 hours in Salesforce with the project I’m assigned to" matches folks expectations.

However.

If you're ever on a project that doesn't turn out so well, it may suddenly become critical to account for all work done during every billed hour in detail.

I would advise all consultants to track their time diligently and completely.


That’s part of the project management tracking but that’s not strictly hours.

Those traceability artifacts are in order

1. the signed statement of work - this is the contract that is legally binding.

2. The project kick off meeting where we agree on the mechanics of the project and a high level understanding of the expectations

3. Recorded, transcribed and these days using Gong to summarize the meetings, deep dive discovery sessions.

4. A video recorded approvals of the design proposals as I am walking through it.

5. A shared Jira backlog that I create and walk through them with it throughout the project

6. A shared decision log recording what decisions were made and who on the client side made them.

7. A handoff - also video recorded where the client says they are good going forward.

I lead 2-7 or do it all myself depending on the size of the project.

At no point am I going to say or expect anyone on my project to say they spent 4 hours on Tuesday writing Terraform.

But then again, my number one rule about consulting that I refuse to break is that I don’t do staff augmentation. I want to work on a contract with requirements and a “definition of done”. I control the execution of the project and the “how” within limits.

I want to be judged on outcomes not how many jira tickets I closed.

When I was at AWS I worked with a client that directly hired a former laid off ProServe L6 consultant. He was very much forced into staff augmentation where he did have to track everything he did by the hour.

You could tell he thought that was the fifth level of hell going from strategy consulting to staff augmentation. It paid decently. But he was definitely looking and I recommended him as a staff consultant at my current company (full time direct hire)

FWIW: I specialize in cloud + app dev - “application modernization”


This has been illegal since 2018. I think you'll find that pharmacies have a book to show you every price if you ask.

See S. 2554, the "Patient Right to Know Drug Prices Act"


> When you trade CME bitcoin futures, your settlement is guaranteed by the clearing entities of Chicago Mercantile Exchange which are bulge bracket firms of TradFi.

The CME clearinghouse itself is the guarantor. And below it are the clearing firms. The trading firms don't guarantee trades, the clearing firms do.

In fact, for many products, the CME is the counterparty for both sides of a trade.


Interesting "ScreenSpot Pro" results:

    72.7% Gemini 3 Pro
    11.4% Gemini 2.5 Pro
    49.9% Claude Opus 4.5
    3.50% GPT-5.1
ScreenSpot-Pro: GUI Grounding for Professional High-Resolution Computer Use

https://arxiv.org/abs/2504.07981


I was surprised at how poorly GPT-5 did in comparison to Opus 4.1 and Gemini 2.5 on a pretty simple OCR task a few months ago - I should run that again against the latest models and see how they do. https://simonwillison.net/2025/Aug/29/the-perils-of-vibe-cod...


Agreed, GPT-5 and even 5.1 is noticeably bad at OCR. OCRArena backs this up: https://www.ocrarena.ai/leaderboard (I personally would rank 5.1 as even worse than it is there).

According to the calculator on the pricing page (it's inside a toggle at the bottom of the FAQs), GPT-5 is resizing images to have a minor dimension of at most 768: https://openai.com/api/pricing/ That's ~half the resolution I would normally use for OCR, so if that's happening even via the API then I guess it makes sense it performs so poorly.


and GPT4 was pretty decent at OCR, so that's weird?


That is... astronomically different. Is GPT-5.1 downscaling and losing critical information or something? How could it be so different?


This is my default explanation for visual impairments in LLMs, they're trying to compress the image into about 3000 tokens, you're going to lose a lot in the name of efficiency.


I found much better results with smallish UI elements in large screenshots on GPT by slicing it up manually and feeding them one at a time. I think it does severely lossy downscaling.


It has a rather poor max resolution. Higher resolution images get tiled up to a point. 512 x 512, I think is the max tile size, 2048 x 2048 the max canvas.


impressive.....most impressive

its going to reach low 90s very soon if trends continue


Oh man, this link is worth it just for the "Reflections from the Selection Committee."

These days, abstracts are so marketing/advertising forward that it's hard to even understand the claim.


I have JS off by default and click one button to turn it on per website. You might be surprised how much faster the web is and how often you don't need JS.


Yes, NoScript is great and I'm surprised how often HN users seem unfamiliar with the concept or need it justified to them.


A couple interesting things I've come across over the years:

1. Western politics seems tragically reactionary and concerned with short-term issues. "Boring" stuff like infrastructure maintenance gets set aside. Deferred maintenance results in a superlinear increased expense: deferring $1 of maintenance today will cost you >$1 in the future (in real terms, accounting for inflation).

2. Some nations massively spend on some infrastructure with results little better than others.


Please, please try using weight whenever possible, aka for all amounts >= 2 grams.

1. People are bad at measuring volume. This has been tested. There is much more variance in amounts measured by volume than be weight. See "science and cooking" (ferran adria).

2. Using a scale means doing a lot fewer dishes! (measuring cups, spoons, etc.)

3. It's faster, try it!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: