Hacker Newsnew | past | comments | ask | show | jobs | submit | nuky's commentslogin

Oh yeah tree-sitter it's a great foundation for semantic structure.

What I'm exploring is more about what we do with that structure once someone/smth starts generating thousands of changed lines: how to compress change into signals we can actually reason about.

Thank you for sharing. I'm actually trying your tool right now - it looks really interesting. Happy to exchange thoughts.


Feel free to shoot me an email (your email is not visible on your profile).


Agreed - that trap is very real. The open question for me is what we do when atomic, 5min readable diffs are the right goal but not realistically achievable always. My gut says we need better deterministic signals to reduce noise before human review. Not to replace it.


This is exactly the gap I'm worried about. human review still matters, but linear reading breaks down once the diff is mostly machine-generated noise. Summarizing what actually changed before reading feels like the only way to keep reviews sustainable.


This came out of reviewing a lot of large refactors and AI-assisted changes.

The goal isn't to replace code review (ofc) or relax standards, but to make the consequences of changes visible early — so reviewers can decide where to focus or when to push back.


It was precisely because this was going too far that I thought the consequences of the active adoption of LLM tools could be made visible. I'm not saying LLM is completely bad—after all, and not all tools, even non-LLM ones, are 100% deterministic. At the same time, reckless and uncontrolled use of LLM is increasingly gaining ground not only in coding but even in code analyze/review.


fair — that’s what I do as well)


Yeah difftastic and similar tools help a lot with formatting noise really.

My question is slightly orthogonal though: even with a cleaner diff, I still find it hard to quickly tell whether public API or behavior changed, or whether logic just moved around.

Not really about LLMs as reviewers — more about whether there are useful deterministic signals above line-level diff.


The tools exist, they're just rarely used in web dev. Look into ApiDiff or tools using Tree-sitter to compare function signatures. In the Rust/Go ecosystem, there are tools that scream in CI if the public contract changes. We need to bring that rigor into everyday AI-assisted dev. A diff should say "Function X now accepts null", not "line 42 changed"


That matches my experience too - tests and plans are still the backbone.

What I keep running into is the step before reading tests or code: when a change is large or mechanical, I’m mostly trying to answer "did behavior or API actually change, or is this mostly reshaping?" so I know how deep to go etc.

Agree we’re all still experimenting here.


Sounds interesting. What's the stack under the hood?


Mostly usearch since pgvector had some perf problems, though I use postgres during my ingestion stage since postgis is so good. I segment my hnsw indexes by multipe h3 levels.


Nice. And how do you handle candidates near H3 boundaries?


So I don't support queries with a radius larger than 50km (if an AI agent doesn't know where it's looking within 50km there usually is a context issue upstream), but i have a larger h3 index and a tighter h3 index. Then I have a router that tries to find the correct h3 indexes for each query. For some queries I'll need up to 3 searches, but most map to a single search. (sorry I probably won't be able to reply below here since the max hn comment depth is 4)


Reply to your comment below this (since hn limits comment depth to 4). The 40ms latency is an average but 90% of queries are getting routed to a single index, latency is worse when the routing goes to 3. Since I already batch the embedding generation I should be able to get hard queries down to like 50ms.


Makes sense. What about latency? for typical and hard queries


Just to clarify - this isn’t about replacing diffs or selling a tool

I ran into this problem while reviewing AI-gen refactors and started thinking about whether we’re still reviewing the right things. Mostly curious how others approach this.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: