nuky's comments

nuky · 2026-01-15T21:01:14 1768510874

Oh yeah tree-sitter it's a great foundation for semantic structure.

What I'm exploring is more about what we do with that structure once someone/smth starts generating thousands of changed lines: how to compress change into signals we can actually reason about.

Thank you for sharing. I'm actually trying your tool right now - it looks really interesting. Happy to exchange thoughts.

csomar · 2026-01-16T06:24:15 1768544655

Feel free to shoot me an email (your email is not visible on your profile).

nuky · 2026-01-15T20:46:00 1768509960

Agreed - that trap is very real. The open question for me is what we do when atomic, 5min readable diffs are the right goal but not realistically achievable always. My gut says we need better deterministic signals to reduce noise before human review. Not to replace it.

nuky · 2026-01-15T20:42:28 1768509748

This is exactly the gap I'm worried about. human review still matters, but linear reading breaks down once the diff is mostly machine-generated noise. Summarizing what actually changed before reading feels like the only way to keep reviews sustainable.

nuky · 2026-01-15T14:41:06 1768488066

This came out of reviewing a lot of large refactors and AI-assisted changes.

The goal isn't to replace code review (ofc) or relax standards, but to make the consequences of changes visible early — so reviewers can decide where to focus or when to push back.

nuky · 2026-01-14T22:16:59 1768429019

It was precisely because this was going too far that I thought the consequences of the active adoption of LLM tools could be made visible. I'm not saying LLM is completely bad—after all, and not all tools, even non-LLM ones, are 100% deterministic. At the same time, reckless and uncontrolled use of LLM is increasingly gaining ground not only in coding but even in code analyze/review.

nuky · 2026-01-14T22:00:14 1768428014

fair — that’s what I do as well)

nuky · 2026-01-14T21:59:20 1768427960

Yeah difftastic and similar tools help a lot with formatting noise really.

My question is slightly orthogonal though: even with a cleaner diff, I still find it hard to quickly tell whether public API or behavior changed, or whether logic just moved around.

Not really about LLMs as reviewers — more about whether there are useful deterministic signals above line-level diff.

veunes · 2026-01-15T10:18:13 1768472293

The tools exist, they're just rarely used in web dev. Look into ApiDiff or tools using Tree-sitter to compare function signatures. In the Rust/Go ecosystem, there are tools that scream in CI if the public contract changes. We need to bring that rigor into everyday AI-assisted dev. A diff should say "Function X now accepts null", not "line 42 changed"

nuky · 2026-01-14T21:58:10 1768427890

That matches my experience too - tests and plans are still the backbone.

What I keep running into is the step before reading tests or code: when a change is large or mechanical, I’m mostly trying to answer "did behavior or API actually change, or is this mostly reshaping?" so I know how deep to go etc.

Agree we’re all still experimenting here.

nuky · 2026-01-14T18:36:49 1768415809

Sounds interesting. What's the stack under the hood?

mips_avatar · 2026-01-14T18:38:33 1768415913

Mostly usearch since pgvector had some perf problems, though I use postgres during my ingestion stage since postgis is so good. I segment my hnsw indexes by multipe h3 levels.

nuky · 2026-01-14T18:44:36 1768416276

Nice. And how do you handle candidates near H3 boundaries?

mips_avatar · 2026-01-14T18:49:34 1768416574

So I don't support queries with a radius larger than 50km (if an AI agent doesn't know where it's looking within 50km there usually is a context issue upstream), but i have a larger h3 index and a tighter h3 index. Then I have a router that tries to find the correct h3 indexes for each query. For some queries I'll need up to 3 searches, but most map to a single search. (sorry I probably won't be able to reply below here since the max hn comment depth is 4)

mips_avatar · 2026-01-14T18:58:07 1768417087

Reply to your comment below this (since hn limits comment depth to 4). The 40ms latency is an average but 90% of queries are getting routed to a single index, latency is worse when the routing goes to 3. Since I already batch the embedding generation I should be able to get hard queries down to like 50ms.

nuky · 2026-01-14T18:54:51 1768416891

Makes sense. What about latency? for typical and hard queries

nuky · 2026-01-14T18:15:06 1768414506

Just to clarify - this isn’t about replacing diffs or selling a tool

I ran into this problem while reviewing AI-gen refactors and started thinking about whether we’re still reviewing the right things. Mostly curious how others approach this.