More

zomglings · 2025-07-21T04:49:12 1753073352

Does anyone else find the use of different shades of green for the graph comparing Gemini 2.5 Pro and Sonnet just a little insane?

yorwba · 2025-07-21T08:08:31 1753085311

What matters is whether a point is above or below the diagonal, the colors just display the same information redundantly.

zomglings · 2025-04-19T16:49:29 1745081369

You can ask it to store its current context to a file, review the file, ask it to emphasize or de-emphasize things based on your review, and then use `/clear`.

Then, you can edit the file at your leisure if you want to.

And when you want to load that context back in, ask it to read the file.

Works better than `/compact`, and is a lot cheaper.

Wowfunhappy · 2025-04-19T17:01:05 1745082065

Neat, thanks, I had no idea!

Edit: It so happens I had a Claude Code session open in my Terminal, so I asked it:

    Save your current context to a file.

Claude produced a 91 line md file... surely that's not the whole of its context? This was a reasonably lengthy conversation in which the AI implemented a new feature.

zomglings · 2025-04-19T17:16:13 1745082973

What is in the file?

Wowfunhappy · 2025-04-19T19:58:53 1745092733

An overview of the project and the features implemented.

Edit: Here's the actual file if you want to see it. https://gist.github.com/Wowfunhappy/e7e178136c47c2589cfa7e5a...

zomglings · 2025-04-20T14:11:27 1745158287

Apologies for the late reply. My kids demanded my attention yesterday.

It doesn't seem to have included any points on style or workflow in the context. Most of my context documents end up including the following information:

1. I want the agent to treat git commits as checkpoints so that we can revert really silly changes it makes.

2. I want it to keep on running build/tests on the code to be sure it isn't just going completely off the rails.

3. I want it to refrain from adding low signal comments to the code. And not use emojis.

4. I want it to be honest in its dealings with me.

It goes on a bit from there. I suspect the reason that the models end up including that information in the context documents they dump in our sessions is that I give them such strong (and strongly worded) feedback on these topics.

As an alternative, I wonder what would happen if you just told it what was missing from the context and asked it to re-dump the context to file.

Wowfunhappy · 2025-04-20T15:00:22 1745161222

But none of this is really Claude Code's internal context, right? It's a summary. I could see using it as an alternative to /compact but not to undo a /clear.

Whatever the internal state is of Claude Code, it's lost as soon as you /clear or close the Terminal window. You can't even experiment with a different prompt and then--if you don't like the prompt--go back to the original conversation, because pressing esc to branch the conversation looses the original branch.

zomglings · 2025-04-21T05:44:44 1745214284

Yes, this is true. It's a summary, and cannot really undo a /clear. It is just a directed, cheaper /compact.

datavirtue · 2025-04-19T19:08:54 1745089734

Compared to my experience with the free GitHub Copilot in VS Code it sounds like you guys are in a horse and buggy.

shmoogy · 2025-04-19T20:01:09 1745092869

I'm excited for the improvements they've had recently but I have better luck with Cline in regular vs code, as well as cursor.

I've tried Claude code this week and I really didn't like it - Claude did an okay job but was insistent on deleting some shit and hard coding a check instead of an actual conditional. It got the feature done in about $3, but I didn't really like the user experience and it didn't feel any better than using 3.7 in cursor.

zomglings · 2025-04-19T14:56:40 1745074600

If anyone from Anthropic is reading this, your billing for Claude Code is hostile to your users.

Why doesn’t Claude Code usage count against the same plan that usage of Claude.ai and Claude Desktop are billed against?

I upgraded to the $200/month plan because I really like Claude Code but then was so annoyed to find that this upgrade didn’t even apply to my usage of Claude Code.

So now I’m not using Claude Code so much.

jdance · 2025-04-19T16:08:37 1745078917

This would put anthropic in the business of minimizing the context to increase profits, same as Cursor and others who cheap out on context and try to RAG etc. Which would quickly make it worse, so I hope they stay on api pricing

Some base usage included in the plan might be a good balance

zomglings · 2025-04-19T16:21:25 1745079685

You know, I wouldn't mind if they just applied the API pricing after Claude Code ran through the plan limits.

It would definitely get me to use it more.

Wowfunhappy · 2025-04-19T17:10:30 1745082630

But the Claude Pro plan is almost certainly priced under the assumption that some users will use it below the usage limit.

If everyone used the plan to the limit, the plan would cost the same as the API with usage equal to the limit.

karbon0x · 2025-04-19T17:02:03 1745082123

Claude Code and Claude.ai are separate products.

twalkz · 2025-04-19T15:44:22 1745077462

I’ve been using codemcp (https://github.com/ezyang/codemcp) to get “most” of the functionality of Claude code (I believe it uses prompts extracted from Claude Code), but using my existing pro plan.

It’s less autonomous, since it’s based on the Claude chat interface, and you need to write “continue” every so often, but it’s nice to save the $$

zomglings · 2025-04-19T16:24:34 1745079874

Thanks, makes sense that an MCP server that edits files is a workaround to the problem.

fcoury · 2025-04-19T18:02:06 1745085726

Just tried it and it's indeed very good, thanks for mentioning it! :-)

dist-epoch · 2025-04-19T15:24:41 1745076281

Claude.ai/Desktop is priced based on average user usage. If you have 1 power user sending 1000 requests per day, and 99 sending 5, many even none, you can afford having a single $10/month plan for everyone to keep things simple.

But every Claude Code user is a 1000 requests per day user, so the economics don't work anymore.

zomglings · 2025-04-19T16:10:45 1745079045

I would accept a higher-priced plan (which covered both my use of Claude.ai/Claude Desktop AND my use of Claude Code).

Anthropic make it seem like Claude Code is a product categorized like Claude Desktop (usage of which gets billed against your Claude.ai plan). This is how it signs off all its commits:

     Generated with [Claude Code](https://claude.ai/code)

At the very least, this is misleading. It misled me.

Once I had purchased the $200/month plan, I did some reading and quickly realized that I had been too quick to jump to conclusions. It still left me feeling like they had pulled a fast on one me.

dist-epoch · 2025-04-19T16:20:22 1745079622

Maybe you can cancel your subscription or charge back?

I think it's just oversight on their part. They have nothing to gain by making people believe they would get Claude Code access through their regular plans, only bad word of mouth.

zomglings · 2025-04-19T16:23:00 1745079780

To be fair to them, they make it pretty easy to manage the subscription, downgrade it, etc.

This is definitely not malicious on their part. Just bears pointing out.

fcoury · 2025-04-19T15:29:13 1745076553

Well, take that into consideration then. Just make it an option. Instead of getting 1000 requests per day with code, you get 100 on the $10/month plan, and then let users decide whether they want to migrate to a higher tier or continue using the API model.

I am not saying Claude should stop making money, I'm just advocating for giving users the value of getting some Code coverage when you migrate from the basic plan to the pro or max.

Does that make sense?

replwoacause · 2025-04-19T15:22:06 1745076126

Their API billing in general is hostile to users. I switched completely to Gemini for this reason and haven’t looked back.

fcoury · 2025-04-19T14:59:13 1745074753

I totally agree with this, I would rather have some kind of prediction than using the Claude Code roulette. I would definitely upgrade my plan if I got Claude Code usage included.

datavirtue · 2025-04-19T18:18:19 1745086699

I don't what you guys are on about but I have been using the free GitHub Copilot in VS Code chats to absolutely crank out new UI features in Vue. All that stuff that makes you groan at the thought of it: more divs, bindings, form validation, a whole new widget...churned out in 30 seconds. Try it live. Works? Keep.

I'm surprised at the complexity and correctness at which it infers from very simple, almost inadequate, prompts.

zomglings · 2025-05-01T22:32:26 1746138746

They did it!

cypherpunks01 · 2025-04-19T15:12:44 1745075564

Claude Pro and other website/desktop subscription plans are subject to usage limits that would make it very difficult to use for Claude Code.

Claude Code uses the API interface and API pricing, and writes and edits code directly on your machine, this is a level past simply interacting with a separate chat bot. It seems a little disingenuous to say it's "hostile" to users, when the reality is yeah, you do pay a bit more for more reliable usage tier, for a task that requires it. It also shows you exactly how much it's spent at any point.

fcoury · 2025-04-19T15:14:56 1745075696

> ... usage limits that would make it very difficult to use for Claude Code.

Genuinely interested: how's so?

cypherpunks01 · 2025-04-19T15:17:43 1745075863

Well, I think it'd be pretty irritating to see the message "3 messages remaining until 6PM" while you are in the middle of a complex coding task.

unshavedyak · 2025-04-19T16:37:28 1745080648

Conversely I have to manually do this and monitor the billing instead.

fcoury · 2025-04-19T15:22:29 1745076149

No, that's the whole point: predictability. It's definitely a trade off, but if we could save the work as is we could have the option to continue the iteration elsewhere, or even better, from that point on offer the option to fallback to the current API model.

A nice addition would be having something like /cost but to check where you are in regards to limits.

zomglings · 2025-04-19T16:13:10 1745079190

The writing of edits and code directly on my machine is something that happens on the client side. I don't see why that usage would be subject to anything but one-time billing or how it puts any strain on Anthropic's infrastructure.

visarga · 2025-04-19T18:43:06 1745088186

Yeah, tried it for a couple of minutes, $0.31, quickly stopped and moved away.

ghuntley · 2025-04-19T15:14:32 1745075672

$200/month isn't that much. Folks, I'm hanging around with are spending $100 USD to $500 USD daily as the new norm as a cost of doing business and remaining competitive. That might seem expensive, but it's cheap... https://ghuntley.com/redlining

oytis · 2025-04-19T15:31:20 1745076680

When should we expect to see the amazing products these super-competitive businesses are developing?

zomglings · 2025-04-19T16:17:28 1745079448

$100/day seems reasonable as an upper-percentile spend per programmer. $500/day sounds insane.

A 2.5 hour session with Claude Code costs me somewhere between $15 and $20. Taking $20/2.5 hours as the estimate, $100 would buy me 12.5 hours of programming.

bambax · 2025-04-19T17:03:56 1745082236

Asking very specific questions to Sonnet 3.7 costs a couple of tenths of a cent every time, and even if you're doing that all day it will never amount to more than maybe a dollar at the end of the day.

On average, one line of, say, JavaScript represents around 7 tokens, which means there are around 140k lines of JS per million tokens.

On Openrouter, Sonnet 3.7 costs are currently:

- $3 / one million input tokens => $100 = 33.3 million input tokens = 420k lines of JS code

- $15 / one million output tokens => $100 = 3.6 million output tokens = 4.6 million lines of JS code

For one developer? In one day? It seems that one can only reach such amounts if the whole codebase is sent again as context with each and every interaction (maybe even with every keystroke for type completion?) -- and that seems incredibly wasteful?

bambax · 2025-04-19T19:43:10 1745091790

I can't edit the above comment, but there's obviously an error in the math! ;-) Doesn't change the point I was trying to make, but putting this here for the record.

33.3 million input tokens / 7 tokens per loc = 4.8 million locs

3.6 million output tokens / 7 tokens per loc = 515k locs

cma · 2025-04-19T18:06:05 1745085965

That's how it works, everything is recomputed again every additional prompt. But it can cache the state of things and restore for a lower fee, and reingesting what was formerly output is cheaper than making new output (serial bottleneck) so sometimes there is a discount there.

dannersy · 2025-04-20T07:44:28 1745135068

I'm waiting for the day this AI bubble bursts since as far as we can tell almost all these AI "providers" are operating at a loss. I wonder if this billing model actually makes profit or if it's still just burning cash in hopes of AGI being around the corner. We have yet to see a product that is useful and affordable enough to justify the cost.

timmytokyo · 2025-04-20T10:09:10 1745143750

It's burning cash. Lots of it.

[0] https://www.wheresyoured.at/openai-is-a-systemic-risk-to-the...

dannersy · 2025-04-20T11:18:11 1745147891

Great article, thanks. Mirrors exactly what the JP Morgan/Goldman report claimed but that was quite dated.

ghuntley · 2025-04-19T17:06:29 1745082389

It sounds insane until you drive full agentic loops/evals. I'm currently making a self-compiling compiler; no doubt you'll hear/see about it soon. The other night, I fell asleep and woke up with interface dynamic dispatch using vtables with runtime type information and generic interface support implemented...

UltraSane · 2025-04-19T17:17:50 1745083070

Do you actually understand the code Claude wrote?

cpursley · 2025-04-19T19:00:48 1745089248

Do you understand all of the code in the libraries that your applications depend on? Or your coworker for that matter?

All of the gate keeping around llm code tools are amusing. But whatever, I’m shipping 10x and making money doing it.

UltraSane · 2025-04-20T04:13:30 1745122410

Up until recently I could be sure they were written by a human.

But if you are making money by using LLMs to write code then all power to you. I just despair at the idea of trillions of lines of LLM generated code.

cpursley · 2025-04-20T11:20:05 1745148005

Well, you can’t just vibe code something useful into existence despite all the marketing. You have to be very intentional about which libraries it can use, code style etc. Make sure it has the proper specifications and context. And review the code, of course.

zomglings · 2025-04-19T17:17:52 1745083072

Fair enough. That's pretty cool, I haven't gone that far in my own work with AI yet, but now I am inspired to try.

The point is to get a pipeline working, cost can be optimized down after.

m00dy · 2025-04-19T15:18:16 1745075896

Seriously? That’s wild. What kind of CS field could even handle that kind of daily spend for a bunch of people?

ghuntley · 2025-04-19T15:22:51 1745076171

Consider L5 at Google: outgoings of $377,797 USD per year just on salary/stock, before fixed overheads such as insurance, leave, issues like ramp-up time and cost of their manager. In the hands of a Staff+ engineer, these tools enable replication of Staff+ engineers and don't sleep. My 2c: the funding for the new norm will come from either compressing the manager layer or engineering layer or both.

malfist · 2025-04-19T15:55:58 1745078158

LLMs absolutely don't replicate staff+ engineers.

If your staff engineers are mostly doing things AI can do, then you don't need staff. Probably don't even need senior

ghuntley · 2025-04-19T16:03:30 1745078610

That's my point.

- L3 SWE II - $193,712 USD (before overheads)

- L4 SWE III - $297,124 USD (before overheads)

- L5 Senior SWE - $377,797 USD (before overheads)

These tools and foundational models get better every day, and right now, they enable Staff+ engineers and businesses to have less need for juniors. I suspect there will be [short-to-medium-term] compression. See extended thoughts at https://ghuntley.com/screwed

StefanBatory · 2025-04-19T17:48:21 1745084901

I wonder what will happen first - will companies move to LLMs, or to programmers from abroad (because ultimately, it will be cheaper than using LLMs - you've said ~$500 per day, in Poland ~$1500 will be a good monthly wage - and that still will make us expensive! How about moving to India, then? Nigeria? LATAM countries?)

throwawayb299 · 2025-04-19T19:45:02 1745091902

> in Poland ~$1500 will be a good monthly wage

The minimum wage in Poland is around USD 1240/month. The median wage in Poland is approximately USD 1648/month. Tech salaries are considerably higher than the median.

Idk, maybe for an intern software developer it's a good salary...

StefanBatory · 2025-04-19T19:52:48 1745092368

Minimal is ~$930 after taxes, though; I rarely see people talk here about salary pre-tax, tbh.

~$1200 is what I'd get paid here after a few years of experience; I have never saw an internship offer in my city that paid more than minimal wage (most commonly, it's unpaid).

ghuntley · 2025-04-19T18:14:46 1745086486

The industry has tried that, and the problems are well known (timezones, unpredictable outcomes in terms of quality and delivery dates)...

Delivery via LLMs is predictable, fast, and any concerns about outcome [quality] can be programmed away to reject bad outcomes. This form of programming the LLMs has a one-time cost...

breckenedge · 2025-04-20T00:15:03 1745108103

> These […] get better every day.

They do, but I’ve seen a huge slowdown in “getting better” in the last year. I wonder if it’s my perception, or reality. Each model does better on benchmarks but I’m still experiencing at least a 50% failure rate on _basic_ task completion, and that number hasn’t moved higher in many months.

cpursley · 2025-04-19T19:02:33 1745089353

Oh but they absolutely do. Have you not used any of this llm tooling? It’s insanely good once you learn how to employ it. I no longer need a front end team, for example. It's that good at TypeScript and React. And the design is even better.

mmikeff · 2025-04-19T15:24:03 1745076243

The kind of field where AI builds more in a day than a team or even contract dev does.

ghuntley · 2025-04-19T15:27:11 1745076431

correct; utilised correctly these tools ship teams of output in a single day.

rudedogg · 2025-04-19T16:42:52 1745080972

Do you have a link to some of this output? A repo on Github of something you’ve done for fun?

I get a lot of value out of LLMs but when I see people make claims like this I know they aren’t “in the trenches” of software development, or care so little about quality that I can’t relate to their experience.

Usually they’re investors in some bullshit agentic coding tool though.

ghuntley · 2025-04-19T17:16:42 1745083002

I will shortly; am building a serious self-compiling compiler rn out of an brand-new esoteric language. Meaning the LLM is able to program itself without training data about the programming language...

lostmsu · 2025-04-19T17:35:40 1745084140

I would hold on on making grand claims until you have something grand to show for it.

ghuntley · 2025-04-19T17:43:54 1745084634

Honestly, I don't know what to make of it. Stage 2 is almost complete, and I'm (right now) conducting per-language benchmarks to compare it to the Titans.

Using the proper techniques, Sonet 3.7 can generate code in the custom lexical/stdlib. So, in my eyes, the path to Stage 3 is unlocked, but it will chew lots and lots of tokens.

throwawayb299 · 2025-04-19T19:35:20 1745091320

> a serious self-compiling compiler

Well, virtually every production-grade compiler is self-compiling. Since you bring it up explicitly, I'm wondering what implications of begin self-compiling you have in mind?

> Meaning the LLM is able to program itself without training data about the programming language...

Could you clarify this sentence a bit? Does it mean the LLM will code in this new language without training in it before hand? Or is it going to enable the LLM to programm itself to gain some new capabilities?

Frankly, with the advent of coding agents, building a new compiler sounds about as relevant as introducing a new flavor of assembly language and then a new assembly may at least be justified by a new CPU architecture...

sbszllr · 2025-04-19T15:26:02 1745076362

All can be true depending on the business/person:

1. My company cannot justify this cost at all.

2. My company can justify this cost but I don't find it useful.

3. My company can justify this cost, and I find it useful.

4. I find it useful, and I can justify the cost for personal use.

5. I find it useful, and I cannot justify the cost for personal use.

That aside -- 200/day/dev for a "nice to have service that sometimes makes my work slightly faster" is much in the majority of the world.

zomglings · 2025-04-17T14:12:13 1744899133

Reinforcement Learning.

I hate acronyms with a fierce passion.

coolThingsFirst · 2025-04-17T14:35:12 1744900512

While i do agree that acronyms can be PITA, AFAIK RL seems to truly lead to AGI. ICBA to provide more detail.

zomglings · 2025-04-17T14:39:09 1744900749

My blood pressure just tripled.

zomglings · 2025-04-04T01:22:34 1743729754

This is the quiz Claude created about Hacker News:

Question: What's interesting about Hacker News?

1. Hacker News was created by Paul Graham in February 2007, initially called "Startup News" or "News.YC" before receiving its current name on August 14, 2007.

2. Hacker News users need to accumulate 501 "karma" points before they're allowed to downvote content, as part of measures to prevent the "Eternal September" phenomenon.

3. Hacker News was designed as a collaborative project between Y Combinator and Reddit, with Reddit co-founders helping develop the initial moderation algorithms.

Which statement do you think is the twist?

zomglings · 2025-04-01T11:40:15 1743507615

I'm constantly waiting for other players.

Classic cold-start problem for multiplayer games like this. Traditionally, game developers add bots to combat this problem.

In your case, because the players are supposed to spot the bot, you can't even add bots without compromising your game.

zomglings · 2025-03-29T20:56:11 1743281771

You feel comfortable uploading your family images to OpenAI servers?

_zoltan_ · 2025-03-29T21:08:53 1743282533

what exactly do you think they are going to do with it?

zomglings · 2025-03-29T21:10:50 1743282650

Use it as training data for their models.

_zoltan_ · 2025-03-29T22:46:13 1743288373

it will be one of the pictures in tens/hundreds of millions of pictures, maybe.

zomglings · 2025-03-27T20:37:58 1743107878

I received a 500 response when I attempted to create an MCP server for an API.

I was using this URL: https://engineapi.moonstream.to/metatx/openapi.json

The response body:

    {success: false, error: "Server URL must start with https:// or http://"}

dan-kwiat · 2025-03-27T21:57:07 1743112627

Thanks, I just added support for relative URLs, try again. Your OpenAPI spec defines a relative base URL for the server /metatx but no domain. You can now specify the full base URL in your MCP client's environment variables:

`OPEN_MCP_BASE_URL="https://engineapi.moonstream.to/metatx"`

zomglings · 2025-03-24T21:14:08 1742850848

Can you explain how you produce the opcodes from a text/image?

edflsafoiewq · 2025-03-25T01:34:49 1742866489

Looks SDF-ish. My guess is convert the letters to curves, cut them into pieces without holes so each piece is the set of points between certain curves, write down SDFs for each curve (negative on the "inside", positive on the "outside"), combine all SDFs in a piece with intersection (max), then combine all pieces with union (min).

iamwil · 2025-03-24T22:16:59 1742854619

the text contains a math expression. The math expression can be interpreted as an abstract syntax tree. The AST can also be translated into bytecode and interpreted by a stack-based virtual machine.

The op-codes are where you do that translation from an AST into bytecode. The entire 2nd half of the crafting interpreters book will guide you through how it's done. https://craftinginterpreters.com/a-bytecode-virtual-machine....

zomglings · 2025-03-09T18:18:05 1741544285

I'm not sure everyone in a country "knowing" to drive on the same side of the road is an example of a Schelling point -- drivers are trained to do this.

Also not sure if fads, like "everyone orders a flat white", are instances of Schelling points, but that seems more reasonable as a Schelling point than driving on the same side of the road.

More generally, didn't really understand the point of this article. I guess the author is trying to say that as technology improves, people are gaining the ability to customize their experiences. Framing this as "anti-Schelling points" doesn't make sense to me - what shared game is being played? At its most game-like, you could say that people are just trying to maximize their own utility without worrying (or having to worry) about shared economy of scale.

tedunangst · 2025-03-09T19:59:00 1741550340

The point is you can talk about game theory if you make everything about game theory.

ano-ther · 2025-03-09T20:08:35 1741550915

> I'm not sure everyone in a country "knowing" to drive on the same side of the road is an example of a Schelling point -- drivers are trained to do this.

The driving side is usually also the walking side. When I travel to a country with another orientation, I bump into people on the sidewalk or corridors a few times before adjusting. Same on the way back, all without driving a car myself.

zomglings · 2025-03-10T02:29:39 1741573779

Although the driving side isn't a Schelling point, the walking direction matching the driving direction is.

Suppafly · 2025-03-10T17:30:28 1741627828

> the walking direction matching the driving direction is.

Not really though since that's also dictation and not something that naturally occurs. In fact you usually have to teach children to walk with vehicle traffic, because the natural inclination is to walk against it so that you can see the cars coming and move out of the way vs walking with traffic and hoping they maintain correct distance from you.

harrison_clarke · 2025-03-09T18:55:10 1741546510

in the case of drink orders, there's a slight benefit to ordering something unique (at least unique within the queue you're standing in). you don't have to remember your place in line, or negotiate with someone else about who was there first

the space of possible drink orders isn't so large that you'll be collision-free by default (like UUIDs), so there's some incentive to guess what other people will order, and adjust your order to avoid collisions

thaumasiotes · 2025-03-09T19:58:41 1741550321

> in the case of drink orders, there's a slight benefit to ordering something unique (at least unique within the queue you're standing in). you don't have to remember your place in line, or negotiate with someone else about who was there first

A normally-functioning vendor would call out the completed order by order number, so this problem just can't arise. You can't take someone else's order identical to yours any more than you can take someone else's order for ten times as much food as you purchased.

harrison_clarke · 2025-03-10T16:57:58 1741625878

i don't think i've ever seen a cafe do that (unless you count mcdonald's as a cafe). even the starbuck's thing of using your name is rare, but others have picked it up

most cafes i've been in, the queue is usually short enough that the barista knows who ordered what. at peak times, they can't keep track of it, though, so the customers keep track themselves and it usually works well enough that nobody is going to optimize it

Suppafly · 2025-03-10T17:32:39 1741627959

> at peak times, they can't keep track of it

They usually have FIFO system anyway, so even if everyone was ordering the same thing, it wouldn't matter. I suppose a situation where you have multiple baristas and some work faster than others or some people have simpler orders might force customers to keep track though.

brookst · 2025-03-09T19:42:40 1741549360

“I have a flat white for Steve” is comedy gold at a crowded starbucks.