> It worked for nine and a half hours. > Again, it wasn’t perfect. As an expert,...

matneyx · 2026-06-09T18:00:58 1781028058

In Claude's defense (and I cannot believe I'm defending it), I know no single dev who could create what it did (Concord), from a 19-page design document, in 9.5 working hours.

We're gonna go back to the days where our bosses ask why we're just sitting around, but instead of saying "compiling," we'll just say, "waiting for Claude."

petesergeant · 2026-06-09T19:39:50 1781033990

Sadly I didn't get very many answers to my Ask HN, "What are you doing during inference?": https://news.ycombinator.com/item?id=47944917

ModernMech · 2026-06-09T19:44:03 1781034243

I alt-tab to a MMO and farm XP.

mattbettinson · 2026-06-09T20:24:00 1781036640

Which one?

ModernMech · 2026-06-10T12:32:29 1781094749

I've been playing Monsters and Memories, basically an Everquest clone.

https://monstersandmemories.com

It's in private beta but sometimes they have a public beta, like just last week. They were supposed to have released this month but they pushed back to October.

Also check out Adrullan Online, it's also an EQ clone but Minecraft voxel style. More like alpha status, they don't seem as far along.

magarnicle · 2026-06-10T04:12:36 1781064756

Drawing.

giancarlostoro · 2026-06-09T19:22:46 1781032966

This. I get told things like "you can't build all that on your own?" I've had Claude poop out full feature web apps in under 30 minutes, to a spec. Was it perfect? No, but sometimes even in a simple setup phase you can burn 15 minutes to some obscure setup step that's failing. I cannot just code nonstop at 900WPM or whatever ridiculous speed, and poop out an entire full feature web app, with maybe a few bugs here or there. If you can, come show me, I'll gladly have you race against my Claude prompting capabilities.

Will Claude's code be perfect in one shot? Probably not, will it get you 80 to 90% of the way there with your chosen design patterns in under a few hours? Absolutely.

toss1 · 2026-06-09T21:42:36 1781041356

>>If you can, come show me, I'll gladly have you race against my Claude prompting capabilities.

Sounds like we've nearly reached in coding the point where Paul Bunyan [0] has his epic competition with the chainsaw... and loses by 1/4" and history forever changes...

[0]https://www.britannica.com/topic/Paul-Bunyan

dyauspitr · 2026-06-10T05:45:26 1781070326

And honestly, it will get you the rest of the 10-20% with a little bit of yelling at it once it’s done

torginus · 2026-06-09T22:30:01 1781044201

I tried to read the 'design doc' - its slop full of vague platitudes and impressive sounding but impossible to pin down management speak - in short, it's slop, and I still don't really get what its supposed to do exactly.

It's some prompt engineered AI harness, that guides the AI to create stats after it researches a subject and ingests the data, but I'm not sure what is it that the tool actually does on top of this.

neogodless · 2026-06-09T18:05:05 1781028305

For the rare uninitiated:

https://xkcd.com/303/

giancarlostoro · 2026-06-09T19:21:14 1781032874

> At the same time, it is very dissonant to see the industry heading towards hour+ long workflows with an agent.

At this point, pay me significantly more, and I'll do it.

warkdarrior · 2026-06-09T20:38:28 1781037508

> pay me significantly more

Ha ha, that's how you negotiate yourself out of a job!

giancarlostoro · 2026-06-09T22:40:01 1781044801

Fire me then, I can bring someone else drastically more value with AI tooling.

swader999 · 2026-06-10T05:00:04 1781067604

"I can bring your competitors drastically more value with AI tooling"

PeterStuer · 2026-06-09T18:02:49 1781028169

My Opus 4.8 regularly works for 10+minutes on a single non-trivial coding request.

ASalazarMX · 2026-06-09T18:52:31 1781031151

Your Opus 4.8? Is it now usual to refer to LLMs like that?

wongarsu · 2026-06-09T19:21:45 1781032905

Isn't it common to refer to all software like that? "Let my look at my JIRA", "I can't find anything using my Outlook's search function", "My Powerpoint is acting up today", "My browser just crashed" are all sentences I might say during a normal work day

calvinmorrison · 2026-06-09T19:32:57 1781033577

better than "The JIRA" , or "The Google" or "The Spotify"

DonHopkins · 2026-06-10T13:47:37 1781099257

"The Facebook"

hypfer · 2026-06-09T19:27:03 1781033223

Depends on the demographic I think. And also tells you surprisingly much about how the brain of person uttering it works.

There are people that almost feel physical pain if something is unnecessarily incorrect.

+ That if the mental model of something is accurate, it is actually _more_ work to say something that is incorrect than just saying the correct thing.

wongarsu · 2026-06-09T19:41:00 1781034060

In my mental model, "my Outlook" is the outlook instance running on my computer, on my data. My outlook crashed today. Yours might not have crashed. Similarly, my Jira contains tickets about my work, your Jira does not contain those same tickets. That might be technically the same instance on the same SaaS server, but the server I'm routed to accessing my data with my credentials turns it into "my Jira". My Jira is slow. Maybe you are lucky and get routed to a faster server, or your company is self-hosting. Then your Jira might be reasonably fast

ASalazarMX · 2026-06-09T20:01:44 1781035304

This is completely fine, as those are your own installs, but LLMs can't be owned by the users, your Opus is the same Opus as everyone else's, your only difference is the suscription tier to their API.

If you had your own on-premises LLM, that would indeed be your LLM, and it would make sense to compare it to the on-premises LLMs of other people, as your setup particulars would affect the result.

dasyatidprime · 2026-06-09T20:28:14 1781036894

The copyright to the Outlook binary isn't owned by the users either, even if they're running it on local hardware. The Opus 4.8 weights are (we assume) the same between users, but the conversation/tooling state is not shared between them by default. I prefer to route around this construction myself, since I do think there's some ontological slippery-slope potential, but from a lexical perspective I think “my” is a perfectly defensible abbreviation in context.

hypfer · 2026-06-09T20:43:03 1781037783

> The copyright to the Outlook binary isn't owned by the users either, even if they're running it on local hardware

There was a time where one actually bought software to own it.

This time is.. actually it is right now. Please leave at once.

hypfer · 2026-06-09T19:44:21 1781034261

Hmm, good point. "My outlook" might actually be correct. Depending on if it is a webapp or the real one running on your device that is.

Similiar to "My game just crashed".

Jira otoh is not yours, because it's in the cloud. It might be "my internet connection", "my browser" or "my account" that is having trouble.

___

Hm. "My train got delayed" is interesting in this context. I don't find that offensive. But that also might be because trains don't seek rent the way SaaS does? Not sure.

I guess trains do not hold me hostage. They might just be a container in which someone does that.

Jira, cloud LLM inference or similar otoh..

ASalazarMX · 2026-06-09T21:35:48 1781040948

The "my train" convention is an interesting argument. It's not actually yours, you're buying a train-as-a-service single-use license, and there are tiers to that too.

I guess the main difference is that TAAS has many different trains where the experience varies wildly, so it helps to be specific on which train you're licensing; but LLMs are the same product for everyone, and you can't stay with say, ChatGPT 1.0, you get the same choices as everyone else.

RugnirViking · 2026-06-09T20:56:57 1781038617

> tells you surprisingly much about how the brain of person uttering it works

That's ridiculous. You wouldn't respond to "I went to visit my doctor yesterday" with "but slavery has been illegal since forever!" Similarly it would be foolish to respond to "where should we meet? my place or yours" with "but we both rent!"

w4yai · 2026-06-09T19:17:20 1781032640

You don't have your Opus 4.8 ? I got mine yesterday !

ASalazarMX · 2026-06-09T21:24:23 1781040263

I didn't get mine, but I suspect I might be using yours when I use it.

PeterStuer · 2026-06-10T07:30:35 1781076635

I probably should have used 'Opus 4.8 in my Claude Code configuration'. The model and harnass might be yhe same for everyone, but the .md's, hooks, skills, agents, MCP ... configurations make everyone's setup fairly unique.

giancarlostoro · 2026-06-09T19:31:37 1781033497

That's pretty tame, if you want to be disturbed check out r/MyBoyfriendIsAI

2026-06-09T21:08:53 1781039333

[dead]

giancarlostoro · 2026-06-09T22:39:28 1781044768

You what now? lol

hedgehog · 2026-06-09T18:11:42 1781028702

Work duration is also not that valuable of a measure, you're usually better off defining the process yourself in code and having that delegate chunks of work to the models. The only real issue there is that it's harder to take advantage of the providers' subscription discounts, but on the other hand it's easier to do your own model routing, and there's no way I've seen for the normal chatbots to maintain coherence on streams of work measured in days and weeks.

cyanydeez · 2026-06-09T19:28:54 1781033334

I think we hit the sigmoid back when the QWEN models were released. By properly structuring my project, I can point it at any extension I want and get it going for 30 minutes to extend whatever. It can't effectively do 'god mode' on all the code, but being a mindful observer and code "professional" I don't need more than what a 128GB VRAM needs.

I'm amazed we're so far into SOTA bloat that the chinese will kill once they start etching silicon with these models.