More

mulmboy · 2025-12-07T08:09:43 1765094983

What do these look like?

pmg101 · 2025-12-07T11:14:30 1765106070

  1. Take every single function, even private ones.
  2. Mock every argument and collaborator.
  3. Call the function.
  4. Assert the mocks were  called in the expected way.

These tests help you find inadvertent changes, yes, but they also create constant noise about changes you intend.

senbrow · 2025-12-07T18:25:01 1765131901

These tests also break encapsulation in many cases because they're not testing the interface contract, they're testing the implementation.

ornornor · 2025-12-07T16:31:11 1765125071

Juniors on one of the teams I work with only write this kind of tests. It’s tiring, and I have to tell them to test the behaviour, not the implementation. And yet every time they do the same thing. Or rather their AI IDE spits these out.

girvo · 2025-12-07T21:53:12 1765144392

You beat me to it, and yep these are exactly it.

“Mock the world then test your mocks”, I’m simply not convinced these have any value at all after my nearly two decades of doing this professionally

mulmboy · 2025-12-03T06:05:33 1764741933

> Everything it does can be done reasonable well with list comprehensions and objects that support type annotations and runtime type checking (if needed).

I see this take somewhat often, and usually with similar lack of nuance. How do you come to this? In other cases where I've seen this it's from people who haven't worked in any context where performance or scientific computing ecosystem interoperability matters - missing a massive part of the picture. I've struggled to get through to them before. Genuine question.

mulmboy · 2025-11-30T00:57:18 1764464238

It does largely avoid the issue if you configure to allow only specific environments AND you require reviews before pushing/merging to branches in that environment.

https://docs.pypi.org/trusted-publishers/adding-a-publisher/

For a malicious version to be published would then require full merge which is a fairly high bar.

AWS allows similar

LtWorf · 2025-11-30T01:17:15 1764465435

As we're seeing, properly configuring github actions is rather hard. By default force pushes are allowed on any branch.

mulmboy · 2025-11-30T01:38:29 1764466709

Yes and anyone who knows anything about software dev knows that the first thing you should do with an important repo is set up branch protections to disallow that, and require reviews etc. Basic CI/CD.

This incident reflects extremely poorly on PostHog because it demonstrates a lack of thought to security beyond surface level. It tells us that any dev at PostHog has access at any time to publish packages, without review (because we know that the secret to do this is accessible from plain GHA secret which can be read from any GHA run which presumably run on any internal dev's PR). The most charitable interpretation of this is that it's consciously justified by them because it reduces friction, in which case I would say that demonstrates poor judgement, a bad balance.

A casual audit would have revealed this and suggested something like restricting the secret to a specific GHA environment and requiring reviews to push to that env. Or something like that.

LtWorf · 2025-11-30T10:36:06 1764498966

Nobody understands github. I guess someone at microsoft did but they probably got fired at some point.

You can't really fault people for this.

It's literally the default settings.

mulmboy · 2025-11-15T02:10:43 1763172643

Along with a bunch of limitations that make it useless for anything but trivial use cases https://docs.claude.com/en/docs/build-with-claude/structured...

I've found structured output APIs to be a pain across various LLMs. Now I just ask for json output and pick it out between first/last curly brace. If validation fails just retry with details about why it was invalid. This works very reliably for complex schemas and works across all LLMs without having to think about limitations.

And then you can add complex pydantic validators (or whatever, I use pydantic) with super helpful error messages to be fed back into the model on retry. Powerful pattern

ACCount37 · 2025-11-15T10:46:22 1763203582

Yeah, the pattern of "kick the error message back to the LLM" is powerful. Even more so with all the newer AIs trained for programming tasks.

mulmboy · 2025-09-18T01:45:14 1758159914

Big missing piece - what was the impact of the degraded quality?

Was it 1% worse / unnoticeable? Did it become useless? The engineering is interesting but I'd like to see it tied to actual impact

cpursley · 2025-09-18T08:42:29 1758184949

Significant, check any Claude related thread here over the last month or the Claude Code subreddit. Anecdotally, the degradation has been so bad that I had to downgrade to a month old version - which has helped a lot. I think part of the problem is there as well (Claude Code).

mulmboy · 2025-06-29T07:34:07 1751182447

We operate a saas where a common step is inputting rates of widgets in $/widget, $/widget/day, $/1kwidgets, etc etc. These are incredibly tedious and error prone to enter. And usually the source of these rates is an invoice which presents them in ambiguous ways e.g. rows with "quantity" and "charge" from which you have to back calculate the rate. And these invoices are formatted in all different ways.

We offer a feature to upload the invoice and we pull out all the rates for you. Uses LLMs under the hood. Fundamentally it's a "chatgpt wrapper" but there's a massive amount of work in tweaking the prompts based on evals, splitting things up into multiple calls, etc.

And it works great! Niche software, but for power users were saving them tens of minutes of monotonous work per day and in all likelihood entering things more accurate. This complements the manual entry process with full ability to review the results. Accuracy is around 98-99 percent.

mulmboy · 2025-06-26T01:36:55 1750901815

I gave it a shot just now with a fairly simple refactor. +19 lines, -9 lines, across two files. Totally ballsed it up. Defined one of the two variables it was meant to, referred to the non-implemented one. I told it "hey you forgot the second variable" and then it went and added it in twice. Added comments (after prompting it to) which were half-baked, ambiguous when read in context.

Never had anything like this with claude code.

I've used Gemini 2.5 Pro quite a lot and like most people I find it's very intelligent. I've bent over backwards to use Gemini 2.5 Pro in another piece of work because it's so good. I can only assume it's the gemini CLI itself that's using the model poorly. Keen to try again in a month or two and see if this poor first impression is just a teething issue.

I told it that it did a pretty poor job and asked it why it thinks that is, told it that I know it's pretty smart. It gave me a wall of text and I asked for the short summary

> My tools operate on raw text, not the code's structure, making my edits brittle and prone to error if the text patterns aren't perfect. I lack a persistent, holistic view of the code like an IDE provides, so I can lose track of changes during multi-step tasks. This led me to make simple mistakes like forgetting a calculation and duplicating code.

luckydata · 2025-06-26T02:11:19 1750903879

I noticed a significant degradation of Gemini's coding abilities in the last couple checkpoints of 2.5. the benchmarks say it should be better but it doesn't jive with my personal experience.

tom_m · 2025-06-26T01:42:48 1750902168

Oh interesting. I have yet to try it. I love Gemini 2.5 Pro, so I expect the same here...but if not, wow. That would be a big whoops on their part.

mulmboy · 2025-04-24T04:36:45 1745469405

Nice.

Can run with `uvx --from lmnr-index --python 3.12 index run`

mulmboy · 2025-04-21T08:15:23 1745223323

Are there string prefixes for i18n stuff?

Biganon · 2025-04-21T08:35:47 1745224547

They're probably talking about the convention of using _ as an alias for `translate`

mulmboy · 2025-03-26T09:59:23 1742983163

What MCP servers do you use?

cadamsdotcom · 2025-03-26T23:19:16 1743031156

I’m using this one: https://github.com/wonderwhy-er/DesktopCommanderMCP

Got it from this guide: https://www.youtube.com/watch?v=ly3bed99Dy8