I dismissed the earlier non-technical blog post as shameless product boosterism for Anthropic. The linked hacks blog (which is a better source than this article) is a welcome release. It's hard to deny there's something real to this now, I think. Mozilla's internal definition of a "vulnerability" is also probably more widely applied than what many would intuit, but it is good that these issues are being taken seriously and fixed.
I used to work with a guy who would always say "if you're looking for trouble, you are going to find it"
When I hear that "we found X bugs using some new tool", where the standard for bugs is low and doesn't neccessarily require user impact in realistic scenarios, I think to myself- duh! You went looking for bugs, of course you found them.
For a sufficiently complicated product, in my experience, you don't have to look far.
The things I’ve read from various open source orgs with access to it is that Anthropic is giving them unmetered access for now as part of Glasswing. I’d bet that the corporate partners have to pay though.
> if you're looking for trouble, you are going to find it
That's the "'No Way to Prevent This,' Says Only Nation Where This Regularly Happens" of unsafe languages.
There are huge swathes of problems we know how to categorically prevent, but some people won't do it because they're more comfortable believing it was never preventable than accepting any culpability for not preventing it previously.
As the Hacks.Mozilla article notes: "We began with small-scale experiments prompting the harness to look for sandbox escapes with Claude Opus 4.6. Even with this model, we identified an impressive amount of previously-unknown vulnerabilities which required complex reasoning over multiprocess browser engine code."
Agreed. The earlier blog post did not explicitly claim this, but I think casual viewers were prompted to believe that the Magic of Mythos (TM) went and found (and fixed??) a bunch of vulnerabilities with minimal human guidance, and even contrasted this with their fuzzing infrastructure and made it sound (to me) like it was casting shade on it.
This new post makes it pretty clear that this was all bolted on-top of their existing fuzzing infrastructure, and really just used to get more and better initial hits that a very skilled team is looking at. I assume Anthropic was giving them a very good deal on inference for the positive PR, but I believe these other reports and suspect Mozilla did not really need them.
Wasn't AISLE only able to find the same bugs when it was shown only the known faulty code? The worrying part about Mythos isn't the fact that it can find bugs. The worrying part is Mythos being able to find them on its own across entire code base as vast as Firefox then write exploits for what its found with a very basic prompt.
The skill required to find then create zero days is quickly approaching the floor.
I think they split the codebase in smaller files or modules and then tell the AI there's a bug in this particular file and to go find it.
Then they loop over a codebase like this. This way you always point a model at a 'known' bug. And I assume a smaller context window helps with quality.
I don't know what to call this - a "freelancer launch"? It is the best executed one I've seen, though. Maybe even a black-mark on OSS if it does not go well.
> Maybe even a black-mark on OSS if it does not go well.
No, because realistically, this is the opposite of what corporations want. If a project is only being maintained by one or two people, that’s a risk, pure and simple. So you look somewhere else for something that matches your needs, with a more sustainable story.
Nothing against the author, but what he’s describing is a business model - just one that’s likely to bring in a negligible amount of money. This is less about open source and more about what kinds of projects society is willing to pay people to work on.
Corporations seem to rely on key software that just a few people maintain all the time already, but you're right and the bus factor does not look great. Mise is also currently MITMing my shell, along with presumably many other dev machines, so the threat of compromise is pretty scary.
The Mise website makes way more sense to me now. I suppose some artistic license is justified when you're at the cutting-edge of the CLI aesthetic and what not.
Anti-trust. They're selling part of the problem (inference via Gemini) and now they're selling a solution. They also dominate web standards by developing the dominant browser. And they control one of two dominant phone platforms that will collaborate to enable this solution.
If this were some smaller company that just did cloud then it'd never even make it to PoC. This can only happen because it's Google Cloud, and they can leverage everything they own all at once. Those not buying into their ecosystem can take a hike.
Such a bizarre boondoggle for a company that otherwise seems to have smart and focused offerings. They may as well announce a Slack or Jira replacement next.
I wasn't aware, and I even paid for their search for a time.
I still do not understand the market for a proprietary browser aimed at privacy-conscious power-users. There is a non-proprietary option that is many years ahead, along with another proprietary browser marketing to the same niche demographic. Good luck to them.
I hope the only reason people are pretending these markdown suggestions are a "workflow" is fear that a more structured approach will be obsolete by the time it's polished. I can't imagine the pace of innovation with the underlying models will stay like this forever.
I hope to see harnesses that will demand instead of ask. Kill an agent that was asked to be in plan mode but did not play the prescribed planning game. Even if it's not perfect, it'd have to better than the current regime when combined with a human in the loop.
All of the three sectors you've mentioned are not in a good place right now. Probably much less stressful to be an unemployed programmer than trying to make a hobby-scale farm profitable with soaring fuel and fertilizer prices, along with a labor force that is fleeing.
E: Farm automation probably has some juice though, regardless of how close the androids I keep seeing in demos actually are.
reply