That's pretty good, I've achieved pretty much the same thing using the vercel's agent-browser, but I've tried playwright and it worked easily as good. Its good for scraping, automating stuff in the browser.
agent-browser uses playwright so it struggles with things like cross-origin-iframes - on the other hand, browser harness uses raw cdp, which is unrestrictive. It's discussed in this blog post! https://browser-use.com/posts/bitter-lesson-agent-harnesses
Yeah I just created a playwright cli skill in about 30 minutes and I've been using it for months. It is a bit slow but I occasionally try other things like this and they are slow too so maybe that's just inherent.
I am building a platform for helping stray dogs/cats pro actively. The idea is whenever you see an animal in need in an unfamiliar area, you can use it to help the animal get into the right hands. Meaning that you share your GPS and it automatically sends you the nearest clinics, relevant government institution, phone numbers of people that are monitoring for such things and local facebook groups that you can post on. I am also exploring a ways to post automatically in a facebook group whenever a signal is received. There are a lot of challenges however with the platform as it needs to be mobile-first and easy to use for the people to actually start using it.
Yea, the thing is that currently this system works only for my country (Bulgaria) and to expand this further and to be effective deeply requires local integrations, but then again there are so many problems to be dealt with.
I am actively using ohmypi harness which is based on pi-mono which I believe is within OpenClaw, I don't personally use OpenClaw but I suspect that I will be affected. The reason that I use ohmypi is because I can extend it and put guardrails specific for our company and myself (those are different from SKILLs and more sophisticated than the hooks) + I like the ability to start "tasks" with faster models like gpt5.4-mini for certain tasks and overall have the multi-model capabilities, now all of this seems impossible. I have the $20 sub from OpenAI and it seems that the usage is similar to the $100 plan by Anthropic, I am extensively using GPT5.4 to review and sometimes code along with Opus, right now it seems to me that OpenAI is winning, I can just go with the $200 unlimited usage by OpenAI and use 5.4/5.4mini for everything. On top of that the Chinese models are really capable at the moment, I've tried StepFun and it's really good. Seems to me that Anthropic is sabotaging themselves with those moves. But it is what it is, the cycle of model switching has begun again, I strongly believe that in 2-3 months they will revert that and we will switch models again. :D
I built a harness where my plans and code are reviewed with 'claude -p' but most work is interactive, now it has been wrecked. I relied and integrated with Anthropic to get burned. I'm not even maxing out my plan, never surpassed 60%. But now I have to pay API pricing on top? This tells me how trustworthy Anthropic is. If you depend on any specific feature you are at their mercy.
Prior to Anthropic I have had bad experiences with Windsurf and Cursor, same shit - I pay the plan, they shrink my usage quota after a short time, couple of months or weeks. I never returned to Windsurf after they abused me, and never used Cursor after I got my Claude sub, I have no idea where I'll end up next. Too bad Anthropic is pushing my $200/mo away.
Yes, but as a producer I would like to have more simplistic generations such as "Generate me 15 variations of a kick that sounds like X", I think stuff like this would be much more useful.
I've tested it just now, very Opus-like experience. The speed is also there so far I think I even like the response of GPT5.4 better than Opus (although very close) I might not distinguish them just yet.
I tried several use cases:
- Code Explanation: Did far much better than Opus, considered and judged his decision on a previous spec that I made, all valid points so I am impressed. TBF if I spawned another Opus as a reviewer I might got similar results.
- Workflow Running: Really similar to Opus again, no objections it followed and read Skills/Tools as it should be (although mine are optimized for Claude)
- Coding: I gave it a straightforward task to wrap an API calls to an SDK and to my surprise it did 'identical' job with Opus, literally the same code, I don't know what the odds are to this but again very good solution and it adhered our rules of implementing such code.
Overall I am impressed and excited to see a rival to Opus and all of this is literally pushing everyone to get better and better models which is always good for us.
The Fourier transform audio examples fooled me. The example sounds and slider for them appeared consistent as far as I could tell... but then again I don't know much about Fourier transforms.
Maybe I'm out of the loop but have to say this is the first time I have seen an LLM generate a webpage with working audio widgets.
No not really, I rewrote that part since it gives the reader the wrong vibe. The RCE is quite unlikely (although possible), I believe however that people at OpenAI should care for such "P5 vulnerabilities" since something minor as this could be chained into something else later on.