Hacker Newsnew | past | comments | ask | show | jobs | submit | blixt's commentslogin

Since AI became capable of long-running sessions with tool calls, one VM per AI as a service became very lucrative. But I do think a large amount of these can indeed run in the browser, especially all the ones that essentially just want to live-update and execute code, or run shells on top of a mounted file system. You can actually do all of this in the user's browser very efficiently. There are two things you lose though: collaboration (you can do it, but it becomes a distributed problem if you don't have a central server) and working in the background (you need to pause all work while the user's tab is suspended or closed).

So if you can work within the constraints there are a lot of benefits you get as a platform: latency goes down a lot, performance may go up depending on user hardware (usually more powerful than the type of VM you'd use for this), bandwidth can go down significantly if you design this right, and your uptime and costs as a platform will improve if you don't need to make sure you can run thousands of VMs at once (or pay a premium for a platform that does it for you)[1]

All that said I'm not sure trying to put an entire OS or something like WebContainers in the user's browser is the way, I think you need to build a slightly custom runtime for this type of local agentic environment. But I'm convinced it's the best way to get the smoothest user experience and smoothest platform growth. We did this at Framer to be able to recompile any part of a website into React code at 60+ frames per second, which meant less tricks necessary to make the platform both feel snappy and be able to publish in a second.

[1] For big model providers like OpenAI and Anthropic there's an interesting edge they have in that they run a tremendous amount of GPU-heavy loads and have a lot of CPUs available for this purpose.


I’ve been building on an opinionated provider-agnostic library in Go[1] for a year now and it’s nice to see standardization around the format given how much variety there is between the providers. Hopefully it won’t just be the OpenAI logo on this though.

[1] https://github.com/flitsinc/go-llms


I've gotten to a point where my workflow YAML files are mostly `mise` tool calls (because it handles versioning of all tooling and has cache support) and webhooks, and still it is a pain. Also their concurrency and matrix strategies are just not working well, and sometimes you end up having to use a REST API endpoint to force cancel a job because their normal cancel functionality simply does not take.

There was a time I wanted our GH actions to be more capable, but now I just want them to do as little as possible. I've got a Cloudflare worker receiving the GitHub webhooks firehose, storing metadata about each push and each run so I don't have to pass variables between workflows (which somehow is a horrible experience), and any long-running task that should run in parallel (like evaluations) happens on a Hetzner machine instead.

I'm very open to hear of nice alternatives that integrate well with GitHub, but are more fun to configure.


If the immediate next token probabilities are flat, that would mean the LLM is not able to predict the next token with any certainty. This might happen if an LLM is thrown off by out of distribution data, though I haven't personally seen it happen with modern models, so it was mostly a sanity check. But examples from the past that would cause this have been simple things like not normalizing token boundaries in your input, trailing whitespace, etc. And sometimes using very rare tokens AKA "glitch tokens" (https://en.wikipedia.org/wiki/Glitch_token).


One thing I tend to do myself is use https://generator.jspm.io/ to produce an import map once for all base dependencies I need (there's also a CLI), then I can easily copy/paste this template and get a self-contained single-file app that still supports JSX, React, and everything else. Some people may think it's overkill, but for me it's much more productive than document.getElementById("...") everywhere.

I don't have a lot of public examples of this, but here's a larger project where I used this strategy for a relatively large app that has TypeScript annotations for easy VSCode use, Tailwind for design, and it even loads in huge libraries like the Monaco code editor etc, and it all just works quite well 100% statically:

HTML file: https://github.com/blixt/go-gittyup/blob/main/static/index.h...

Main entrypoint file: https://github.com/blixt/go-gittyup/blob/main/static/main.js


I wondered how you got JSX but it seems you don't quite and the true magic here is the htm library.


Yeah I’ve found that the only way to let AI build any larger amount of useful code and data for a user that does not review all of it requires a lot of “gutter rails”. Not just adding more prompting, because it is an after-the-fact solution. Not just verifying and erroring a turn, because it adds latency and allows the model to start spinning out of control. But also isolating tasks and autofixing output keep the model on track.

Models definitely need less and less of this for each version that comes out but it’s still what you need to do today if you want to be able to trust the output. And even in a future where models approach perfect, I think this approach will be the way to reduce latency and keep tabs on whether your prompts are producing the output you expected on a larger scale. You will also be building good evaluation data for testing alternative approaches, or even fine tuning.


Extrapolating and wildly guessing, we could end up with using all that mostly idle CPU/RAM (the non-VRAM) on the beefy GPUs doing inference on agentic loops where the AI runs small JS scripts in a sandbox (which Bun is the best at, with its faster startup times and lower RAM use, not to mention its extensive native bindings that Node.js/V8 do not have) essentially allowing multiple turns to happen before yielding to the API caller. It would also go well with Anthropic's advanced tool use that they recently announced. This would be a big competitive advantage in the age of agents.


I almost read this as anthropic will be using our idle CPU/GPU resources for their own training tasks ;)


It's a bit odd to come from the outside to judge the internal process of an organization with many very complex moving parts, only a fraction of which we have been given context for, especially so soon after the incident and the post-mortem explaining it.

I think the ultimate judgement must come from whether we will stay with Cloudflare now that we have seen how bad it can get. One could also say that this level of outage hasn't happened in many years, and they are now freshly frightened by it happening again so expect things to get tightened up (probably using different questions than this blog post proposes).

As for what this blog post could have been: maybe a page out of how these ideas were actively used by the author at e.g. Tradera or Loop54.


> how these ideas were actively used by the author at e.g. Tradera or Loop54.

This would be preferable, of course. Unfortunately both organisations were rather secretive about their technical and social deficiencies and I don't want to be the one to air them out like that.


The quirks of field values not matching expectations reminds me of a rabbit hole when I was reverse engineering the Starbound engine[1] and eventually figured out the game was using a flawed implementation of SHA-256 hashing and had to create a replica of it [2]. Originally I used Python [3] which is a really nice language for reverse engineering data formats thanks to its flexibility.

[1] Starbounded was supposed to become an editor: https://github.com/blixt/starbounded

[2] https://github.com/blixt/starbound-sha256

[3] https://github.com/blixt/py-starbound


Wow, blast from the past. There's a fairly recent game on Steam called "Stick it to the Stickman" which practically puts you in control of the character in these animations. In fact I think the game was directly inspired by them (There's a Devolver interview with someone working on the game mentioning it[1]).

[1] https://www.youtube.com/watch?v=LF2kGSAIljU


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: