I could have sworn that sickly sweet smell was the smell of various phosphine reagents? Just my vague recollection of my time in lab from 15 years ago.
Phosphines, for me, were either odorless (for heavy ligand like things like triphenylphosphine) or absolutely rancid fishy mixed with a burnt chemical note. thankfully I never inhaled too many of the light organophosphines, they aren't too healthy...
Does this handle, e.g., water of hydration CaSO4 . 2H2O? states of matter H2O(g)? does it preserve subunit information, as in (C6H5)CH2COOH? Writing a parser for basic formulae is such a tiny tiny part of the actual problem... deciding the scope of what you want to handle and how is the real problem
Great write-up, but I admit that I found the interweaving of human and AI-written content/headlines/summaries pretty distracting. I kept on wanting to scroll past, but had to keep on backtracking to find the human thread again.
I think if you want to give your reader a quick intro to, e.g., what is the Adam optimizer, a simple link to Wikipedia is fine. No need to copy-paste an AI tutorial on Adam into the blog post.
To be fair, you can easily click to hide those expanded sections. I found it a neat compromise between "Link to (usually) obtuse Wikipedia article" which aren't usually written for laypersons, and forcing me to read through stuff I already know about, I just hid the sections I already understood but found value in the others.
This throughput assumes 100% utilizations. A bunch of things raise the cost at scale:
- There are no on-demand GPUs at this scale. You have to rent them for multi-year contracts. So you have to lock in some number of GPUs for your maximum throughput (or some sufficiently high percentile), not your average throughput. Your peak throughput at west coast business hours is probably 2-3x higher than the throughput at tail hours (east coast morning, west coast evenings)
- GPUs are often regionally locked due to data processing issues + latency issues. Thus, it's difficult to utilize these GPUs overnight because Asia doesn't want their data sent to the US and the US doesn't want their data sent to Asia.
These two factors mean that GPU utilization comes in at 10-20%. Now, if you're a massive company that spends a lot of money on training new models, you could conceivably slot in RL inference or model training to happen in these off-peak hours, maximizing utilization.
But for those companies purely specializing in inference, I would _not_ assume that these 90% margins are real. I would guess that even when it seems "10x cheaper", you're only seeing margins of 50%.
Do we know how big the "batch processing" market is? I know the major providers offer 50%+ off for off-peak processing.
I assumed it was to slightly correct this problem and on the surface it seems like it'd be useful for big data places where process-eventually is enough, e.g. it could be a relatively big market. Is it?
A major issue we have right now is, we want the coding process to be more "Agentic", but we don't have an easy way for LLMs to determine what to pull into context to solve a problem. This is a problem that I am working on with my personal AI search assistant, which I talk about below:
Analyzers are the "Brains" for my search, but generating the analysis is both tedious and can be costly. I'm working on the tedious part and with batch processing, you can probably process thousands of files for under 5 dollars with Gemini 2.5 Flash.
With batch processing and the ability to continuously analyze 10s of thousands of files, I can see companies wanting to make "Agentic" coding smarter, which should help with GPU utilization and drive down the cost of software development.
No what I am saying is there are more applications for batch processing that will help with utilization. I can see developers and companies using off hour processing to prep their data for agentic coding.
However, I don’t think these companies provision capacity for peak usage and let it idle during off peak. I think they provision it at something a bit above average, and aim at 100% utilization for the max number of hours in the day. When there is not enough capacity to meet demand they utilize various service degradation methods and/or load shedding.
Is this why I get anthropic/Claude emails every single day since I signed up for their status updates? I just assumed they were working hard with production bugs but in light of this comment, if you don't hit capacity constraints every day, you are wasting money?
If you are willing to spread your workload out over a few regions getting that many GPUs on demand can be doable. You can use something like compute classes on gcp to fallback to different machine types if you do hit stockouts. That doesn't make you impervious from stock outs, but makes it a lot more resilient.
You can also use duty cycle metrics to scale down your gpu workloads to get rid of some of the slack.
Re the overnight that's why some providers are offering there are batch tier jobs that are 50% off which return over up to 12 or 24 hours for non-interactive use cases.
> These two factors mean that GPU utilization comes in at 10-20%.
Why don't these two factors cancel out? Why wouldn't a company building a private GPU cluster for their own use, also sit a workload scheduler (e.g. Slurm) in front of it, enable credit accounting + usage-based-billing on it, and then let validated customer partners of theirs push batch jobs to their cluster — where each such job will receive huge spot resource allocations in what would otherwise be the cluster's low-duty point, to run to completion as quickly as possible?
Just a few such companies (and universities) deciding to rent their excess inference capacity out to local SMEs, would mean that there would then be "on-demand GPUs at this scale." (You'd have to go through a few meetings to get access to it, but no more than is required to e.g. get a mortgage on a house. Certainly nothing as bad as getting VC investment.)
This has always been precisely how the commercial market for HPC compute works: the validated customers of an HPC cluster sending off their flights of independent "wide but short" jobs, that get resource-packed + fair-scheduled between other clients' jobs into a 2D (nodes, time) matrix, with everything getting executed overnight, just a few wide jobs at a time.
So why don't we see a similar commercial "GPU HPC" market?
I can only assume that the companies building such clusters are either:
- investor-funded, and therefore not concerned with dedicating effort to invent ways to minimize the TCO of their GPUs, when they could instead put all their engineering+operational labor into grabbing market share
- bigcorps so big that they have contracts with one big overriding "customer" that can suck up 100% of their spare GPU-hours: their state's military / intelligence apparatus
...or, if not, then it must turn out that these clusters are being 100% utilized by their owners themselves — however unlikely that may seem.
Because if none of these statements are true, then there's just a proverbial $20 bill sitting on the ground here. (And the best kind of $20 bill, too, from a company's perspective: rent extraction.)
That is what I’m doing with my excess compute , excess fabrication , CNC, laser , 3d printing , reflow oven etc capacity in between hardware revs for my main product. I also bill out my trusted sub contractors.
I validate the compute renters because ITAR. Lots of hostile foreign powers trying to access compute .
My main business is ITAR related , so I have incredibly high security in place already.
We are multi tenant from day zero and have slurm etc in place for accounting reasons for federal contracts etc. we actually are spinning up federal contracting as a service and will do a ShowHN when that launches.
Riches in the niches and the business of business :)
> Why wouldn't a company ... let validated customer partners of theirs push batch jobs
A company standing up this infrastructure is presumably not in the business of selling time-shares of infrastructure, they're busy doing AI B2B pet food marketing or whatever. In order to make that sale, someone has to connect their underutilized assets with interested customers, which is outside of their core competency. Who's going to do that?
There's obviously an opportunity here for another company to be a market maker, but that's hard, and is its own speciality.
Yes, "HPC workload-scheduling software with multi-tenant customer usage accounting" does cost a hundred million dollars to develop and takes 5–10 years to build.
But some research labs (Lawrence Livermore National Laboratory, the research arm of HP, and a few others) got together to build it ~2002, and decided to make the results open source.
And that's what SLURM is. No, really.
> Slurm is the workload manager on about 60% of the TOP500 supercomputers.
Desalinated water is also less dense than normal seawater, so the water column inside the output pipe would create a pressure imbalance with the water column outside the pipe, assisting in the outflow? I'm having trouble figuring out how to resolve this seeming perpetual motion machine
The minimum pressure differential needed to perform reverse osmosis is bigger than the pressure differential between a collumn of fresh and a collumn of salt water of equal height.
Not in a static system, but the ocean isn't static - there are currents.
Until the membrane fouled, if you sank a system like this to the bottom, fresh water would naturally spill out at the surface while brine built up around it.
If the brine doesn't flow away (brine is weird like this) then eventually the system hits equilibrium and stops. But if ocean currents (powered by the sun, tectonics etc.) keep removing brine at the bottom...then it can in fact run indefinitely because there is an energy input.
A steady supply of salty water doesn't help enough in the same way a steady supply of warm air cannot cool a house.
The problem with your system is that it you can power an engine with the flow of salt ions and that really isn't the kind of thing you are supposed to be able to do to something that happens spontaneously.
And really water spontaneously desalinating is about as clear a violation of the second law of thermodynamics as you can get. With the scale of latent energies involved it would be like water flowing up a 70 meter wall.
Look maybe I am missing something somewhere that secretly compensates for the apparent decrease in entropy but I am not seeing it. Brine will flow away e entually, the water returns to the ocean and in the meantime you can power your power plant by salinating the water, indefinitely.
You're functionally drawing solar energy off the system very inefficiently (if you wanted kinetic motion).
A different way to look at the problem is that you can't have water spontaneously move up hill, but if you dam a river you can absolutely extract useful energy from it.
A turbine underneath the ocean could extract energy from ocean currents and this is the same problem.
So according to you you could simply lower a membrane and a pipe into a still body of salt water and it would spontaneously separate into sweet water and brine until the build up of brine prevented this from continuing?
Yeah I am still not seeing it. If that were the thermal equilibrium I don't see how it wouldn't separate spontaneously, or why you can mix salt and water with no input of energy whatsoever.
It goes against anything I know about entropy and osmotic pressure.
At this point you are simply arguing reverse osmosis is impossible. There is no functional difference between mechanically creating a pressure differential across the membrane with a pump, and lowering a membrane deep enough that the pressure differential can drive the process.
Back when Claude Code had per-token pricing, almost nobody used it because it was clearly much more expensive than the Cursor pricing - $20 a month flat for Cursor vs $5-10 a day for per-token Claude. The incentives manifested in the way both products used tokens - Claude Code has no particular qualms about submitting a gigantic number of tokens and letting Sonnet figure it all out, whereas Cursor puts in a lot of traditional software engineering to figure out the correct minimal context to put in. Now that Claude Code is on a fixed price plan, it strangely doesn't seem like Anthropic is doing anything to optimize the number of tokens consumed by Claude.
I think it's quite plausible that Anthropic is bleeding out ~100/month on token costs per $20/month user, and even at 80% margin, this is just merely breakeven. Their limited capacity also means that they are _losing_ the opportunity to sell the same capacity at a per-token marginal profit. I think the only plausible endgame here is that Anthropic uses the usage data to RL-finetune Claude Code to the point where it is actually worth a $200/month subscription.
Enjoy the $20/month Claude Pro plan while it lasts; I don't really see it sticking around for more than a year at best.
The Claude Code privacy policy[0] is pretty explicit that by default they train on neither the prompts, usage data, or even explicitly provided feedback data (presumably /bug?) that can be used for other product improvements.
> By default, Anthropic does not train generative models using code or prompts that are sent to Claude Code.
> We aim to be fully transparent about how we use your data. We may use feedback to improve our products and services, but we will not train generative models using your feedback from Claude Code.
[...]
> If you choose to send us feedback about Claude Code, such as transcripts of your usage, Anthropic may use that feedback to debug related issues and improve Claude Code’s functionality (e.g., to reduce the risk of similar bugs occurring in the future). We will not train generative models using this feedback. Given their potentially sensitive nature, we store user feedback transcripts for only 30 days.
For understanding what value they place on that data, they do have a program where you can opt-in to have your data be used for training[1] in exchange for a discount on the API rates.
As a former big tech engineer, I can't help but come up with a gazillion ways to work around these sorts of seemingly straightforward policies.
Here's one way they could get around their own privacy policy: keep track of what % of Claude-generated code is retained in the codebase over time (as an indicator of how high-quality / bug-free the code was); A/B test variations of Claude Code to see which variations have higher retention percentages.
No usage data is retained, no code is retained, no data is used (other than a single floating point number) and yet they get to improve their product atop your usage patterns.
Here's another idea: use a summarization model to transform your session transcript into a set of bits saying "user was satisfied/dissatisfied with this conversation", "user indicated that claude was doing something dangerous", "user indicated that claude was doing something overly complicated / too simple", "user interrupted claude", "user indicated claude should remember something in CLAUDE.md", etc. etc. and then train on these auxiliary signals, without ever seeing the original code or usage data.
I always get a kick out of sheer number of HNers with deep concern about “training on their data” while hacking a crud boot service with nextjs fromt-end :)
Compared to when Claude Code was originally released in late February, its token use is greatly reduced now. Since the late May Claude 4 releases I agree with you; it hasn't decreased much since then.
20? I'd be excited if the 200 a month plan stays after a year. I went with the max reluctantly being the cheapskate I am. There is no way I'm giving that up now. I'm really worried that if they rise the prices il find it extremely hard to not fall for it! Just hoping the open source models catch up by then. Even if they get to CC abilities (of today) that is good enough for me!
I'm a second-gen Korean-American; my korean is weak but conversational. I am intrigued by the reasoning model that analyzes my speech and points out various mistakes I'm making. It's a good first attempt at separating the 2 tracks of actual conversation vs mistake-correcting.
I think showing the raw reasoning text is not quite the right UI; maybe highlighting the specific text in red and showing a suggested correction would work better?
It's also a little awkward that the conversation is live; I don't really have any breathing room to read the reasoning traces on what mistakes I made / could have done better. I hung up the first time I tried to figure out how to pause.