Hacker Newsnew | past | comments | ask | show | jobs | submit | commandar's commentslogin

In particular, the API pricing for GPT-5.2 Pro has me wondering what on earth the possible market for that model is beyond getting to claim a couple of percent higher benchmark performance in press releases.

>Input:

>$21.00 / 1M tokens

>Output:

>$168.00 / 1M tokens

That's the most "don't use this" pricing I've seen on a model.

https://openai.com/api/pricing/


Last year o3 high did 88% on ARC-AGI 1 at more than $4,000/task. This model at its X high configuration scores 90.5% at just $11,64 per task.

General intelligence has ridiculously gotten less expensive. I don't know if it's because of compute and energy abundance,or attention mechanisms improving in efficiency or both but we have to acknowledge the bigger picture and relative prices.


Sure, but the reason I'm confused by the pricing is that the pricing doesn't exist in a vacuum.

Pro barely performs better than Thinking in OpenAI's published numbers, but comes at ~10x the price with an explicit disclaimer that it's slow on the order of minutes.

If the published performance numbers are accurate, it seems like it'd be incredibly difficult to justify the premium.

At least on the surface level, it looks like it exists mostly to juice benchmark claims.


It could be using the same early trick of Grok (at least in the earlier versions) that they boot 10 agents who work on the problem in parallel and then get a consensus on the answer. This would explain the price and the latency.

Essentially a newbie trick that works really well but not efficient, but still looking like it's amazing breakthrough.

(if someone knows the actual implementation I'm curious)


The magic number appears to be 12 in case of GPT 5.2 pro.

Those prices seem geared toward people who are completely price insensitive, who just want "the best" at any cost. If the margins on that premium model are as high as they should be, it's a smart business move to give them what they want.

gpt-4-32k pricing was originally $60.00 / $120.00.

Pro solves many problems for me on first try that the other 5.1 models are unable to after many iterations. I don't pay API pricing but if I could afford it I would in some cases for the much higher context window it affords when a problem calls for it. I'd rather spend some tens of dollars to solve a problem than grind at it for hours.

Less an issue if your company is paying

Even less an issue when OpenAI provides you free credits

Someone on Reddit reported that they were charged $17 for one prompt on 5-pro. Which suggests around 125000 reasoning tokens.

Makes me feel guilty for spamming pro with any random question I have multiple times a day.


Spinning rust still typically holds advantages for archival storage, as well.

There was literally a headline on the front page here a few days ago re: data degradation of SSDs during cold storage, as one example.


The article in question: "Unpowered SSDs slowly lose data" https://news.ycombinator.com/item?id=46038099


I missed that earlier post to ask a question that always bugs me... do SSDs, when powered on, actually "patrol" their storage and rewrite cells that are fading even when quiescent from the host perspective?

Or does the data decay there as well, just as a function of time since cells were written?

In other words, is this whole focus on "powered off" just a proxy for "written once" versus "live data with presumed turnover"? Or do the cells really age more rapidly without power?


My understanding based on my readings of the previous post is there are no hardware level checks. SSDs need to be power cycled every so often and the integrity of the filesystem needs to be checked via something akin to zfs scrub. This should bs done on a monthly basis at minimum.

If you are paranoid about your data and not relying on filesystem level checks from ZFS or Btrfs you should ptobably avoid SSDs for long term storage.


>My understanding based on my readings of the previous post is there are no hardware level checks

There are "hardware level checks", it's just that they might assume regular usage. If your SSD is turned on regularly (eg. a few hours a day at least), your files are probably fine, even if you never read/scrub your rarely read files. If it is infrequently used, you're right that you probably have to do an end-to-end scan to make sure everything gets checked and possibly re-written.


Is the same true for USB flash drives? Do they rely on the OS to scrub/refresh them?


"probably" does the heavy lifting here.


I mean, obviously? SSDs and HDDs randomly fail for all sorts of reasons beyond random bitflips, so properly working ECC isn't enough to guarantee your files are "fine". Even if you're using something like ZFS, it's possible for the one of the underlying drive to experience ECC errors, and have another drive fail before that can caught. If your parity factor is 1 or less (eg. RAIDz1), you'll also experience data loss.


I was researching that topic a little bit a while ago but with no usable outcome. The aim was to find out how to cope with SSDs as backups. Is it enough to plug them into a power connection once in a while so the firmware starts the refresh cycle? Do I need to do something else? How often does it need to be plugged in? Thankful for any pointers ...


> Data degradation of SSDs during cold storage

Why is that? I'd have expected solid-state electronics to last longer at low temperatures.

Or is it precisely that, some near/superconductivity effects causing naughty electrons to escape and wander about?


They mean powered off, not physically cold. Electrons escape the NAND flash over time and if the device is not active it's not refreshing them.


It's not super conductivity but instead quantum mechanics.

High capacity hard drives nowadays use heat and strong magnetic fields to write patterns into the platter. It's pretty stable just sitting around doing nothing.

High density multi level NAND involves quantum tunneling a few electrons using a strong electric field into an electrically insulated bit of semiconductor. Over some time the electrons tunnel their way out, but usually this only ends up actually happening if too much writing damaged the insulation.


Oh, so would actual cold (low temperatures) prevent/reduce that phenomenon?


Yes


By cold, I think they mean "powered off", not "low temperature".


>It also eats vertical space with those huge icons, which is precious given the current popular screen dimensions.

This is one Windows gets wrong too, in fairness.

W11 killed the ability to move the taskbar to the left/right side of the screen. That's a real pain on ultrawide displays where you have absolutely miles of horizontal real estate and relatively limited vertical space.

Fortunately, ExplorerPatcher exists and can restore the functionality.

https://github.com/valinet/ExplorerPatcher/


>I think people who do this think that people who use Windows perceive that the Mac experience is smoother, and may have some sort of Mac envy.

There's an irony in this due to this:

>b) WSL-based, VSCode-using devs who are one step away from just using Linux. These are the folk who fifteen years ago would have been using what was then still OSX. But these folk don't use Windows as Windows: they use it as a semi-Unix.

The people still doing the "hurr durr wind0ze suxx" routine are the ones stuck 15 years in the past. Modern Windows is an entirely different and vastly more capable beast and it still runs huge swathes of the enterprise world.

The best technologists I know don't really care all that much which desktop platform you stick them on anymore since most of what they really need is either available everywhere or running on a backend that isn't their desktop anyway.


> Modern Windows is an entirely different and vastly more capable beast and it still runs huge swathes of the enterprise world.

Okay, but what does it actually do that Linux doesn't? What's the selling point, why should I make the switch from Linux to Windows?


For one, it can run Raycast. There are launchers on Linux that implement a small fraction of Raycast's functionality, but entire categories of abilities are only possible in Raycast, like CRUD operations on Jira tickets, using AI to interact with your Notion workspace without having to pay Notion 20 USD per month, and directly interacting with other remote APIs with just a few keystrokes.


Okay, but why would I want that?

I don't want any AI at all, and I have turned down incredibly well-paid jobs because it would involve using Jira.


I'm firmly of the opinion that, as a general rule, if you're directly embedding the output of a model into a workflow and you're not one of a handful of very big players, you're probably doing it wrong.[1]

If we overlook that non-determinism isn't really compatible with a lot of business processes and assume you can make the model spit out exactly what you need, you can't get around the fact that an LLM is going to be a slower and more expensive way of getting the data you need in most cases.

LLMs are fantastic for building things. Use them to build quickly and pivot where needed and then deploy traditional architecture for actually running the workloads. If your production pipeline includes an LLM somewhere in the flow, you need to really, seriously slow down and consider whether that's actually the move that makes sense.

[1] - There are exceptions. There are always exceptions. It's a general rule not a law of physics.


>(Healthcare in the US is nearly entirely Windows based).

This wasn't my experience in over a decade in the industry.

It's Windows dominant, but our environment was typically around a 70/30 split of Windows/Linux servers.

Cerner shops in particular are going to have a larger Linux footprint. Radiology, biomed, interface engines, and med records also tended to have quite a bit of nix infrastructure.

One thing that can be said is that containerization has basically zero penetration with any vendors in the space. Pretty much everyone is still doing a pets over cattle model in the industry.


>Then compare this to something like a Kei truck and it's really quite pathetic.

I will forever be sad that Canoo was wildly (possibly fraudulently) mismanaged and went bust before they ever built any of their planned pickup trucks:

https://cars.usnews.com/cars-trucks/features/canoo-pickup-tr...

They were going to be built on the same platform as their vans and the best way to describe them is "Kei truck upsized and uppowered enough to be safe on US roads." They had neat party tricks like a compact bed for daily driving that could expand out to fit full size ply and fold out workbenches on all four sides of the truck.

I'm not even a truck guy and I desperately wanted one of these things. Just such a cool concept.


> They had neat party tricks like a compact bed for daily driving that could expand out to fit full size ply...

Unless I'm missing something this sounds like the bed extenders which I've seen on lots of trucks that allow the tailgate to be used as part of the bed when folded down. I was initially think they might be allowing the passenger compartment to be opened up to temporarily get full bed size but I didn't see anything like that when browsing the page. The closest thing I ever saw to that was on the Subaru Baja (which was far more a sedan than a truck) and given how short the bed was and the the fact that the back window was immobile seemed like it had less hauling utility than a standard hatchback.


It was a built-in expander. The bed was 4x6 in standard configuration and could pull out another 2 feet to get 4x8 when you needed it.


Thset is really sad - I'd seen a review of an early model on YouTube and it seemed like a brilliant idea - really hope someone else can make something similar work.


Anthropic, frankly, needs to in ways the other big names don't.

It gets lost on people in techcentric fields because Claude's at the forefront of things we care about, but Anthropic is basically unknown among the wider populace.

Last I'd looked a few months ago, Anthropic's brand awareness was in the middle single digits; OpenAI/ChatGPT was somewhere around 80% for comparison. MS/Copilot and Gemini were somewhere between the two but closer to Open AI than Anthropic.

tl;dr - Anthropic has a lot more to gain from awareness campaigns than the other major model providers do.


Anthropic feels like a one trick pony as most users dont need or want anthropic products.

However, I speak with a small subset of our most experienced engineers and they all love Claude Sonnet 4.5. Who knows if this lead will last.


Claude is ChatGPT done right. It's just better under any metric.

Of course OpenAI has tons of money and can branch off in all kind of directions (image, video, n8n clone, now RAG as a service).

In the end I think they will all be good enough and both Anthropic and OpenAI lead will evaporate.

Google will be left to win because they already have all the customers with the GSuite and OpenAI will be incorporated at massive loss in Microsoft, which is already selling to all the Azure customers.


Anthropic are mostly selling, and having most success, with business customers (incl. selling API access for Claude Code).

This is the reason they haven't bothered to provide an image generator yet - because Chat users are not their focus.


Lately ClaudeAI switched over to ASCII art when doing explanations....


>Anthropic feels like a one trick pony as most users dont need or want anthropic products.

I don't see what the basis for this is that wouldn't be equally true for OpenAI.

Anthropic's edge is that they very arguably have some of the best technology available right now, despite operating at a fraction of the scale of their direct competitors. They have to start building mind and marketshare if they're going to hold that position, though, which is the point of advertising.


Tangentially related, one of the more interesting projects I've seen in the 3D printing space recently is Prunt. It's a printer control board and firmware, with the latter being developed in Ada.

https://prunt3d.com/

https://github.com/Prunt3D/prunt

It's kind of an esoteric choice, but struck me as "ya know, that's really not a bad fit in concept."


I wrote about some of the reason for choosing it here: https://news.ycombinator.com/item?id=42319962


>Even with procedural and parametric modeling in Blender, you will always encounter issues with approximation and floating point precision, which are inherent to the data representation.

A common problem people run into with CAD models is importing a STEP file and modeling directly off of geometry in it. They later find out that some face they used as a reference was read by the CAD package as 89.99999994 degrees to another, and discover it's thrown the geometry of everything else in their model subtly off when things aren't lining up the way they should.

And that's with a file that has solid body representation! It's an entire new level of nightmare when you throw meshes into the mix.

The heart of any real CAD package is a geometry kernel[1]. There are really only a handful of them out there; Parasolid is used by a ton of 'big name' packages, for example. This is what takes a series of descriptions of geometry and turns it into clear, repeatable geometry. The power of this isn't just where geometry and dimensions are known. It's when the geometry and dimensions are critical to the function of whatever's being modeled. It's the very core of what these things do. Mesh modeling is fantastic for a lot of things, but it's a very different approach to creating geometry and just isn't a great fit for things like mechanical engineering.

1 - https://en.wikipedia.org/wiki/Geometric_modeling_kernel


> The power of this isn't just where geometry and dimensions are known. It's when the geometry and dimensions are critical to the function of whatever's being modeled.

Yes, but I meant making a case for workflow differences.

CAD is bad at aiding visual thinking and exploration, since you kinda have to be precise and constrain everything. You can pump out a rough idea of an object, edit it much, so much faster in Blender.

Sketching on paper, or visualizing in one’s mind, is pretty hard for most people when it comes to 3D. CAD is not at all inviting for creative impulses and flow. People who can do this in CAD are probably trained engineers who learned a very discipled, analytical way to approach problems, people who think in technical drawings.

So, CAD is good at getting a precise and workable digital representation of a "pre-designed" object for further (digital) processing, analysis, assembly and production. I think Blender is better at the early design process, figuring out shapes and relations.


I don't entirely agree there.

In a vacuum for a standalone object, a 3D mesh app like Blender can be useful for brainstorming.

Most of my CAD usage is designing parts that have to fit together with other things. The fixed elements drive the rest of the design. A lot of the work is figuring out "how do I make these two things fit together and be able to move in the ways they need to."

There is still a lot of room for creativity. My workflow is basically "get the basic functionality down as big square blocks, then keep cutting away and refining until you have something that looks like a real product." My designs very rarely end up looking like what they started out as. But the process of getting them down in CAD is exactly what lets me figure out what's actually going to work.

It's a very different workflow, and it's definitely not freeform in the same way as a traditional mesh modeling app, but CAD is for when you have to have those constraints. You can always (and it's not an uncommon pattern) go back and use a mesh modeler to build the industrial design side of things on top once the mechanical modeling is done.

ETA:

I'd also add: I'm not sure "thinking in CAD" comes naturally to anyone; it's a skillset that has to be built.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: