Yes Sora hallucinates significantly more than Claude.
I find that Codex generally requires me to remove code to get to what I want, whereas Claude I tend to use what it gives me and I add to it. Whether this is from additional prompting or from manual typing, i just find that codex requires removal to get to desired state, and Claude requires adding to get to desired state. I prefer adding incrementally than removing.
The last time I used them both side by side was a month ago, so unless its significantly improved in the past month, I am genuinely surprised that someone is making the argument that Codex is competitive with ClaudeCode, let alone it somehow being superior.
ClaudeCode is used by me almost daily, and it continues to blow me away. I don't use Codex often because every time I have used it, the output is next to worthless and generally invalid. Even if it does get me what I eventually want, it will take much more prompting for me to get the functioning result. ClaudeCode on the other hand gets me good code from the initial prompt. I'm continually surprised at exactly how little prompting it requires. I have given it challenges with very vague prompts where it really exceeds my expectations.
I think the enthusiasm for Codex coincided with the extended period of degraded quality CC was experiencing around a couple of months ago? During that time I cancelled my Claude sub and tried out Codex, which by comparison was feeling significantly better. I haven't tried them out side by side since Claude has been de-borked but even if Codex is objectively poorer I could believe that flattering comparison has stuck for people who switched?
Yeah I think the argument is the tooling vs agent. Maybe the OpenAI agent is performing better now, but the tooling is significantly better from anthropic.
The anthropic (ClaudeCode) tooling is best-in-class to me. You listed many features that I have become so reliant on now, that I consider them the Ante that other competitors need to even be considered.
I have been very impressed with the Anthropic agent for code generation and review. I have found the OpenAI agent to be significantly lacking by comparison. But to be fair, the last time I used OpenAI's agent for code was about a month ago, so maybe it has improved recently (not at all unreasonable in this space). But at least a month ago when using them side-by-side the codex CLI was VERY basic compared to the wealth of features and UI in the ClaudeCode CLI. The agents for Claude were also so much better than OpenAI, that it wasn't even close. OpenAI has always delivered me improper code (non-working or invalid) at a very high rate, whereas Claude is generally valid code, the debate is just whether it is the desired way to build something.
This is very true. ChatGPT has a very generous free tier. I used to pay for it, but realized I was never really hitting the limits of what is needed to pay for it.
However, at the same time, I was using Claude much less, really preferring the answers from it most of the time, and constantly being hit with limits. So guess what I did. I cancelled my OpenAI subscription and moved to Anthropic. Not only do i get Claude Code, which OpenAI really has no serious competitor for.
I still use both models but never run into problems with OpenAI, so i see no reason to pay for it.
In a recent episode of Hard Fork podcast, the hosts discussed an on-the-record conversation they had with Sam Altman from OpenAI. They asked him about profitability and he claimed that they are losing money mostly because of the cost of training. But as the model advances, they will train less and less. Once you take training out of the equation he claimed they were profitable based on the cost of serving the trained foundation models to users at current prices.
Now, when he said that, his CFO corrected him and said they aren't profitable, but said "it's close".
Take that with a grain of salt, but thats a conversation from one of the big AI companies that is only a few weeks old. I suspect that it is pretty accurate that pricing is currently reasonable if you ignore training. But training is very expensive and the reason most AI companies are losing money right now.
Unfortunately for those companies, their APIs are a commodity, and are very fungible. So they'll need to keep training or be replaced with whichever competitor will. This is an exercise in attrition.
I wonder if we’re reaching a point of diminishing returns with training, at least, just by scaling the data set. I mean, there’s a finite amount of information (that can be obtained reasonably) to be trained on. I think we’re already at a sizable chunk of that, not to mention the cost of naively scaling up. My guess is that the ultimate winner will be the one that figures out how to improve without massive training costs, through better algorithms, or maybe even just better hardware (i.e. neuristors). I mean, we know that at worst case, we should be able to build something with human level intelligence that takes about 20 watts to run, and is about the size of a human head, and you only need to ingest a small slice of all available information to do that. And training should only use about 3.5 MWh, total, and can be done with the same hardware that runs the model.
> But as the model advances, they will train less and less.
They sure have a lot of training to do between now and whenever that happens. Rolling back from 5 to whatever was before it is their own admission of this fact.
I think that actually proves the opposite. People wanted an old model, not a new one, indicating that for that user base they could have just... not trained a new model.
That is for a very specific class of usecases. If they would turn up the sycophancy on the new model, those people would not call for the old onee.
The reasoning here is off. It is like saying new game development is nearly over as some people keep playing old games.
My feeling: we've yet barely scrarched the surface on the milage we can get out of even today's frontier models, but we are just at the beginning of a huge runway for improved models and architectures. Watch this space.
which is completely "normal" at this point, """right"""? if you have billions of VC money chasing returns there's no time to sit around, it's all in, the hype train doesn't wait for bootstrapping profitability. and of course with these gargantuan valuations and mandatory YoY growth numbers, there is no way they are not fucking with the unit economy numbers too. (biases are hard to beat, especially if there's not much conscious effort to do so.)
Does the cost of good come down 10x or not? For say Uber it didn’t, so we went from great $6 VC funded product to mediocre $24 ride product we have today. I’m not sure I’m going to use Copilot at $1 per request. Or even $0.25. Starts to approach overseas consultant in price and ability.
well, Uber always faced the obvious problem of scaling (even after level 42 self-driving, because it's not possible to serve local demand with global supply, plus all the regulatory compliance issues - which they initially "conveniently" sidestepped by being bold/criminal, but cities are not going to play dumb forever)
of course these chat-AIs also started by "well maybe it's fair use", but at least the scaling problem seems easier than for taxi services
I think this is the world we are going to. I'm not going to get mired in the details of how it would happen, but I see this end result as inevitable (and we are already moving that way).
I expect a lot more paywalls for valuable content. General information is commoditized and offered in aggregated form through models. But when an AI is fetching information for you from a website, the publisher is still paying the cost of producing that content and hosting that content. The AI models are increasing the cost of hosting the content and then they are also removing the value of producing the content since you are just essentially offering value to the AI model. The user never sees your site.
I know Ads are unpopular here, but the truth is that is how publishers were compensated for your attention. When an AI model views the information that a publisher produces, then modifies it from its published form, and removes all ad content. Then you now have increased costs for producers, reduced compensation in producing content (since they are not getting ad traffic), and the content isn't even delivered in the original form.
The end result is that publishers now have to paywall their content.
Maybe an interesting middle-ground is if the AI Model companies compensated for content that they access similar to how Spotify compensates for plays of music. So if an AI model uses information from your site, they pay that publisher a fraction of a cent. People pay the AI models, and the AI models distribute that to the producers of content that feed and add value to the models.
There is a video of it floating around for the morbidly curious. I won't link it here. It is very NSFL. I was accidently shown it while scrolling instagram and wish I hadn't seen it.
He is able to talk, you can make out his words, but he is clearly choking or being strangled. He was fully sucked into the machine. There was a very strong guy trying with everything to pull him out. He made some pretty sad and harrowing words when he realized he wasn't going to make it. Again, the video is out there if you really want to see it. I do NOT recommend it though.
Apparently, oxygenated hemoglobin and blood plasma are diamagnetic, while deoxygenated hemoglobin is paramagnetic. Meaning, magnetic properties are determined by the molecules, not its elements. I assume that whatever attraction or repulsion caused by even the MRI magnets are weak compared to the forces involved in Brownian motion. So don't expect anything substantial.
This reminds me of something I've always thought that Toph – spoiler about Avatar: The Last Airbender – should also be able to blood bend since she created metal bending and the blood is full of iron.
There is a scene in one of the X-men movies where Magneto escapes a completely non-metallic prison by extracting iron from a guard's body. I initially thought that Raven had injected him the previous day with something to increase his iron content. But realized later that she had injected metallic iron as a suspension.
That aside, you don't need ferromagnetic substances for it be manipulated by magnetic fields. Anything conducting can be moved around by fluctuating magnetic fields. Even non-conducting paramagnetic or diamagnetic substance will eventually respond to very high strength magnetic fields - just not at the 'feeble' strengths of an MRI machine's superconducting magnets. Here is something I collected previously on the same fun topic: https://phanpy.social/#/fosstodon.org/s/111504060685437481?v...
Why were you downvoted? I was going to read more on it later. So far, I know that Iron III oxide is magnetic. I don't know anything about the other oxides, ions in other oxidation states or iron in other compounds. I wish the people explained the reason why they downvoted.
FYI: MRI magnetic fields are also incredibly predictable/uniform. Very interesting tech, dumping a 100-200 watts of RF energy into somebody, and listening to the results. Then somehow turning that into a 3d spacial image. Truly makes CT scanning look like easy mode.
I've seen a lot of gruesome stuff so I'm not bothered by that, but curious how someone got a camera, presumably with ferrous parts, in there without it also getting pulled into the magnet.
Phones now days don't have a lot of ferrous stuff in them they are pretty much all battery, copper, silicon, glass, plastic and maybe aluminum. Your keys probably have more steel on them than your phone.
People have gone in MRIs with phones with no adverse effects, except maybe damaged speakers. It's more likely that the MRI is going to damage the electronics than it will physically rip it off you.
It's all about the amount of ferrous material involved. It can take your keys of your pocket, but I doubt you can't peel them of it.
The "bailout" for consumers is that they lower interest to 0%. That's what we did in 2007. If people can refinance their homes from 7% to ~2% then they save a fortune and it spurs buyers back into the market and current homeowners to move around and shuffle inventory.
Of course the Parent comment is also correct because banks get bailed out by low interest rates, but the government also bailed out several banks directly. Corporate bailouts are always a debatable topic. In one way we should let bad businesses fail, they failed because of the risks and choices they made and bailing them out is just inviting those mistakes to happen again. But on the flip side, consumers do need banks (as much as we refuse or hate to admit it). Yes banks make money off of us, but we as consumers also need banks. Which is why bailouts get approved.
We have seen this movie before. I'm not sure why everyone is debating the ending. We watched and lived the ending. It wasn't pretty in the middle there, but the market eventually recovered. Here we are getting ready to rewind and watch the movie again.
I'm strongly opposed to the idea of citizens using government-run banks accounts. The Soviets did this and proved how easy it is for the government to control what people buy, down to the individual level a la social credit scores.
Even if the government today were to wield that power safely its simply too much risk to load that gun and hand it to all future administrations.
It is for testing python projects that connect to postgres.
So often what you do is when unit testing, you need to test CRUD-type operations between the application and the database. So you generally have to spin up a temporary database just to delete it at the end of the tests. Commonly this is done with SQLite during testing, even if you built the application for postgres in production. Because it is fast and easy and you can interact with it as a file, not requiring connection configurations and added bloat.
But then sometimes your app gets complicated enough that you are using Postgres features that SQLite doesn't have comparables to. So now you need a PG instance just for testing, which is a headache.
So this project bridges the gap. Giving you the feature completedness and consistency across envs of using postgres, but with the conveniences of using SQLite.
I find that Codex generally requires me to remove code to get to what I want, whereas Claude I tend to use what it gives me and I add to it. Whether this is from additional prompting or from manual typing, i just find that codex requires removal to get to desired state, and Claude requires adding to get to desired state. I prefer adding incrementally than removing.