I’m all-in on agents but this is a “you’re holding it wrong” situation.
If you want to give your agents a DB for their own work as a scratchpad or something that’s great. They can not only go to town, but also analyze their own work and iterate on it.
If you are talking about a production base, agents should not be hitting it directly under any circumstances. There needs to be an API layer with defined usage patterns, rate limits, etc.
This is basically the same as saying “databases weren’t designed for interns to run live inline migrations in prod”. Yeah of course they aren’t.
> This is basically the same as saying “databases weren’t designed for interns to run live inline migrations in prod”. Yeah of course they aren’t.
And the same as saying "databases weren't designed for non-technical people to connect with report-building tools like Power BI and Excel and run reports in the middle of peak customer checkouts."
As a DBA, I'm constantly surprised by what people think will be completely harmless to hook up to the database server - and then how much havoc it causes. Gonna be a rough decade.
At this point we should make a GitHub repo with a huge list of unsolved “dry lab” problems and spin up a harness to try and solve them all every new release.
Except that Erdős problems are solved all the time, so many of them are already solved. Quite sure the last time I saw an article about an LLM solving an Erdős problem someone even tracked down a solution published by Erdős himself.
I’m not on either side of the argument, but one popular definition is missing which is “can automate most knowledge work”.
Not that this is my definition or anything, just pointing out that this is the one people actually care about, even if the acronym doesn’t say anything about economics or social change.
Interesting, could you explain it further - I'd like to know what that means - I thought I covered this in definition 8 - but with the very blatant asterisk in the post being - we have it and it works but it doesn't work consistently enough and with acceptable quality in long enough runs or open ended tasks - but I believe we will get closer and closer to this target with improvements in models and scaffolding.
I agree that it seems most likely that things are going to rapidly improve as they have been.
But the reality is that global unemployment levels are as of yet unaffected.
This is clearly the hardest bar to meet, but it’s also the most important.
If AI fully automates 5% of global jobs (or pick your number), I think it would be fair to say that this specific definition of AGI is achieved.
As a SWE, I feel immensely augmented by AI. But you can’t yet fully deploy an AI to do a job end to end without any human involvement.
I use like ~1b input tokens per week on codex or something like that, and while it does an insane amount of work, you have to have a skilled hand guide it.
This might not be the case for long, but it’s not here yet (in that narrow definition at least).
Strange, I had the same thought about doing this exact exercise this weekend.
I think the overall percentage is the wrong approach here.
It’s easy to say a lot of things that are factually true or predictions that are inevitably true.
However the more salient point with Gary Marcus is the one unforgivable thing he was wrong about and continues to double down on which is that deep learning is hitting a wall.
Starting in early 2022 and going through today, there is still so much low hanging fruit with deep learning.
Today’s LLM progress is mostly being made in RL. But world models are also still so early and they’re deep learning all the way down.
It would be nice if he would just admit he was wrong.
Depends on how you look at it. In terms of overcoming fundamental limitations, I would argue it has indeed hit a wall. ChatGPT is how old, but LLMs still can't actually count?
But then, to your point, what does it matter, if they're still as useful as they are? Even at this stage, Claude Code makes Jira halfway bearable.
Of course, we have to consider the devil's advocate as well. Most CEOs don't seem to be reporting great ROI on their "AI" investments.
I'll add one more point. If you scroll through his Substack, a lot of his posts are incredibly negative and unproductive. I was (and continue to be) someone who cares deeply about responsible AI... But there's a difference between working on AI responsibly or pushing the debate, versus simply criticizing everything that is done as folly, useless, crap, etc.