Politics is accruing and deploying political capital within an organisation - or less abstractly, building relationships and using them.
What you’re describing is a particular form of manipulative and divisive politics which is performed by insecure, desperate or selfish people.
Many engineers are not good at building relationships (the job of coding isn’t optimal for it after all), so painting the people who are good at is as narcissistic may be comforting but isn’t correct.
If LLMs stopped improving today I’m sure you would be correct- as it is I think it’s very hard to predict what the future holds and where the advancements take us.
I don’t see a particularly good reason why LLMs wouldn’t be able to do most programming tasks, with the limitation being our ability to specify the problem sufficiently well.
I feel like we’ve been hearing this for 4 years now. The improvements to programming (IME) haven’t come from improved models, they’ve come from agents, tooling, and environment integrations.
> I feel like we’ve been hearing this for 4 years now.
I feel we were hearing very similar claims 40 years ago, about how the next version of "Fourth Generation Languages" were going to enable business people and managers to write their own software without needing pesky programmers to do it for them. They'll "just" need to learn how to specify the problem sufficiently well.
(Where "just" is used in it's "I don't understand the problem well enough to know how complicated or difficult what I'm about to say next is" sense. "Just stop buying cigarettes, smoker!", "Just eat less and exercise more, fat person!", "Just get a better paying job, poor person!", "Just cheer up, depressed person!")
Both is true, models have also been significantly improved in the last year alone, let's not even talk about 4 years ago. Agents, tooling and other sugar on top is just that - enabling more efficient and creative usage, but let's not undermine how much better models today are compared to what was available in the past.
The code that's generated when given a long leash is still crap. But damned if I didn't use a JIRA mcp and a gitlab mcp, and just have the corporate AI just "do" a couple of well defined and well scoped tickets, including interacting with JIRA to get the ticket contents, update its progress, push to gitlab, and open an MR. Then, the corporate CodeRabbit does a first pass code review against the code so any glaring errors are stomped out before a human can review it. What's more scary though is that the JIRA tickets were created by a design doc that was half AI generated in the first place. The human proposed something, the AI asked clarifying questions, then broke the project down into milestones and then tickets, and then created the epic and issues on JIRA. One of my tradie friends taking an HVAC class tells me that there are a couple of programmers in his class looking to switch careers. I don't know what the future brings, but those programmers (sorry, "software developers") may have the right idea.
Yes we get it, there is a ton of "work" being done in corporate environments, in which the slop that generative AI churns out is similar to the slop that humans churn out. Congrats.
How do you judge model improvements vs tooling improvements?
If not working at one of the big players or running your own, it appears that even the APIs these days are wrapped in layers of tooling and abstracting raw model access more than ever.
> even the APIs these days are wrapped in layers of tooling and abstracting raw model access more than ever.
No, the APIs for these models haven't really changed all that much since 2023. The de facto standard for the field is still the chat completions API that was released in early 2023. It is almost entirely model improvements, not tooling improvements that are driving things forward. Tooling improvements are basically entirely dependent on model improvements (if you were to stick GPT-4, Sonnet 3.5, or any other pre-2025 model in today's tooling, things would suck horribly).
Improved tooling/agent scaffolds, whatever, are symptoms of improved model capabilities, not the cause of better capabilities. You put a 2023-era model such as GPT-4 or even e.g. a 2024-era model such as Sonnet 3.5 in today's tooling and they would crash and burn.
The scaffolding and tooling for these models have been tried ever since GPT-3 came out in 2020 in different forms and prototypes. The only reason they're taking off in 2025 is that models are finally capable enough to use them.
Yet when you compare the same model in 2 different agents you can easily see capability differences. But cross (same tier) model in the same agent is much less stark.
My personal opinion is that there was a threshold earlier this year where the models got basically competent enough to be used for serious programming work. But all the major on the ground improvements since then has gone from the agents, and not all agents are equal, while all sota models are effectively.
> Yet when you compare the same model in 2 different agents you can easily see capability differences.
Yes definitely. But this is to be expected. Heck take the same person and put them in two different environments and they'll have very different performance!
> But cross (same tier) model in the same agent is much less stark.
Unclear what you mean by this. I do agree that the big three companies (OpenAI, Anthropic, Google DeepMind) are all more or less neck and neck in SOTA models, but every new generation has been a leap. They just keep leaping over each other.
If you compare e.g. Opus 4.1 and Opus 4.5 in the same agent harness, Opus 4.5 is way better. If you compare Gemini 3 Pro and Gemini 2.5 Pro in the same agent harness, Gemini 3 is way better. I don't do much coding or benchmarking with OpenAI's family of models, but anecdotally have heard the same thing going from GPT-5 to GPT-5.2.
The on the ground improvements have been coming primarily from model improvements, not harness improvements (the latter is unlocked by the former). Again, it's not that there were breakthroughs in agent frameworks that happened; all the ideas we're seeing now have all been tried before. Models simply weren't capable enough to actually use them. It's just that more and more (pre-tried!) frameworks are starting to make sense now. Indeed, there are certain frameworks and workflows that simply did not make sense with Q2-Q3 2025 models that now make sense with Q4 2025 models.
I actually have spent a lot of time doing comparisons between the 4.1 and 4.5 Claude models (and lately the 5.1->5.2 chatgpt models) and for many many tasks there is not significant improvement.
All things being equal I agree that the models are improving, but for many of the tasks I’m testing what has the most improvement is the agent. The agents choosing the appropriate model for the task for instance has been huge.
I do believe there is beneficial symbiosis but for my results the agent's provide much bigger variance than the model.
LLM capability improvement is hitting a plateau with recent advancements mostly relying on accessing context locally (RAG), or remotely (MCP), with a lot of extra tokens (read: drinking water and energy), being spent prompting models for "reasoning". Foundation-wise, observed improvements are incremental, not exponential.
> able to do most programming tasks, with the limitation being our ability to specify the problem sufficiently well
We've spent 80 years trying to figure that out. I'm not sure why anyone would think we're going to crack this one anytime in the next few years.
> Foundation-wise, observed improvements are incremental, not exponential.
Incremental gains are fine. I suspect capability of models scales roughly as the logarithm of their training effort.
> (read: drinking water and energy)
Water is not much of a concern in most of the world. And you can cool without using water, if you need to. (And it doesn't have to be drinking water anyway.)
Yes, energy is a limiting factor. But the big sink is in training. And we are still getting more energy efficient. At least to reach any given capability level; of course in total we will be spending more and more energy to reach ever higher levels.
Incremental gains in output seem to - so far - require exponential gains in input. This is not fine.
Water is a concern in huge parts of the World, as is energy consumption.
And if the big sink is “just” in training, why is there so much money being thrown at inference capacity?
I thought it was mad when I read that Bitcoin uses more energy than the country of Austria, but knowing AI inference using more energy than all the homes in the USA is so, so, so much worse given the quality of the outputs are so mediocre.
It seems to me that MCP and Skills are solving 2 different problems and provide solutions that compliment each other quite nicely.
MCP is about integration of external systems and services. Skills are about context management - providing context on demand.
As Simon mentions, one issue with MCP is token use. Skills seem like a straightforward way to manage that problem: just put the MCP tools list inside a skill where they use no tokens until required.
Just consider what it fundamentally is: a company at the leading edge of a product category that has found absurdly strong technology/use-case fit, and is growing insanely fast.
Looking for a moat in the technology is always a bit of a trap - it’s in the traction, the brand awareness, the user data etc.
> Looking for a moat in the technology is always a bit of a trap - it’s in the traction, the brand awareness, the user data etc.
Traction, brand awareness, and user data do not favor Windsurf over GitHub Copilot. The few of us who follow all the new developments are aware that Windsurf has been roughly leading the pack in terms of capabilities, but do not underestimate the power of being bundled into both VS Code and GitHub by default. Everyone else is an upstart by comparison and needs some form of edge to make up for it, and without a moat it will be very hard for them to maintain their edge long enough to beat GitHub's dominance.
Definitely take that point. But this valuation is perhaps more about how much that traction, brand and data is worth to OpenAI, who cannot buy Copilot. $3bn doesn’t seem so disproportionate in that context especially given the amount of money being attracted to the space.
Define losing? My company pays for Copilot but not for Cursor, and it's not at all clear to me that we're the exception rather than the norm. What numbers and data are you working with?
That's not actually how unseating an incumbent works. The incumbent can adapt to the threat for quite a while if they act on it, they just have to not be Blockbuster. Copilot is showing every sign of making up ground feature-wise, which is bad news for the runners up.
Incumbent advantage of being in VS Code already? Thing is, Cursor is basically just VS Code, there's hardly any barrier to switching, so it's quite a weak advantage.
In brand velocity maybe, but copilot is rapidly reaching feature parity with cursor and will invariably overtake it—while costing less to users.
Same with Google vs OpenAI. I tend to agree with the sentiment that I most frequently hear which is that OpenAI is the currently popular brand, but that can only carry them so far against what will eventually be a better offering for cheaper.
Yeah it's very interesting... It appears to lead itself astray: the way it looks at several situational characteristics, gives each a "throw-away" example, only to then mushing all those examples together to make a joke seems to be it's downfall in this particular case.
Also I can't help but think that if it had written out a few example jokes about animals rather than simply "thinking" about jokes, it might have come up with something better
It’s about demand isn’t it? TSMC have red hot demand, it’s not hard to understand their urgency in setting up new fabs, wherever they may be. Intel don’t have the same incentive - their incentive is to take the money (because, why wouldn’t you), build newer fabs and hope for some breakthrough in demand. The urgency is not there: being complete before there is demand could be detrimental
Yes. There used to be a saying the most expensive Fab ( or factory ) isn't the most advance Fab, but an empty Fab.
You cant built without first ensuring you can fill it, you cant fill it without first ensuring you can deliver. And Intel has failed to deliver twice with their custom foundry. Both times with Nokia and Ericsson. How the two fall for it twice is completely beyond me, but then Intel are known to have very good sales teams.
Intel will need another Apple moment that has huge demand, little margin, but willing to pay up front. On the assumption that Intel is even price competitive. The Apple modem may be it. But given the current situation with Intel as they want to lower Capital spending I am not even sure if betting on Intel is a risk Apple is willing to make. Comparing to a stable consistent relationship with TSMC.
Then Intel is going to have to wait for a very long time. At best, China is currently in a scenario similar to Japan's lost decade of 30 years or US's Great Depression. At worst, China's current deflation + massive debt seems eerily similar to Weimar Germany's early internal devaluation. China is pretty fucked.
US fully recovered from Great Depression in 1939, 2 years before entering ww2. Weimar Germany started in 1918 and ended in 1933 at the beginning of nazi Germany, 15 years later.
You can't start a war when you are truly broke, much like China is today. And China is aging super fast, unlike Germany or US during the 30s.
Being in spiraling deflation while the rest of the world suffers from inflation is a big sign of being broke.
Having debt to GDP ratio of 310% and local governments being unable to pay out salaries for many months is a big sign of being broke. (google or chatgpt the salary news, they are everywhere)
Consumer spending dropping 20% y/y in November in Beijing and Shanghai is a sign of being broke.
52,000 EV-related companies shut down in China in 2023 and an increase of 90% on the year before, where most EV companies were the targets of government subsidies, is a sign of being broke.
30% drop in revenues from land sales in 2024, which the local government derive most of its revenue on, is a sign of being broke.
China is not self sufficient; it imports 80% of consumed soybeans and other food products, and 90% of semiconductor equipments. Nor is it even remotely at the same level as Japan when Japan entered the lost decades. 600M Chinese citizens earned less than $100/month as of 2020. Recently, a scholar reported 900M Chinese citizens earned less than $400/month.
> Being in spiraling deflation while the rest of the world suffers from inflation is a big sign of being broke.
How would you handle the eloquent counterargument that spiraling deflation is not a sign of being broke? Deflation doesn't, in and of itself, signal anything except that the real value of a currency is going up.
China is one of the worlds largest creditors [0]. They may have a lot of organisational problems - I'd go as far as saying they are guaranteed to given they are quite authoritarian. But they aren't broke.
None of those metrics signal problems in and of themselves, and when put together ... they still don't. The consumer spending drop is the closest to something that might be a problem but it needs some supporting data to make a case.
Deflation by itself, sure. Deflation when coupled with huge and increasing debt to service, then you have a crippling problem. That means your ability to pay off your debt gets harder and harder as time goes on, and most of your income goes to service debt principal and interest, and not on actual income growth. China plans a record $411 billion special treatment treasury bond next year, for example, but most if not all of that is just helping local governments pay off debts.
China being the largest creditor doesn't mean much when a lot of their debt is issued to belt and road countries that can never be paid back, and will be written off in the future. It does have a large US debt holdings, but that has shrank from 1.27T (2013) to 772B (2024), and a large part of that being used for cross border transactions.
> Deflation when coupled with huge and increasing debt to service, then you have a crippling problem.
Individuals have a problem. Corporations have a problem. China may or may not have a problem. It depends on how reasonable their bankruptcy laws are. Cleaning out the system of people who aren't using capital effectively is a healthy thing to do.
And I have to say, this idea that we should focus on China's debts and dismiss their credits is suspect. I mean sure, if we ignore all the assets and income streams then they do have a problem. But that isn't reasonable. You can't ignore the strengths to make an argument they are weak.
Let me put it in another way; it's similar to the US banks during 2008, when they appeared to be healthy, holding lots of subprime loans on their books.
If we are talking about China's credit, China has a lot of subprime loans to belt and road countries that have very little income, and lot of subprime loans to their citizens, which recently a scholar reported that 900M of them make less than $400/month.
Possibly. But if the US system was a wealth-producing engine like China's has been in recent history 2008 wouldn't have been all that big a deal. They'd have bounced back in a year or two. Instead in 2008 the US made decisive moves to preserve a system that isn't generating much wealth for the US, and over the course of around 20 years they've arguably managed to give up their position as #1 global economy and are packing stadiums full of people chanting "We love Trump. We love Trump". Looks to me like it is going down in history as a major turning point for the worse.
If China has to take decisive steps to preserve whatever craziness is going on in the mainland, they're going to be preserving a system that has at least 10x-ed their wealth over the last 30 years while producing vast amounts of real capital that has catapulted their living standards up to a much more reasonable standard.
I wouldn't necessarily gamble on China because the system doing well looks unstable and could veer to disaster at any moment the central bureaucracy does something stupid. But we don't have strong evidence of a problem yet. We've got strong evidence they aren't acting like the US, but the US hasn't been setting an inspiring example in decades. As with a lot of economic problems, most of the damage from 2007 was doubling down on failing strategy rather than taking the hint that something needed to change.
And I'm not seeing evidence here that China is broke. They might muck this up, always an option, but they have all the tools they need to succeed in principle.
Tiresome take that's been repeated time and time again. China has problems like any other country larger than Luxemburg. But the conclusion that "china is fucked" sounds more like a wish than anything else to my ears. The Chinese economy is growing ~5% per year. It's got one of the worlds most well educated workforces. It's manufacturing everything from basics to high tech and very little indicates that's about to change anytime soon.
The chip technology sanctions might slow development in that area in China, but I wouldn't count on it.
It's pretty tiring responding to folks who just parrot Chinese government's official 5% numbers and never bothered look into the actual details. Like its well educated workforces being laid off at age 35, and 80% of recent graduates are unemployed or driving didi or delivering food. Or China's low end manufacturing shutting down or moving to Southeast Asia, and high end manufacturing being tariffed/sanctioned.
> On the assumption that Intel is even price competitive. The Apple modem may be it.
Which is super interesting/ironic with the entire reason for an “apple modem” is due to Intels failure there a decade ago. Bonus irony for the subsequent acquisition.
Intel wasn't able to ship a competitive modem to Qualcomm and the whole point of the acquisition was to get rid of Qualcomm and even apple hasn't gotten a shipping version of a 5g modem for six years since the first intel modem started in 2018. This was really to vertically integrate the modem in all of the relevant Apple Silicon devices and it keeps going on...
No, you have to read more of the thread to understand why I asked it.
> TSMC have red hot demand, it’s not hard to understand their urgency in setting up new fabs, wherever they may be. Intel don’t have the same incentive (...)
There was some discussion awhile back about Intel potentially fabbing ARM chips (or any other custom non-x86 chip) as a viable business in the future. I don’t know how serious they were but it sounded plausible when you think about how important it is to have an American leading edge fab, independent of the market future of the x86 ISA.
Basically what would it take for Intel to still have Apple as a customer even if Apple made their own ARM designs…
They feed into each other especially for anything that isn't a vanilla gate. Got a deeply ported SRAM with bypasses? That might fail synthesis if it is too choked by wire rules for the size of the cells so now it's banking time.
I think realistically you wouldn't port the exact same design between manufacturers. That would be a waste of money unless one manufacturer is really rinsing you.
More likely you'd switch manufacturers when you planned to switch process nodes anyway, in which case the increase in workload probably wouldn't be too bad.
There’ll always be an advantage for those who understand the problem they’re solving for sure.
The balance of traditional software components and LLM driven components in a system is an interesting topic - I wonder how the capabilities of future generations of foundation model will change that?
Certain the end state is "one model to rule them all" hence the "transitional."
Just that the pragmatic approach, today, given current LLM capabilities, is to minimize the surface area / state space that the LLM is actuating. And then gradually expand that until the whole system is just a passthrough. But starting with a passthrough kinda doesn't lead to great products in December 2024.
Development, like construction, is horribly ineffective at accurate projections.
So if it's "your turn" to have the (singular) long lived feature branch, you will run over your deadline and into someone else's start date. There will be someone you're blocking who starts pressuring you to finish or for the org to make an exception and allow 2 feature branches.
And six months later, what was good for the goose will be good for the gander and then you have 2 feature branches part of the official process. Which of course means you'll occasionally have 3. After that it's like Texas Highways. They just keep getting more lanes and traffic gets worse.
If you follow the authors advice to its logical conclusion then all changes to the code base are narrowly focussed tweaks - where does the longer term thinking come into this?
If I’m implementing a new feature, should I also disregard the need for refactoring?
A more nuanced approach is needed. You need to learn when to make changes additively and when to reshape the code to fit your new use case (and how much reshaping is required).
As an aside: I think tech debt sprints (if needed regularly) are often a sign that you aren’t developing software sustainably day to day.
I disagree, if you realize: "we need to refactor this file/module/class/etc." then it becomes a new task. Or more general, "we need to refactor the organically grown architecture": it's a new task. Working on a task should not prohibit your thinking about additional tasks, just add them to the backlog.
What you’re describing is a particular form of manipulative and divisive politics which is performed by insecure, desperate or selfish people.
Many engineers are not good at building relationships (the job of coding isn’t optimal for it after all), so painting the people who are good at is as narcissistic may be comforting but isn’t correct.