(Hi, Tom!) Reread the article and look for “CPU”. The whole article is about doing deep learning on CPUs not GPUs. Moonshine, the open source project and startup he talks about, shows speech recognition and realtime translation on the device rather than on a server. My understanding is that doing The Math in parallel is itself a performance hack, but Doing Less Math is also a performance hack.
I hate its acknowledgement of its personality prompt. Try having a series of back and forth and each response is like “got it, keeping it short and professional. Yes, there are only seven deadly sins.” You get more prompt performance than answer.
I like the term prompt performance; I am definitely going to use it:
> prompt performance (n.)
> the behaviour of a language model in which it conspicuously showcases or exaggerates how well it is following a given instruction or persona, drawing attention to its own effort rather than simply producing the requested output.
It's like writing an essay for a standardized test, as opposed to one for a college course or for a general audience. When taking a test, you only care about the evaluation of a single grader hurrying to get through a pile of essays, so you should usually attempt to structure your essay to match the format of the scoring rubric. Doing this on an essay for a general audience would make it boring, and doing it in your college course might annoy your professor. Hopefully instruction-following evaluations don't look too much like test grading, but this kind of behavior would make some sense if they do.
Pay people $1 and hour and ask them to choose A or B, which is more short and professional:
A) Keeping it short and professional. Yes, there are only seven deadly sins
B) Yes, there are only seven deadly sins
Also have all the workers know they are being evaluated against each other and if they diverge from the majority choice their reliability score may go down and they may get fired. You end up with some evaluations answered as a Keynesian beauty contest/family feud survey says style guess instead of their true evaluation.
Tl;Dr: ThoughtWorks founder is spending his millions portraying Chinese government policies, including Xinjian/Uighurs, in a positive light. His spending his heavily laundered but he’s now based in China, and working in the same offices as a propaganda company.
Calendar was brilliant. I think it was the first time I fully appreciated the misery of the human mind in the face of various orbit periods that aren't simple integer ratios of one another. https://www.bbc.co.uk/programmes/p00548m9
Politeness. Social barriers were coming down, you were interacting with people of different rank, how do you not get into a swordfight? Also, the letter from the wife complaining about her husband! https://www.bbc.co.uk/programmes/p004y29m
I think they did all the big interesting things in history and then struggled with a lot of minor events that were hard to find interesting angles on.
Are you two talking at cross-purposes because you don't have a shared understanding of control and data flow?
The pieces here are:
* Claude Code, a Node (Javascript) application that talks to MCP server(s) and the Claude API
* The MCP server, which exposes some tools through stdin or HTTP
* The Claude API, which is more structured than "text in, text out".
* The Claude LLM behind the API, which generates a response to a given prompt
Claude Code is a Node application. CC is configured in JSON with a list of MCP servers. When CC starts up, CC"s Javascript initialises each server and as part of that gets a list of callable functions.
When CC calls the LLM API with a user's request, it's not just "here is the user's words, do it". There are multiple slots in the request object, one of which is a "tools" block, a list of the tools that can be called. Inside the API, I imagine this is packaged into a prefix context string like "you have access to the following tools: tool(args) ...". The LLM API probably has a bunch of prompts it runs through (figure out what type of request the user has made, maybe using different prompts to make different types of plan, etc.) and somewhere along the way the LLM might respond with a request to call a tool.
The LLM API call then returns the tool call request to CC, in a structured "tool_use" block separate from the freetext "hey good news, you asked a question and got this response". The structured block means "the LLM wants to call this tool."
CC's JS then calls the server with the tool request and gets the response. It validates the response (e.g., JSON schemas) and then calls the LLM API again bundling up the success/failure of the tool call into a structured "tool_result" block. If it validated and was successful, the LLM gets to see the MCP server's response. If it failed to validate, the LLM gets to see that it failed and what the error message was (so the LLM can try again in a different way).
The idea is that if a tool call is supposed to return a CarMakeModel string ("Toyota Tercel") and instead returns an int (42), JSON Schemas can catch this. The client validates the server's response against the schema, and calls the LLM API with
So the LLM isn't choosing to call the validator, it's the deterministic Javascript that is Claude Code that chooses to call the validator.
There are plenty of ways for this to go wrong: the client (Claude Code) has to validate; int vs string isn't the same as "is a valid timestamp/CarMakeModel/etc"; if you helpfully put the thing that failed into the error message ("Expect string, got integer (42)") then the LLM gets 42 and might choose to interpret that as a CarMakeModel if it's having a particularly bad day; the LLM might say "well, that didn't work, but let's assume the answer was Toyota Tercel, a common car make and model", ... We're reaching here, yet these are possible.
But the basic flow has validation done in deterministic code and hiding the MCP server's invalid responses from the LLM. The LLM can't choose not to validate. You seemed to be saying that the LLM could choose not to validate, and your interlocutor was saying that was not the case.
>Are you two talking at cross-purposes because you don't have a shared understanding of control and data flow?
No they're literally just skipping an entire step into how LLM's actually "use" MCP.
MCP is just a standard, largely for humans. LLM's do not give a singular fuck about it. Some might be fine tuned for it to decrease erroneous output, but at the end of the day it's just system prompts.
And respectfully, your example misunderstands what is going on:
>* The Claude API, which is more structured than "text in, text out".
>* The Claude LLM behind the API, which generates a response to a given prompt
No. That's not what "this" is. LLM's use MCP to discover tools they can call, aka function/tool calling. MCP is just an agreed upon format, it doesn't do anything magical; it's just a way of aligning the structure across companies, teams, and people.
There is not an "LLM behind the API", while a specific tool might implement its overall feature set using LLM's, that's totally irrelevant to what's being discussed and the principle point of contention.
Which is this: an LLM interacting with other tools via MCP still needs system prompts or fine tuning to do so. Both of those things are not predictable or deterministic. They will fail at some point in the future. That is indisputable. It is a matter of statistical certainty.
It's not up for debate. And an agreed upon standard between humans that ultimately just acts as convention is not going to change that.
It is GRAVELY concerning that so many people are trying to use technical jargon of which they clearly are ill-equipped to do so. The magic rules all.
> No they're literally just skipping an entire step into how LLM's actually "use" MCP.
No,you are literally misunderstanding the entire control flow of how an LLM toolchain uses both the model and any external tools (whether specified via MCP or not, but the focus of the conversation is MCP.)
> MCP is just a standard, largely for humans.
The standard is for humans implementing both tools and the toolchains that call them.
> LLM's do not give a singular fuck about it.
Correct. LLM toolchains, which if they can connect to tools via MCP, are also MCP clients care about it. LLMs don't care abojt it because the toolchain is the thing that actually calls both the LLM and the tools. And that's true whether the toolchain is a desktop frontend with a local, in process llama.cpp backend for running the LLM or if its the Claude Desktop app with a remote connection to the Anthropic API for calling the LLM or whatever.
> Some might be fine tuned for it to decrease erroneous output,
No, they aren't. Most models that are used to call tools now are specially trained for tool calling with a well-defined format for requesting tool calls from the toolchain a mnd receiving results back from it (though this isn't necessary for tool calling to work, people were using the ReAct pattern in toolchains to do it with regular chat models without any training or prespecified prompt/response format for tool calls just by having the toolchain inject tool-related instructions in the prompt, and read LLM responses to see if it was asking for tool calls), none of them that exist now are fine tuned for MCP, nor do they need to be because they literally never see it. The toolchain reads LLM responses, identifies tool call requests, takes any that map to tools defined via MCP and routes them down the channel (http or subprocess stdio) specified by the MCP, and does the reverse woth responses from the MCP server, validating responses and then mapping them into a prompt template that specifies where tool responses go and how they are formatted. It does the same thing (minus the MCP parts) for tools that aren’t specified by MCP (frontends might have their own built-tools, or have other mechanisms for custom tools that predate MCP support.) The LLM doesn't see any difference between MCP tools and other tools or a human reading the message with the tool request and manually creating a response that goes directly back.
> LLM's use MCP to discover tools they can call,
No, they don't. LLM frontends, which are traditional deterministic programs, use MCP to do that, and to find schemas for what should be sent to and expected from the tools. LLMs don’t see the MCP specs, and get information from the toolchain in prompts in formats that are model-specific and unrelated to MCP that tell them what tools they can request calls be made to and what they can expect back.
> an LLM interacting with other tools via MCP still needs system prompts or fine tuning to do so. Both of those things are not predictable or deterministic. They will fail at some point in the future. That is indisputable.
That's not, contrary to your description, a point of contention.
The point of contention is that the validation of data returned by an MCP server against the schema provided by the server is not predictable or deterministic. Confusing these two issues can only happen if you think the model does something with each response that controls whether or not the toolchain validates it, which is impossible, because the toolchain does whatever validation it is programmed to do before the model sees the data. The model has no way to know there is a response until that happens.
Now,can the model make requests that the don't fit the toolchain’s expectations due to unpredictable model behavior? Sure. Can the model do dumb things with the post-validation reaponse data after the toolchain has validated it and mapped it into the models prompt template and called the model with that prompt, for the same reason? Abso-fucking-lutely.
Can the model do anything to tell the toolchain not to validate response data for a tool call that it did decide to make on behalf of the model if the toolchain is programmed to validate the response data against the schema provided by the tool server? No, it can't. It can't even know that the tool was provided by an MCP and that that might be an issue, not can it know that the toolchain made the request, nor can it know that the toolchain received a response until the toolchain has done what it is programmed to do with the response through the point of populating the prompt template and calling the model with the resulting prompt, by which point any validation it was programmed to do has been done and is an immutable part of history.
>No, they don't. LLM frontends, which are traditional deterministic programs, use MCP to do that, and to find schemas for what should be sent to and expected from the tools.
You are REALLY, REALLY misunderstanding how this works. Like severely.
You think MCP is being used for some other purpose despite the one it was explicitly designed for... which is just weird and silly.
>Confusing these two issues can only happen if you think the model does something with each response that controls whether or not the toolchain validates it
No, you're still just arguing against something no one is arguing for the sake of pretending like MCP is doing something it literally cannot do or fundamentally fix about how LLM's operate.
I promise you if you read this a month from now with a fresh pair of eyes you will see your mistake.
What do you think the `tools/call` MCP flow is between the LLM and an MCP server? For example, if I had the GitHub MCP server configured on Claude Code and prompted "Show me the most recent pull requests on the torvalds/linux repository".
Hum, I'm not sure if everyone is simply unable to understand what you are saying, including me, but if the MCP client validates the MCP server response against the schema before passing the response to the LLM model, the model doesn't even matter, your MCP client could choose to report an error and interrupt the agentic flow.
That will depend on what MCP client you are using and how they've handled it.
The newsletter this is from is full of very clear writing about SQL, practically applying theory without getting lost in a tangle of database theory jargon. If you need to read or write SQL then I think you’ll find it as interesting as I have.
From the excellent "Why Nations Fail" by Daron Acemoglu and James A. Robinson:
> An example of what could happen if you took your job too seriously,
rather than successfully second-guessing what the Communist Party
wanted, is provided by the Soviet census of 1937. As the returns came
in, it became clear that they would show a population of about 162
million, far less than the 180 million Stalin had anticipated and indeed
below the figure of 168 million that Stalin himself announced in 1934.
The 1937 census was the first conducted since 1926, and therefore the
first one that followed the mass famines and purges of the early 1930s.
The accurate population numbers reflected this. Stalin's response was to
have those who organized the census arrested and sent to Siberia or
shot. He ordered another census, which took place in 1939. This time the
organizers got it right; they found that the population was actually 171
million.
The jobs data comes from surveys of businesses and consumers. Fewer of each category are responding, continuing a long-term trend of declining response rates. Cuts affect their ability to collect data with about 15% of the sample "suspended" -- i.e. not done "to align survey workload with resource levels" in the words of the announcement linked from the Bloomberg article.
> "The more data you’re missing and comes in later, the higher the odds the revisions will be much larger," said Omair Sharif, president of Inflation Insights LLC. "Fifty percent is just not enough."