I appreciate the goal of demystifying agents by writing one yourself, but for me...

simonw · 2025-11-07T17:47:12 1762537632

Yeah, that's basically it. Many models these days are specifically trained for tool calling though so the system prompt doesn't need to spend much effort reminding them how to do it.

You can see the prompts that make this work for gpt-oss in the chat template in their Hugging Face repo: https://huggingface.co/openai/gpt-oss-120b/blob/main/chat_te... - including this bit:

    {%- macro render_tool_namespace(namespace_name, tools) -%}
        {{- "## " + namespace_name + "\n\n" }}
        {{- "namespace " + namespace_name + " {\n\n" }}
        {%- for tool in tools %}
            {%- set tool = tool.function %}
            {{- "// " + tool.description + "\n" }}
            {{- "type "+ tool.name + " = " }}
            {%- if tool.parameters and tool.parameters.properties %}
                {{- "(_: {\n" }}
                {%- for param_name, param_spec in tool.parameters.properties.items() %}
                    {%- if param_spec.description %}
                        {{- "// " + param_spec.description + "\n" }}
                    {%- endif %}
                    {{- param_name }}
    ...

As for how LLMs know when to stop... they have special tokens for that. "eos_token_id" stands for End of Sequence - here's the gpt-oss config for that: https://huggingface.co/openai/gpt-oss-120b/blob/main/generat...

    {
      "bos_token_id": 199998,
      "do_sample": true,
      "eos_token_id": [
        200002,
        199999,
        200012
      ],
      "pad_token_id": 199999,
      "transformers_version": "4.55.0.dev0"
    }

The model is trained to output one of those three tokens when it's "done".

https://cookbook.openai.com/articles/openai-harmony#special-... defines some of those tokens:

200002 = <|return|> - you should stop inference

200012 = <|call|> - "Indicates the model wants to call a tool."

I think that 199999 is a legacy EOS token ID that's included for backwards compatibility? Not sure.

vinhnx · 2025-11-08T02:03:43 1762567423

Thank you Simon! This information is invaluable to know about the underlying tools coherent of language model, gladly we have gpt-oss for clear example for how the model understand and perform tool.

JoshMandel · 2025-11-07T17:37:19 1762537039

I think that it's basically fair and I often write simple agents using exactly the technique that you describe. I typically provide a TypeScript interface for the available tools and just ask the model to respond with a JSON block and it works fine.

That said, it is worth understanding that the current generation of models is extensively RL-trained on how to make tool calls... so they may in fact be better at issuing tool calls in the specific format that their training has focused on (using specific internal tokens to demarcate and indicate when a tool call begins/ends, etc). Intuitively, there's probably a lot of transfer learning between this format and any ad-hoc format that you might request inline your prompt.

There may be recent literature quantifying the performance gap here. And certainly if you're doing anything performance-sensitive you will want to characterize this for your use case, with benchmarks. But conceptually, I think your model is spot on.

fryz · 2025-11-07T17:38:28 1762537108

The "magic" is done via the JSON schemas that are passed in along with the definition of the tool.

Structured Output APIs (inc. the Tool API) take the schema and build a Context-free Grammar, which is then used during generation to mask which tokens can be output.

I found https://openai.com/index/introducing-structured-outputs-in-t... (have to scroll down a bit to the "under the hood" section) and https://www.leewayhertz.com/structured-outputs-in-llms/#cons... to be pretty good resources