As the domain (hopefully) indicates, A REST API for the FFmpeg service. So far, it's been a plain API, but now it's adding MCPs and AI endpoints, so you don't have to remember ffmpeg commands.
Amen. Been seeing these agent SDKs coming out left and right for a couple of years and thought it'd be a breeze to build an agent. Now I'm trying to build one for ~3 weeks, and I've tried three different SDKs and a couple of architectures.
Here's what I found:
- Claude Code SDK (now called Agent SDK) is amazing, but I think they are still in the process of decoupling it from the Claude Code, and that's why a few things are weird. e.g, You can define a subagent programmatically, but not skills. Skills have to be placed in the filesystem and then referenced in the plugin. Also, only Anthoripic models are supported :(
- OpenAI's SDK's tight coupling with their platform is a plus point. i.e, you get agents and tool-use traces by default in your dashboard. Which you can later use for evaluation, distillation, or fine-tuning.
But:
2. They have agent handoffs (which works in some cases), but not subagents. You can use tools as subagents, though.
1. Not easy to use a third-party model provider. Their docs provide sample codes, but it's not as easy as that.
- Google Agent Kit doesn't provide any Typescript SDK yet. So didn't try.
- Mastra, even though it looks pretty sweet, spins up a server for your agent, which you can then use via REST API. umm.. why?
- SmythOS SDK is the one I'm currently testing because it provides flexibility in terms of choosing the model provider and defining your own architecture (handoffs or subagents, etc.). It has its quirks, but I think it'll work for now.
Question: If you don't mind sharing, what is your current architecture? Agent -> SubAgents -> SubSubAgents? Linear? or a Planner-Executor?
I'll write a detailed post about my learnings from architectures (fingers crossed) soon.
Every single SDK I've used was a nightmare once you get past the basics. I ended up just using an OpenRouter client library [1] and writing agents by hand without an abstraction layer. Is it a little more boilerplatey? Yea. Does it take more LoC to write? Yea. Is it worth it? 100%. Despite writing more code, the mental model is much easier (personally) to follow and understand.
As for the actual agent I just do the following:
- Get metadata from initial query
- Pass relevant metadata to agent
- Agent is a reasoning model with tools and output
- Agent runs in a loop (max of n times). It will reason which tool calls to use
- If there is a tool call, execute it and continue the loop
- Once the agent outputs content, the loop is effectively finished and you have your output
This is effectively a ReAct agent. Thanks to the reasoning being built in, you don't need an additional evaluator step.
Tools can be anything. It can be a subagent with subagents, a database query, etc.
Need to do an agent handoff? Just output the result of the agent into a different agent. You don't need an sdk to do a workflow.
I've tried some other SDKs/frameworks (Eino and langchaingo), and personally found it quicker to do it manually (as described above) than fight against the framework.
I think the term sub-agent is almost entirely useless. An agent is an LLM loop that has reasoning and access to tools.
A "sub agent" is just a tool. It's implantation should be abstracted away from the main agent loop. Whether the tool call is deterministic, has human input, etc, is meaningless outside of the main tool contract (i.e Params in Params out, SLA, etc)
I agree, technically, "sub agent" is also another tool. But I think it's important to differentiate tools with deterministic input/output from those with reasoning ability.
A simple 'Tool' will take the input and try to execute, but the 'subagent' might reason that the action is unnecessary and that the required output already exists in the shared context. Or it can ask a clarifying question from the main agent before using its tools.
> It's implantation should be abstracted away from the main agent loop. Whether the tool call is deterministic, has human input, etc, is meaningless outside of the main tool contract (i.e Params in Params out, SLA, etc)
Up to a point. You're obviously right in principle, but if that task itself has the ability to call into "adjacent" tools then the behavior changes quite a bit. You can see this a bit with how the Oracle in Amp surfaces itself to the user. The oracle as sub-agent has access to the same tools as the main agent, and the state changes (rare!) that it performs are visible to itself as well as the main agent. The tools that it invokes are displayed similarly to the main agent loop, but they are visualized as calls within the tool.
ADK differentiates between tools and subagents based on the ability to escalate or transfer control (subagents), where as tools are more basic
I think this is a meaningful distinction, because it impacts control flow, regardless what they are called. The lexicon are quite varied vendor-to-vendor
Are there any examples of implementations of this that actually work, and/or are useful? I've seen people write about this, but I haven't seen it anywhere
I think in ADK, the most likely place to find them actually used is the Workflow agent interfaces (sequential, parallel, loop). Perhaps looping, where it looks like they suggest you have an agent that determines if the loop is done and escalates with that message to the Looper.
Nah, when working on anything sufficiently complicated you will have many parallel subagents that need their own context window, ability to mutate shared state, sandboxing differences, durability considerations, etc.
If you want to rewrite the behavior per instance you totally can, but there is a definite concept here that is different than “get_weather”.
I think that existing tools don’t work very well here or leave much of this as an exercise for the user. We have tasks that can take a few days to finish (just a huge volume of data and many non deterministic paths). Most people are doing way too much or way too little. Having subagents with traits that can be vended at runtime feels really nice.
You actually probably don't need reasoning, as the old non reasoning models like 4o can do this too.
In the past, the agent type flows would work better if you prompted the LLM to write down a plan, or reasoning steps on how to accomplish the task with the available tools. These days, the new models are trained to do this without promoting
Hello, about Claude Code where only Anthoripic models are supported, in reality you can use Claude Code router (https://github.com/musistudio/claude-code-router) to use other models in Claude Code. I use it since some weeks with opensource models and it works pretty well. You can even use "free" models from openrouter
Google's ADK is pretty nice, I'm using the Go version, which is less mature than the python on. Been at it a bit over a week and progress is great. This weekend I'm aiming for tracking file changes in the session history to allow rewinding / forking
It has a ton of day 2 features, really nice abstractions, and positioned itself well in terms of the building blocks and constructing workflows.
ADK supports working with all the vendors and local LLMs
w.r.t. Go, it's probably not that big a lift. I was looking at that yesterday, made a small change to lift the Gorm stuff a bit so the DB conn can be shared between the services
I thought the same thing about the artifact service, which could have a nice local FS option.
I'm pretty new to ADK, so we'll see how long the honeymoon phase lasts. Generally very optimistic that I found a solid foundation and framework
Have you tried AWS’s Strands Agents SDK? I’ve found it to be a very fluent and ergonomic API. And it doesn’t require you to use Bedrock; most major vendor native APIs are supported.
(Disclaimer: I work for AWS, but not for any team involved. Opinions are my own.)
Dude, it looks great, but I just spent half an hour learning about its 'CodeAgents' feature. Which essentially is 'actions written as code'.
This idea has been floating around in my head, but it wasn't refined enough to implement. It's so wild that what you're thinking of may have already been done by someone else on the internet.
For those who are wondering, it's kind of similar to the 'Code Mode' idea implemented by Cloudflare and now being explored by Anthropic; Write code to discover and call MCPs instead of stuffing context window with their definations.
Hey, this is super cool. congrats on the product and the launch!
I'm building something exactly similar and couldn't believe my eyes when I saw the HN post. What i'm building (chatoctopus.com) is more like a chat-first agent for video editing, only at a prototype stage. But what you guys have achieved is insane. Wishing you lots of success.
thank you! chatoctopus looks pretty cool, I'm trying it out right now!
how did you find the chat-first interface to work out for video? what we found is that the response times can be so long that the chat UX breaks down a bit. how are you thinking about this?
As the domain (hopefully) indicates, A REST API for the FFmpeg service. So far, it's been a plain API, but now it's adding MCPs and AI endpoints, so you don't have to remember ffmpeg commands.