Hacker Newsnew | past | comments | ask | show | jobs | submit | moinism's commentslogin

https://ffmpeg-api.com

As the domain (hopefully) indicates, A REST API for the FFmpeg service. So far, it's been a plain API, but now it's adding MCPs and AI endpoints, so you don't have to remember ffmpeg commands.


Wait, the CLI command for the package is.. 'tit'?


Amen. Been seeing these agent SDKs coming out left and right for a couple of years and thought it'd be a breeze to build an agent. Now I'm trying to build one for ~3 weeks, and I've tried three different SDKs and a couple of architectures.

Here's what I found:

- Claude Code SDK (now called Agent SDK) is amazing, but I think they are still in the process of decoupling it from the Claude Code, and that's why a few things are weird. e.g, You can define a subagent programmatically, but not skills. Skills have to be placed in the filesystem and then referenced in the plugin. Also, only Anthoripic models are supported :(

- OpenAI's SDK's tight coupling with their platform is a plus point. i.e, you get agents and tool-use traces by default in your dashboard. Which you can later use for evaluation, distillation, or fine-tuning. But: 2. They have agent handoffs (which works in some cases), but not subagents. You can use tools as subagents, though. 1. Not easy to use a third-party model provider. Their docs provide sample codes, but it's not as easy as that.

- Google Agent Kit doesn't provide any Typescript SDK yet. So didn't try.

- Mastra, even though it looks pretty sweet, spins up a server for your agent, which you can then use via REST API. umm.. why?

- SmythOS SDK is the one I'm currently testing because it provides flexibility in terms of choosing the model provider and defining your own architecture (handoffs or subagents, etc.). It has its quirks, but I think it'll work for now.

Question: If you don't mind sharing, what is your current architecture? Agent -> SubAgents -> SubSubAgents? Linear? or a Planner-Executor?

I'll write a detailed post about my learnings from architectures (fingers crossed) soon.


Every single SDK I've used was a nightmare once you get past the basics. I ended up just using an OpenRouter client library [1] and writing agents by hand without an abstraction layer. Is it a little more boilerplatey? Yea. Does it take more LoC to write? Yea. Is it worth it? 100%. Despite writing more code, the mental model is much easier (personally) to follow and understand.

As for the actual agent I just do the following:

- Get metadata from initial query

- Pass relevant metadata to agent

- Agent is a reasoning model with tools and output

- Agent runs in a loop (max of n times). It will reason which tool calls to use

- If there is a tool call, execute it and continue the loop

- Once the agent outputs content, the loop is effectively finished and you have your output

This is effectively a ReAct agent. Thanks to the reasoning being built in, you don't need an additional evaluator step.

Tools can be anything. It can be a subagent with subagents, a database query, etc. Need to do an agent handoff? Just output the result of the agent into a different agent. You don't need an sdk to do a workflow.

I've tried some other SDKs/frameworks (Eino and langchaingo), and personally found it quicker to do it manually (as described above) than fight against the framework.

[1]: https://github.com/reVrost/go-openrouter


I think the term sub-agent is almost entirely useless. An agent is an LLM loop that has reasoning and access to tools.

A "sub agent" is just a tool. It's implantation should be abstracted away from the main agent loop. Whether the tool call is deterministic, has human input, etc, is meaningless outside of the main tool contract (i.e Params in Params out, SLA, etc)


I agree, technically, "sub agent" is also another tool. But I think it's important to differentiate tools with deterministic input/output from those with reasoning ability. A simple 'Tool' will take the input and try to execute, but the 'subagent' might reason that the action is unnecessary and that the required output already exists in the shared context. Or it can ask a clarifying question from the main agent before using its tools.


> It's implantation should be abstracted away from the main agent loop. Whether the tool call is deterministic, has human input, etc, is meaningless outside of the main tool contract (i.e Params in Params out, SLA, etc)

Up to a point. You're obviously right in principle, but if that task itself has the ability to call into "adjacent" tools then the behavior changes quite a bit. You can see this a bit with how the Oracle in Amp surfaces itself to the user. The oracle as sub-agent has access to the same tools as the main agent, and the state changes (rare!) that it performs are visible to itself as well as the main agent. The tools that it invokes are displayed similarly to the main agent loop, but they are visualized as calls within the tool.


ADK differentiates between tools and subagents based on the ability to escalate or transfer control (subagents), where as tools are more basic

I think this is a meaningful distinction, because it impacts control flow, regardless what they are called. The lexicon are quite varied vendor-to-vendor


Are there any examples of implementations of this that actually work, and/or are useful? I've seen people write about this, but I haven't seen it anywhere


I think in ADK, the most likely place to find them actually used is the Workflow agent interfaces (sequential, parallel, loop). Perhaps looping, where it looks like they suggest you have an agent that determines if the loop is done and escalates with that message to the Looper.

https://google.github.io/adk-docs/agents/workflow-agents/

I haven't gotten there yet, still building out the basics like showing diffs instead of blind writing and supporting rewind in a session


I think you kinda proved my point. It's a feature that doesn't solve any problems, it's a feature for the sake of being a cool talking point


Nah, when working on anything sufficiently complicated you will have many parallel subagents that need their own context window, ability to mutate shared state, sandboxing differences, durability considerations, etc.

If you want to rewrite the behavior per instance you totally can, but there is a definite concept here that is different than “get_weather”.

I think that existing tools don’t work very well here or leave much of this as an exercise for the user. We have tasks that can take a few days to finish (just a huge volume of data and many non deterministic paths). Most people are doing way too much or way too little. Having subagents with traits that can be vended at runtime feels really nice.


What does "has reasoning" mean? Isn't that just a system prompt that says something like "make a plan" and includes that in the loop?


You actually probably don't need reasoning, as the old non reasoning models like 4o can do this too.

In the past, the agent type flows would work better if you prompted the LLM to write down a plan, or reasoning steps on how to accomplish the task with the available tools. These days, the new models are trained to do this without promoting


Oh, so _that_ is what a sub-agent is. I have been wondering about that for a while now!


Hello, about Claude Code where only Anthoripic models are supported, in reality you can use Claude Code router (https://github.com/musistudio/claude-code-router) to use other models in Claude Code. I use it since some weeks with opensource models and it works pretty well. You can even use "free" models from openrouter


Thank you. But the main blocker for me right now is their skill definition: https://platform.claude.com/docs/en/agent-sdk/skills#how-ski...


Google's ADK is pretty nice, I'm using the Go version, which is less mature than the python on. Been at it a bit over a week and progress is great. This weekend I'm aiming for tracking file changes in the session history to allow rewinding / forking

It has a ton of day 2 features, really nice abstractions, and positioned itself well in terms of the building blocks and constructing workflows.

ADK supports working with all the vendors and local LLMs


I really wish ADK had a local persistent memory implementation, though.


w.r.t. Go, it's probably not that big a lift. I was looking at that yesterday, made a small change to lift the Gorm stuff a bit so the DB conn can be shared between the services

I thought the same thing about the artifact service, which could have a nice local FS option.

I'm pretty new to ADK, so we'll see how long the honeymoon phase lasts. Generally very optimistic that I found a solid foundation and framework

edit: opened an issue to track it

https://github.com/google/adk-go/issues/339


The frameworks are all pointless, just use AI assist to create agents in python or ideally a language with concurrency.

You will be happy you did


How do you deal with the different APIs/Tooluse schema in a custom build? As other people have mentioned, it's a bigger undertaking than it sounds.


You can just tell the AI which format you want the input in, in natural language.


you're wasting valuable context with approaches like that

save it for more interesting tasks


Are you saying that json schema takes less tokens?


I'm saying that having tools/subagents is less tokens

For example, instead of a JSON schema in your prompt, use an Open API subagent with API tools to keep your primary contexts clean

https://google.github.io/adk-docs/tools-custom/openapi-tools...


You will undoubtedly be recreating what already exists in LangGraph. And you'll probably be doing it worse.


Have you tried AWS’s Strands Agents SDK? I’ve found it to be a very fluent and ergonomic API. And it doesn’t require you to use Bedrock; most major vendor native APIs are supported.

(Disclaimer: I work for AWS, but not for any team involved. Opinions are my own.)


This looks good. Even though it's only in Python, I think its worth a try. Thanks.


If you are still open to trying Codex, I'm working on a containerized version with various features: https://github.com/DeepBlueDynamics/codex-container


This looks good, but a bit overkill for what I'm trying to build tbh.


My favourite is Smolagents from Huggingface. You can easily mix and match their models in your agents.


Dude, it looks great, but I just spent half an hour learning about its 'CodeAgents' feature. Which essentially is 'actions written as code'.

This idea has been floating around in my head, but it wasn't refined enough to implement. It's so wild that what you're thinking of may have already been done by someone else on the internet.

https://huggingface.co/docs/smolagents/conceptual_guides/int...

For those who are wondering, it's kind of similar to the 'Code Mode' idea implemented by Cloudflare and now being explored by Anthropic; Write code to discover and call MCPs instead of stuffing context window with their definations.


Did you try langchain/langgraph? Am I confusing what the OP means aa agents?


What about AI SDK from Vercel?

https://ai-sdk.dev/docs/agents/overview


Haven't tried it yet, but it looks similar to OpenAI's. What is your experience?


Hey, this is super cool. congrats on the product and the launch!

I'm building something exactly similar and couldn't believe my eyes when I saw the HN post. What i'm building (chatoctopus.com) is more like a chat-first agent for video editing, only at a prototype stage. But what you guys have achieved is insane. Wishing you lots of success.

to healthy competition!


thank you! chatoctopus looks pretty cool, I'm trying it out right now!

how did you find the chat-first interface to work out for video? what we found is that the response times can be so long that the chat UX breaks down a bit. how are you thinking about this?


looks like I got a network error


Congrats on the launch! Having support for regional voices is going to open up so many opportunities.


Agreed!


It's a shameless plug, but I have built and use something similar: https://backupdiary.com


Here are a few things you can store:

- Important passwords

- Insurance policies

- Anything owed or lent

- Online subscriptions to cancel

- Bills or loans to pay to avoid late fees

- Any information that your business partners would want to know

- Contact information for other important people in your life, such as close friends

- Any accounts or assets that you have and the login information for them

- Special instructions or wishes


As another Pakistani, I don't understand the need for a protest... Let the names be used.


Brother, you missed the "lol" in op's comment.

Anyway, perhaps the devs are Pakistani ;-)


> perhaps the devs are Pakistani

The company is owned by a Chinese national


Congrats on the launch! Just tried the demo and it looks impressive. Good luck.

Are you by any chance hiring global-remote, full-stack/front-end devs? Would love to work with you guys.


Thanks! We aren't hiring right now, but if you shoot me an email at max@talc.ai I'll follow up in a few months.


can you please share one such example or a package in python, node, etc?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: