give it a skill that runs a timer in the background and every 4.5 minutes says "...

zoogeny · 2026-04-13T19:57:22 1776110242

Interesting idea. I suppose one could also have response settings (e.g. max response tokens) to ensure the model doesn't waffle on and run up costs. In a best-case scenario "ping" would be one or two input tokens and a "pong" response would be one or two output tokens, so the cost of the operation would be the preserved context size times the cache read cost (one could avoid doing a cache write since I believe the cache read would reset the platforms cache timer).

It would be interesting to graph the cost/savings of this approach based on context length, percent cached, etc.

The UI for this is a bit tricky, I could mark conversations as "active" and then do the ping/pong dance on only active conversations and up to some determined max cached (e.g. 1 hour).