Interesting idea. I suppose one could also have response settings (e.g. max response tokens) to ensure the model doesn't waffle on and run up costs. In a best-case scenario "ping" would be one or two input tokens and a "pong" response would be one or two output tokens, so the cost of the operation would be the preserved context size times the cache read cost (one could avoid doing a cache write since I believe the cache read would reset the platforms cache timer).
It would be interesting to graph the cost/savings of this approach based on context length, percent cached, etc.
The UI for this is a bit tricky, I could mark conversations as "active" and then do the ping/pong dance on only active conversations and up to some determined max cached (e.g. 1 hour).