I suspect that just like with Search, LLMs are used for a number of different ac...

I suspect that just like with Search, LLMs are used for a number of different action types. One being specifically web search, one being product search and so forth.

Within the web-search and product-search requests there is undoubtedly A LOT of overlap between peoples queries. It would not be unfeasible to have on nice long good answer generated by e.g. ChatGPT 5.1 cached, and first throw the initial user request into some kind of classifier and use a smaller LLM to judge whether the cached answer is close enough to the initial query.