Yeah sure, but if you do that you are heavily dropping the token/s for a single ...

sailingparrot · 2025-07-23T13:26:47 1753277207

> but if you do that you are heavily dropping the token/s for a single user.

I don’t follow what you are saying and what “that” is specifically. Assuming it’s referencing using HBM and not just SRAM, this is not optional on a GPU, SRAM is many order of magnitudes too small. Data is constantly flowing between HBM and SRAM by design, and to get data in/out of your GPU you have to go through HBM first, you can’t skip that.

And while it is quite massive on a Cerebras system it is also still too small for very large models.