Decode - the model chooses a new token to append to the end of the current token list (i.e. it generates a token), then computes the new tokens KVs.
Decode is basically prefill 1 tok -> add 1 tok -> prefill 1 more tok -> ....
but in the initial prefill stage it doesn't need to do generation, since you've provided the toks.
reply
Decode - the model chooses a new token to append to the end of the current token list (i.e. it generates a token), then computes the new tokens KVs.
Decode is basically prefill 1 tok -> add 1 tok -> prefill 1 more tok -> ....
but in the initial prefill stage it doesn't need to do generation, since you've provided the toks.