They can generalise to novel inputs. Ok often they mess it up and they're clearly better at dealing with inputs they have seen before (who isn't?), but they can still reason about things they have never seen before.
Honestly if you don't believe me just go and use them. It's pretty obvious if you actually get experience with them.
Current LLMs are equivalent to tabular Markov chains (though these are too huge to realistically compute). What's the size limit when a tabular Markov chain can generalize to novel inputs?
Honestly if you don't believe me just go and use them. It's pretty obvious if you actually get experience with them.