The legacy workloads are in some sense legacy, but in another sense the term legacy is misleading because at larger shops with capacity growth there has been no shortage of modernization of the various frontends and addition of new logic into the existing programs. The application maybe has the same name and core business as it always has, but is still growing in size and under active development. The idea now is that you might as well do the same thing and build in AI inference to those existing applications. Which is something they started implementing first with the last generation Telum chip which added some basic tensor units. This time around they are adding vector extensions 3 (think round 3 kinda like how AVX evolved) and that tensor processing extension 2 to the instruction set. Plus they are selling a discrete chip now that uses those same instructions and connects over PCI, which will probably be more than enough given that the goal is to never train on mainframes and only inference.