Using the NPU numbers grossly overstates the AI performance of the Apple Silicon hardware, so they're actually giving Apple the benefit of the doubt.
Most AI training and inference (including generative AI) is bound by large scale matrix MACs. That's why nvidia fills their devices with enormous numbers of tensor cores and Apple / Qualcomm et al are adding NPUs, filling largely the same gap. Only nvidia's not only are a magnitude+ more performant, they've massively more flexible (in types and applications), usable for training and inference, while Apple's is only even useful for a limited set of inference tasks (due to architecture and type limits).
Apple can put the effort in and making something actually competitive with nvidia, but this isn't it.
Care to share the TOPs numbers for the Apple GPUs and show how this would “grossly overstate” the numbers?
Apple won’t compete with NVIDIA, I’m not arguing that. But your opening line will only make sense if you can back up the numbers and the GPU performance is lower than the ANE TOPS.
Tensor / neural cores are very easy to benchmark and give a precise number because they do a single well-defined thing at a large scale. So GPU numbers are less common and much more use-specific.
However the M2 Ultra GPU is estimated, with every bit of compute power working together, at about 26 TOPS.
Could you provide a link for that TOPS count? (And specifically TOPs with comparable unit sizes since NVIDIA and Apple did not use the same units till recently)
The only similar number I can find is for TFLOPS vs TOPS
Again I’m not saying the GPU will be comparable to an NVIDIA one, but that the comparison point isn’t sensible in the comments I originally replied to.
Most AI training and inference (including generative AI) is bound by large scale matrix MACs. That's why nvidia fills their devices with enormous numbers of tensor cores and Apple / Qualcomm et al are adding NPUs, filling largely the same gap. Only nvidia's not only are a magnitude+ more performant, they've massively more flexible (in types and applications), usable for training and inference, while Apple's is only even useful for a limited set of inference tasks (due to architecture and type limits).
Apple can put the effort in and making something actually competitive with nvidia, but this isn't it.