The degree to which they hype UMA (a term of art within the PC industry for decades) on that page is amusing --- if Intel did the same sort of marketing for its integrated GPUs, they would almost certainly be laughed at.
From what I've read it seems to be different from Intel's UMA. It's cache-friendly single pool of memory directly accessible from different cores of the SoC: CPU, GPU, NPU etc. Since there's no division of RAM between different parts of the SoC, copying operations aren't needed.
I also read somewhere that they made sure the data formats used by the various modules are identical. Without that, you have to copy data to do format conversions. I don’t know how much of a problem that is in general, though.
M1's performance is great, but there isn't much evidence to attribute it to the memory subsystem. Also, people wanting to fit 32GB worth of apps into 16GB is just wishful thinking that isn't supported by evidence either.
Quite a lot of testing in Final Cut and Logic etc showing very good performance and efficient swap. So for most consumer apps it works well. No magic if you need a big block of memory for ML or analytics.
Do we know what benefit they're getting by putting the memory on the chip to begin with? I had initially assumed there was some non-trivial performance advantage in it, but now people are saying not so much, so why did they do it?
It seems weird to artificially limit the max memory if there was no advantage in it.