Another very simple way to look at this is with the pidgeonhole principle. If I have (for simplicity) 4 signed 8-bit numbers, there are only 2^32 possible digital outputs I can represent. However, in the presence of aliasing, there's a theoretically infinite number of possible analog signals that could have produced those numbers, and no way from the numbers alone to distinguish them. Therefore, you need additional data to distinguish the original signal. In the analog domain, the most practical choice is to pre-filter the signal so that you know the sampling is adequate in the frequency range you are sampling. With that additional constraint you can take those numbers and reconstruct the original signal within the parameters of signal processing theory.
This point of view also has some advantages in that if you think about it, you can see how you might play some games to work your way around things, many of which are used in the real world. For instance, if I'm sampling a 20KHz range, you could sample 0-10KHz with one signal, and then have something that downshifts 40-50KHz into the 10-20KHz range, and get a funky sampling of multiple bands of the spectrum. But no matter what silly buggers you play on the analog or the digital side, you can't escape from the pigeonhole principle.
From here we get the additional mathematical resonance that this sounds an awful lot like the proof that there can be no compression algorithm that compresses all inputs. And there is indeed a similarity; if we had a method for taking a digital signal that could have been 500Hz or 50500Hz despite an identical aliased signal, we could use that as a channel for storing bits in above and beyond what the raw digital signal contains; if we figure out it's the 500Hz signal it's an extra 0, or if it's 50500Hz it's a 1. With higher harmonics we could get even more bits. They don't claim something quite so binary, they've got more of a probabilistic claim, but that just means they're getting fractional extra bits instead of entire extra bits; the fundamental problem is the same. It doesn't matter how many bits you pull from nowhere; anything > 0.0000... is not valid.
Of course, one of the things we know from the internet is that there is still a set of people who don't accept the pigeonhole principle, despite it literally being just about the simplest possible mathematical claim I can possibly imagine (in the most degenerate case, "if you have two things and one box, it is not possible to put both of the things in a box without the box having more than one thing in it").
When dealing with bits, the situation is different since algebraic degrees of freedom (dimensions or coefficients of sinusoids) are different than information degrees of freedom (bits). This difference in the context of the sampling theorem is explored in https://arxiv.org/abs/1601.06421, where it is shown that sampling below the Nyquist rate (without additional loss) is possible when the samples must be quantized to satisfy some bit (information) constraint.
This point of view also has some advantages in that if you think about it, you can see how you might play some games to work your way around things, many of which are used in the real world. For instance, if I'm sampling a 20KHz range, you could sample 0-10KHz with one signal, and then have something that downshifts 40-50KHz into the 10-20KHz range, and get a funky sampling of multiple bands of the spectrum. But no matter what silly buggers you play on the analog or the digital side, you can't escape from the pigeonhole principle.
From here we get the additional mathematical resonance that this sounds an awful lot like the proof that there can be no compression algorithm that compresses all inputs. And there is indeed a similarity; if we had a method for taking a digital signal that could have been 500Hz or 50500Hz despite an identical aliased signal, we could use that as a channel for storing bits in above and beyond what the raw digital signal contains; if we figure out it's the 500Hz signal it's an extra 0, or if it's 50500Hz it's a 1. With higher harmonics we could get even more bits. They don't claim something quite so binary, they've got more of a probabilistic claim, but that just means they're getting fractional extra bits instead of entire extra bits; the fundamental problem is the same. It doesn't matter how many bits you pull from nowhere; anything > 0.0000... is not valid.
Of course, one of the things we know from the internet is that there is still a set of people who don't accept the pigeonhole principle, despite it literally being just about the simplest possible mathematical claim I can possibly imagine (in the most degenerate case, "if you have two things and one box, it is not possible to put both of the things in a box without the box having more than one thing in it").