Yeah, that bit about each phoneme sounding exactly the same everytime really made a lot of sense. Even if the TTS phoneme sounds nothing like a human would say it, once you've heard it enough times, you just memorize it.
I guess sounding "natural" really just amounts to adding variation across the sentence, which destroys phoneme-level accuracy.
I guess sounding "natural" really just amounts to adding variation across the sentence, which destroys phoneme-level accuracy.