Abstract: Studies of speech representation enhance zero-shot Text-to-Speech by mapping text to intermediate representations before generating speech. However, using representations often struggles to ...