1: Ask her to bring these things with her from the store.
2: We also need a small plastic snake and a big toy frog for the kids.
3: The rainbow is a division of white light into many beautiful colors.
4: People look but no one ever finds it.
5: Throughout the centuries people have explained the rainbow in various ways.
Deep Voice 3 + IAF (distilled)
ParaNet + IAF (distilled)
1: Ask her to bring these things with her from the store.
2: We also need a small plastic snake and a big toy frog for the kids.
3: The rainbow is a division of white light into many beautiful colors.
4: People look but no one ever finds it.
5: Throughout the centuries people have explained the rainbow in various ways.
Deep Voice 3 + IAF (WaveVAE)
ParaNet + IAF (WaveVAE)
1: Ask her to bring these things with her from the store.
2: We also need a small plastic snake and a big toy frog for the kids.
3: The rainbow is a division of white light into many beautiful colors.
4: People look but no one ever finds it.
5: Throughout the centuries people have explained the rainbow in various ways.
Deep Voice 3 + WaveGlow
ParaNet + WaveGlow
FastSpeech + WaveGlow
1: Ask her to bring these things with her from the store.
2: We also need a small plastic snake and a big toy frog for the kids.
3: The rainbow is a division of white light into many beautiful colors.
4: People look but no one ever finds it.
5: Throughout the centuries people have explained the rainbow in various ways.
Section II: Controlling the rate of speech
The non-autoregressive ParaNet can synthesize speech with different speech rates by specifying the position encoding rate and the length of output spectrogram, accordingly. See the following synthesized speech examples with slow, normal, and fast speech rates, respectively. We use WaveNet as the vocoder.
Slow
Normal
Fast
1: Six spoons of fresh snow peas five thick slabs of blue cheese and maybe a snack for her brother Bob.
2: We also need a small plastic snake and a big toy frog for the kids.