Analytic generation of synthesis units by closed loop training for totally speaker driven text to speech system (TOS drive TTS)

Akamine, Masami; Kagoshima, Takehiko

doi:10.21437/ICSLP.1998-17

iBet uBet web content aggregator. Adding the entire web to your favor.

Link to original content: https://doi.org/10.21437/ICSLP.1998-17

Analytic generation of synthesis units by closed loop training for totally speaker driven text to speech system (TOS drive TTS)

Masami Akamine, Takehiko Kagoshima

This paper provides a new method for automatically generating speech synthesis units. The algorithm, called Closed-Loop Training (CLT), is based on evaluating and reducing the distortion in synthesized speech. It minimizes distortion caused by synthesis process such as prosodic modification in an analytic way. The distortion is measured by calculating the error between synthesized speech units and natural speech units in a large speech database (corpus). The CLT method effectively generates the synthesis units that are most resembling of natural speech after synthesis process. In this paper, CLT is applied to a waveform concatenation based synthesizer, whose basic unit is a diphone. By using CLT, the synthesizer generates clear and smooth synthetic speech even with a relatively small volume of synthesis units.