DFKI Technical Memo-07-02
by Sacha Krstulovic
A review of state-of-the-art speech modelling methods for the parameterisation of expressive synthetic speech
This document will review a sample of available voice modelling and transformation techniques, in view of an application in expressive unit-selection based speech synthesis in the framework of the PAVOQUE project. The underlying idea is to introduce some parametric modification capabilities at the level of the synthesis system, in order to compensate for the sparsity and rigidity, in terms of available emotional speaking styles, of the databases used to define speech synthesis voices. For this work, emotion-related parametric modifications will be restricted to the domains of voice quality and prosody, as suggested by several reviews addressing the vocal correlates of emotions (Schröder, 2001; Schröder, 2004; Roehling et al., 2006). The present report will start with a review of some techniques related to voice quality modelling and modification. First, it will explore the techniques related to glottal flow modelling. Then, it will review the domain of cross-speaker voice transformations, in view of a transposition to the domain of cross-emotion voice transformations. This topic will be exposed from the perspective of the parametric spectral modelling of speech and then from the perspective of available spectral transformation techniques. Then, the domain of prosodic parameterisation and modification will be reviewed.
This document is available as PDF-File.
The previous abstract is here.
Note: This page was written to look best with CSS stylesheet support Level 1 or higher. Since you can see this, your browser obviously doesn't support CSS, or you have turned it off. We highly recommend you use a browser that supports and uses CSS, and review this page once you do. However, don't fear, we've tried to write this page to still work and be readable without CSS.