Research work summary

In an embodied virtual agent or animated character, speech sound may be complemented by displaying visual information.
Some examples of possible extra information is displaying the motion of the mouth, lips, articulators (i.e. jaw, tongue), eyebrows, eyelids, facial expression, and also the movement of the head and body.
My research interests focus on the estimation of this visual information from speech and interlocutor behaviour (i.e. his speech and body language) in order to animate a virtual agent.

Figure 1. Summary of research work

Ph.D work at GIPSA-Lab: Speech inversion (or acoustic-to-articulatory mapping)

In order to produce augmented speech, displaying speech articulators shapes on a computer screen is potentially useful in all instances where the sound itself might be difficult to understand, for physical or perceptual reasons. My research focuses on the development of a system called visual articulatory feedback, in which the visible and hidden articulators of a talking head are controlled from the speaker‘s speech sound. The motivation of this research was to develop such a system that could be applied to Computer Aided Pronunciation Training (CAPT) for learning of foreign languages, or in the domain of speech therapy.

We are working on the acoustic-to-articulatory mapping by statistical learning methods (e.g. hidden Markov models (HMM) and Gaussian mixture models (GMMs)) trained on parallel synchronous acoustic and articulatory data recorded on a French speaker by means of an ElectroMagnetic Articulograph (EMA).

The talking head currently developed in Speech and Cognition Department of GIPSA-Lab is the assemblage of individual three-dimensional models of various speech organs of the same speaker. These models are built from MRI, CT and video data acquired from the same speaker.

Figure. 2 Three coils attached to the tongue

Figure. 3 Postions of different coils

Figure. 4 Talking head diplaying 4 vowels's articulation
/a/ /i/ /y/ /u/

Our HMMs approach combines HMM-based acoustic recognition and HMM-based articulatory synthesis techniques to estimate the articulatory trajectories from the acoustic signal. GMMs estimate articulatory features directly from the acoustic ones.
Using MLLR adaptation procedure, we have developed a complete articulatory feedback demonstrator, which can work for any speaker.

Video 1. Our first visual articulatory feedback prototype

Postdoctoral work at CSTR: Speech-driven head motion synthesis

My work at CSTR focus on the devlopment of a talking head in which the lips and head motion are controlled using articulatory movements estimated from speech.
The advantage of the use of articulatory features is that they can drive the lips motions and they have a close link with head movements.
Different temporal clustering techniques are investigated for HMM-based mapping as well as a GMM-based frame-wise mapping as a baseline system (more details available here).
Video 2 presents the animations showing lips and head motions. Note that there was no movements of eyebrow and body. Lip movements were the same in all the videos and was estimated form the predicted articulatory features (cf. Ph.D work at GIPSA-Lab).

Video 2. Speech-driven head motion synthesis samples

Postdoctoral work at LIMSI: Adaptation of a virtual agent to the user's behaviour

My research work at LIMSI focus on a computational model for reasoning about affects of the interlocutor, using a Theory of Mind (ToM) paradigm: the system manipulates representations of beliefs about the interlocutor's affects, preferences and goals. We have implemented it using an OCC-based representation of emotions and a PAD model for moods.
Our affective model is designed for the context of job interview simulation.

The TARDIS Academic Platform that you can use for free is available here.

Video 3. Demo video of a virtual recruiter on TARDIS platform

Grenoble Images Parole Signal Automatique laboratoire

UMR 5216 CNRS - Grenoble INP - Université Joseph Fourier - Université Stendhal

Flag Counter
* Since Fri 31 May 2013: CSTR & LIMSI *