Team manager : Gérard BAILLYThomas HUEBER

Research Areas

Interactive systems

We develop interactive systems for speech-handicaped people, speech rehabilitation, computer assisted language and reading learning, and multimedia (movies, video games, etc.). Many of those systems exploits different modalities of speech communication such as the speech acoustics indeed (i.e. the audio signal), but also the speech articulation (movements of the tongue, lips, etc.), the hands movements, the muscle and brain activity, etc. These multimodal signals are jointly modeled using  custom machine learning techniques. Below, a focus on two systems currently developed in the team. 


Automatic recognition/synthesis of cued-speech (Langue Parlée Complétée)

Cued Speech is a visual system of communication used with and among deaf or hard-of-hearing people. It is a phonemic-based system which makes traditionally spoken languages accessible by using a small number of handshapes, known as cues, (representing consonants) in different locations near the mouth (representing vowels), as a supplement to speechreading (source wikipedia).

In the context of ANR TELMA project (2005-2009), we developed a system able 1) to convert automatically a video of somebody using cued-speech into a sequence of words (i.e recognition) and 2) to animate also automatically an avatar of a virtual "cued-speaker" from the voice.

Click on the picture below to watch a video about this system (in French):




Silent speech interface

A “silent speech interface” (SSI) is a device that allows speech communication without the necessity of vocalizing. SSI could be used in situations where silence is required (as a silent cell phone), or for communication in very noisy environments. Further applications are possible in the medical field. For example, SSI could be used by laryngectomized patients as an alternative to electrolarynx which provides a very robotic voice; to oesophageal speech, which is difficult to master; or to tracheo-oesoephageal speech, which requires additional surgery. The design of a SSI has recently received considerable attention from the speech research community. In our approach (in collaboration with Pr. Bruce Denby at Institut Langevin), articulatory movements are captured by a non-invasive multimodal imaging system composed of an ultrasound transducer placed beneath the chin and a video camera in front of the lips. The articulatory-to-acoustic mapping problem, i.e. the synthesis of an intelligible speech signal from articulatory data (only), is achieved using statistical mapping techniques (DNN, GMM, HMM). 

Watch below a TV report (in French) on this research project.

GIPSA-lab, 11 rue des Mathématiques, Grenoble Campus BP46, F-38402 SAINT MARTIN D'HERES CEDEX - 33 (0)4 76 82 71 31