Research projects

 

 

My research activities deal with multimodal speech processing (recognition, synthesis, conversion), with a special interest in speech articulation (i.e. movements of the tongue, lips, etc.), statistical machine learning, and assistive technologies for speech-impaired people. See below some of my current research projects:

 

 

 

 

 

Silent Speech Interfaces

 

A “silent speech interface” (SSI) is a device that allows speech communication without the necessity of vocalizing. SSI could be used in situations where silence is required (as a silent cell phone), or for communication in very noisy environments. Further applications are possible in the medical field. For example, SSI could be used by laryngectomized patients as an alternative to electrolarynx which provides a very robotic voice; to oesophageal speech, which is difficult to master; or to tracheo-oesoephageal speech, which requires additional surgery. The design of a SSI has recently received considerable attention from the speech research community. In the approach developed in my PhD, articulatory movements are captured by a non-invasive multimodal imaging system composed of an ultrasound transducer placed beneath the chin and a video camera in front of the lips. The articulatory-to-acoustic mapping problem, i.e. the synthesis of an intelligible speech signal from articulatory data (only), is achieved using statistical mapping techniques (Neural Networks, GMM, HMM).

 

Related projects:

  • "Ultraspeech II" (GIPSA-lab), funded by the Christian Benoît Award
  • "Revoix" (ANR, 2009-2011), in collaboration with SIGMA-lab (ESPCI ParisTech) and LPP Université Sorbonne Nouvelle.
  • "Ouisper", (ANR, 2006-2009, SIGMA-lab ESPCI ParisTech, LTCI Telecom ParisTech, VTVL University of Maryland)
  • "Cassis" (PHC Sakura, 2009-2010, GIPSA-lab, SIGMA-lab, LTCI, NAIST Japan)

 

Ultrasound-based silent speech interface

 

Check out our very first real-time prototype (developed in the Ultraspeech2 project, funded by the Christian Benoit Award). These are preliminary results, based on a "light" version of our acoustic-articulatory mapping algorithm.

 

 

Visual articulatory feedback

 

Systems of visual sensory feedback aim at providing the speaker visual information about his/her own articulation, in real-time. Several studies show that this kind of system can be useful for both speech therapy and Computer Aided Pronunciation Training (CAPT). The system developed at GIPSA-lab is based on a 3D talking head used in an augmented speech scenario, i.e. it displayed all speech articulators including the tongue and the velum. In this system, the talking head is animated automatically from the audio speech signal, using acoustic-to-articulatory inversion. This inversion is achieved using statistical mapping techniques based on Gaussian mixture modeling (GMM) and Hidden Markov modeling (HMM).

Visual biofeedback based on acoustic-articulatory inversion

 

Check out this video of our first real-time prototype !

Related projects:

  • Diandra Fabre's PhD (funded by Région Rhones-Alpes)
  • Project Living Book of Anatomy (Persyval-lab, funding for the post-doctoral position of Eric Tatulli)
  • "Vizart3D" (Pôle CSVB, Université Joseph Fourier, Grenoble, 2012-2013, GIPSA-lab)

 

 

 

Incremental Speech Synthesis

 

This research project aims at developing an incremental Text-To-Speech system (iTTS) in order to improve the user experience of people with communication disorders who use a TTS system in their daily life. Contrary to a conventional TTS, an iTTS system aims at delivering the synthetic voice while the user is typing (eventually with a delay of one word), and thus before the full sentence is available. By reducing the latency between text input and speech output, iTTS should enhance the interactivity of communication. Besides, iTTS could be chained with incremental speech recognition systems, in order to design highly responsive speech-to-speech conversion system (for application in automatic translation, silent speech interface, real-time enhancement of pathological voice, etc.).

 

Check out this video of our first prototype :

 

Related projects:

  • Project SpeakRightNow (AGIR, funding for the post-doctoral position of Olha Nahorna)
  • Maël Pouget's PhD (National grant)

Grenoble Images Parole Signal Automatique laboratoire

UMR 5216 CNRS - Grenoble INP - Université Joseph Fourier - Université Stendhal