Teaching activities

Hidden Markov Model and Gaussian Mixture Model, application to automatic speech recognition - Master SIGMA (since 2017)

  • Lecture 1 (2h)
    • Introduction (speech tech, speech production)
    • Basic considerations on ASR (problem formulation, signal encoding, general overview of an ASR system)
    • Hidden Markov Model (HMM), episode 1
  • Lecture 2 (2h)
    • Gaussian Mixture Model (GMM), a spin-off of HMM
    • Hidden Markov Model (HMM), episode 2
    • Practical implementation of a HMM-GMM-based ASR system

(slides here)


Real-time audio programming - PHELMA (since 2015)

  • Lecture 1 (2h)
    • Definition(s) of a "real-time system", classification of RT systems (hard/soft, safe-critical, etc.).
    • Theoritical models: synchronous/scheduled, time-triggered/event-based model
    • Hardware aspects (DSP, GPU, etc.)
  • Lecture 2 (2h)
    • Common implementation issues in real-time audio programmaming on standard OS - (preemption, scheduling strategies, context switching, priority inversion, memory allocation, etc.)
    • Specific aspects of real-time signal processing (circular buffering, overlap-add, etc.)
  • Lab work (16h)


Speech technologies - ENSIMAG (2013-2015)


  • Lecture 1: Automatic Speech recognition (ASR)
    • Introduction
    • Speech analysis/coding for ASR
    • Template-based ASR systems ( DTW, Level-building/one-stage DTW)
    • Introduction to maching learning
    • HMM-based ASR (discrete Markov models, hidden markov model, training/evaluation/decoding, context-dependancy, state-tying, introduction to langage modeling)

  • Lecture 2 : Text-To-Speech synthesis
    • Some history ...
    • Introduction to text analysis for TTS (morpho-syntactic analysis, prosody generation, phonetization)
    • Corpus-based (unit selection) TTS
    • HMM-based TTS

  • Lecture 3 : Multimodal speech technologies
    • Introduction : Speech is "multimodal"
    • Audivisual speech recognition (visual feature extraction, feature/decision fusion).
    • Audivisual speech synthesis (image-based system, model-based system, talking head)
    • Multimodal mapping (goals, practical applications, neural-network based system, GMM-based system).

  • Lecture 4 : Voice transformation and conversion (an introduction)
    • Goals and practical applications
    • Speech transformation (pitch shifting & time stretching, TD-PSOLA, Harmonic+noise model)
    • Voice conversion (GMM-based mapping).

Grenoble Images Parole Signal Automatique laboratoire

UMR 5216 CNRS - Grenoble INP - Université Joseph Fourier - Université Stendhal