Thomas HUEBER home page

HUEBER

Thomas

CNRS researcher

Publications

By year / By Type

2026

Asaad I., Jacquelin M., Perrotin O., Girin L., Hueber T., (2026) "Is Self-Supervised Learning Enough to Fill in the Gap? A Study on Speech Inpainting" , Computer Speech & Language, vol. 99, pp. 101922. (preprint, code, demo)

2025

Lavechin, M., Hueber T. (2025) "From perception to production: how acoustic invariance facilitates articulatory learning in a self-supervised vocal imitation model", Proc. of EMNLP 2025, to appear. (preprint)

Sankar. S., Lenglet M., Bailly G., Beautemps D., Hueber T. (2025) "Cued Speech Generation Leveraging a Pre-trained Audiovisual Text-to-Speech Model", Proc. of ICASSP, to appear. (preprint)

2024

Georges M-A., Lavechin M., Schwartz J-L, Hueber T., (2024) "Decode, Move and Speak! Self-supervised Learning of Speech Units, Gestures, and Sound Relationships Using Vocal Imitation", Computational Linguistics, vol. 50, num 4, pp. 1345–1373. (preprint)

Ortiz, A., Schatz, T., Hueber, T., Dupoux, E., "Simulating articulatory trajectories with phonological feature interpolation", Proc. of Interspeech, 2024, pp. 3595-3599 (preprint)

2023

Sankar, S., Beautemps, D., Elisei, F., Perrotin, O., Hueber, T., "Investigating the dynamics of hand and lips in French Cued Speech using attention mechanisms and CTC-based decoding", Proc. of Interspeech, 2023, pp.4978-4982 (preprint)

Ouakrim, Y., Beautemps, D., Gouiffes, M., Hueber, T., Berthommier, F., Braffort, A. "A Multistream Model for Continuous Recognition of Lexical Unit in French Sign Language", Proc. of GRETSI, 2023, to appear (preprint)

2022

Georges M-A, Schwartz J-L, Hueber, T.,"Self-supervised speech unit discovery from articulatory and acoustic features using VQ-VAE", Proc. of Interspeech, 2022, accepted for publication, to appear (preprint)

Stephenson B., Besacier L., Girin L., Hueber T. "BERT, can HE predict contrastive focus? Predicting and controlling prominence in neural TTS using a language model", Proc. of Interspeech, 2022, accepted for publication, to appear

Georges M-A, Diard, J., Girin, L., Schwartz J-L, Hueber, T.,"Repeat after me: self-supervised learning of acoustic-to-articulatory mapping by vocal imitation", Proc. of ICASSP, pp. 8252-8256, 2022 (preprint)

Sankar, S., Beautemps, D., Hueber, T., "Multistream neural architectures for cued-speech recognition using a pre-trained visual feature extractor and constrained CTC decoding", Proc. of ICASSP, pp. 8477-8481 (preprint, dataset)

2021

Girin L., Leglaive S., Bie X, Diard J., Hueber T., Alameda-Pineda X. (2021), Dynamical Variational Autoencoders: A Comprehensive Review, Foundations and Trends in Machine Learning, Vol. 15, No. 1-2, pp 1-175 (preprint).

Georges M-A, Girin L., Schwartz J-L, Hueber, T., "Learning robust speech representation with an articulatory-regularized variational autoencoder", Proc. of Interspeech, pp. 3345-3349, 2021 (preprint)

Stephenson B., Hueber T., Girin. L., Besacier., "Alternate Endings: Improving Prosody for Incremental Neural TTS with Predicted Future Text Input", Proc. of Interspeech, pp. 3865-3869, 2021 (preprint)

Perrotin O., El Amouri H., Bailly G., Hueber, T., "Evaluating the Extrapolation Capabilities of Neural Vocoders to Extreme Pitch Values", Proc. of Interspeech, pp. 11-15, 2021 (preprint, video)

Bie X., Girin L., Leglaive S., Hueber T., Alameda-Pineda X, "A Benchmark of Dynamical Variational Autoencoders applied to Speech Spectrogram Modeling", Proc. of Interspeech, pp. 46-50, 2021 (preprint, source code)

Roche, F., Hueber, T., Garnier, M., Limier, S., & Girin, L. (2021). Make That Sound More Metallic: Towards a Perceptually Relevant Control of the Timbre of Synthesizer Sounds Using a Variational Autoencoder. Transactions of the International Society for Music Information Retrieval, 4(1), pp. 52-66.

2020

Stephenson B., Besacier L., Girin L., Hueber T., "What the Future Brings: Investigating the Impact of Lookahead for Incremental Neural TTS", in Proc. of Interspeech, Shanghai, 2020, pp. 215-219, (preprint)

Haldin C., Loevenbruck H., Hueber T., Marcon V., Piscicelli C., Perrier P., Chrispin A., Pérennou D., Baciu M., (2020) Speech rehabilitation in post-stroke aphasia using visual illustration of speech articulators: A case report study, Clinical Linguistics & Phonetics, 32(7):595-621 (preprint)

Hueber, T., Tatulli, E., Girin, L., Schwartz, J-L., "Evaluating the potential gain of auditory and audiovisual speech predictive coding using deep learning", Neural Computation, vol. 32 (3), pp. 596-625. (preprint, source code, dataset/pretrained models)

2019

Girod-Roux, M., Hueber, T., Fabre, D., Gerber, S., Canault, M., Bedoin, N., Acher, A., Beziaud, N., Truy, E., Badin, P.,Rehabilitation of speech disorders following glossectomy, based on, ultrasound visual illustration and feedback", Clinical Linguistics & Phonetics, 34(9), 826-843 (preprint).

Hueber, T. (2019), "Traitement automatique de la parole multimodale : application à la suppléance vocale et à la rééducation orthophonique", Thèse d'Habilitation à diriger des recherches, Université Grenoble Alpes.

Girin, L., Roche, F., Leglaive, S., Hueber, T.Notes on the use of variational autoencoders for speech and audio spectrogram modeling, in Proc. of International Conference on Digital Audio Effects (DAFx), Birmingham, UK, 2019.

Roche, F.,. Hueber, T., Limier, S, Girin. L., Autoencoders for music sound modeling : a comparison of linear, shallow, deep, recurrent and variational models. In Proc. of SMC. Malaga, Spain, 2019.

2018

Liu, L., Hueber, T., Feng, G.., Beautemps, D. "Visual recognition of continuous Cued Speech using a tandem CNN-HMM approach", Proceedings of Interspeech, pp. 2643-2647 (dataset).

2017

Treille, A., Vilain, C., Schwartz, J-L., Hueber, T., Sato M. (2017) "Electrophysiological evidence for Audio-visuo-lingual speech integration", Neuropsychologia, vol. 109, pp. 126-133.

Schultz, T., Hueber, T., Krusienski, D. J., Brumberg, J. S, (2017) "Introduction to the Special Issue on Biosignal-Based Spoken Communication", IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 25, no. 12, pp. 2254-2256 (guest editors).

Haldin, C., Acher, A., Kauffmann, L., Hueber, T., Cousin, E., Badin, P., Perrier P., Fabre, D., Perennou, D., Detante, O., Jaillard, A., Loevenbruck, H., Baciu, M. (2017) "Speech recovery and language plasticity can be facilitated by Sensori-Motor Fusion (SMF) training in chronic non- fluent aphasia. A case report study.", Clinical Linguistics & Phonetics, 32(7):595-621.

Schultz, T., Wand, M., Hueber, T., Krusienski, D. J, Herff, C., & Brumberg, J. S, (2017) "Biosignal-based Spoken Communication: A Survey", IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 25, no. 12, pp. 2257-2271 (preprint).

Fabre, D., Hueber, T., Girin, L., Alameda-Pineda, X., Badin, P., (2017) "Automatic animation of an articulatory tongue model from ultrasound images of the vocal tract", Speech Communication, vol. 93, pp. 63-75 (preprint pdf, dataset).

Bocquelet, F., Hueber, T., Girin, L. Chabadès, S., Yvert, B., (2017) "Key considerations in designing a speech brain-computer interface", Journal of Physiology-Paris, vol. 110, no. 4(A). pp. 392-401

Girin, L, Hueber, T., Alameda-Pineda, X ,(2017) Extending the Cascaded Gaussian Mixture Regression Framework for Cross-Speaker Acoustic-Articulatory Mapping, in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 25, no. 3, pp. 662-673 (preprint, source code)

Tatulli, E., Hueber, T.,, "Feature extraction using multimodal convolutional neural networks for visual speech recognition", Proceedings of IEEE ICASSP, New Orleans, 2017, pp. 2971-2975.

Girin, L, Hueber, T., Alameda-Pineda, X., "Adaptation of a Gaussian Mixture Regressor to a New Input Distribution: Extending the C-GMR Framework", Proceedings of Int. Conf. on Latent Variable Analysis and Signal Separation (LVA-ICA), Grenoble, France, 2017, to appear (preprint, source code).

Treille, A, Vilain, C., Hueber, T., Lamalle, L. Sato, M. (2017) "Inside speech: multisensory and modality specific processing of tongue and lip speech actions", Journal of Cognitive Neuroscience, vol. 29, no. 3, pp. 448-466.

2016

Baciu, M., Acher, A., Kauffmann, L., Cousin, E., Boilley, C., Hueber, T., ... & Detante, O. (2016). Effect of visual feedback on speech recovery and language plasticity in patients with post-stroke non-fluent aphasia. Functional MRI assessment.Annals of physical and rehabilitation medicine,59, e75-e76

Fabre, D., Hueber, T., Canault, M., Bedoin, N., Acher, A., Bach, C., Lambourion, L. & Badin, P. (2016). Apport de l'échographie linguale à la rééducation orthophonique. In XVIèmes Rencontres Internationales d'Orthophonie. Orthophonie et technologies innovantes (UNADREO) (N. Joyeux & S. Topouzkhanian, Eds.), pp. 199-225. Paris, France: Ortho Edition.

Acher, A., Fabre, D., Hueber, T., Badin, P., Detante, O., Cousin, E., Pichat, C., Loevenbruck, H., Haldin, C. & Baciu, M. (2016). Retour visuel en aphasiologie : résultats comportementaux, acoustiques et en neuroimagerie. In XVIèmes Rencontres Internationales d'Orthophonie. Orthophonie et technologies innovantes (UNADREO) (N. Joyeux & S. Topouzkhanian, Eds.), pp. 227-260. Paris, France: Ortho Edition.

Bocquelet F, Hueber T, Girin L, Savariaux C, Yvert B (2016) Real-Time Control of an Articulatory-Based Speech Synthesizer for Brain Computer Interfaces. PLOS Computational Biology 12(11): e1005119. doi: 10.1371/journal.pcbi.1005119

Pouget, M., Nahorna, O., Hueber, T., Bailly, G., "Adaptive Latency for Part-of-Speech Tagging in Incremental Text-to-Speech Synthesis", Proceedings of Interspeech, San Francisco, USA, 2016, pp. 2846-2850.

Hueber, T., Bailly, G. (2016), Statistical Conversion of Silent Articulation into Audible Speech using Full-Covariance HMM, Computer Speech and Language, vol. 36, pp. 274-293 (preprint pdf).

2015

Hueber, T., Girin, L., Alameda-Pineda, X., Bailly, G. (2015), "Speaker-Adaptive Acoustic-Articulatory Inversion using Cascaded Gaussian Mixture Regression", in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 23, no. 12, pp. 2246-2259 (preprint pdf - source code)

Pouget, M., Hueber, T. Bailly, G., Baumann, T., "HMM Training Strategy for Incremental Speech Synthesis", Proceedings of Interspeech, Dresden, 2015, pp. 1201-1205.

Bocquelet, D., Hueber, T., Girin, L., Savariaux, C., Yvert, B. "Real-time Control of a DNN-based Articulatory Synthesizer for Silent Speech Conversion: a pilot study", Proceedings of Interspeech, Dresden, 2015, pp. 2405-2409.

Fabre, D., Hueber, T. Badin, P., "Tongue Tracking in Ultrasound Images using EigenTongue Decomposition and Artificial Neural Networks", Proceedings of Interspeech, Dresden, 2015, pp. 2410-2414.

Bocquelet, F., Hueber, T., Girin, L., Savariaux, C., Yvert, B., "Real-time articulatory speech synthesis for brain-computer interfaces", Society for Neuroscience Annual Meeting, Chicago, 2015 (abstract)

2014

Bocquelet, F., Hueber, T., Girin, L., Badin, P., Yvert, B., "Robust articulatory speech synthesis using deep neural networks for BCI applications", Proceedings of Interspeech, Singapour, Malaysia, 2014, pp. 2288-2292.

Fabre, D., Hueber, T. & Badin, P., "Automatic animation of an articulatory tongue model from ultrasound images using Gaussian mixture regression", Proceedings of Interspeech Singapour, Malaysia, 2014, pp. 2293-2297.

Wang, X., Hueber, T., Badin, P. "On the use of an articulatory talking head for second language pronunciation training: the case of Chinese learners of French", Proceedings of the 10th International Seminar on Speech Production (ISSP), K�ln, Germany, 2014, pp. 449-452.

Treille, A., Vilain, C., Hueber, T., Schwartz, J.-L., Lamalle, L. & Sato, M,. Hearing tongue and seeing voices: neural correlates of audio-visuo-lingual speech perception. Proceedings of the 10th International Seminar on Speech Production (ISSP), Köln, Germany, 2014, pp. 429-432.

Treille, A., Vilain, C., Hueber, T., Schwartz, J.-L., Lamalle, L. & Sato, M. (2014). Inside speech: neural correlates of audio-lingual speech perception. Neurobiology of Language Conference, August 27-29, Amsterdam, The Netherlands (abstract)

Baciu, M., Cousin, E., Hueber, T., Pichat, C., Minotti, L., Krainik, A., Kahane, P., Perrone-Bertolotti, M. (2014) A combined language-memory fMRI paradigm to assess cerebral networks. 20 Annual Meeting of the Organization of Human Brain Mapping, June 7-12, 2014, Hamburg City, Germany. (abstract)

2013

Barbulescu A., Hueber T., Bailly G., Ronfard R. "Audio-Visual Speaker Conversion using Prosody Features", Proceedings of Int. Conf of Audio-visual Speech Processing (AVSP), Annecy, France, 2013.

Treille A.,Vilain C., Hueber T., Schwartz J-L, Lamalle L., Sato M., "The sight of your tongue: neural correlates of audio-lingual speech perception", Proceedings of Int. Conf of Audio-visual Speech Processing (AVSP), Annecy, France, 2013.

Hueber T,. "Ultraspeech-player: Intuitive visualization of ultrasound articulatory data for speech therapy and pronunciation training ", Proceedings of Interspeech (show&tell), Lyon, France, 2013, pp.752-753.

Hueber T., Bailly G., Badin P., Elisei F., "Speaker Adaptation of an Acoustic-Articulatory Inversion Model
using Cascaded Gaussian Mixture Regressions", Proceedings of Interspeech, Lyon, France, 2013, pp. 2753-2757.
Hueber, T., Ultraspeech-tools: acquisition, processing and visualization of ultrasound speech data for phonetics and speech therapy, in Ultrafest VI, (Edinburgh, Scotland), 6-8 november 2013. (abstract)

d'Alessandro, N., Tilmanne, J., Astrinaki, M, Hueber, T., Dall, R. et al. 2013, "Reactive Statistical Mapping: Towards the Sketching of Performative Control with Data", Proceedings of the 9th International Summer Workshop on Multimodal Interfaces - eNTERFACE'13, in Innovative and Creative Developments in Multimodal Interaction Systems - IFIP Advances in Information and Communication Technology (IFIP AICT), Volume 425, pp. 20-49.

2012

Hueber, T., Bailly, G., Denby, B., "Continuous Articulatory-to-Acoustic Mapping using Phone-based Trajectory HMM for a Silent Speech Interface", Proceedings of Interspeech, Portland, USA, 2012.

Hueber T., Ben Youssef A., Bailly G., Badin P., Eliséi, F., "Cross-speaker Acoustic-to-Articulatory Inversion using Phone-based Trajectory HMM for Pronunciation Training", Proceedings of Interspeech, Portland, USA, 2012.

Hueber, T., Ben Youssef, A., Badin, P., Bailly, G. & Elisei, F. (2012). Vizart3D : retour articulatoire visuel pour l'aide à la prononciation. In 29èmes Journées d'Etude de la Parole (L. Besacier, B. Lecouteux & G. Sérasset, Eds.), vol. 5, pp. 17-18. Grenoble, France, juin 2012. (abstract & show&tell)

2011

Hueber, T., Benaroya, E.L, Denby, B., Chollet, G., "Statistical Mapping between Articulatory and Acoustic Data for an Ultrasound-based Silent Speech Interface", Proceedings of Interspeech, pp. 593-596, Firenze, Italia, 2011.

Ben Youssef A., Hueber T., Badin P., Bailly G., "Toward a multi-speaker visual articulatory feedback system", Proceedings of Interspeech, Firenze, Italia, pp. 489-492, 2011.

Cai, J., Hueber, T. Denby, B., Benaroya. E.L., Chollet, G., Roussel, P., Dreyfus G., Crevier-Buchman, L., "A Visual Speech Recognition System for an Ultrasound-based Silent Speech Interface", Proceedings of ICPhS, pp. 384-387, Honk Kong, 2011.

Hueber, T., Badin, P., Savariaux, C., Vilain, C., Bailly, G., "Differences in articulatory strategies between silent, whispered and normal speech? A pilot study using electromagnetic articulography", Proceedings of International Seminar on Speech Production, Montreal, 2011.

Ben Youssef, A., Hueber, T., Badin, P., Bailly, G., Elisei, F., "Toward a speaker-independent visual articulatory feedback system", Proceedings of International Seminar on Speech Production, Montreal, 2011.

Denby, B., Cai, J., Hueber, T., Roussel, P., Dreyfus, G., Crevier-Buchman, L., Pillot-Loiseau, C., Chollet, G., Stone, M., "Towards a practical silent interface based on vocal tract imaging", Proceedings of International Seminar on Speech Production, pp. 89-94, Montreal, 2011.
Ben Youssef, A., Hueber, T., Badin, P., Bailly, G. & Elisei, F. (2011). Toward a speaker-independent visual articulatory feedback system. In 9th International Seminar on Speech Production, ISSP9. Montreal, Canada, 2011. (abstract)
Hueber, T., Badin, P., Ben Youssef, A., Bailly, G. & Elisei, F. (2011). Toward a real-time and speaker-independent system of visual articulatory feedback. In SLaTE 2011, ISCA Special Interest Group on Speech and Language Technology in Education Workshop. Venice, Italy, 24-26 august 2011. (abstract)
Hueber, T., Ben Youssef, A., Badin, P., Bailly, G. & Elisei, F. (2011). Articulatory-to-acoustic mapping: application to silent speech interface and visual articulatory feedback. In 9th Pan European Voice Conference (PEVOC 2011), pp. 74-75. Marseille, France, Aug 31-Sept 3 2011. (abstract)

Hueber, T., Badin, P., Bailly, G., Ben Youssef, A., Elisei, F., Denby, B. & Chollet, G. (2011). Statistical mapping between articulatory and acoustic data. Application to Silent Speech Interface and Visual Articulatory Feedback. In 1st International Workshop on Performative Speech and Singing Synthesis [P3S]. Vancouver, BC, Canada, March 11-13.

Hueber, T., Dubois, R., Roussel, P., Denby, B., and Dreyfus, G., "Device for reconstructing speech by ultrasonically probing the vocal apparatus", Patent No. WO/2011/032688, published on 24/03/2011.

2010

Hueber, T., Benaroya, E.L., Chollet, G., Denby, B., Dreyfus, G., Stone, M., (2010) "Development of a Silent Speech Interface Driven by Ultrasound and Optical Images of the Tongue and Lips", Speech Communication, 52(4), pp. 288-300.

Denby, B., Schultz, T., Honda, K., Hueber, T., Gilbert, J.M., Brumberg, J.S. (2010) "Silent speech interfaces", Speech Communication, 52(4), pp. 270-287.

Badin., P, Ben Youssef, A., Bailly, G., Elisei, F., Hueber, T. (2010), "Visual articulatory feedback for phonetic correction in second language learning", Proceedings of L2SW (Tokyo, Japan).

Florescu, V-M., Crevier-Buchman, L., Denby, B., Hueber, T., Colazo-Simon, A., Pillot-Loiseau, C., Roussel, P. Gendrot, C., Quattrochi, S. (2010), "Silent vs Vocalized Articulation for a Portable Ultrasound-Based Silent Speech Interface", Proceedings of Interspeech (Makuari, Japan), pp. 450-453.

Hueber, T., Chollet G. Denby. B. (2010), Ultraspeech, a portable system for acquisition of high-speed ultrasound, video and acoustic speech data, in Ultrafest V, (New Haven, Connecticut, U.S.A), March 19-21. (abstract)

2009

Hueber, T., Chollet, G., Denby, B., Dreyfus, G., and Stone, M. (2009). "Visuo-Phonetic Decoding using Multi-Stream and Context-Dependent Models for an Ultrasound-based Silent Speech Interface," Proceedings of Interspeech (Brighton, UK),
pp. 640-643.

Hueber, T. (2009), "Reconstitution de la parole par imagerie ultrasonore et vidéo de l'�'appareil vocal : vers une communication parlée silencieuse", Thèse de doctorat, Université Pierre et Marie Curie.

Hueber, T., Denby, B. (2009). "Analyse du conduit vocal par imagerie ultrasonore", L'imagerie médicale pour l'étude de la parole, Alain Marchal, Christian Cavé, Traité Cognition et Traitement de l'Information, IC2, Hermes Science, pp. 147-174.

2008

Hueber, T., Chollet, G., Denby, B., and Stone, M. (2008). "Acquisition of ultrasound, video and acoustic speech data for a silent-speech interface application," Proceedings of International Seminar on Speech Production (Strasbourg, France),
pp. 365-369.

Hueber, T., Chollet, G., Denby, B., Dreyfus, G., and Stone, M. (2008). "Towards a Segmental Vocoder Driven by Ultrasound and Optical Images of the Tongue and Lips," Proceedings of Interspeech (Brisbane, Australie), pp. 2028-2031.

Hueber, T., Chollet, G., Denby, B., Dreyfus, G., and Stone, M. (2008). "Phone Recognition from Ultrasound and Optical Video Sequences for a Silent Speech Interface," Proceedings of Interspeech (Brisbane, Australia), pp. 2032-2035.

Hueber, T., Chollet G. Denby. B. (2008), An Ultrasound-based silent speech interface, in Acoustic'08, (Paris France), June 29- July 4. (abstract)

2007

Hueber, T., Chollet, G., Denby, B., Stone, A., and Zouari, L. (2007). "Ouisper: Corpus Based Synthesis Driven by Articulatory Data," Proceedings of International Congress of Phonetic Sciences (Saarbrücken, Germany), pp. 2193-2196.

Hueber, T., Chollet, G., Denby, B., Dreyfus, G., and Stone, M. (2007). "Continuous-Speech Phone Recognition from Ultrasound and Optical Images of the Tongue and Lips," Proceedings of Interspeech (Antwerp, Belgium), pp. 658-661.

Hueber, T., Aversano, G., Chollet, G., Denby, B., Dreyfus, G., Oussar, Y., Roussel, P., and Stone, M. (2007). "Eigentongue feature extraction for an ultrasound-based silent speech interface," Proceedings of ICASSP (Honolulu, USA), pp. 1245-1248.

Chollet, G., Landais, R., Hueber, T., Bredin, H., Mokbel, C., Perrot, P., Zouari, L. (2007). "Some Experiments in Audio-Visual Speech Processing", Advances in Nonlinear Speech Processing, vol 4885, Springer, pp. 28-56.

Hueber, T., Chollet G. Denby. B, Stone M., (2007) Ouisper, toward a silent speech interface, in Ultrafest IV, (New York, USA), September 28-29. (abstract)

2006

Beller, G., Hueber, T., Schwarz, D., Rodet, X. (2006). "Speech Rates in French Expressive Speech", Proceedings of Speech Prosody, (Dresden, Allemagne), pp. 672-675.

By year

By type

Grenoble Images Parole Signal Automatique laboratoire

UMR 5216 CNRS - Grenoble Alpes University