Sankar. S., Lenglet M., Bailly G., Beautemps D., Hueber T. (2025) "Cued Speech Generation Leveraging a Pre-trained Audiovisual Text-to-Speech Model",
Proc. of ICASSP, to appear. (preprint)
Georges M-A., Lavechin M., Schwartz J-L, Hueber T., (2024) "Decode, Move and Speak! Self-supervised Learning of Speech Units, Gestures, and Sound Relationships Using Vocal Imitation",
Computational Linguistics, vol. 50, num 4, pp. 1345–1373. (preprint)
Ortiz, A., Schatz, T., Hueber, T., Dupoux, E., "Simulating articulatory trajectories with phonological feature interpolation",
Proc. of Interspeech, 2024, pp. 3595-3599 (preprint)
Sankar, S., Beautemps, D., Elisei, F., Perrotin, O., Hueber, T., "Investigating the dynamics of hand and lips in French Cued Speech using attention mechanisms and CTC-based decoding",
Proc. of Interspeech, 2023, pp.4978-4982
(preprint)
Ouakrim, Y., Beautemps, D., Gouiffes, M., Hueber, T., Berthommier, F., Braffort, A. "A Multistream Model for Continuous Recognition of Lexical Unit in French Sign Language",
Proc. of GRETSI, 2023, to appear
(preprint)
Georges
M-A, Schwartz J-L, Hueber, T.,"Self-supervised speech unit
discovery from articulatory and acoustic features using VQ-VAE",
Proc. of Interspeech, 2022, accepted for publication, to appear (preprint)
Stephenson
B., Besacier L., Girin L., Hueber T. "BERT, can HE predict
contrastive focus? Predicting and controlling prominence in neural
TTS using a language model", Proc. of Interspeech, 2022, accepted
for publication, to appear
Georges
M-A, Diard, J., Girin, L., Schwartz J-L, Hueber, T.,"Repeat
after me: self-supervised learning of acoustic-to-articulatory
mapping by vocal imitation", Proc. of ICASSP, pp. 8252-8256,
2022 (preprint)
Sankar,
S., Beautemps, D., Hueber, T., "Multistream
neural architectures for cued-speech recognition using a
pre-trained visual feature extractor and constrained CTC
decoding", Proc. of ICASSP, pp. 8477-8481 (preprint,
dataset)
Girin
L., Leglaive S., Bie X, Diard J., Hueber T., Alameda-Pineda X.
(2021), Dynamical
Variational Autoencoders: A Comprehensive Review,
Foundations and Trends in Machine Learning, Vol. 15, No. 1-2, pp
1-175 (preprint).
Georges
M-A, Girin L., Schwartz J-L, Hueber, T., "Learning
robust speech representation with an articulatory-regularized
variational autoencoder", Proc. of Interspeech, pp.
3345-3349, 2021 (preprint)
Stephenson
B., Hueber T., Girin. L., Besacier., "Alternate
Endings: Improving Prosody for Incremental Neural TTS with
Predicted Future Text Input", Proc. of Interspeech, pp.
3865-3869, 2021 (preprint)
Perrotin
O., El Amouri H., Bailly G., Hueber, T., "Evaluating
the Extrapolation Capabilities of Neural Vocoders to Extreme
Pitch Values", Proc. of Interspeech, pp. 11-15, 2021 (preprint,
video)
Bie
X., Girin L., Leglaive S., Hueber T., Alameda-Pineda X, "A
Benchmark of Dynamical Variational Autoencoders applied to
Speech Spectrogram Modeling", Proc. of Interspeech, pp.
46-50, 2021 (preprint,
source
code)
Roche,
F., Hueber, T., Garnier, M., Limier, S., & Girin, L. (2021). Make
That Sound More Metallic: Towards a Perceptually Relevant
Control of the Timbre of Synthesizer Sounds Using a Variational
Autoencoder. Transactions of the International Society for
Music Information Retrieval, 4(1), pp. 52-66.
Stephenson
B., Besacier L., Girin L., Hueber T., "What
the Future Brings: Investigating the Impact of Lookahead for
Incremental Neural TTS", in Proc. of Interspeech, Shanghai,
2020, pp. 215-219, (preprint)
Haldin
C., Loevenbruck H., Hueber T., Marcon V., Piscicelli C., Perrier
P., Chrispin A., Pérennou D., Baciu M., (2020) Speech
rehabilitation in post-stroke aphasia using visual illustration
of speech articulators: A case report study, Clinical
Linguistics & Phonetics, 32(7):595-621 (preprint)
Hueber,
T., Tatulli, E., Girin, L., Schwartz, J-L., "Evaluating
the potential gain of auditory and audiovisual speech predictive
coding using deep learning", Neural Computation, vol. 32
(3), pp. 596-625. (preprint,
source code,
dataset/pretrained
models)
Girod-Roux,
M., Hueber, T., Fabre, D., Gerber, S., Canault, M., Bedoin, N.,
Acher, A., Beziaud, N., Truy, E., Badin, P.,Rehabilitation
of speech disorders following glossectomy, based on, ultrasound
visual illustration and feedback", Clinical Linguistics
& Phonetics, 34(9), 826-843 (preprint).
Hueber,
T. (2019), "Traitement
automatique de la parole multimodale : application à la
suppléance vocale et à la rééducation orthophonique", Thèse
d'Habilitation à diriger des recherches, Université Grenoble
Alpes.
Girin,
L., Roche, F., Leglaive, S., Hueber, T.Notes
on the use of variational autoencoders for speech and audio
spectrogram modeling, in Proc. of International Conference
on Digital Audio Effects (DAFx), Birmingham, UK, 2019.
Roche,
F.,. Hueber, T., Limier, S, Girin. L., Autoencoders
for music sound modeling : a comparison of linear, shallow,
deep, recurrent and variational models. In Proc. of SMC.
Malaga, Spain, 2019.
Liu,
L., Hueber, T., Feng, G.., Beautemps, D. "Visual
recognition of continuous Cued Speech using a tandem CNN-HMM
approach", Proceedings of Interspeech, pp. 2643-2647 (dataset).
Treille,
A., Vilain, C., Schwartz, J-L., Hueber, T., Sato M. (2017) "Electrophysiological
evidence for Audio-visuo-lingual speech integration",
Neuropsychologia, vol. 109, pp. 126-133.
Schultz,
T., Hueber, T., Krusienski, D. J., Brumberg, J. S, (2017) "Introduction
to the Special Issue on Biosignal-Based Spoken Communication",
IEEE/ACM Transactions on Audio, Speech, and Language Processing,
vol. 25, no. 12, pp. 2254-2256
(guest editors).
Haldin,
C., Acher, A., Kauffmann, L., Hueber, T., Cousin, E., Badin, P.,
Perrier P., Fabre, D., Perennou, D., Detante, O., Jaillard, A.,
Loevenbruck, H., Baciu, M. (2017) "Speech
recovery and language plasticity can be facilitated by
Sensori-Motor Fusion (SMF) training in chronic non- fluent
aphasia. A case report study.", Clinical Linguistics &
Phonetics, 32(7):595-621.
Schultz,
T., Wand, M., Hueber, T., Krusienski, D. J, Herff, C., &
Brumberg, J. S, (2017) "Biosignal-based
Spoken Communication: A Survey", IEEE/ACM Transactions on
Audio, Speech, and Language Processing, vol. 25, no. 12, pp. 2257-2271
(preprint).
Fabre,
D., Hueber, T., Girin, L., Alameda-Pineda, X., Badin, P., (2017) "Automatic
animation of an articulatory tongue model from ultrasound images
of the vocal tract", Speech Communication, vol. 93, pp.
63-75 (preprint
pdf, dataset).
Bocquelet,
F., Hueber, T., Girin, L. Chabadès, S., Yvert, B., (2017) "Key
considerations in designing a speech brain-computer interface",
Journal of Physiology-Paris, vol. 110, no. 4(A). pp. 392-401
Girin,
L, Hueber, T., Alameda-Pineda, X ,(2017) Extending
the Cascaded Gaussian Mixture Regression Framework for
Cross-Speaker Acoustic-Articulatory Mapping, in IEEE/ACM
Transactions on Audio, Speech, and Language Processing, vol. 25,
no. 3, pp. 662-673 (preprint,
source
code)
Tatulli,
E., Hueber, T.,, "Feature
extraction using multimodal convolutional neural networks for
visual speech recognition", Proceedings of IEEE ICASSP, New
Orleans, 2017, pp. 2971-2975.
Girin,
L, Hueber, T., Alameda-Pineda, X., "Adaptation of a Gaussian
Mixture Regressor to a New Input Distribution: Extending the C-GMR
Framework", Proceedings of Int. Conf. on Latent Variable Analysis
and Signal Separation (LVA-ICA), Grenoble, France, 2017, to appear
(preprint,
source code).
Treille,
A, Vilain, C., Hueber, T., Lamalle, L. Sato, M. (2017) "Inside
speech: multisensory and modality specific processing of tongue
and lip speech actions", Journal of Cognitive Neuroscience,
vol. 29, no. 3, pp. 448-466.
Baciu,
M., Acher, A., Kauffmann, L., Cousin, E., Boilley, C., Hueber, T.,
... & Detante, O. (2016). Effect of visual feedback on speech
recovery and language plasticity in patients with post-stroke
non-fluent aphasia. Functional MRI assessment.Annals of
physical and rehabilitation medicine,59, e75-e76
Fabre,
D., Hueber, T., Canault, M., Bedoin, N., Acher, A., Bach, C.,
Lambourion, L. & Badin, P. (2016).
Apport de l'échographie linguale à la rééducation orthophonique.
In XVIèmes Rencontres Internationales d'Orthophonie. Orthophonie
et technologies innovantes (UNADREO) (N. Joyeux & S.
Topouzkhanian, Eds.), pp. 199-225. Paris, France: Ortho Edition.
Acher,
A., Fabre, D., Hueber, T., Badin, P., Detante, O., Cousin, E.,
Pichat, C., Loevenbruck, H., Haldin, C. & Baciu, M. (2016). Retour
visuel en aphasiologie : résultats comportementaux, acoustiques
et en neuroimagerie. In XVIèmes Rencontres Internationales
d'Orthophonie. Orthophonie et technologies innovantes (UNADREO)
(N. Joyeux & S. Topouzkhanian, Eds.), pp. 227-260. Paris,
France: Ortho Edition.
Bocquelet
F, Hueber T, Girin L, Savariaux C, Yvert B (2016)
Real-Time Control of an Articulatory-Based Speech Synthesizer
for Brain Computer Interfaces. PLOS Computational Biology
12(11): e1005119. doi: 10.1371/journal.pcbi.1005119
Pouget,
M., Nahorna, O., Hueber, T., Bailly, G., "Adaptive
Latency for Part-of-Speech Tagging in Incremental Text-to-Speech
Synthesis", Proceedings of Interspeech, San Francisco, USA,
2016, pp. 2846-2850.
Hueber,
T., Bailly, G. (2016), Statistical
Conversion of Silent Articulation into Audible Speech using
Full-Covariance HMM, Computer Speech and Language, vol. 36,
pp. 274-293 (preprint
pdf).
Hueber,
T., Girin, L., Alameda-Pineda, X., Bailly, G. (2015), "Speaker-Adaptive
Acoustic-Articulatory Inversion using Cascaded Gaussian Mixture
Regression", in IEEE/ACM Transactions on Audio, Speech, and
Language Processing, vol. 23, no. 12, pp. 2246-2259 (preprint
pdf - source
code)
Pouget,
M., Hueber, T. Bailly, G., Baumann, T., "HMM
Training Strategy for Incremental Speech Synthesis",
Proceedings of Interspeech, Dresden, 2015, pp. 1201-1205.
Bocquelet,
D., Hueber, T., Girin, L., Savariaux, C., Yvert, B. "Real-time
Control of a DNN-based Articulatory Synthesizer for Silent
Speech Conversion: a pilot study", Proceedings of
Interspeech, Dresden, 2015, pp. 2405-2409.
Fabre,
D., Hueber, T. Badin, P., "Tongue
Tracking in Ultrasound Images using EigenTongue Decomposition
and Artificial Neural Networks", Proceedings of Interspeech,
Dresden, 2015, pp. 2410-2414.
Bocquelet,
F., Hueber, T., Girin, L., Savariaux, C., Yvert, B., "Real-time
articulatory speech synthesis for brain-computer interfaces",
Society for Neuroscience Annual Meeting, Chicago, 2015 (abstract)
Bocquelet,
F., Hueber, T., Girin, L., Badin, P., Yvert, B., "Robust
articulatory speech synthesis using deep neural networks for BCI
applications", Proceedings of Interspeech, Singapour,
Malaysia, 2014, pp. 2288-2292.
Fabre,
D., Hueber, T. & Badin, P., "Automatic
animation of an articulatory tongue model from ultrasound images
using Gaussian mixture regression", Proceedings of
Interspeech Singapour, Malaysia, 2014, pp. 2293-2297.
Wang,
X., Hueber, T., Badin, P. "On
the use of an articulatory talking head for second language
pronunciation training: the case of Chinese learners of French",
Proceedings of the 10th International Seminar on Speech Production
(ISSP), K�ln, Germany, 2014, pp. 449-452.
Treille,
A., Vilain, C., Hueber, T., Schwartz, J.-L., Lamalle, L. &
Sato, M,. Hearing
tongue and seeing voices: neural correlates of
audio-visuo-lingual speech perception. Proceedings of the
10th International Seminar on Speech Production (ISSP), Köln,
Germany, 2014, pp. 429-432.
Treille,
A., Vilain, C., Hueber, T., Schwartz, J.-L., Lamalle, L. &
Sato, M. (2014). Inside speech: neural correlates of audio-lingual
speech perception. Neurobiology of Language Conference, August
27-29, Amsterdam, The Netherlands (abstract)
Baciu,
M., Cousin, E., Hueber, T., Pichat, C., Minotti, L., Krainik, A.,
Kahane, P., Perrone-Bertolotti, M. (2014) A combined
language-memory fMRI paradigm to assess cerebral networks. 20
Annual Meeting of the Organization of Human Brain Mapping, June
7-12, 2014, Hamburg City, Germany. (abstract)
Barbulescu
A., Hueber T., Bailly G., Ronfard R.
"Audio-Visual Speaker Conversion using Prosody Features",
Proceedings of Int. Conf of Audio-visual Speech Processing
(AVSP), Annecy, France, 2013.
Treille
A.,Vilain C., Hueber T., Schwartz J-L, Lamalle L., Sato M., "The
sight of your tongue: neural correlates of audio-lingual speech
perception", Proceedings of Int. Conf of Audio-visual Speech
Processing (AVSP), Annecy, France, 2013.
Hueber
T,. "Ultraspeech-player:
Intuitive visualization of ultrasound articulatory data for
speech therapy and pronunciation training",
Proceedings of Interspeech (show&tell), Lyon, France,
2013, pp.752-753.
Hueber
T., Bailly G., Badin P., Elisei F., "Speaker
Adaptation of an Acoustic-Articulatory Inversion Model
using Cascaded Gaussian Mixture Regressions", Proceedings
of Interspeech, Lyon, France, 2013, pp. 2753-2757.
Hueber,
T., Ultraspeech-tools: acquisition, processing and visualization
of ultrasound speech data for phonetics and speech therapy, in
Ultrafest VI, (Edinburgh, Scotland), 6-8 november 2013. (abstract)
d'Alessandro,
N., Tilmanne, J., Astrinaki, M, Hueber, T., Dall, R. et al. 2013,
"Reactive
Statistical Mapping: Towards the Sketching of Performative
Control with Data", Proceedings of the 9th International
Summer Workshop on Multimodal Interfaces - eNTERFACE'13, in
Innovative and Creative Developments in Multimodal Interaction
Systems - IFIP Advances in Information and Communication
Technology (IFIP AICT), Volume 425, pp. 20-49.
Hueber,
T., Bailly, G., Denby, B., "Continuous
Articulatory-to-Acoustic Mapping using Phone-based Trajectory
HMM for a Silent Speech Interface", Proceedings of Interspeech,
Portland, USA, 2012.
Hueber
T., Ben Youssef A., Bailly G., Badin P., Eliséi, F., "Cross-speaker
Acoustic-to-Articulatory Inversion using Phone-based Trajectory
HMM for Pronunciation Training", Proceedings of Interspeech,
Portland, USA, 2012.
Hueber,
T., Ben Youssef, A., Badin, P., Bailly, G. & Elisei, F.
(2012). Vizart3D : retour articulatoire visuel pour l'aide à la
prononciation. In 29èmes Journées d'Etude de la Parole (L.
Besacier, B. Lecouteux & G. Sérasset, Eds.), vol. 5, pp.
17-18. Grenoble, France, juin 2012. (abstract & show&tell)
Hueber,
T., Benaroya, E.L, Denby, B., Chollet, G., "Statistical
Mapping between Articulatory and Acoustic Data for an
Ultrasound-based Silent Speech Interface", Proceedings of
Interspeech, pp. 593-596, Firenze, Italia, 2011.
Ben
Youssef A., Hueber T., Badin P., Bailly G., "Toward
a multi-speaker visual articulatory feedback system",
Proceedings of Interspeech, Firenze, Italia, pp. 489-492, 2011.
Cai,
J., Hueber, T. Denby, B., Benaroya. E.L., Chollet, G., Roussel,
P., Dreyfus G., Crevier-Buchman, L., "A
Visual Speech Recognition System for an Ultrasound-based Silent
Speech Interface", Proceedings of ICPhS, pp. 384-387, Honk
Kong, 2011.
Hueber,
T., Badin, P., Savariaux, C., Vilain, C., Bailly, G., "Differences
in articulatory strategies between silent, whispered and normal
speech? A pilot study using electromagnetic articulography",
Proceedings of International Seminar on Speech Production,
Montreal, 2011.
Ben
Youssef, A., Hueber, T., Badin, P., Bailly, G., Elisei, F., "Toward
a speaker-independent visual articulatory feedback system",
Proceedings of International Seminar on Speech Production,
Montreal, 2011.
Denby,
B., Cai, J., Hueber, T., Roussel, P., Dreyfus, G.,
Crevier-Buchman, L., Pillot-Loiseau, C., Chollet, G., Stone, M., "Towards
a practical silent interface based on vocal tract imaging",
Proceedings of International Seminar on Speech Production, pp.
89-94, Montreal, 2011.
Ben
Youssef, A., Hueber, T., Badin, P., Bailly, G. & Elisei, F.
(2011). Toward a speaker-independent visual articulatory feedback
system. In 9th International Seminar on Speech Production, ISSP9.
Montreal, Canada, 2011. (abstract)
Hueber,
T., Badin, P., Ben Youssef, A., Bailly, G. & Elisei, F.
(2011). Toward a real-time and speaker-independent system of
visual articulatory feedback. In SLaTE 2011, ISCA Special Interest
Group on Speech and Language Technology in Education Workshop.
Venice, Italy, 24-26 august 2011. (abstract)
Hueber,
T., Ben Youssef, A., Badin, P., Bailly, G. & Elisei, F.
(2011). Articulatory-to-acoustic mapping: application to silent
speech interface and visual articulatory feedback. In 9th Pan
European Voice Conference (PEVOC 2011), pp. 74-75. Marseille,
France, Aug 31-Sept 3 2011. (abstract)
Hueber,
T., Badin, P., Bailly, G., Ben Youssef, A., Elisei, F., Denby, B.
& Chollet, G. (2011). Statistical mapping between articulatory
and acoustic data. Application to Silent Speech Interface and
Visual Articulatory Feedback. In 1st International Workshop on
Performative Speech and Singing Synthesis [P3S]. Vancouver, BC,
Canada, March 11-13.
Hueber,
T., Dubois, R., Roussel, P., Denby, B., and Dreyfus, G., "Device
for reconstructing speech by ultrasonically probing the vocal
apparatus", Patent No. WO/2011/032688, published on
24/03/2011.
Hueber,
T., Benaroya, E.L., Chollet, G., Denby, B., Dreyfus, G., Stone,
M., (2010) "Development
of a Silent Speech Interface Driven by Ultrasound and Optical
Images of the Tongue and Lips", Speech Communication, 52(4),
pp. 288-300.
Denby,
B., Schultz, T., Honda, K., Hueber, T., Gilbert, J.M., Brumberg,
J.S. (2010) "Silent
speech interfaces", Speech Communication, 52(4), pp.
270-287.
Badin.,
P, Ben Youssef, A., Bailly, G., Elisei, F., Hueber, T. (2010), "Visual
articulatory feedback for phonetic correction in second language
learning", Proceedings of L2SW (Tokyo, Japan).
Florescu,
V-M., Crevier-Buchman, L., Denby, B., Hueber, T., Colazo-Simon,
A., Pillot-Loiseau, C., Roussel, P. Gendrot, C., Quattrochi, S.
(2010), "Silent
vs Vocalized Articulation for a Portable Ultrasound-Based Silent
Speech Interface", Proceedings of Interspeech (Makuari,
Japan), pp. 450-453.
Hueber,
T., Chollet G. Denby. B. (2010), Ultraspeech, a portable system
for acquisition of high-speed ultrasound, video and acoustic
speech data, in Ultrafest V, (New Haven, Connecticut, U.S.A),
March 19-21. (abstract)
Hueber,
T., Chollet, G., Denby, B., Dreyfus, G., and Stone, M. (2009). "Visuo-Phonetic
Decoding using Multi-Stream and Context-Dependent Models for an
Ultrasound-based Silent Speech Interface," Proceedings of
Interspeech (Brighton, UK),
pp. 640-643.
Hueber,
T. (2009), "Reconstitution
de la parole par imagerie ultrasonore et vidéo de l'�'appareil
vocal : vers une communication parlée silencieuse", Thèse de
doctorat, Université Pierre et Marie Curie.
Hueber,
T., Denby, B. (2009). "Analyse du conduit vocal par imagerie
ultrasonore", L'imagerie
médicale pour l'étude de la parole, Alain Marchal, Christian
Cavé, Traité Cognition et Traitement de l'Information, IC2, Hermes
Science, pp. 147-174.
Hueber,
T., Chollet, G., Denby, B., and Stone, M. (2008). "Acquisition
of ultrasound, video and acoustic speech data for a
silent-speech interface application," Proceedings of
International Seminar on Speech Production (Strasbourg, France),
pp. 365-369.
Hueber,
T., Chollet, G., Denby, B., Dreyfus, G., and Stone, M. (2008). "Towards
a Segmental Vocoder Driven by Ultrasound and Optical Images of
the Tongue and Lips," Proceedings of Interspeech (Brisbane,
Australie), pp. 2028-2031.
Hueber,
T., Chollet, G., Denby, B., Dreyfus, G., and Stone, M. (2008). "Phone
Recognition from Ultrasound and Optical Video Sequences for a
Silent Speech Interface," Proceedings of Interspeech
(Brisbane, Australia), pp. 2032-2035.
Hueber,
T., Chollet G. Denby. B. (2008), An Ultrasound-based silent speech
interface, in Acoustic'08, (Paris France), June 29- July 4.
(abstract)
Hueber,
T., Chollet, G., Denby, B., Stone, A., and Zouari, L. (2007). "Ouisper:
Corpus Based Synthesis Driven by Articulatory Data,"
Proceedings of International Congress of Phonetic Sciences
(Saarbrücken, Germany), pp. 2193-2196.
Hueber,
T., Chollet, G., Denby, B., Dreyfus, G., and Stone, M. (2007). "Continuous-Speech
Phone Recognition from Ultrasound and Optical Images of the
Tongue and Lips," Proceedings of Interspeech (Antwerp,
Belgium), pp. 658-661.
Hueber,
T., Aversano, G., Chollet, G., Denby, B., Dreyfus, G., Oussar, Y.,
Roussel, P., and Stone, M. (2007). "Eigentongue
feature extraction for an ultrasound-based silent speech
interface," Proceedings of ICASSP (Honolulu, USA), pp.
1245-1248.
Chollet,
G., Landais, R., Hueber, T., Bredin, H., Mokbel, C., Perrot, P.,
Zouari, L. (2007). "Some
Experiments in Audio-Visual Speech Processing", Advances in
Nonlinear Speech Processing, vol 4885, Springer, pp. 28-56.
Hueber,
T., Chollet G. Denby. B, Stone M., (2007) Ouisper, toward a silent
speech interface, in Ultrafest IV, (New York, USA), September
28-29. (abstract)
Beller,
G., Hueber, T., Schwarz, D., Rodet, X. (2006). "Speech
Rates in French Expressive Speech", Proceedings of Speech
Prosody, (Dresden, Allemagne), pp. 672-675.
Grenoble Images Parole Signal Automatique laboratoire
UMR 5216 CNRS - Grenoble Alpes University