Studies in Speech Science
Endo N, Vilain C, Nakazawa K, Ito T (2025)
Somatosensory influence on auditory cortical response of self-generated sound,
Neuropsychologia, 211:109103, doi: 10.1016/j.neuropsychologia.2025.109103.
[Abstract]
Motor execution which results in the generation of sounds attenuates the cortical response to these self-generated sounds.
This attenuation has been explained as a result of motor relevant processing. The current study shows that corresponding
somatosensory inputs can also change the auditory processing of a self-generated sound. We recorded auditory event-related potentials (ERP)
in response to self- generated sounds and assessed how the amount of auditory attenuation changed according to the somatosensory inputs.
The sound stimuli were generated by a finger movement that pressed on a virtual object, which was produced by a haptic robotic device.
Somatosensory inputs were modulated by changing the stiffness of this virtual object (low and high) in an unpredictable manner.
For comparison purposes, we carried out the same test with a computer keyboard, which is conventionally used to induce the auditory
attenuation of self-generated sound. While N1 and P2 attenuations were clearly induced in the control condition with the keyboard
as has been observed in previous studies, when using the robotic device the amplitude of N1 was found to vary according to
the stiffness of the virtual object. The amplitude of N1 in the low stiffness condition was similar to that found using
the keyboard for the same condition but not in the high stiffness condition. In addition, P2 attenuation did not differ between
stiffness conditions. The waveforms of auditory ERP after 200 ms also differed according to the stiffness conditions.
The estimated source of N1 attenuation was located in the right parietal area. These results suggest that somatosensory
inputs during movement can modify the auditory processing of self-generated sound. The auditory processing of self-generated sound
may represent self-referenced processing like an embodied process or an action-perception mechanism.
Bourhis M, Perrier P, Savariaux C, Ito T (2024)
Quick speech motor correction in the absence of auditory feedback,
Frontiers in Hum Neurosci., 18:1399316 doi: 10.3389/fnhum.2024.1399316.
[Abstract]
A quick correction mechanism of the tongue has been formerly experimentally observed in speech posture stabilization in response
to a sudden tongue stretch perturbation. Given its relatively short latency (< 150 ms), the response could be driven by somatosensory
feedback alone. The current study assessed this hypothesis by examining whether this response is induced in the absence of auditory
feedback. We compared the response under two auditory conditions: with normal versus masked auditory feedback. Eleven participants
were tested. They were asked to whisper the vowel /e/ for a few seconds. The tongue was stretched horizontally with step patterns
of force (1N during 1 s) using a robotic device. The articulatory positions were recorded using electromagnetic articulography
simultaneously with the produced sound. The tongue perturbation was randomly and unpredictably applied in one-fifth of trials.
The two auditory conditions were tested in random order. A quick compensatory response was induced in a similar way to the previous
study. We found that the amplitudes of the compensatory responses were not significantly different between the two auditory conditions,
either for the tongue displacement or for the produced sounds. These results suggest that the observed quick correction mechanism is
primarily based on somatosensory feedback. This correction mechanism could be learned in such a way as to maintain the auditory goal
on the sole basis of somatosensory feedback.
Ashokumar M, Schwartz J-L, Ito T (2024)
Changes in speech production following perceptual training with orofacial somatosensory inputs,
J Speech Lang Hear Res., 67(10S):3962-3973, doi: 10.1044/2023_JSLHR-23-00249.
[Abstract]
[Kudos]
Purpose: Orofacial somatosensory inputs play an important role in speech motor control and speech learning.
Since receiving specific auditory-somatosensory inputs during speech perceptual training alters speech perception,
similar perceptual training could also alter speech production. We examined whether the production performance was
changed by perceptual training with orofacial somatosensory inputs.
Methods: We focused on the French vowels /e/ and /ø/, contrasted in their articulation by horizontal gestures.
Perceptual training consisted of a vowel identification task contrasting /e/ and /ø/. Along with training, for the
first group of participants, somatosensory stimulation was applied as facial skin stretch in backward direction. We
recorded the target vowels uttered by the participants before and after the perceptual training and compared their F1,
F2 and F3 formants. We also tested a control group with no somatosensory stimulation and another somatosensory group with
a different vowel continuum (/e/-/i/) for perceptual training.
Results: Perceptual training with somatosensory stimulation induced changes in F2 and F3 in the produced vowel sounds.
F2 decreased consistently in the two somatosensory groups. F3 increased following the /e/-/ø/ training and decreased following
the /e/-/i/ training. This change was significantly correlated with the perceptual shift between the first and second half of
the training phase in the somatosensory group with the /e/-/ø/ training, but not with the /e/-/i/ training. The control group
displayed no effect on F2 and F3, and just a tendency of F1 increase.
Conclusions: The results suggest that somatosensory inputs associated to speech sound inputs can play a role in speech
training and learning in both production and perception.
Ito T, Bouguerra M, Bourhis M, Perrier P (2024)
Tongue reflex for speech posture control,
Sci. Rep., 14, 6386, doi: 10.1038/s41598-024-56813-9.
[Abstract]
Although there is no doubt from an empirical viewpoint that reflex mechanisms can contribute to tongue motor control in humans,
there is limited neurophysiological evidence to support this idea. Previous results failing to observe any tonic stretch reflex
in the tongue had reduced the likelihood of a reflex contribution in tongue motor control. The current study presents
experimental evidence of a human tongue reflex in response to a sudden stretch while holding a posture for speech. The
latency was relatively long (50 ms), which is possibly mediated through cortical-arc. The activation peak in a speech task
was greater than in a non-speech task while background activation levels were similar in both tasks, and the peak amplitude
in a speech task was not modulated by the additional task to react voluntarily to the perturbation. Computer simulations with
a simplified linear mass-spring-damper model showed that the recorded muscle activation response is suited for the generation
of tongue movement responses that were observed in a previous study with the appropriate timing when taking into account a
possible physiological delay between reflex muscle activation and the corresponding force. Our results evidenced clearly that
reflex mechanisms contribute to tongue posture stabilization for speech production.
Ashokumar M, Guichet C, Schwartz J-L, Ito T (2023)
Correlation between the effect of orofacial somatosensory inputs in speech perception and speech production performance,
Audit Percept Cogn, 6(1-2):97-107, doi: 10.1080/25742442.2022.2134674.
[Abstract]
Orofacial somatosensory inputs modify the perception of speech sounds. Such auditory-somatosensory
integration likely develops alongside speech production acquisition. We examined whether the somatosensory effect in speech
perception varies depending on individual characteristics of speech production.
The somatosensory effect in speech perception was assessed by changes in category boundary between /e/ and /ø/
in a vowel identification test resulting from somatosensory stimulation providing facial skin deformation in the rearward
direction corresponding to articulatory movement for /e/ applied together with the auditory input. Speech production
performance was quantified by the acoustic distances between the average first, second and third formants of /e/ and /ø/
utterances recorded in a separate test.
The category boundary between /e/ and /ø/ was significantly shifted towards /ø/ due to the somatosensory
stimulation which is consistent with previous research. The amplitude of the category boundary shift was significantly correlated
with the acoustic distance between the mean second – and marginally third – formants of /e/ and /ø/ productions, with no
correlation with the first formant distance.
Greater acoustic distances can be related to larger contrasts between the articulatory targets of vowels
in speech production. These results suggest that the somatosensory effect in speech perception can be linked to speech
production performance.
Ito T, Ogane R (2022)
Repetitive exposure to orofacial somatosensory inputs in speech perceptual training modulates the vowel categorization in speech perception,
Front. Psychol., 13, doi: 10.3389/fpsyg.2022.839087.
[Abstract]
Orofacial somatosensory inputs may play a role in the link between speech perception and production.
Given the fact that speech motor learning, which involves paired auditory and somatosensory
inputs, results in changes to speech perceptual representations, somatosensory inputs may also
be involved in learning or adaptive processes of speech perception. Here we show that
repetitive pairing of somatosensory inputs and sounds, such as occurs during speech
production and motor learning, can also induce a change of speech perception. We examined
whether the category boundary between /ε/ and /a/ was changed as a result of perceptual
training with orofacial somatosensory inputs. The experiment consisted of three phases:
Baseline, Training, and Aftereffect. In all phases, a vowel identification test was used
to identify the perceptual boundary between /ε/ and /a/. In the Baseline and the Aftereffect
phase, an adaptive method based on the maximum-likelihood procedure was applied to detect
the category boundary using a small number of trials. In the Training phase, we used the
method of constant stimuli in order to expose participants to stimulus variants which
covered the range between /ε/ and /a/ evenly. In this phase, to mimic the sensory input
that accompanies speech production and learning in an experimental group, somatosensory
stimulation was applied in the upward direction when the stimulus sound was presented.
A control group followed the same training procedure in the absence of somatosensory
stimulation. When we compared category boundaries prior to and following paired
auditory-somatosensory training, the boundary for participants in the experimental
group reliably changed in the direction of /ε/, indicating that the participants
perceived /a/ more than /ε/ as a consequence of training. In contrast, the control
group did not show any change. Although a limited number of participants were tested,
the perceptual shift was reduced and almost eliminated one week later. Our data suggest
that repetitive exposure of somatosensory inputs in a task that simulates the sensory
pairing which occurs during speech production, changes perceptual system and supports
the idea that somatosensory inputs play a role in speech perceptual adaptation,
probably contributing to the formation of sound representations for speech perception.
Endo N, Ito T, Katsumi W and Nakazawa K (2021)
Enhancement of loudness discrimination acuity for self-generated sound is independent of musical experience,
PLOS ONE, 16(12):e0260859, doi: 10.1371/journal.pone.0260859.
[Abstract]
Musicians tend to have better auditory and motor performance than non-musicians because of their extensive
musical experience. In a previous study, we established that loudness discrimination acuity is enhanced when
sound is produced by a precise force generation task. In this study, we compared the enhancement effect between
experienced pianists and non-musicians. Without the force generation task, loudness discrimination acuity was
better in pianists than non-musicians in the condition. However, the force generation task enhanced loudness
discrimination acuity similarly in both pianists and non-musicians. The reaction time was also reduced with the
force control task, but only in the non-musician group. The results suggest that the enhancement of loudness
discrimination acuity with the precise force generation task is independent of musical experience and is,
therefore, a fundamental function in auditory-motor interaction.
Ito T, Ohashi H, Gracco VL (2021)
Somatosensory contribution to audio-visual speech processing, Cortex, 143:195-204,
doi: 10.1016/j.cortex.2021.07.013.
[Abstract]
Recent studies have demonstrated that the auditory speech perception of a listener can be modulated
by somatosensory input applied to the facial skin suggesting that perception is an embodied process.
However, speech perception is a multisensory process involving both the auditory and visual modalities.
It is unknown whether and to what extent somatosensory stimulation to the facial skin modulates audio-visual
speech perception. If speech perception is an embodied process, then somatosensory stimulation applied to the
perceiver should influence audio-visual speech processing. Using the McGurk effect (the perceptual illusion
that occurs when a sound is paired with the visual representation of a different sound, resulting in the
perception of a third sound) we tested the prediction using a simple behavioral paradigm and at the neural
level using event-related potentials (ERPs) and their cortical sources. We recorded ERPs from 64 scalp sites
in response to congruent and incongruent audio-visual speech randomly presented with and without somatosensory
stimulation associated with facial skin deformation. Subjects judged whether the production was /ba/ or not
under all stimulus conditions. In the congruent audio-visual condition subjects identifying the sound as /ba/,
but not in the incongruent condition consistent with the McGurk effect. Concurrent somatosensory stimulation
improved the ability of participants to more correctly identify the production as /ba/ relative to the
non-somatosensory condition in both congruent and incongruent conditions. ERP in response to the somatosensory
stimulation for the incongruent condition reliably diverged 220 ms after stimulation onset. Cortical sources were
estimated around the left anterior temporal gyrus, the right middle temporal gyrus, the right posterior superior
temporal lobe and the right occipital region. The results demonstrate a clear multisensory convergence of
somatosensory and audio-visual processing in both behavioral and neural processing consistent with the perspective
that speech perception is a self-referenced, sensorimotor process.
Endo N, Ito T, Mochida T, Ijiri T, Katsumi W and Nakazawa K (2021)
Precise force controls enhance loudness discrimination of self-generated sound,
Exp Brain Res, 239:1141–1149, doi: 10.1007/s00221-020-05993-7.
[Abstract]
Motor executions alter sensory processes. Studies have shown that loudness perception changes when a sound
is generated by active movement. However, it is still unknown where and how the motor-related changes in
loudness perception depend on the task demand of motor execution. We examined whether different levels of
precision demands in motor control affects loudness perception. We carried out a loudness discrimination
test, in which the sound stimulus was produced in conjunction with the force generation task. We tested
three target force amplitude levels. The force target was presented on a monitor as a fixed visual target.
The generated force was also presented on the same monitor as a movement of the visual cursor. Participants
adjusted their force amplitude in a predetermined range without overshooting using these visual targets and
moving cursor. In the control condition, the sound and visual stimuli were generated externally (without a
force generation task). We found that the discrimination performance was significantly improved when the
sound was produced by the force generation task compared to the control condition, in which the sound was
produced externally, although we did not find that this improvement in discrimination performance changed
depending on the different target force amplitude levels. The results suggest that the demand for precise
control to produce a fixed amount of force may be key to obtaining the facilitatory effect of motor execution
in auditory processes.
Ogane R, Selila L, Ito T (2020)
An experimental device for multi-directional somatosensory perturbation and its evaluation in a pilot psychophysical experiment,
J. Acoust. Soc. Am, 148:EL279, doi: 10.1121/10.0001942.
[Abstract]
Somatosensory stimulation associated with facial skin deformation has been developed and efficiently
applied in the study of speech production and speech perception. However, the technique is limited to
a simplified unidirectional pattern of stimulation, and cannot adapt to realistic stimulation patterns
related to multidimensional orofacial gestures. To overcome this issue, we developed a new multi-actuator
system enabling to synchronously deform the facial skin in multiple directions. The first prototype
involves stimulation in two directions and we demonstrate its efficiency through a temporal order
judgement test involving vertical and horizontal facial skin stretches at the sides of the mouth.
Ito T, Bai J, Ostry DJ (2020)
Contribution of sensory memory to speech motor learning,
J Neurophysiol, 124(4):1103-1109, doi: 10.1152/jn.00457.2020.
[Abstract]
Speech learning requires precise motor control but it likewise requires transient storage of information
to enable the adjustment of upcoming movements based on the success or failure of previous attempts.
The contribution of somatic sensory memory for limb position has been documented in work on arm movement,
however, in speech, the sensory support for speech production comes both from somatosensory and auditory
inputs and accordingly sensory memory for either or both of sounds and somatic inputs might contribute to
learning. In the present study, adaptation to altered auditory feedback was used as an experimental model
of speech motor learning. Participants also underwent tests of both auditory and somatic sensory memory.
We found that although auditory memory for speech sounds is better than somatic memory for speech-like
facial skin deformations, somatic sensory memory predicts adaptation, whereas auditory sensory memory does not.
Thus, even though speech relies substantially on auditory inputs and in the present manipulation adaptation
requires the minimization of auditory error, it is somatic inputs that provide the memory support for learning.
Ito T, Szabados A, Caillet JL, Perrier P (2020)
Quick compensatory mechanisms for tongue posture stabilization during speech production,
J Neurophysiol 123(6):2491-2503, doi: 10.1152/jn.00756.2019.
[Abstract]
The human tongue is atypical as a motor system since its movement is determined by deforming
its soft tissues via muscles that are in large part embedded in it (muscular hydrostats).
However, the neurophysiological mechanisms enabling fine tongue motor control are not well understood.
We investigated sensorimotor control mechanisms of the tongue through a perturbation experiment.
A mechanical perturbation was applied to the tongue during the articulation of three vowels (/i/,/e/, /ε/)
under conditions of voicing, whispering and posturing. Tongue movements were measured at three surface
locations in the sagittal plane using electromagnetic articulography. We found that the displacement
induced by the external force was quickly compensated for. Individual sensors did not return to their
original positions but went towards a position on the original tongue contour for that vowel.
The amplitude of compensatory response at each tongue site varied systematically according to
the articulatory condition. A mathematical simulation that included reflex mechanisms suggested
that the observed compensatory response can be attributed to a reflex mechanism, rather than passive
tissue properties. The results provide evidence for the existence of quick compensatory mechanisms
in the tongue that may be dependent on tunable reflexes. The tongue posture for vowels could be regulated
in relation to the shape of the tongue contour, rather than to specific positions for individual tissue points.
Ito T, Ohashi H, Gracco VL(2020)
Changes of orofacial somatosensory attenuation during speech production,
Neurosci Lett 730:135045, doi: 10.1016/j.neulet.2020.135045.
[Abstract]
Modulation of auditory activity occurs before and during voluntary speech movement.
However, it is unknown whether orofacial somatosensory input is modulated in the same manner.
The current study examined whether or not the somatosensory event-related potentials (ERPs)
in response to facial skin stretch are changed during speech and nonspeech production tasks.
Specifically, we compared ERP changes to somatosensory stimulation for different orofacial
postures and speech utterances. Participants produced three different vowel sounds (voicing)
or non-speech oral tasks in which participants maintained a similar posture without voicing.
ERP's were recorded from 64 scalp sites in response to the somatosensory stimulation under
six task conditions (three vowels × voicing/posture) and compared to a resting baseline condition.
The first negative peak for the vowel /u/ was reliably reduced from the baseline in both the voicing
and posturing tasks, but the other conditions did not differ. The second positive peak was reduced for
all voicing tasks compared to the posturing tasks. The results suggest that the sensitivity of somatosensory
ERP to facial skin deformation is modulated by the task and that somatosensory processing during speaking may
be modulated differently relative to phonetic identity.
Ogane R, Schwartz J-L, Ito T (2020)
Orofacial Somatosensory Inputs Modulate Word Segmentation in Lexical Decision,
Cognition, 197:104163, doi: 10.1016/j.cognition.2019.104163.
[Abstract]
There is accumulating evidence that articulatory/motor knowledge plays a role in phonetic processing,
such as the recent finding that orofacial somatosensory inputs may influence phoneme categorization.
We here show that somatosensory inputs also contribute at a higher level of the speech perception chain,
that is, in the context of word segmentation and lexical decision. We carried out an auditory identification
test using a set of French phrases consisting of a definite article “la” followed by a noun, which may be
segmented differently according to the placement of accents within the phrase. Somatosensory stimulation was
applied to the facial skin at various positions within the acoustic utterances corresponding to these phrases,
which had been recorded with neutral accent, that is, with all syllables given similar emphasis. We found that
lexical decisions reflecting word segmentation were significantly and systematically biased depending on the
timing of somatosensory stimulation. This bias was not induced when somatosensory stimulation was applied to
the skin other than on the face. These results provide evidence that the orofacial somatosensory system contributes
to lexical perception in situations that would be disambiguated by different articulatory movements, and suggests
that articulatory/motor knowledge might be involved in speech segmentation.
Trudeau-Fisette P, Ito T, Ménard L (2019)
Auditory and Somatosensory Interaction in Speech Perception in Children and Adults,
Front. Hum. Neurosci., 13:344, doi: 10.3389/fnhum.2019.00344.
[Abstract]
Multisensory integration allows us to link sensory cues from multiple sources and plays a crucial role
in speech development. However, it is not clear whether humans have an innate ability or whether repeated
sensory input while the brain is maturing leads to efficient integration of sensory information in speech.
We investigated the integration of auditory and somatosensory information in speech processing in a bimodal
perceptual task in 15 young adults (age 19 to 30) and 14 children (age 5 to 6). The participants were asked
to identify if the perceived target was the sound /e/ or /ø/. Half of the stimuli were presented under a
unimodal condition with only auditory input. The other stimuli were presented under a bimodal condition
with both auditory input and somatosensory input consisting of facial skin stretches provided by a robotic
device, which mimics the articulation of the vowel /e/. The results indicate that the effect of somatosensory
information on sound categorization was larger in adults than in children. This suggests that integration of
auditory and somatosensory information evolves throughout the course of development.
Ohashi H, Ito T (2019)
Recalibration of auditory perception of speech due to orofacial somatosensory inputs during speech motor adaptation,
J Neurophysiol, 122(5):2076-2084, doi: 10.1152/jn.00028.2019>.
[Abstract]
Speech motor control and learning rely both on somatosensory and auditory inputs. Somatosensory inputs associated
with speech production can also affect the process of auditory perception of speech, and the
somatosensory-auditory interaction may play a fundamental role in auditory perception of speech.
Here, we show that the somatosensory system contributes to perceptual recalibration, separate from
its role in motor function. Subjects participated in speech motor adaptation to altered auditory
feedback. Auditory perception of speech was assessed in phonemic identification tests prior to and
following speech adaptation. To investigate a role of the somatosensory system in motor adaptation
and subsequent perceptual change, we applied orofacial skin stretch in either a backward or forward
direction during the auditory feedback alteration as a somatosensory modulation. We found that the
somatosensory modulation did not affect the amount of adaptation at the end of training although it
changed the rate of adaptation. However, the perception following speech adaptation was altered
depending on the direction of the somatosensory modulation. Somatosensory inflow rather than motor
outflow thus drives changes to auditory perception of speech following speech adaptation, suggesting
that somatosensory inputs play an important role in tuning of perceptual system.
van den Bunt MR, Groen MA, Ito T, Francisco AA, Gracco VL, Pugh KR, Verhoeven L (2017)
Increased Response to Altered Auditory Feedback in Dyslexia: A Weaker Sensorimotor Magnet Implied in the Phonological Deficit,
J Speech Lang Hear Res, 60(3):654-667, doi: 10.1044/2016_JSLHR-L-16-0201.
[Abstract]
Purpose: The purpose of this study was to examine whether developmental dyslexia (DD) is
characterized by deficiencies in speech sensory and motor feedforward and feedback mechanisms,
which are involved in the modulation of phonological representations.
Method: A total of 42 adult native speakers of Dutch (22 adults with DD; 20 participants who
were typically reading controls) were asked to produce /bep/ while the first formant (F1) of the /e/
was not altered (baseline), increased (ramp), held at maximal perturbation (hold), and not altered
again (after-effect). The F1 of the produced utterance was measured for each trial and used for statistical
analyses. The measured F1s produced during each phase were entered in a linear mixed-effects model.
Results: Participants with DD adapted more strongly during the ramp phase and returned to
baseline to a lesser extent when feedback was back to normal (after-effect phase) when compared with
the typically reading group. In this study, a faster deviation from baseline during the ramp phase,
a stronger adaptation response during the hold phase, and a slower return to baseline during the
after-effect phase were associated with poorer reading and phonological abilities.
Conclusion: The data of the current study are consistent with the notion that the phonological
deficit in DD is associated with a weaker sensorimotor magnet for phonological representations.
Ito T, Coppola JH, Ostry DJ (2016)
Speech motor learning changes the neural response to both auditory and somatosensory signals,
Sci. Rep., 6:25926, doi: 10.1038/srep25926.
[Abstract]
In the present paper, we present evidence for the idea that speech motor learning is accompanied
by changes to the neural coding of both auditory and somatosensory stimuli. Participants in our
experiments undergo adaptation to altered auditory feedback, an experimental model of speech motor
learning which like visuo-motor adaptation in limb movement, requires that participants change their
speech movements and associated somatosensory inputs to correct for systematic real-time changes to
auditory feedback. We measure the sensory e ects of adaptation by examining changes to auditory and
somatosensory event-related responses. We nd that adaptation results in progressive changes to speech
acoustical outputs that serve to correct for the perturbation. We also observe changes in both auditory
and somatosensory event-related responses that are correlated with the magnitude of adaptation.
These results indicate that sensory change occurs in conjunction with the processes involved in speech motor adaptation.
Ito T, Ostry DJ, Gracco VL (2015)
Somatosensory event-related potentials from orofacial skin stretch stimulation,
J Vis. Exp.(106), e53621, doi: 10.3791/53621.
[Abstract]
Cortical processing associated with orofacial somatosensory function in speech has received limited
experimental attention due to the difficulty of providing precise and controlled stimulation.
This article introduces a technique for recording somatosensory event-related potentials (ERP) that
uses a novel mechanical stimulation method involving skin deformation using a robotic device.
Controlled deformation of the facial skin is used to modulate kinesthetic inputs through excitation
of cutaneous mechanoreceptors. By combining somatosensory stimulation with electroencephalographic recording,
somatosensory evoked responses can be successfully measured at the level of the cortex. Somatosensory
stimulation can be combined with the stimulation of other sensory modalities to assess multisensory interactions.
For speech orofacial stimulation is combined with speech sound stimulation to assess the contribution of
multi-sensory processing including the effects of timing differences to examine the manner in which
the two sensory signals combine. The ability to precisely control orofacial somatosensory stimulation
during speech perception and speech production with ERP recording is an important tool providing new insights
into the neural organization and neural representations for speech.
Suemitsu A, Jianwu D, Ito T, Tiede M (2015)
A real-time articulatory visual feedback approach with target presentation for second language pronunciation learning,
J. Acoust. Soc. Am., 138:EL382. doi: 10.1121/1.4931827>.
[Abstract]
Articulatory information can support learning or remediating pronunciation of a second language (L2).
This paper describes an electromagnetic articulometer-based visual-feedback approach using an articulatory
target presented in real-time to facilitate L2 pronunciation learning. This approach trains learners
to adjust articulatory positions to match targets for a novel L2 vowel estimated from productions of vowels
that overlap in both L1 and L2. Training of Japanese learners for the American English vowel that included
visual training improved its pronunciation regardless of whether audio training was also included.
Articulatory visual feedback is shown to be an effective method for facilitating L2 pronunciation learning.
Ito T, Gracco VL, Ostry DJ (2014)
Temporal factors affecting somatosensory-auditory interactions in speech processing,
Front. Psychol. 5:1198. doi: 10.3389/fpsyg.2014.01198.
[Abstract]
Speech perception is known to rely on both auditory and visual information. However, sound-specific
somatosensory input has been shown also to influence speech perceptual processing (Ito et al., 2009).
In the present study, we addressed further the relationship between somatosensory information and
speech perceptual processing by addressing the hypothesis that the temporal relationship between
orofacial movement and sound processing contributes to somatosensory–auditory interaction in speech
perception. We examined the changes in event-related potentials (ERPs) in response to multisensory
synchronous (simultaneous) and asynchronous (90 ms lag and lead) somatosensory and auditory stimulation
compared to individual unisensory auditory and somatosensory stimulation alone. We used a robotic device
to apply facial skin somatosensory deformations that were similar in timing and duration to those
experienced in speech production. Following synchronous multisensory stimulation the amplitude of
the ERP was reliably different from the two unisensory potentials. More importantly, the magnitude of
the ERP difference varied as a function of the relative timing of the somatosensory– auditory stimulation.
Event-related activity change due to stimulus timing was seen between 160 and 220 ms following somatosensory
onset, mostly around the parietal area. The results demonstrate a dynamic modulation of somatosensory–auditory
convergence and suggest the contribution of somatosensory information for speech processing process is
dependent on the specific temporal order of sensory inputs in speech production.
Ito T, Johns AR, Ostry DJ (2013)
Left lateralized enhancement of orofacial somatosensory processing due to speech sounds,
J Speech Lang Hear Res., 56(6):S1875-1881, doi: 10.1044/1092-4388(2013/12-0226).
[Abstract]
[PDF]
Purpose: Somatosensory information associated with speech articulatory movements affects the perception
of speech sounds and vice versa, suggesting an intimate linkage between speech production and perception
systems. However, it is unclear which cortical processes are involved in the interaction between speech
sounds and orofacial somatosensory inputs. The authors examined whether speech sounds modify orofacial
somatosensory cortical potentials that were elicited using facial skin perturbations.
Method: Somatosensory event-related potentials in EEG were recorded in 3 background sound conditions
(pink noise, speech sounds, and nonspeech sounds) and also in a silent condition. Facial skin deformations
that are similar in timing and duration to those experienced in speech production were used for somatosensory
stimulation.
Results: The authors found that speech sounds reliably enhanced the first negative peak of the
somatosensory event-related potential when compared with the other 3 sound conditions. The enhancement
was evident at electrode locations above the left motor and premotor area of the orofacial system.
The result indicates that speech sounds interact with somatosensory cortical processes that are produced
by speech-production-like patterns of facial skin stretch.
Conclusion: Neural circuits in the left hemisphere, presumably in left motor and premotor cortex,
may play a prominent role in the interaction between auditory inputs and speech-relevant somatosensory
processing.
Ito T, Ostry DJ (2012)
Speech sounds alter facial skin sensation.
J Neurophysiol, 107(1):442-7, doi: 10.1152/jn.00029.2011.
[Abstract]
[PDF]
Interactions between auditory and somatosensory information are relevant to the neural processing
of speech since speech processes and certainly speech production in- volves both auditory information
and inputs that arise from the muscles and tissues of the vocal tract. We previously demonstrated that
somatosensory inputs associated with facial skin deformation alter the perceptual processing of speech
sounds. We show here that the reverse is also true, that speech sounds alter the perception of facial
somatosensory inputs. As a somatosensory task, we used a robotic device to create patterns of facial
skin deformation that would normally accompany speech production. We found that the perception of the
facial skin deformation was altered by speech sounds in a manner that reflects the way in which auditory
and somatosensory effects are linked in speech production. The modulation of orofacial somatosensory
processing by auditory inputs was specific to speech and likewise to facial skin deformation.
Somatosensory judgments were not affected when the skin deformation was delivered to the forearm or
palm or when the facial skin deformation accompanied nonspeech sounds. The perceptual modulation that
we observed in conjunction with speech sounds shows that speech sounds specifically affect neural
processing in the facial somatosensory system and suggest the involvement of the somatosensory system
in both the production and perceptual processing of speech.
Ito T, Ostry DJ (2010)
Somatosensory contribution to motor learning due to facial skin deformation.
J Neurophysiol, 104(3):1230-8, doi: 10.1152/jn.00199.2010.
[Abstract]
[PDF]
Motor learning is dependent on kinesthetic information that is obtained both from cutaneous afferents
and from muscle receptors. In human arm move- ment, information from these two kinds of afferents is
largely corre- lated. The facial skin offers a unique situation in which there are plentiful cutaneous
afferents and essentially no muscle receptors and, accordingly, experimental manipulations involving
the facial skin may be used to assess the possible role of cutaneous afferents in motor learning.
We focus here on the information for motor learning pro- vided by the deformation of the facial skin
and the motion of the lips in the context of speech. We used a robotic device to slightly stretch the
facial skin lateral to the side of the mouth in the period immedi- ately preceding movement. We found
that facial skin stretch increased lip protrusion in a progressive manner over the course of a series
of training trials. The learning was manifest in a changed pattern of lip movement, when measured after
learning in the absence of load. The newly acquired motor plan generalized partially to another speech
task that involved a lip movement of different amplitude. Control tests indicated that the primary source
of the observed adaptation was sensory input from cutaneous afferents. The progressive increase in lip
protrusion over the course of training fits with the basic idea that change in sensory input is attributed
to motor performance error. Sensory input, which in the present study precedes the target move- ment, is
credited to the target-related motion, even though the skin stretch is released prior to movement initiation.
This supports the idea that the nervous system generates motor commands on the assumption that sensory
input and kinematic error are in register.
Ito T, Tiede M, Ostry DJ (2009)
Somatosensory function in speech perception.
Proc Natl Acad Sci U S A 106:1245-1248, doi: 10.1073/pnas.0810063106.
[Abstract]
[PDF]
Somatosensory signals from the facial skin and muscles of the vocal tract provide a rich source of
sensory input in speech production. We show here that the somatosensory system is also involved in
the perception of speech. We use a robotic device to create patterns of facial skin deformation that
would normally accompany speech production. We find that when we stretch the facial skin while people
listen to words, it alters the sounds they hear. The systematic perceptual variation we observe in
conjunction with speech-like patterns of skin stretch indicates that somatosen- sory inputs affect
the neural processing of speech sounds and shows the involvement of the somatosensory system in the
perceptual processing in speech.
Ito T, Gomi H (2007).
Cutaneous mechanoreceptors contribute to the generation of a cortical reflex in speech.
Neuroreport 18(9): 907-910, doi: 10.1097/WNR.0b013e32810f2dfb.
[Abstract]
[PDF]
Owing to the lack of muscle spindles and tendon organs in the perioral system, cutaneous receptors
may contribute to speech sensorimotor processes. We have investigated this possibility in the context
of upper lip reflexes, which we have induced by unexpectedly stretching the facial skin lateral to the
oral angle. Skin stretch at this location resulted in long latency reflex responses that were similar
to the cortical re£exes observed previously. This location reliably elicited the re£ex response, whereas
the skin above the oral angle and the skin on the cheek did not. The data suggest that cutaneous
mechanoreceptors are narrowly tuned to deformation of the facial skin and provide kinesthetic information
for rapid sensorimotor processing in speech.
Ito T, Kimura T, and Gomi H (2005).
The motor cortex is involved in reflexive compensatory adjustment of speech articulation.
Neuroreport 16(16): 1791-4, doi: 10.1097/01.wnr.0000185956.58099.f4.
[Abstract]
[PDF]
Although speech articulation relies heavily on the sensorimotor processing, little is known about its
brain control mechanisms. Here, we investigate, using transcranial magnetic stimulation, whether the
motor cortex contributes to the generation of quick sensorimotor responses involved in speech motor
coordination. By applying a jaw-lowering perturbation, we induced a reflexive compensatory upper-lip
response, which assists in maintaining the intact labial aperture in the production of bilabial fricative
consonants. This reflex response was significantly facilitated by subthreshold transcranial magnetic
stimulation over the motor cortex, whereas a simple perioral reflex that is mediated only within the
brainstem was not. This suggests that the motor cortex is involved in generating this functional
reflexive articulatory compensation.
Ito T, Gomi H, Honda M (2004).
Dynamical simulation of speech cooperative articulation by muscle linkages.
Biol Cybern 91: 275-282, doi: 10.1007/s00422-004-0510-6.
[Abstract]
[PDF]
Different kinds of articulators, such as the upper and lower lips, jaw, and tongue, are precisely coordinated
in speech production. Based on a perturbation study of the production of a fricative consonant using the upper
and lower lips, it has been suggested that increasing the stiffness in the muscle linkage between the upper
lip and jaw is beneficial for maintaining the constriction area between the lips (Gomi et al. 2002).
This hypothesis is crucial for examining the mechanism of speech motor control, that is, whether mechanical
impedance is controlled for the speech motor coordination. To test this hypothesis, in the current study we
performed a dynamical simulation of lip compensatory movements based on a muscle linkage model and then evaluated
the performance of compensatory movements. The temporal pattern of stiffness of muscle linkage was obtained
from the electromyogram (EMG) of the orbicularis oris superior (OOS) muscle by using the temporal
transformation (second-order dynamics with time delay) from EMG to stiffness, whose parameters were experimentally
determined. The dynamical simulation using stiffness estimated from empirical EMG successfully reproduced the
temporal profile of the upper lip com- pensatory articulations. Moreover, the estimated stiffness variation
significantly contributed to reproduce a functional modulation of the compensatory response. This result supports
the idea that the mechanical impedance highly contributes to organizing coordination among the lips and jaw.
The motor command would be programmed not only to generate movement in each articulator but also to regulate
mechanical impedance among articulators for robust coordination of speech motor control.
Ito T, Murano EZ, Gomi H (2004).
Fast force generation dynamics of human articulatory muscles.
J Appl Physiol 96: 2318-2324, doi: 10.1152/japplphysiol.01048.2003.
[Abstract]
[PDF]
[Commentary]
To explore the mechanisms of speech articulation, which is one of the most sophisticated human motor skills
controlled by the central nervous system, we investigated the force-generation dynamics of the human speech articulator
muscles [orbicularis oris superior (OOS) and inferior (OOI) muscles of the lips]. Short-pulse electrical stimulation (300 s)
with approximately three or four times the sensation threshold intensity of each subject induced the muscle response.
The responses of these muscles were modeled as second-order dynamics with a time delay (TD), and the model parameters
[natural frequency (NF), damping ratio (DR), and TD] were identified with a nonlinear least mean squares method.
The OOS (NF: 6.1 Hz, DR: 0.71, TD: 14.5 ms) and OOI (NF: 6.1 Hz, DR: 0.68, TD: 15.6 ms) showed roughly similar characteristics
in eight subjects. The dynamics in the tongue (generated by combined muscles) also showed similar characteristics
(NF: 6.1 Hz, DR: 0.68, TD: 17.4 ms) in two subjects. The NF was higher, and the DR was lower than results measured for
arm muscles (NF: 4.25 Hz, DR: 1.05, TD: 23.8 ms for triceps long head), indicating that articulatory organs adapt for
more rapid movement. In contrast, slower response dynamics was estimated when muscle force data by voluntarily contraction
task were used for force-generation dynamics modeling. We discuss methodological problems in estimating muscle dynamics
when different kinds of muscle contraction methods are used.
Ito T, Gomi H, Honda M (2003).
Articulatory coordination by muscle-linkage during bilabial utterances.
Acoust. Sci. & Tech. Acoustical letter 24(6): 391-393, doi: 10.1250/ast.24.391.
[Abstract]
[PDF]
Previous studies showed that the upper lip compensatory movement against the downward perturbation to the jaw is effective
in maintaining labial constriction or in attaining labial contact for the production of bilabial utter- ances. We measured
the stiffness of the muscle-linkage between the upper lip and jaw and proposed a model in which the muscle-linkage plays
key role in coordinating articulators during speech. To further examine the compensatory mechanism using muscle-linkage,
we here investigate the coordination change of the lip-jaw system under the upward perturbation condition, and examine
the directional difference between the upward and downward perturbation. Based on the experimental observations, we propose
an extended lip-jaw model, which could explain the directional difference.
Ito T, Gomi H, Honda M (2003).
Articulatory coordination using mechanical linkage between upper lip and jaw examined by jaw perturbation.
The IEICE trans. on Infor. & System, PT.2 J86-D-II(2): 333-341 (in Japanese).
[Abstract (in Japanese)]
本論文では,唇音発声時に顎摂動を与える実験を通して,機械的特性を利用した上唇・顎の協調動作について検討する.
唇摩擦音/φ/発声時に下顎を開く摂動を加えると上唇は下降することによって,下顎を閉じる方向に摂動を加えると
上唇は上昇することによって,両唇間せばめを維持するような補償応答を示した.得られた摂動時の運動変化に対して,
機械的特性を考慮した上唇・顎モデルを用いて上唇・顎間の剛性推定を行ったところ,発声タスクに応じて剛性が変化していた.
また,/φ/発声時において摂動方向によって剛性が異なっており,下顎を閉じる方向に摂動を加えた場合の方が高い剛性を示した.
この推定結果をもとに,下唇を含めた唇・顎モデルを用いて補償動作メカニズムの考察を行った.これらの結果は,筋スティフネスを利用した上唇・
顎の協調的補償動作が,発声運動制御において重要であることを示唆している.
Gomi H, Ito T, Murano EZ, Honda M (2002).
Compensatory articulation during bilabial fricative production by regulating muscle stiffness.
J Phonetics 30(3): 261-279, doi: 10.1006/jpho.2002.0173.
[Abstract]
[PDF]
The cooperative mechanisms in articulatory movements were examined by using mechanical perturbations during bilabial phonemic tasks.
The first experiment compares the differences in compensatory responses during sustained productions of the bilabial fricative /φ/
for which lip constriction is required, and /a/, for which the lips and jaw are relatively relaxed.
In the second experiment, we perturbed jaw movement with different load-onsets in the sentence ‘‘kono /aφaφa/ mitai’’.
In both experiments, labial distances were recovered partly or fully by the downward shifts of the upper lip.
The upper lip response was frequently prior to the EMG response observed in the sustained task. Additionally, initial downward
displacement of the upper lip was frequently larger when the load was supplied during /φ/ than when it was supplied during
/a/ in the sustained and sentence tasks, respectively. The stiffness variation estimated by using a muscle linkage model indicates
that the stiffness increases for the bilabial phonemic task in order to robustly configure a labial constriction.
The results suggest that the change in passive stiffness regulated by the muscle activation level is important in generating
quick cooperative articulation.
Studies in Automatic Control
Fukao Y, Nonami K, Ohnuki O,
Ito T, Fujimoto T, Naruke SA (2000).
Sensorless positioning control of two link robot arm by means of parameter identification.
Transactions of the Japan Society of Mechanical Engineers (C) 66(648): 190-198, doi: 10.1299/kikaic.66.2660 (in Japanese).
[Abstract]
[PDF]
In this study,we have extended the previous sensorless positioning control of single-link robot arm to a two-link robot arm system.
Sliding mode control is useful for multiple-link robot arm which contains the nonlinear characteristics at the drive system.
And, in the sensorless control, a variation of coil resistance which is used as an uncertain parameter of the actuator.
Such uncertain parameter has been estimated by using parameter identifier. Moreover, it has been verified that our provde control
scheme is very useful for the sensorless control system with unknown parameters. We proved the usefulness by experiments.
Ito T, Nonami K (1999).
Application of sliding mode control with robust hyperplane to flexible structure.
Transactions of the Japan Society of Mechanical Engineers (C) 65(629): 161-166
doi: 10.1299/kikaic.65.161 (in Japanese).
[Abstract]
[PDF]
Sliding mode control theory is nonlinear robust control theory and one of variable structure control.
This theory has good performance against uncertainty satisfied matching condition. But the conventional
sliding mode control system often becomes unstabled due to high frequency vibration caused by unmodelled
dynamics. In this paper, we consider application of Frequensy-Shaped Sliding Mode Control (FSSMC) to
flexible structure. Flexible structure has uncertainty on the control input side, e.g. time delay,
friction, and so on. It is difficult for control of flexible structure because of these uncertainty,
when we applied a linear control. Applying sliding mode conrol, we can design a eternal system against
these uncertainty and dynamics for sake of mating condition satisfied. Moreover, closed loop system does
not become unstabled due to ummodelled dynamics for sake of robust hyperplane designed
using H∞ theory.
For example of application, we have just applied to the fiexible truss structure with control momentam
gyro (CMG). We have verified from simulation and experiment that this method has good performance.
Ito T, Nonami K (1997).
Sliding mode control with frequency shaping to suppress spillover.
Transactions of the Japan society of mechanical engineers (C) 63(611): 120-126,
doi: 10.1299/kikaic.63.2308 (in Japanese).
[Abstract]
[PDF]
This paper deals with a new design method for sliding mode control with frequency shaping.
The concept is suppression of spillover phenomena for a flexible system. A conventional hyperplane
consists of a desired reference model without dynamics. Therefore, the sliding mode control system
often becomes unstable due to chattering and spillover phenomena in the high-frequency region.
This method aimes to suppress a control input in a high-frequency region by filtering control input
or measured output using a low-pass filter. The new sliding mode control system is designed as an
augmented system which consists of a reduced-order model and a low-pass filter. We have applied
this method to a four-story flexible structure. We have verified from simulations and experiments
that this method has good performance and is very useful for suppressing spillover phenomena.
Ito T, Nonami K, Iwamoto K, Nishimura H (1996).
Active vibration control of flexible structure by frequency-shaped sliding mode control with μ synthesis theory.
Transactions of the Japan Society of Mechanical Engineers (C) 62(602): 112-119,
doi: 10.1299/kikaic.62.3850(in Japanese).
[Abstract]
[PDF]
This paper proposes a new design method for a sliding mode control system using μ synthesis theory.
This concept is based on a frequency-shaped approach. A general hyperplane consists of a reference model
without dynamics. Therefore, a conventional sliding mode control system often becomes unstable due to
spillover phenomena of high frequency caused by high-speed switching. The proposed design method suppresses
such spillover phenomena because of frequency shaping. Also, it has good robustness on the hyperplane
to realize a minimum value of H∞ norm and the structured singular value from some
noise to state variables.
We have just applied this new method to the flexible structure of a test rig with four stories such as a
high-rise building. We have obtained good performance from simulations and experiments.
Nonami K,
Ito T (1996).
μ Synthesis of flexible rotor-magnetic bearing systems.
IEEE, Transactions on Control Systems Technology 4(5): 503-512 doi: 10.1109/87.531917.
[Abstract]
[PDF]
The μ synthesis design method was evaluated for a flexible rotor magnetic bearing system with a
five-axis-control system using both simulations and experiments. After modeling the full-order system
using the finite element method, we obtained a reduced-order model in the modal domain by truncating
the flexible modes. After choosing appropriate weighting functions with respect to frequency, we designed
the μ-synthesis control system using the μ-toolbox in MATLAB. We then carried out simulations of
the control system for a flexible rotor magnetic bearing system with a five-axis-control system and
obtained good performance. Next, we conducted experiments to verify the robustness of the controllers
on a test rig during initial levitation. The controllers provided robust stability and performance over
a wide range of parameter variations in the test rig.
Ito T, Nonami, K (1995).
μ Synthesis of flexible rotor-magnetic bearing systems.
Transactions of the Japan Society of Mechanical Engineers (C) 61(584): 173-178. (in Japanese).
[Abstract]
[PDF]
The H∞ control theory, which is the most powerful method for robust control
theory to date, is applied to the flexible rotor-magnetic bearing system.
However, the H∞ control system has some disadvantages in that it is a
conservative system and can not be used to treat with robust performances. This is due to its
maximum singular value. Doyle proposed a structured singular value instead of maximum singular
value. This is called the μ synthesis theory which treats robust performances using D-K iteration.
This paper concerned with the μ control of a flexible rotor-magnetic bearing system (FR-MBS).
Plant dynamics, consisting of actuator dynamics and flexible rotor dynamics, are described.
The μ controller for the reduced-order model is designed by D-K iteration, and its robust
performances are evaluated with several experiments. The relationships between μ control,
H∞ control and their robust performances are discussed for the flexible
rotor-magnetic bearing system.