Synthesis of the laryngeal source of throat singing
using a 2×2-mass model
Ken-Ichi Sakakibara, Hiroshi Imagawa, Seiji Niimi,
Naotoshi Osaka
Abstract
Singing voices have various
timbres. Throat singing and some other Asian traditional singing voices have a
pressed timbre that is significantly different from the European classic
singing voice. In our previous study on throat singing, the vibration of the
false vocal folds as well as that of the vocal folds were observed and was
found to be essentially due to the pressed timbre. This paper describes a
2×2-mass model as a physical model, defines an adduction parameterization of
its parameters, and presents a simulation of vocal fold and false vocal fold
vibrations in the larynx. Furthermore, a visual simulator of the laryngeal
movements is demonstrated. By using this model, the vibration patterns of the
two different laryngeal voices in throat singing (the squeezed and karygraa
voices) and the normal pressed voice have been simulated.
The results show the
possibility of synthesis of various timbres for singing.
1 Introduction
The singing voice has
numerous variations of timbre. There are considerable differences, for
instance, between European classical singing voice, such as bel canto and
German lied, and the Asian traditional pressed singing voices, such as throat
singing, Japanese Youkyoku, and Korean Pansori.
The laryngeal source is an
essential factor in determining the timbre of the singing voice, especially for
pressed quality. In general, the pressed quality is obtained by excessive
adduction of the supraglottal structure. The laryngeal adjustments in Asian
traditional pressed singing are much different from that in European classic
singing [5, 6, 9].
Synthesizing
such varying timbres in singing voices requires a flexible laryngeal source
model. A glottal waveform model allows us to control its parameters to
approximate the perception of voice [8, 10]. On the other hand, a physical
model allows us to control its parameters according to the physical and
physiological mechanism of laryngeal adjustment. Based on the physiological
observations, we have constructed a 2×2-mass model as a physical model which is
devised by attaching a two-mass for the false vocal fold to ordinary two-mass
model for the vocal folds [3, 10].
In this paper, after
summarizing the physiological observations in throat singing, we describe the
mechanism of a 2×2-mass model and its adduction parameterization. We also
present a visual simulation tool for the model. Finally, using the model, we
simulate the laryngeal sources of throat singing and the normal pressed voice.
2 Laryngeal Source in throat singing
2.1 Throat singing
Throat singing is a
traditional singing style of people who live around the
The
production of the highly pitched overtone is mainly due to the pipe resonance
of the cavity from the larynx to the point of articulation in the vocal tract
[1]. On the other hand, the laryngeal voice of throat singing has special
pressed timbre and supports the generation of the overtone.
The laryngeal voices of
throat singing can be classified as squeezed and kargyraa based on the
listener’s impression, acoustical characteristics, and the singer’s personal
observation on voice production. The squeezed voice is the basic laryngeal
voice in throat singing and used as drone. The kargyraa voice is a very low
pitched voice that ranges out of the modal register.
2.2 False vocal folds
The false vocal folds (ventricular folds) are a pair of soft and
flaccid folds which attach to anterolateral surface of the arytenoid cartilages
(Fig. 1). While the vocal folds (VFs) have a mechanism that change the
stiffness, thickness, and longitude by the muscles (mainly by the action of
thyroarytenoid muscle), the false vocal folds (FVFs) are incapable of becoming
tense, since they contain very few muscle fibres. The FVFs are capable of moving
with the arytenoid cartilages. They are also abducted and adducted by the
action of certain laryngeal muscles. In normal phonation, they do not vibrate
[11].
2.3 Physiological observation of laryngeal movements
Here, we summarize the
results of the physiological observation of laryngeal movements using
simultaneous recording of high-speed digital images, EGG, and sound waveforms
in [9, 10].
The common features of the
squeezed and kargyraa voices are an overall constriction of the suprastructures
of the glottis and vibration of the FVFs. The differences lie in the narrowness
of the constriction and the manner of FVF vibration. In the squeezed voice, the
FVF vibrates at the same frequency as the VF and both vibrate in the opposite
phase. In the kargyraa voice, the FVFs can be assumed to close once for every
two periods of closure of the VFs, and contribute to the generation of the
subharmonic tone of kargyraa [2, 6, 7, 9, 10].
3 Physical model
3.1 Two-mass model
The VF vibrations are modelled via the two-mass models [4], which make
it possible to simulate the movements of the upper and lower portions of the
VFs in different phase. The model parameters are defined as follows. m1, m2:
paired masses of the upper and lower portions of the VF; d1, d2: thickness; k1,
k2: stiffness; r1, r2: viscous resistances; ζ1, ζ2: damping ratios
which satisfy
stiffness of the linear coupling spring for the upper and lower
portions, lg: the length of the glottis; Ag1,Ag2: the cross-sectional areas
between masses; Ag01,Ag02: the cross-sectional areas between masses in rest.
A tension parameter Q which
controls pitch of a synthesized sound is to parameterize several model
parameters which are related to physical properties of the VF as follows:

3.2 2×2-mass model
For a physical simulation of
the VF and FVF vibrations, we have proposed a 2 × 2-mass model as a self
oscillating model of VF and FVF vibrations [10]. The model (Fig. 2) was devised
by attaching a two-mass model for FVFs to the ordinary two-mass model for VFs
with a laryngeal ventricle space between the models.
The laryngeal ventricle is
assumed to be a cylinder and not to be deformed. The mechanical transmission of
vibrations between the VFs and FVFs were not considered. The shape of area of
vocal tract which have acoustic interaction with the VF vibration is time
variable by the FVF vibrations.
Control parameters for
We adopt a two-mass model
instead of a one-mass model for the FVF because the FVFs are as thick as the VF
and a two-mass model reveals the same movement as a one-mass model does, if kc
is set sufficiently large.
3.3 Adduction parameter for the false vocal folds
As stated above, the FVFs
contain few muscle fibres and, unlike the VFs, their physical properties
essentially do not change. Therefore, it is meaningless to define a tension
parameter for FVFs. Hence some other parameterization is necessary.
It
is a physiological fact that the FVFs are adducted by the action of certain
laryngeal muscles, but it is unclear whether their physiological properties,
such mass and stiffness, are changed or not by the adduction. We take into
account the changing shapes of the FVF and, as one possible parameterization
of the model parameters by introducing an adduction parameter Q’ for
the validity of this
parameterization, we must wait for the detailed measurements of physical
properties of the FVFs by using fresh excited human larynx.
4 Visual Environment

A visual simulation tool called VibLaVie (vibrated larynx viewer) is
implemented on a Windows PC. Fig. 3
shows its main panel. visualization
The default initial values
are given, but users can set arbitrary initial parameters using the initial
parameter setting panel, after setting the initial parameters, users can also
set segmentally linear envelopes that describe time-variable information for
parameters. Fig. 4 shows the displacements of the masses, a laryngeal airflow,
and a synthesized mouth-output sound obtained by convoluting the laryngeal
airflow and vocal tract resonator whose formant-parameters can be also set by
users. Fig. 5 shows the VF and FVF vibration visualization panel. Users can see
the vibrations in larynx. This visual environment is very useful in simulating
the model, which has many complicated parameters and acts as a chaotic complex
system.
5 Experiments
5.1 Basic parameter setting
We set the initial values of
VF parameters as follows:

These constants are the same
as the ones in [4], which was deduced from physiological measurements. We also set
the initial values of the parameters for the FVFs and laryngeal ventricles as
follows:

These constants are not
precisely based on the physiological measurements. However, the longitude and
width of the false glottis and thickness of the FVFs were estimated from images
and are not far from the real values. It was verified by using MRI that the
laryngeal ventricle space exists in throat singing phonation. The vocal tract
is assumed to be a uniform pipe, 16 cm long, 5 cm squared in cross-section.
5.2 Results and discussions
We chose several values from
0 to 1.0 cm as an adduction parameters Q’. The results are shown in Fig. 6; for
each Q’, horizontal displacements of m1,m2,m’1,m’2 is shown at the top and a
laryngeal airflow (volume velocity) Ug is shown at the bottom. In the bottom,
the solid line, dashed line,
dotted line, and dashed-dotted line show the displacement of m1,m2,m’1,
and m’2 respectively.
The
normal pressed voice without vibration of the FVFs that is observed when Q’ =
0.05, 0.1. In general, this type of phonation is observed in normal phonation
and some Japanese traditional singing voices. The false glottis is somewhat
wider than that in throat singing [5]. The simulation based on the 2×2-mass
model is in good agreement with the observations. A period-triple kargyraa, in
which the FVFs vibrate once every three periods of VF vibration, is observed
when Q’ = 0.35. In this pattern, the pitch of the subharmonic tones should be
perceived an octave and a perfect fifth lower than that of the basic phonation.
Some throat singers are known to be able to sing the period-triple kargyraa.
The normal kargyraa vibration occurs when Q’= 0.5. For this vibration, the
shape of laryngeal airflow also agreed with the shape of the laryngeal airflow
estimated by using inverse filtering
[10]. When Q’= 0.6, vibration
is not periodical or might have very long period (> 1 s). When Q’= 0.7, the
period-triple kargyraa is observed again. The squeezed voice, in the realm of
throat singing, is observed when Q’= 0.85, 1.0.
The
difference of the phase between vibrations of the VF and FVF at Q’ = 1.0 is
different from that at Q’ = 0.85 . The shape of the simulated laryngeal airflow
was in agreement with the estimated laryngeal airflow by inverse-filtering
[10].
From
the physiological observation [9, 10], the vibration patterns depend on how
close the FVFs are approximated. The squeezed voice vibration was observed in
the close approximation, and the kargyraa voice vibration in the middle
approximation. The results of the simulation also agree with these
physiological observations.
6 Conclusions
We simulated laryngeal
movements for throat singing using a 2×2-mass model. The results were in good
agreement with physiological observations. By using the model, it is possible
to synthesize various laryngeal voices. As future work, we should measure the
realistic physical properties of the FVF and improve the model, and investigate
details of this model from the viewpoint of physics and chaos.
Acknowledgments
We would like to thank Seiji
Adachi, Takafumi Hikichi, Kiyoshi Honda, Emi Z. Murano, Johan Sundberg, Sayoko
Takano, Niro Tayama, and Masahiko Todoriki for their helpful discussions. We
also would like to thank the reviewers for their useful comments.
Bibliography
[1] S. Adachi and M. Yamada.
An acoustical study of sound production in biphonic singing x¨o¨omij. J.
Acoust. Soc. Am., Vol.105, No. 5, pp. 2920–2932, 1999.
[2] L. Fuks, B. Hammarberg,
and J. Sundberg. A self-sustained vocal-ventricular phonation mode: acoustical,
aerodynamic and glottographic evidences. KTH TMH-QPSR, Vol. 3/1998, pp. 49–59,
1998.
[3] H. Imagawa, K.-I.
Sakakibara, T. Konishi, E. Z. Murano, and S. Niimi. Throat singing synthesis by
a laryngeal voice model based on vocal fold and false vocal fold vibrations.
Proc. Of Study Group on Musical Info. of IPSJ., Vol. 01-MUS-39, pp. 71–78,
2001. in Japanese.
[4] K. Ishizaka and J. L.
Flanagan. Synthesis of voiced sounds from a two-mass model of the vocal cords.
[5] N. Kobayashi, Y. Tohkura,
S. Tenpaku, and S. Niimi. Acoustic and physiological characteristics of
traditional singing in
[6] T. C. Levin and M. E.
Edgerton. The throat singers of tuva. Scientific
[7] P.-A. Lindestad, M.
Sodersten, B. Merker, and S. Granqvist. Voice source characteristics in
mongolian ”throat singing” studied with high-speed imaging technique, acoustic
spectra, and inverse filtering. J. Voice, Vol. 15, No. 1, pp. 78–85, 2001.
[8] H.-L. Liu and J. O. Smith
III. Glottal source modelling for singing voice synthesis. In Proc. ICMC 2000,
pp. 90–97. ICMA, 2000.
[9] K.-I. Sakakibara,
M. Kumada, M. Todoriki, H.
Imagawa, and S. Niimi. Analysis of vocal fold vibrations in throat singing.
Tech. Rep. Musical Acoust. of Acoust. Soc. Jpn., Vol. 19, No. 4, pp. 41–48,
2000. in Japanese.
[10] K.-I. Sakakibara, T.
Konishi, K. Kondo, E. Z. Murano, M. Kumada, H. Imagawa, and S. Niimi. Vocal
fold and false vocal fold vibrations and synthesis of kh¨o¨omei. In Proc. ICMC
2001, pp. 135–138. ICMA, 2001.
[11] W. R. Zemlin. Speech and
hearing science — anatomy and physiology. Allyn and Bacon, 4th edition, 1998.
Return to Khoomii Singing
main Page