The Effect of the Hypopharyngeal and
Supra-Glottic Shapes on The Singing Voice
Hiroshi Imagawa, Ken-Ichi Sakakibara, Niro Tayama,
Seiji Niimi, 2003
Abstract
The timbre of the singing
voice is strongly affected by the shapes of the laryngeal tube and hypopharynx.
We propose a physical model of the vocal tract and larynx, including the vocal
and ventricular folds, for synthesis. We study the effect of the shapes of the
laryngeal tube and hypopharynx on the synthesized voice using the proposed
model.
We
synthesized normal phonation, operatic singing, and throat singing voices by
changing the hypopharyngeal and supra-glottic shapes and evaluated the
acoustical effects of the various shapes of the hypopharynx and laryngeal tube.
The results show that all the shapes of the hypopharynx, larynx tube, piriform
fossa, and laryngeal ventricle play an important role in determining voice
quality in singing.
1. Introduction
The voice quality in singing
is determined by due to both the laryngeal voice and vocal tract shape. In most
cases, the laryngeal voice is characterized by the vibratory pattern of the
vocal folds. These vibratory patterns vary depending on the vocal registers
(whistle, falsetto, modal, and vocal fry) and singing style (belting, opera,
and so on). In throat singing (Khöömei, Khöömij, Kai, etc.), the ventricular
folds (also referred to as the false vocal folds) vibrate as well and their
vibration is essential for the special timbres of the drone and kargyraa voices
[3, 8, 11, 12].
The vocal tract shape also contributes to other
aspects of voice quality besides phonetics. In particular, the contributions of
the lower vocal tract (the larynx tube, piriform fossa, and hypopharynx) seem
to be important in generating the special voice quality in singing. Previous
fiberscope observations have shown differences in the lower vocal tract shape
in various styles of singing. In operatic singing the hypopharynx and piriform
fossa are relatively wide, and larynx tube is relatively narrow. The volume
ratio matching of the larynx tube to the hypopharynx produces “the singer’s
formant” [13]. The belting voice is characterized by a narrow hypopharynx and
piriform fossa and a narrow larynx tube [14]. Japanese Min-yoh is characterised
by a narrow larynx tube and ventricular fold constriction [7].
In
this paper, we investigate the production mechanisms of various timbres in
singing voices using a physical model that allows the vibration of the
ventricular folds. The model is realized by acoustical coupling of the
two-by-two-mass model and vocal tract with the piriform fossa and infraglottic
cavity. We use the model to study the acoustic effect of the shape of the
larynx tube (the laryngeal ventricle, ventricular fold, and upper part) and
hypopharynx on the voice quality.
2. Synthesis Model
The mechanism of the physical
model for the synthesis of singing voices is depicted in Fig. 1. The laryngeal
part is described by the 2_2-mass model, which represents the ventricular folds
in a self oscillating model as well as the vocal folds. The 2_2-mass model was
obtained by improving the two-mass model [2, 4, 10].
The
vocal tract is represented as a lossless transmission line. The length of each
section is set to 0.4cm squared. The
ventricular folds and laryngeal ventricle are also assumed to be parts of the
vocal tract. In the case that the ventricular folds vibrate, the area of the
ventricular-fold section varies.
The
effect of the nasal cavity and change of intra-pressure of the lungs are
neglected for simplicity. Fig. 2 shows the equivalent circuit of our proposed
model.
2.1. Subglottic region
The
cross-sectional areas of the subglottal system are determined based on the
anatomical data reported in [5]. We roughly approximate the subglottic region
by 66 cylindrical sections each 0.4cm long and calculate the acoustic
characteristics by using the equivalent circuit as in Fig. 2. Let y cm be the
distance from the glottis. Then, the areas A cm squared is determined as
follows: A=2.5 if
![]()
2.2. Vocal folds
The vocal folds are represented
as the two-mass model proposed in [4]. The F0 of vocal-fold oscillation is
controlled by changing the tension parameter Q [4]. For the initial setting of
physical parameters, we use all of the normal values described in [4] except
the
rest
glottal areas
2.3. Laryngeal ventricle
The laryngeal ventricle as a
cylindrical section (A1, l1) such that l1= 0.4 cm is set depending on the
phonation type, but to 1.5 cm as the normal value. Note that even if the
ventricular folds strongly constrict and contact as in throat singing, the
space of the laryngeal ventricle is observed [9].
2.4. Ventricular folds
The ventricular folds contain
few muscle fibres and, unlike the vocal folds, their physical properties essentially
do not change. Therefore, it is meaningless to define a tension parameter for
the ventricular folds. Hence, some other parameterisation is necessary.
It
is a physiological fact that the ventricular folds are adducted by the action
of certain laryngeal muscles, such as the cricoepiglottic muscle and
thyroepiglottic muscle [6], but it is unclear whether their physiological
properties, such as mass and stiffness, are changed or not by the adduction.
We
take into account the changing shapes of the ventricular folds and introduce an
adduction parameter Q’ for the ventricular folds, which is one possible
parameterization for the stiffness, mass, and the false glottal area at rest
[10]. We set the initial values of the parameters for the ventricular folds and
laryngeal ventricles as
![]()
The
two different laryngeal voices in throat singing, drone and kargyraa, are
generated by coupling of the vocal and ventricular vibrations. In the drone
voice, the ventricular folds vibrate in the same period as the vocal folds, and
in the kargyraa voice, the ventricular fold vibrate in the integer multiple
(usually double or triple) period of the vocal fold. The constriction of the
ventricular folds at rest is strong for both cases. However, the constriction
in the case of kargyraa is relatively loose. Therefore, by changing the area
between the ventricular folds at rest, the vibratory patterns of normal, drone,
and kargyraa phonations can be simulated by using the 2x2-mass model [10].

2.5. Piriform fossa
The bilateral piriform fossa
is assumed to be symmetric and, thereby, implemented as one cavity. From the
result of the preliminary experiment, the acoustic characteristics are not
significantly different between one-cavity and two-cavity cases.
The
piriform fossa is implemented as follows. First, according to the MRI data in
[1], we assume each piriform fossa to be a cone whose depth is 2 cm and volume
is 1.5 cm squared for each side. We represent the piriform fossa by five cylindrical
sections

of the cylinder portion above
the arytenoid apex plane according to [1], and use the value 0.75 for the end
correction coefficient. Finally, if these adjustments require us to extend the
number of sections, we add necessary sections such that
![]()
2.6. Vocal tract
The vocal tract is
represented as a transmission line of n
cylindrical hard-walled sections An, Ln) with cross-sectional area An and
length ln. We assume that (A1, l1) is the region of the laryngeal ventricle
with l1 = 0.4 cm and that (A2, l2), (A3, l3) are the spaces between the
ventricular folds with l2, l3 = 0.3cm cm. The ventricular folds are able to
vibrate. We set n = 43. Hence, the length of vocal tract is 17.2 cm. For k = 3,…,43 each section has length
lk =0.4 cm and variable Ak.
3. Acoustic Measurement Using the Synthesis Model

We set the length of each
part of the vocal tract as shown in Fig. 3. The length of the larynx tube is
2.4 cm, which includes the ventricular folds sections with the length of 8cm
and laryngeal ventricle section with the length of 4cm. We also set the length
of hypopharynx to 2cm. We set Pl = 5 cm H2O in default and attached the
piriform fossa as shown in Fig. 2. The default areas were set to 1.5 cm squared
for the laryngeal ventricles, 0.5 cm squared for the ventricular folds and
remaining larynx tube section, and 3.14 cm squared for the hypopharynx and the
upper vocal tract.

3.1. Effect of the larynx tube
We changed the areas of the
larynx tube A2,…, A6 excluding the laryngeal ventricle section. The spectral
envelopes of the synthesized sounds are shown in Fig. 4 (LPC analysis, p=24).
With decreasing larynx tube area, F2, F3, F4 move lower. If the area of larynx
tube is greater than 0.05 cm squared, the ventricular folds do not contact. In
addition, F5 moves slightly lower and gradually disappear and the power in the
range greater than 4000 Hz decreases.

3.2. Effect of the hypopharynx
Fig. 5 shows the spectral
envelopes of the synthesized sounds when we changed the hypopharynx area. They
are shown in F3 and F5 move close to F4. In some range, A=3 or 6 cm squared in Fig. 5 , the formant
cluster of F3, F4, F5 is observed around 3000 Hz.

3.3. Effect of the laryngeal ventricle
The results are shown in Fig.
6, when the laryngeal ventricle area was changed. As the area increases, F3, F5
move lower and are pushed close to F2 and F2 is sharpened. The effect of a zero
in the range from 4000 to 5000 Hz also becomes larger.
3.4. Effect of the piriform fossa
The spectral envelope of
synthesized sounds is shown in Fig. 7 for changes in piriform fossa volume. As
the volume is increased, the effect of a zero around 4500 Hz becomes larger. As
pointed out in [15], increasing the volume of the piriform fossa repels the
formants, i.e. F1, F2, F3 and F4 are pushed lower and so is F4.
4. Singing Voice Synthesis
Based on the results of the
acoustic measurements of the effects of varying the shape of the larynx tube,
hypopharynx, laryngeal ventricle, and piriform fossa, we chose the nominal
settings
of the parameters for normal phonation, operatic singing, and throat singing
(Table 1). In the case of operatic singing, we assume that the lowest four
sections in the hypopharynx have constant areas A7 = A8 = 5.0 cm squared and A9
= A10 = 5.0 cm squared to maintain the large volume ratio of the hypopharynx to
the larynx tube. In throat singing, by controlling the adduction parameter Q’, we
synthesized both drone and kargyraa phontaions. We set the initial value of the
A2, A3, to 0.04 cm squared, Q’ = 1 for drone, and Q’ = 0.55. No significant
differences were observed between the spectral envelopes of the two phonations.
In Fig. 8, the spectral envelope of the drone voice is shown.
The
spectral envelopes are show in Fig. 8. The synthesized operatic singing voice
have the singer’s formant around 300 Hz. In the synthesized throat singing
voice, F2 is sharp and the power in the range from 4000 to 6000 Hz is
relatively low. The sharpness of the F2 is the effect of the laryngeal
ventricle resonance and contributes to the generation of the whistle-like tone.
5. Conclusions
We studied the acoustic
effect of the shape of the larynx tube and hypopharynx. The acoustic
characteristics of the synthesized singing voices reveal good accordance with
known results for operatic and throat singing. Our results show that the
dimensions of the laryngeal ventricle, larynx tube, hypopharynx, and piriform
fossa play an important role in determining voice quality in singing.
For
operatic singing, relatively narrow larynx tube and wide hypopharynx effect to
create singing formant as reported in previous studies [13, 15]. The piriform
fossa also contributes to the cluster of F3, F4. For throat singing, strong
constriction of the ventricular folds sharpens F2 and produces the special
laryngeal voices by vibration of the ventricular folds as well.
Most
previous studies assumed that the larynx tube has uniform area. However, our
results reveal that the laryngeal ventricle volume and the ventricular fold
constriction also affect the acoustic characteristics of the singing voices.
In
this study, we did not consider the mechanical interaction of each space in the
lower vocal tract. However, the physiological mechanism determining the shape
of the lower vocal tract is very important for finding appropriate parameters
to synthesize various voice qualities. In particular, layngeal height is very
important in determining the configuration of the lower vocal tract. The
activation of the thyropharyngeal muscle, cricopharyngeal muscle, and
paratopharyngeal muscle are also important for regulating the shape of the
hypopharynx and the intrinsic laryngeal muscles are important for constricting
the larynx tube [6]. Further investigations by using EMG and MRI are needed in
order to understand their physiology and, thereby, control the parameters of
the synthesis model.
Acknowledgments
We would like to thank Seiji
Adachi, Kiyoshi Honda, Emi Z. Murano, and Sotaro Sekimoto for their helpful
discussions.
6. References
[1] J. Dang and K. Honda.
Acoustic characteristics of the piriform fossa in models and humans. J. Acoust.
Soc. Am., 101(1):456–465, 1997.
[2] J. L. Flanagan, K.
Ishizaka, and K. L. Shipley. Synthesis of speech from a dynamical model of the
vocal cords and vocal tract.
[3] L. Fuks, B. Hammarberg,
and J. Sundberg. A self-sustained vocal ventricular phonation mode: acoustical,
aerodynamic and glottographic evidences. KTH TMH-QPSR, 3/1998:49–59, 1998.
[4] K. Ishizaka and J. L.
Flanagan. Synthesis of voiced soudns from a two-mass model of the vocal cords.
[5] K. Ishizaka, M.
Matsudaira, and T. Kaneko. Input acousticimpedance measurement of the
subglottal system. J. Acoust. Soc. Am., 60(1):190–197, 1976.
[6] M. Kimura, K.-I.
Sakakibara, H. Imagawa, R. Chan, S. Niimi, and N. Tayama. Histological
investigation of the supra-glottal structures in human for understanding
abnormal phonation. J. Acoust. Soc. Am., 112:2446, 2002.
[7] N. Kobayashi, Y. Tohkura,
S. Tenpaku, and S. Niimi. Acoustic and physiological characteristics of
traditional singing in
[8] T. C. Levin and M. E.
Edgerton. The throat singers of tuva. Scientific
[9] V. T. Maslov. Functional
peculiarities of the larynx during the vocal formation in Tuva two-voice
singing. Vestn. Otorinolaringol., Mar.– Arp.:58–61, 1979. in Russian.
[10] K.-I. Sakakibara, H.
Imagawa, S. Niimi, and
[11] K.-I. Sakakibara, T.
Konishi, H. Imagawa, E. Z. Murano, K. Kondo, M. Kumada, and S. Niimi.
Observation of the laryngeal movements for throat singing — vibration of two
pairs of the folds in human larynx. Acoust. Soc. Am. World Wide Press Room,
144th meeting of the ASA, 2002. http://www.acoustics.org/press/.
[12] K.-I. Sakakibara, T.
Konishi, K. Kondo, E. Z. Murano, M. Kumada, H. Imagawa, and S. Niimi. Vocal
fold and false vocal fold vibrations and synthesis of kh¨o¨omei. In Proc. ICMC
2001, pages 135–138. ICMA, 2001.
[13] J. Sundberg. The science
of the singing voice. Nothern Illinois Univ. Pr., 1989.
[14] J. Sundberg, P.
Gramming, and J. Lovetri. Comparisons of pharynx, source, formant and pressure
characteristics in Operatic and musical theatre singing. J. Voice,
7(4):301–310, 1993.
[15]