The Laryngeal Flow
model for Pressed-Type Singing Voices
Ken-Ichi
Sakakibara, Hiroshi Imagawa, Seiji Niimi, Naotoshi Osaka 2006
Abstract
Asian traditional pressed-type singing voices are
different from the European traditional singing voice in their timbre and voice
production mechanism. In throat singing, the ventricular folds and true vocal
folds vibrate, resulting in the generation of the special laryngeal voice. On
the other hand, in some other pressed-type singing voices, such as Japanese
Min-yoh, the ventricular folds only approximate but do not vibrate.
We propose a new laryngeal
flow model incorporating the effect of the ventricular fold vibration and
laryngeal ventricle resonance. The model is a combination of the known glottal
airflow model (R-model), the laryngeal ventricle resonance (Helmholtz
resonator), and the modulation of ventricular fold vibration. We will also
demonstrate the relation between model parameters and voice quality. The results
show that the proposed model is effective for synthesizing the pressed-type
singing voices.
1. Introduction
Non-interactive parametric glottal models assume that
there are no interactions between the glottal source and vocal tract [2]. The
glottal source is described by using mathematical equations. Such models are
very effective in speech synthesis and coding and therefore have been used in
many studies. The R-model [8] and LF-model [3] have become reference models for
this type of model.
All of these models assume
that the laryngeal voice source is determined by vocal fold vibratory patterns
and intend to control voice quality by changing vocal fold vibratory
parameters, such as the open quotient (OQ), speed quotient (SQ), closing
quotient (CQ), and amplitude quotient (AQ) [1]. However, in throat singing, the
vibration of the ventricular folds (VTFs) (also referred to as the false vocal
fold) and strong constriction of the supraglottic structure are observed [11],
and in some Asian traditional pressed-type singing, such as Japanese Min-yoh,
the constriction of the supraglottic structure is also observed, though the
VTFs do not vibrate [5]. Therefore, for synthesis of various styles of singing
voices, besides the vocal fold vibration, the effects of the VTF vibration and
resonance of the laryngeal ventricle must be considered.
In this paper, we propose a
new laryngeal model based on glottal flow, laryngeal ventricle resonance, and
the modulation of the VTF vibration.
As the
laryngeal source for the source-filter synthesis, the proposed model is able to
control various timbres of singing voices.
2. The Laryngeal Flowmodel With Ventricular-Fold
Vibratory Modulation
2.1. VTF-modulation model
VTF vibration is observed in various types of
phonation. In throat singing, both drone and kargyraa voice phonations are
always accompanied by VTF vibrations, as well as vocal fold (VF) vibrations. In
the drone voice, the ventricular folds vibrate in the same period as the VFs,
and in the kargyraa voice, the VTF vibrate in an integer multiple (usually
double or triple) period of the VFs [4, 6, 11]. The results of a simulation
using a 2x2-mass model suggest the possible vibratory patterns of the VFs and
VTFs [9, 11].
Here, we use “laryngeal
flow (source)” to mean the airflow through the VTF slit, and “glottal airflow
(source)” to mean the airflow through the slit of the VFs. The laryngeal flows
of drone and kargyraa for different two singers are shown in Fig. 1. These
flows were obtained from recorded sounds using an inverse-filter analysis. We
marked five poles on spectrum in the range from 0 to 5 kHz, constructed the
inverse-filter, and manually adjusted it to make the result smooth. By
combining the results of high-speed images, EGG waveforms, and these
inverse-filtered laryngeal sources, we concluded that, in throat singing, the
VTF vibration is indispensable for the generation of the laryngeal flow.
Therefore, modelling the laryngeal flow in throat singing requires a new
laryngeal model that includes the effect of VTF vibration.
The VTF-modulation model ũ
(t) is simply defined as follows:
![]()
The block-diagram of the model is depicted as shown in
Fig. 2. In this paper, we choose a simple R-model [8] for the glottal flow. The
R-model is described as follows:


where α is amplitude, Tp opening time, Tn closing
time, and To period. All of these variables are in R>0. The open quotient
(OQ) is written as (Tp + Tn) / T0
The vibratory patterns of
the VTFs were observed using the
high-speed images and seem to be not exactly
sine-shape [7, 10,
11]. However, here we define the VTF-modulation
function M (t) by multiplication by constant M of the false glottal area
function A’g. A’g. We also define as a sine function:

where α’ represents the amplitude of the VTF
vibration, Ag’0 the area between the VTFs at rest, ώ the frequency of the
VTF vibration, and θ’ the phase difference of the VTF vibration from VF
vibration. All of these are in R>0.
Physiological observations and the simulation using 2x2-mass model
suggest that the periods of the VF and VTF vibration satisfy 2π/ώ =
nT0.where n ε Z>0.
2.2. VTF-modulation and LVT-resonance model
The laryngeal
ventricle is the space between the VFs and VTFs. When the VTFs are strongly
constricted, it seems the effect of this small space on the laryngeal voice can
not be ignored. The physical model simulation suggests that some acoustic
effects occur around 2000 Hz [9, 11]. The inverse-filtered laryngeal voices of
throat singing have some ripples (Fig. 1), which almost agree with the physical
model simulation results, [7], hence, some appropriate model with laryngeal
ventricle resonance is required. Fig. 3 shows spectra of the drone voices of
two different singers.

A block diagram of our
proposed model (VTF-modulation and LVT-resonance model) is shown in Fig. 4.
The model was obtained as
follows: The glottal airflow is convoluted with the time-variant laryngeal
ventricle resonator depending on the VTF vibration, and modulated by the
vibration of the VTFs.
We denote the resonator by the laryngeal ventricle by
h [t] (z). Then, the laryngeal voice with the laryngeal ventricle and VTF
modulation is described as:
![]()
We realize h
[t] (z) as a time-varying one-pole filter. We calculate the resonance frequency
of the laryngeal ventricle, i.e. the frequency of the pole of h [t], by means
of a Helmholtz resonator. Let Fv (t) be the resonance frequency, d’ be
thickness of the VTF, and Vv be volume of the laryngeal ventricle. Then,

Where c is the sound velocity, 3.53 x 10 cubed cm/s.
To permit control flexibility, we define the bandwidth of the resonance by the
multiple of variable K, which changes depending on phonation types, and the
bandwidth as a Helmholtz resonator.

The resistance Rv (t), inductance Lv (t), conductance
G, and capacitance C satisfy the following equations.

Where ω := 2π/T0 is the frequency of the VF
vibration, and dv the thickness of the laryngeal ventricle. The constants are set
as follows: the density of air p = 1.14 x 10³ g/cm³ the viscosity μ =
1.86 x 10-4 dyn. s/cm² ;the adiabatic gas constant ŋ = 1.4; and the
specific heat ξ = 0.24 cal/gm . degree.
3. Acoustical Characteristics
3.1. VTF-modulation model
We study the effect of the phase difference between
the VF and VTF vibrations. In the equation
![]()
we fix A’g0 =
0.5 max u (t), α’ = 0.35 max u (t). For u (t), we also set T0 = 8 ms,
Tp/T0 = 0.42, Tn/T0 = 0.18, and hence OQ = 0.6.
As these settings, the spectral tilt of u (t) is close to 12 dB. We also
set ω’ = ω, i.e. study the laryngeal flow, such as the drone voice of
throat singing. The laryngeal flows of various θ’ are as shown in Fig. 5.

The EGG waveforms and high-speed images of the same
subjects in Fig. 1 suggested that the phase delay of the VTF vibration to the
VFs should be around π/4, i.e. θ’ = - π/4 [11]. This value is
also supported by its frequent appearance in physical model simulation [9]. In
Fig. 5, the laryngeal flow for θ’ s shows the similar characteristics of
the drone in Fig. 1 and the opening duration is relatively less than the
closing duration.
Fig. 6 show the spectral envelops of the laryngeal
flows for different θ’. Among θ’ s in Fig. 6, the spectral tilt is
the largest when θ’ = - π/4
(-16 dB/octave), smallest when θ’ = - π/2 (- 12dB/octave).
3.2. VTF-modulation and LVT-resonance model
We set A’g0 =
0.10 cm and α’ = 0.05 cm. We normalize u(t) by multiplying some real
positive value and assume u(t) as the glottal area function. We set the maximal
glottal area max u(t) to 0.2 cm². We set the thickness of the VTF d’ to 1.0 cm,
the cross sectional area of the laryngeal ventricle to 1.5 cm², and the depth
of the laryngeal ventricle to 0.5 cm, K = 20 in Eq. (7). We used these values
for calculation of the H[t](z). The other values are the same as above. The
laryngeal flows with LVT resonance of various θ’ s are as shown in Fig. 7.
In all cases, ripples are
observed after the closure of the glottis. Fig. 8 shows spectra of two flows.
The effect of the VTF resonance is observed around 2000 Hz. This feature is
observed in all the synthesized sources.

3.3. False glottal area at rest
When A’g0 is decreased, Eq. (6) implies that the
resonance frequency is pushed higher.
The spectra in Fig. 9 shows the spectra for different
A’g0. Other conditions are the same as above.

3.4. Modulation amplitude
We synthesized
laryngeal flows by changing the amplitude of VTF vibrations α’. No
significant trends are observed in the behaviours of the synthesized flows.
3.5. Laryngeal source for kargyraa
If ώ =
2ω, then u’(t) has a double-period of u(t) and shows behaviour similar to
kargyraa phonation. From the characteristics of u(t), in the middle of each
period, the laryngeal flow reaches to 0. However, the inverse-filtered karygraa
voice maintains flow in each period. Uncompleted closure of the VFs is also
observed in the physical model simulation [9]. In order to obtain the similar
laryngeal flow shape, the second u(t) flow must start before Tn + Tp or u(t)
needs sufficiently large OQ.
4. Conclusions
A new laryngeal flow model was proposed. We studied the
acoustic characteristics of the model by changing its parameters. To obtain the
laryngeal voice shape of the drone voice, VTF modulation is indispensable. In
addition, to obtain ripples after the closure of the vocal folds, laryngeal
ventricle resonance is effective. These results show the proposed model is
effective for synthesizing pressed-type singing voices, such as throat singing.
Parameter fitting in terms of analysis-by-synthesis and perceptual evaluation
will be addressed as future works. In addition, an effective inverse filtering
method in cases that the source has poles and the filter has zeros must be
studies.
Acknowledgments
We thank Seiji Adachi, Parham Mokhtari, Yoshinao
Shiraki, Niro Tayama, and Masahiko Todoriki for their helpful discussions.
5. References
[1] P. Alku, T. B¨ackstr¨om, and E.Vilkman. Normalized
amplitude quatient for parametrization of the glottal flow. J. Acoust.
Soc. Am., 112(2):701–710, 2002.
[2] K. E. Cummings and M. A. Clements. Glottal models
for digital speech processing: A historical survey and new results. Digital
Signa Processing, 5:21–42, 1995.
[3] G. Fant, J. Liljencrants, and Q.-A. Lin. A
four-parameter model of glottal flow. KTH STL QPSR, pages 1–14, 1985.
[4] L. Fuks, B. Hammarberg, and J. Sundberg. A self-sustained
vocal-ventricular phonation mode: acoustical, aerodynamic and glottographic
evidences. KTH TMH-QPSR, 3/1998:49–59, 1998.
[5] N. Kobayashi, Y. Tohkura, S. Tenpaku, and S.
Niimi. Acoustic and physiological characteristics of traditional singing in
[6] T. C. Levin and M. E. Edgerton. The throat singers
of tuva. Scientific
[7] P.- A° . Lindestad, M. Sodersten, B. Merker, and
S. Granqvist. Voice source characteristics in mongolian ”throat singing”studied
with high-speed imaging technique, acoustic spectra, and inverse filtering. J.
Voice, 15(1):78–85, 2001.
[8] A.
[9] K.-I. Sakakibara, H. Imagawa, S. Niimi, and
[10] K.-I. Sakakibara, T. Konishi, H. Imagawa, E. Z.
Murano, K. Kondo, M. Kumada, and S. Niimi. Observation of the laryngeal
movements for throat singing — vibration of two pairs of the folds in human
larynx. Acoust. Soc. Am. World Wide Press Room, 144th meeting of the
ASA, 2002. http://www.acoustics.org/press/.
[11] K.-I. Sakakibara, T. Konishi, K. Kondo, E. Z.
Murano, M. Kumada, H. Imagawa, and S. Niimi. Vocal fold and false vocal fold
vibrations and synthesis of kh¨o¨omei. In Proc. ICMC 2001, pages
135–138. ICMA, 2001.
Return to Mongolian Khoomii Main Page