• Title/Summary/Keyword: Vocal tract characteristics

Search Result 43, Processing Time 0.033 seconds

Voice transformation for HTS using correlation between fundamental frequency and vocal tract length (기본주파수와 성도길이의 상관관계를 이용한 HTS 음성합성기에서의 목소리 변환)

  • Yoo, Hyogeun;Kim, Younggwan;Suh, Youngjoo;Kim, Hoirin
    • Phonetics and Speech Sciences
    • /
    • v.9 no.1
    • /
    • pp.41-47
    • /
    • 2017
  • The main advantage of the statistical parametric speech synthesis is its flexibility in changing voice characteristics. A personalized text-to-speech(TTS) system can be implemented by combining a speech synthesis system and a voice transformation system, and it is widely used in many application areas. It is known that the fundamental frequency and the spectral envelope of speech signal can be independently modified to convert the voice characteristics. Also it is important to maintain naturalness of the transformed speech. In this paper, a speech synthesis system based on Hidden Markov Model(HMM-based speech synthesis, HTS) using the STRAIGHT vocoder is constructed and voice transformation is conducted by modifying the fundamental frequency and spectral envelope. The fundamental frequency is transformed in a scaling method, and the spectral envelope is transformed through frequency warping method to control the speaker's vocal tract length. In particular, this study proposes a voice transformation method using the correlation between fundamental frequency and vocal tract length. Subjective evaluations were conducted to assess preference and mean opinion scores(MOS) for naturalness of synthetic speech. Experimental results showed that the proposed voice transformation method achieved higher preference than baseline systems while maintaining the naturalness of the speech quality.

The Aerodynamic Study of the Vocal Tract (음성기관의 공기역학적 고찰)

  • 김기령;박인용;김희남;심상열;최홍식
    • Proceedings of the KOR-BRONCHOESO Conference
    • /
    • 1979.05a
    • /
    • pp.8.3-8
    • /
    • 1979
  • Dohne (1944) has studied the consumption of air during phonation in patients with dysphonia and Arnold (1955, 1958) reported that the maximum phonation time is frequently reduced to a few seconds in paralytic dysphonia. Also, Nishikawa investigated the relation among the vital capacity, maxium phonation time, caculated mean flow rate and various vocal characteristics in patients with hoarseness. Authors have studied the aerodynamic characteristics of the vocal tract in the following aspects, using 9 L. Respirometer made in Collins Inc.; 1. Maximum phonation time 2. Maximum phonation volume 3. Mean flow rate 4. Vocal velocity index

  • PDF

Computation of Laryngeal Flow and Sound through a Dynamic Model of the Vocal Folds (동적 성대 모델을 이용한 후두 내 유동 및 음향장에 대한 수치 연구)

  • Bae, Young-Min;Moon, Young-J.
    • 한국전산유체공학회:학술대회논문집
    • /
    • 2008.03b
    • /
    • pp.21-24
    • /
    • 2008
  • The present study numerically investigates the glottal airflow characteristics as well as acoustic features of phonation fully coupled with dynamic behavior of vocal folds. The vocal folds are described by a low-dimensional body-covered model characterized by bio-mechanical parameters such as glottal width, vocal folds stiffness, and subglottal pressure. The flow in the vocal tract is modeled as an incompressible, axisymmetric form of the Navier-Stokes equations (INS), while the acoustic field is predicted by the linearized perturbed compressible equations (LPCE). The computed result shows that a two-mass model of vocal folds is sufficient to reproduce temporal variations in oral airflow and glottis motion produced by female speakers. It is also found that i) the glottal width has a significant effect on the amplitude of glottal flow, and thus on the amplitude of acoustic wave in the vocal tract, ii) the vocal fold tension is the main control parameter for the fundamental frequency of phonation, iii) the subglottal pressure plays an appreciable role on reproduction of the self-sustained oscillation of vocal folds, and iv) the strength of pulsating airflow and vortical structures are primarily affected by glottal width and subglottal pressure, and are closely related to pitch, loudness, and voice quality. Finally, more comprehensive explanation about the difference between one- and two-mass models is presented with discussion of effectiveness of vocal folds oscillation and voice quality.

  • PDF

A Comparative Study of Vocal Fold Vibratory Behaviors Shown in the Phonation of the /i/ Vowel between Persons who Stutter and Persons with Muscle Tension Dysphonia Using High-Speed Digital Imaging (초고속 성대촬영기(High-Speed Digital Imaging)를 이용한 말더듬인과 근 긴장성 발성장애인의 /이/모음 발성 시 성대 진동 양상에 관한 비교 연구)

  • Jung, Hun;Ahn, Jong-Bok;Park, Jin-Hyaung;Choi, Byung-Heun;Kwon, Do-Ha
    • Phonetics and Speech Sciences
    • /
    • v.1 no.4
    • /
    • pp.195-201
    • /
    • 2009
  • The purpose of this study was to use high-speed digital imaging (HSDI) to compare vocal vibratory behaviors of persons who stutter (PWS) and persons with muscle tension dysphonia (PMTD) for uttering the /i/ vowel in a bid to identify the characteristics of vocal fold vibratory behaviors of PWS. This study surveyed seven developmental PWSs and seven PMTDs. The findings of the study indicated the following: first, regarding the two groups' vocal fold vibratory behaviors, of seven PWSs, three were found to be close vocal tract (VC) and four were found to be combination vocal tract (VCB). Of the seven PMTDs, one was found to be VC, and the other six were found to be VCB. These results indicate that a voiceprint which is different from the open vocal tract (VO) found in normal groups in research conducted by Jung, et al. (2008b) appeared in both groups of this study. Even between the two groups, there is a difference in the voiceprint before vocalization. Second, a VKG analysis was conducted to identify the two groups' vocal cord contact quotient. As a result, the PWS group's vocal cord contact quotient changed gradually from an irregular one at the initial vocalization stage to a regular one. The PMTD group continued the tension at the initial vocalization. Putting together all of these results, there is a difference in vocal fold vibratory behaviors between PWSs and PMTDs when they speak. Thus, there was a difference in muscular tension between the two groups.

  • PDF

A Comparative Study on Formant Frequency Extraction Performances (포먼트 주파수 추출 알고리즘들의 성능 비교평가 연구)

  • Son Sungyung;Kim Sang-Jin;Kim YoungMin;Hahn Minsoo
    • Proceedings of the KSPS conference
    • /
    • 2003.05a
    • /
    • pp.141-144
    • /
    • 2003
  • In this paper, we compared formant frequency extraction algorithms with various conditions, and show their performances. The formant frequency is the resonance frequency which is decided by the vocal tract characteristics. It is related with phonemes, or characteristics of the physical condition of the vocal track. Since the speech signal is influenced by both the sound source and the vocal tract, it is difficult to calculate the exact formant frequencies. Many studies on the formant frequency extraction had been executed already Besides, any new formant frequency extraction algorithm is hardly found recently.

  • PDF

A study on the 5-Tone Analysis and Classification (5음의 분석과 분류)

  • Cho, B.S.;Lee, Y.D.;Kim, J.K.;Hur, W.;Pak, Y.B.
    • Proceedings of the IEEK Conference
    • /
    • 2001.06e
    • /
    • pp.219-222
    • /
    • 2001
  • The human speech sounds are use to diagnosis in oriental medicine with ‘0-sung’theory. In general, human voice are sound waves which generated by phonation. Two major parts of phonation are vocal cords and vocal tract. The uniqueness of individual vocal sound depend on structure and usage of their vocal cords and tract. In the oriental medicine, “0-sung (5-tones)” has been used to classify constitution of human body In order to characterize the “0-sung”, their frequency characteristics are investigated, and a principal frequency component is extracted. Then, the principal component is applied to classify sounds into “0-sung.”

  • PDF

A Study on Voice Color Control Rules for Speech Synthesis System (음성합성시스템을 위한 음색제어규칙 연구)

  • Kim, Jin-Young;Eom, Ki-Wan
    • Speech Sciences
    • /
    • v.2
    • /
    • pp.25-44
    • /
    • 1997
  • When listening the various speech synthesis systems developed and being used in our country, we find that though the quality of these systems has improved, they lack naturalness. Moreover, since the voice color of these systems are limited to only one recorded speech DB, it is necessary to record another speech DB to create different voice colors. 'Voice Color' is an abstract concept that characterizes voice personality. So speech synthesis systems need a voice color control function to create various voices. The aim of this study is to examine several factors of voice color control rules for the text-to-speech system which makes natural and various voice types for the sounding of synthetic speech. In order to find such rules from natural speech, glottal source parameters and frequency characteristics of the vocal tract for several voice colors have been studied. In this paper voice colors were catalogued as: deep, sonorous, thick, soft, harsh, high tone, shrill, and weak. For the voice source model, the LF-model was used and for the frequency characteristics of vocal tract, the formant frequencies, bandwidths, and amplitudes were used. These acoustic parameters were tested through multiple regression analysis to achieve the general relation between these parameters and voice colors.

  • PDF

A Study on the Voice Conversion Algorithm with High Quality (고음질을 갖는 음색변경에 관한 연구)

  • 박형빈;배명진
    • Proceedings of the IEEK Conference
    • /
    • 2000.09a
    • /
    • pp.157-160
    • /
    • 2000
  • In the generally a voice conversion has used VQ(Vector Quantization) for partitioning the spectral feature and has performed by adding an appropriate offset vector to the source speaker's spectral vector. But there is not represented the target speaker's various characteristics because of discrete characteristics of transformed parameter. In this paper, these problems are solved by using the LMR(Linear Multivariate Regression) instead of the mapping codebook which is determined to the relationship of source and target speaker vocal tract characteristics. Also we propose the method for solved the discontinuity which is caused by applying to time aligned parameters using Dynamic Time Warping the time or pitch-scale modified speech. In our proposed algorithm for overcoming the transitional discontinuities, first of all, we don't change time or pitch scale and by using the LMR change a speaker's vocal tract characteristics in speech with non-modified time or pitch. Compared to existed methods based on VQ and LMR, we have much better voice quality in the result of the proposed algorithm.

  • PDF

A Study on Comparison of Pronunciation Accuracy of Soprano Singers

  • Song, Uk-Jin;Park, Hyungwoo;Bae, Myung-Jin
    • International journal of advanced smart convergence
    • /
    • v.6 no.2
    • /
    • pp.59-64
    • /
    • 2017
  • There are three sorts of voices of female vocalists: soprano, mezzo-soprano, and contralto according to the transliteration. Among them, the soprano has the highest vocal range. Since the voice is generated through the human vocal tract based on the voice generation model, it is greatly influenced by the vocal tract. The structure of vocal organs differs from person to person, and the formants characteristic of vocalization differ accordingly. The formant characteristic refers to a characteristic in which a specific frequency band appears distinctly due to resonance occurring in each vocal tract in the vocal process. Formant characteristics include personality that occurs in the throat, jaw, lips, and teeth, as well as phonological properties of phonemes. The first formant is the throat, the second formant is the jaw, the third formant and the fourth formant are caused by the resonance phenomenon in the lips and the teeth. Among them, pronunciation is influenced not only by phonological information but also by jaws, lips and teeth. When the mouth is small or the jaw is stiff when pronouncing, pronunciation becomes unclear. Therefore, the higher the accuracy of the pronunciation characteristics, the more clearly the formant characteristics appear in the grammar spectrum. However, many soprano singers can not open their mouths because their jaws, lips, teeth, and facial muscles are rigid to maintain high tones when singing, which makes the pronunciation unclear and thus the formant characteristics become unclear. In this paper, in order to confirm the accuracy of the pronunciation characteristics of soprano singers, the experimental group was selected as the soprano singers A, B, C, D, E of Korea and analyzed the grammar spectrum and conducted the MOS test for pronunciation recognition. As a result, soprano singer B showed a clear recognition from F1 to F5 and MOS test result showed the highest recognition rate with 4.6 points. Soprano singers A, C, and D appear from F1 to F3, but it was difficult to find formants above 2kHz. Finally, the soprano singer E had difficulty in finding the formant as a whole, and MOS test showed the lowest recognition rate at 2.1 points. Therefore, we confirmed that the soprano singer B, which exhibits the most distinct formant characteristics in the grammar spectrum, has the best pronunciation accuracy.

A Study on Extraction of Vocal Tract Characteristic After Canceling the Vocal Cord Property Using the Line Spectrum Pairs (선형 스펙트럼쌍을 이용한 성문특성이 제거된 성도특성 추출법에 관한 연구)

  • 민소연;장경아;배명진
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.7
    • /
    • pp.665-670
    • /
    • 2002
  • The most common form of pre-emphasis is y(n)=s(n)-As(n-1), where A typically lies between 0.9 and 1.0 in voiced signal. Also, this value reflects the degree of pre-emphasis and equals R(1)/R(0) in conventional method. This paper proposes a new flattening method to compensate the weaked high frequency components that occur by vocal cord characteristic. We used interval information of LSP to estimate formant frequency, After obtaining the value of slope and inverse slope using linear interpolation among formant frequency, flattening process is followed. Experimental results show that the proposed method flattened the weaked high frequency components effectively. That is, we could improve the flattening characteristics by using interval information of LSP as flattening factor at the process that compensates weaked high frequency components.