통합 검색 | Korea Science

기본주파수와 성도길이의 상관관계를 이용한 HTS 음성합성기에서의 목소리 변환 (Voice transformation for HTS using correlation between fundamental frequency and vocal tract length)

유효근;김영관;서영주;김회린
- 말소리와 음성과학
- /
- 제9권1호
- /
- pp.41-47
- /
- 2017
The main advantage of the statistical parametric speech synthesis is its flexibility in changing voice characteristics. A personalized text-to-speech(TTS) system can be implemented by combining a speech synthesis system and a voice transformation system, and it is widely used in many application areas. It is known that the fundamental frequency and the spectral envelope of speech signal can be independently modified to convert the voice characteristics. Also it is important to maintain naturalness of the transformed speech. In this paper, a speech synthesis system based on Hidden Markov Model(HMM-based speech synthesis, HTS) using the STRAIGHT vocoder is constructed and voice transformation is conducted by modifying the fundamental frequency and spectral envelope. The fundamental frequency is transformed in a scaling method, and the spectral envelope is transformed through frequency warping method to control the speaker's vocal tract length. In particular, this study proposes a voice transformation method using the correlation between fundamental frequency and vocal tract length. Subjective evaluations were conducted to assess preference and mean opinion scores(MOS) for naturalness of synthetic speech. Experimental results showed that the proposed voice transformation method achieved higher preference than baseline systems while maintaining the naturalness of the speech quality.
https://doi.org/10.13064/KSSS.2017.9.1.041 인용 PDF KSCI

음성기관의 공기역학적 고찰 (The Aerodynamic Study of the Vocal Tract)

김기령;박인용;김희남;심상열;최홍식
- 대한기관식도과학회:학술대회논문집
- /
- 대한기관식도과학회 1979년도 제13차 학술대회 연제순서 및 초록
- /
- pp.8.3-8
- /
- 1979
음의 생성은 성문하의 기류가 성대에서 조절되고 성대상부의 Vocal tract에서 modulation되어 생성되므로 후두에 이상이 생기면 발성시 후두를 통과하는 기류에 변화가 오게된다. 타국에서는 Dohne(1944)과 Arnold(1955, 1958)등 여러학자들이 후두질환에 따른 공기역학적 변화를 측정하여 후두질환의 진단에 기여한 바 크다. 본 저자들은 후두질환에 따른 공기역학적 측정에 앞서 이에 대한 정상역치를 측정하여 그 기준치로 하고자 21∼30세의 정상인 남녀 각각 20명을 대상으로 Collins회사제 Respirometer를 이용하여 평균 기류유출률, 최대 밭성량, 최대발성시간 및 발성속력치 등을 측정하였기에 제 1보로서 보고하는 바이다.
PDF

동적 성대 모델을 이용한 후두 내 유동 및 음향장에 대한 수치 연구 (Computation of Laryngeal Flow and Sound through a Dynamic Model of the Vocal Folds)

배영민;문영준
- 한국전산유체공학회:학술대회논문집
- /
- 한국전산유체공학회 2008년도 춘계학술대회논문집
- /
- pp.21-24
- /
- 2008
The present study numerically investigates the glottal airflow characteristics as well as acoustic features of phonation fully coupled with dynamic behavior of vocal folds. The vocal folds are described by a low-dimensional body-covered model characterized by bio-mechanical parameters such as glottal width, vocal folds stiffness, and subglottal pressure. The flow in the vocal tract is modeled as an incompressible, axisymmetric form of the Navier-Stokes equations (INS), while the acoustic field is predicted by the linearized perturbed compressible equations (LPCE). The computed result shows that a two-mass model of vocal folds is sufficient to reproduce temporal variations in oral airflow and glottis motion produced by female speakers. It is also found that i) the glottal width has a significant effect on the amplitude of glottal flow, and thus on the amplitude of acoustic wave in the vocal tract, ii) the vocal fold tension is the main control parameter for the fundamental frequency of phonation, iii) the subglottal pressure plays an appreciable role on reproduction of the self-sustained oscillation of vocal folds, and iv) the strength of pulsating airflow and vortical structures are primarily affected by glottal width and subglottal pressure, and are closely related to pitch, loudness, and voice quality. Finally, more comprehensive explanation about the difference between one- and two-mass models is presented with discussion of effectiveness of vocal folds oscillation and voice quality.
PDF

초고속 성대촬영기(High-Speed Digital Imaging)를 이용한 말더듬인과 근 긴장성 발성장애인의 /이/모음 발성 시 성대 진동 양상에 관한 비교 연구 (A Comparative Study of Vocal Fold Vibratory Behaviors Shown in the Phonation of the /i/ Vowel between Persons who Stutter and Persons with Muscle Tension Dysphonia Using High-Speed Digital Imaging)

정훈;안종복;박진향;최병흔;권도하
- 말소리와 음성과학
- /
- 제1권4호
- /
- pp.195-201
- /
- 2009
The purpose of this study was to use high-speed digital imaging (HSDI) to compare vocal vibratory behaviors of persons who stutter (PWS) and persons with muscle tension dysphonia (PMTD) for uttering the /i/ vowel in a bid to identify the characteristics of vocal fold vibratory behaviors of PWS. This study surveyed seven developmental PWSs and seven PMTDs. The findings of the study indicated the following: first, regarding the two groups' vocal fold vibratory behaviors, of seven PWSs, three were found to be close vocal tract (VC) and four were found to be combination vocal tract (VCB). Of the seven PMTDs, one was found to be VC, and the other six were found to be VCB. These results indicate that a voiceprint which is different from the open vocal tract (VO) found in normal groups in research conducted by Jung, et al. (2008b) appeared in both groups of this study. Even between the two groups, there is a difference in the voiceprint before vocalization. Second, a VKG analysis was conducted to identify the two groups' vocal cord contact quotient. As a result, the PWS group's vocal cord contact quotient changed gradually from an irregular one at the initial vocalization stage to a regular one. The PMTD group continued the tension at the initial vocalization. Putting together all of these results, there is a difference in vocal fold vibratory behaviors between PWSs and PMTDs when they speak. Thus, there was a difference in muscular tension between the two groups.
PDF

포먼트 주파수 추출 알고리즘들의 성능 비교평가 연구 (A Comparative Study on Formant Frequency Extraction Performances)

손성용;김상진;김영민;한민수
- 대한음성학회:학술대회논문집
- /
- 대한음성학회 2003년도 5월 학술대회지
- /
- pp.141-144
- /
- 2003
In this paper, we compared formant frequency extraction algorithms with various conditions, and show their performances. The formant frequency is the resonance frequency which is decided by the vocal tract characteristics. It is related with phonemes, or characteristics of the physical condition of the vocal track. Since the speech signal is influenced by both the sound source and the vocal tract, it is difficult to calculate the exact formant frequencies. Many studies on the formant frequency extraction had been executed already Besides, any new formant frequency extraction algorithm is hardly found recently.
PDF

5음의 분석과 분류 (A study on the 5-Tone Analysis and Classification)

조병서;이용동;;허웅;박영배
- 대한전자공학회:학술대회논문집
- /
- 대한전자공학회 2001년도 하계종합학술대회 논문집(5)
- /
- pp.219-222
- /
- 2001
The human speech sounds are use to diagnosis in oriental medicine with ‘0-sung’theory. In general, human voice are sound waves which generated by phonation. Two major parts of phonation are vocal cords and vocal tract. The uniqueness of individual vocal sound depend on structure and usage of their vocal cords and tract. In the oriental medicine, “0-sung (5-tones)” has been used to classify constitution of human body In order to characterize the “0-sung”, their frequency characteristics are investigated, and a principal frequency component is extracted. Then, the principal component is applied to classify sounds into “0-sung.”
PDF

음성합성시스템을 위한 음색제어규칙 연구 (A Study on Voice Color Control Rules for Speech Synthesis System)

김진영;엄기완
- 음성과학
- /
- 제2권
- /
- pp.25-44
- /
- 1997
When listening the various speech synthesis systems developed and being used in our country, we find that though the quality of these systems has improved, they lack naturalness. Moreover, since the voice color of these systems are limited to only one recorded speech DB, it is necessary to record another speech DB to create different voice colors. 'Voice Color' is an abstract concept that characterizes voice personality. So speech synthesis systems need a voice color control function to create various voices. The aim of this study is to examine several factors of voice color control rules for the text-to-speech system which makes natural and various voice types for the sounding of synthetic speech. In order to find such rules from natural speech, glottal source parameters and frequency characteristics of the vocal tract for several voice colors have been studied. In this paper voice colors were catalogued as: deep, sonorous, thick, soft, harsh, high tone, shrill, and weak. For the voice source model, the LF-model was used and for the frequency characteristics of vocal tract, the formant frequencies, bandwidths, and amplitudes were used. These acoustic parameters were tested through multiple regression analysis to achieve the general relation between these parameters and voice colors.
PDF

고음질을 갖는 음색변경에 관한 연구 (A Study on the Voice Conversion Algorithm with High Quality)

박형빈;배명진
- 대한전자공학회:학술대회논문집
- /
- 대한전자공학회 2000년도 제13회 신호처리 합동 학술대회 논문집
- /
- pp.157-160
- /
- 2000
In the generally a voice conversion has used VQ(Vector Quantization) for partitioning the spectral feature and has performed by adding an appropriate offset vector to the source speaker's spectral vector. But there is not represented the target speaker's various characteristics because of discrete characteristics of transformed parameter. In this paper, these problems are solved by using the LMR(Linear Multivariate Regression) instead of the mapping codebook which is determined to the relationship of source and target speaker vocal tract characteristics. Also we propose the method for solved the discontinuity which is caused by applying to time aligned parameters using Dynamic Time Warping the time or pitch-scale modified speech. In our proposed algorithm for overcoming the transitional discontinuities, first of all, we don't change time or pitch scale and by using the LMR change a speaker's vocal tract characteristics in speech with non-modified time or pitch. Compared to existed methods based on VQ and LMR, we have much better voice quality in the result of the proposed algorithm.
PDF

A Study on Comparison of Pronunciation Accuracy of Soprano Singers

Song, Uk-Jin;Park, Hyungwoo;Bae, Myung-Jin
- International journal of advanced smart convergence
- /
- 제6권2호
- /
- pp.59-64
- /
- 2017
There are three sorts of voices of female vocalists: soprano, mezzo-soprano, and contralto according to the transliteration. Among them, the soprano has the highest vocal range. Since the voice is generated through the human vocal tract based on the voice generation model, it is greatly influenced by the vocal tract. The structure of vocal organs differs from person to person, and the formants characteristic of vocalization differ accordingly. The formant characteristic refers to a characteristic in which a specific frequency band appears distinctly due to resonance occurring in each vocal tract in the vocal process. Formant characteristics include personality that occurs in the throat, jaw, lips, and teeth, as well as phonological properties of phonemes. The first formant is the throat, the second formant is the jaw, the third formant and the fourth formant are caused by the resonance phenomenon in the lips and the teeth. Among them, pronunciation is influenced not only by phonological information but also by jaws, lips and teeth. When the mouth is small or the jaw is stiff when pronouncing, pronunciation becomes unclear. Therefore, the higher the accuracy of the pronunciation characteristics, the more clearly the formant characteristics appear in the grammar spectrum. However, many soprano singers can not open their mouths because their jaws, lips, teeth, and facial muscles are rigid to maintain high tones when singing, which makes the pronunciation unclear and thus the formant characteristics become unclear. In this paper, in order to confirm the accuracy of the pronunciation characteristics of soprano singers, the experimental group was selected as the soprano singers A, B, C, D, E of Korea and analyzed the grammar spectrum and conducted the MOS test for pronunciation recognition. As a result, soprano singer B showed a clear recognition from F1 to F5 and MOS test result showed the highest recognition rate with 4.6 points. Soprano singers A, C, and D appear from F1 to F3, but it was difficult to find formants above 2kHz. Finally, the soprano singer E had difficulty in finding the formant as a whole, and MOS test showed the lowest recognition rate at 2.1 points. Therefore, we confirmed that the soprano singer B, which exhibits the most distinct formant characteristics in the grammar spectrum, has the best pronunciation accuracy.
https://doi.org/10.7236/IJASC.2017.6.2.59 인용 PDF KSCI

선형 스펙트럼쌍을 이용한 성문특성이 제거된 성도특성 추출법에 관한 연구 (A Study on Extraction of Vocal Tract Characteristic After Canceling the Vocal Cord Property Using the Line Spectrum Pairs)

민소연;장경아;배명진
- 한국음향학회지
- /
- 제21권7호
- /
- pp.665-670
- /
- 2002
프리엠퍼시스 필터의 일반적인 형태는 y(n)=s(n)-As(n-1)이고, 여기서 A값은 유성음의 경우 0.9∼l.0사이의 값이다. 또한 A값은 프리엠퍼시스의 기울기 값을 반영하고 기존의 방법에서는 자기상관계수 값인 R(1)/R(0)를 사용한다. 본 논문에서는 성문특성으로 인해 고주파특성이 약화되는 것을 보상하기 위하여 새로운 평탄화 기법을 제안한다. 우선 포만트 주파수 예측을 위해 LSP 파라미터의 간격정보를 사용하였다. 찾아진 포만트 주파수들간의 선형보간을 통해 기울기와 역기울기 값을 구하여 평탄화 과정을 수행한다. 실험결과에서는 제안한 방법이 기존의 방법보다 평탄화 특성이 우수한 것으로 나타났다. 즉 본 논문에서는 약화된 고주파 성분을 보상하는 과정에서 평탄화 요소로 LSP의 간격정보를 사용하였다.
PDF KSCI

검색결과 43건 처리시간 0.02초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)