통합 검색 | Korea Science

기본주파수와 성도길이의 상관관계를 이용한 HTS 음성합성기에서의 목소리 변환 (Voice transformation for HTS using correlation between fundamental frequency and vocal tract length)

유효근;김영관;서영주;김회린
- 말소리와 음성과학
- /
- 제9권1호
- /
- pp.41-47
- /
- 2017
The main advantage of the statistical parametric speech synthesis is its flexibility in changing voice characteristics. A personalized text-to-speech(TTS) system can be implemented by combining a speech synthesis system and a voice transformation system, and it is widely used in many application areas. It is known that the fundamental frequency and the spectral envelope of speech signal can be independently modified to convert the voice characteristics. Also it is important to maintain naturalness of the transformed speech. In this paper, a speech synthesis system based on Hidden Markov Model(HMM-based speech synthesis, HTS) using the STRAIGHT vocoder is constructed and voice transformation is conducted by modifying the fundamental frequency and spectral envelope. The fundamental frequency is transformed in a scaling method, and the spectral envelope is transformed through frequency warping method to control the speaker's vocal tract length. In particular, this study proposes a voice transformation method using the correlation between fundamental frequency and vocal tract length. Subjective evaluations were conducted to assess preference and mean opinion scores(MOS) for naturalness of synthetic speech. Experimental results showed that the proposed voice transformation method achieved higher preference than baseline systems while maintaining the naturalness of the speech quality.
https://doi.org/10.13064/KSSS.2017.9.1.041 인용 PDF KSCI

대칭구조를 갖는 일반적인 고차의 미분 에너지함수를 기반한 순간주파수를 이용한 음성의 기본주파수 추정 (Estimation of Fundamental Frequency Using an Instantaneous Frequency Based on the Symmetric Higher Order Differential Energy Operator)

임병관
- 전기학회논문지
- /
- 제60권12호
- /
- pp.2374-2379
- /
- 2011
The fundamental frequency of the voiced speech is estimated using the instantaneous frequency based on the symmetric higher order differential energy operator. The instantaneous frequency based on the symmetric higher order energy operator shows better frequency estimation result since it is aligned to the time instance of the signal. The speech is pre-processed by a lowpass filter to remove higher frequency components. Then, it is processed by the instantaneous frequency to obtain the fundamental frequency estimates. The symmetric higher order energy operator has been used as an indicator to determine the voiced/unvoiced speech. The fundamental frequency estimates are further processed by a moving average filter to obtain the monotonically changed estimates. The obtained fundamental frequency estimates have been compared with the spectrogram of the speech to confirm its accuracy.
https://doi.org/10.5370/KIEE.2011.60.12.2374 인용 PDF KSCI

선천성 청각장애성인의 시각적피드백 이용 음도치료 효과 (The Effect of Visual Feedback Intervention on Voice Pitch of Adult with Hearing Impairment)

어수지;윤미선
- 음성과학
- /
- 제12권4호
- /
- pp.215-226
- /
- 2005
This study is an attempt to investigate effect of pitch treatment program using visual feedback for profound deaf adults. Dr. Speech program was applied as a training tool. The subjects of this study were 3 profound deaf adults. Speech samples for evaluation were vowel prolongations and connected speech. Analysis was performed under the principle of single subject research design. As results of this study, all subjects showed the treatment effects which were represented by lowering fundamental frequency and speaking fundamental frequency.
PDF

변곡점 검출에 기반한 음성의 기본 주파수 추정 (Fundamental Frequency Estimation of Voiced Speech Signals Based on the Inflection Point Detection)

임병관
- 전기전자학회논문지
- /
- 제27권4호
- /
- pp.472-476
- /
- 2023
피치 혹은 기본 주파수는 음성 신호의 주요 특성 인자이며 음성 부호화, 음성인식, 화자인식 등의 다양한 음성 관련 응용에 활용된다. 본 논문에서는 기본 주파수의 역수인 음성의 피치 주기를 추정하기 위해서 음성 신호의 변곡점을 이용한다. 변곡점은 국소적인 최대값, 최소값 혹은 신호의 기울기가 변하는 지점으로 정의된다. 음성 신호는 저역통과 필터로 먼저 전처리되어 고주파 성분이 제거된다. 이를 통해 불필요한 변곡점들이 제거되며, 피치 주기 추정에 유용한 국소적인 최대값만을 변곡점 검출법을 이용하여 추출한다. 얻어진 변곡점 간의 시간 간격을 측정하여 피치 주기를 추정하며, 그 역수로 기본 주파수 추정치를 얻는다. 기존의 피치 추정 방법은 음성이 국소적으로 시불변이라는 가정하에 음성을 블록 단위로 처리하여 블록당 피치 주기를 구하지만, 제안된 방법은 음성을 샘플 단위로 처리하여 변곡점을 검출하며, 그 결과 피치 주기를 시간 경과에 따라 얻게 되어 음성의 시변성이 반영된 기본 주파수 추정치를 얻는다. 컴퓨터 모의실험으로 기본 주파수 추정기로서 제안된 방법의 유용성을 볼 수 있다.
https://doi.org/10.7471/ikeee.2023.27.4.472 인용 PDF

Filtering of a Dissonant Frequency for Speech Enhancement

Kang, Sang-Ki;Baek, Seong-Joon;Lee, Ki-Yong;Sun, Koeng-Mo
- The Journal of the Acoustical Society of Korea
- /
- 제22권3E호
- /
- pp.110-112
- /
- 2003
There have been numerous studies on the enhancement of the noisy speech signal. In this paper, we propose a completely new speech enhancement scheme, that is, a filtering of a dissonant frequency (especially F# in each octave of the tempered scale) based on the fundamental frequency which is developed in frequency domain. In order to evaluate the performance of the proposed enhancement scheme, subjective tests (MOS tests) were conducted. The subjective test results indicate that the proposed method provides a significant gain in audible improvement especially for speech contaminated by colored noise and speaking in a husky voice. Therefore when the filter is employed as a pre-filter for speech enhancement, the output speech quality and intelligibility is greatly enhanced.
PDF KSCI

시주파수 분석법을 이용한 음성의 기본주파수 검출 (Fundamental Frequency Estimation based on Time-Frequency Analysis)

임병관
- 대한전기학회논문지:시스템및제어부문D
- /
- 제55권1호
- /
- pp.31-34
- /
- 2006
A simple robust fundamental frequency estimator on the time-frequency domain is proposed. Combined with the appropriately designed low-pass filter, the instantaneous frequency estimator based on the Teager-Kaiser energy function can detect the fundamental frequency of speech signal. The Teager-Kaiser function can be obtained through real computation and show the change of frequency as time goes. And when a speech block with N samples is processed with a lowpass fille. with length of L, it requires $O(N{\cdot}(L+5))operations,$ compared to $O(N{\cdot}2log_2N+L))operations$ in the recently introduced wavelet and conventional instantaneous frequency method. The computer simulation confirms the usefulness of the proposed fundamental frequency estimation method.
PDF KSCI

Vowel Fundamental Frequency in Manner Differentiation of Korean Stops and Affricates

Jang, Tae-Yeoub
- 음성과학
- /
- 제7권1호
- /
- pp.217-232
- /
- 2000
In this study, I investigate the role of post-consonantal fundamental frequency (F0) as a cue for automatic distinction of types of Korean stops and affricates. Rather than examining data obtained by restricting contexts to a minimum to prevent the interference of irrelevant factors, a relatively natural speaker independent speech corpus is analysed. Automatic and statistical approaches are adopted to annotate data, to minimise speaker variability, and to evaluate the results. In spite of possible loss of information during those automatic analyses, statistics obtained suggest that vowel F0 is a useful cue for distinguishing manners of articulation of Korean non-continuant obstruents having the same place of articulation, especially of lax and aspirated stops and affricates. On the basis of the statistics, automatic classification is attempted over the relevant consonants in a specific context where the micro-prosodic effects appear to be maximised. The results confirm the usefulness of this effect in application for Korean phone recognition.
PDF

감정에 따른 음성의 기본주파수 실현 연구 (A Study of FO's realization in Emotional speech)

박미영;박미경
- 대한음성학회:학술대회논문집
- /
- 대한음성학회 2005년도 추계 학술대회 발표논문집
- /
- pp.79-85
- /
- 2005
In this Paper, we are trying to compare the normal speech with emotional speech -happy, sad, and angry states- through the changes of fundamental frequency. Based on the distribution charts of the normal and emotional speech, there are distinctive cues such as range of distribution, average, maximum, minimum, and so on. On the whole, the range of the fundamental frequency is extended in happy and angry states. On the other hand, sad states make the range relatively lessened. Nevertheless, the ranges of the 10 frequency in sad states are wider than the normal speech. In addition, we can verify that ending boundary tones reflect the information of whole speech.
PDF

음성장애의 병인 집단 간 추정 발화 기본주파수 절대 오차 비교 (A comparison of the absolute error of estimated speaking fundamental frequency (AEF0) among etiological groups of voice disorders)

이승진;임재열;김재옥
- 말소리와 음성과학
- /
- 제15권4호
- /
- pp.53-60
- /
- 2023
본 연구에서는 음성장애 환자에서 음성 범위 프로파일(voice range profile, VRP)과 말 범위 프로파일(speech range profile, SRP)을 이용한 추정 발화 기본주파수 절대 오차(absolute error of estimated speaking fundamental frequency, AEF0)를 음성장애의 병인 집단 간에 비교하여 차이를 확인하고,각 병인 집단 별로 AEF0와 관련된 변수들 간의 상관관계를 살펴보고자 하였다. 연구대상은 음성장애로 진단된 기능적(functional, FUNC), 기질적(organic, ORGAN), 신경학적(neurogenic, NEUR) 음성장애 환자군과 정상군(normal control, NC) 각 30명(남 15명, 여 15명)으로 총 120명이었다. 각 대상자로 하여금 음성, 말 범위 프로파일 과제를 수행하도록 하고 전기성문파형검사(electroglottography, EGG)를 통해 발화 기본주파수를 측정하였다. 병인 집단 간 AEF0의 비교 결과, Grade와 Severity는 병인 집단 간 차이가 없었던 반면, AEF0_VRP와 AEF0_SUM에서 병인 집단 간 차이가 있어 AEF0_VRP는 ORGAN이 FUNC와 NC보다 높았으며, AEF0_SUM은 ORGAN이 NC보다 높았다. 또한 FUNC와 NEUR에서는 AEF0가 Grade와 양의 상관관계를 보인 반면, ORGAN은 CQ(closed quotient)와 양의 상관관계가 있었다. 따라서 병인 집단에 따라 AEF0의 적용과 관련 음성 변수를 살펴보는 데 주의를 기울여야 할 것으로 보이며, 본 연구는 이러한 임상적 판단에 대한 기초 자료를 마련하는 데 일조한 것으로 여겨진다.
https://doi.org/10.13064/KSSS.2023.15.4.053 인용 PDF

자연스런 인간-로봇 상호작용을 위한 음성 신호의 AM-FM 성분 분해 및 순간 주파수와 순간 진폭의 추정에 관한 연구 (AM-FM Decomposition and Estimation of Instantaneous Frequency and Instantaneous Amplitude of Speech Signals for Natural Human-robot Interaction)

이희영
- 음성과학
- /
- 제12권4호
- /
- pp.53-70
- /
- 2005
A Vowel of speech signals are multicomponent signals composed of AM-FM components whose instantaneous frequency and instantaneous amplitude are time-varying. The changes of emotion states cause the variation of the instantaneous frequencies and the instantaneous amplitudes of AM-FM components. Therefore, it is important to estimate exactly the instantaneous frequencies and the instantaneous amplitudes of AM-FM components for the extraction of key information representing emotion states and changes in speech signals. In tills paper, firstly a method decomposing speech signals into AM - FM components is addressed. Secondly, the fundamental frequency of vowel sound is estimated by the simple method based on the spectrogram. The estimate of the fundamental frequency is used for decomposing speech signals into AM-FM components. Thirdly, an estimation method is suggested for separation of the instantaneous frequencies and the instantaneous amplitudes of the decomposed AM - FM components, based on Hilbert transform and the demodulation property of the extended Fourier transform. The estimates of the instantaneous frequencies and the instantaneous amplitudes can be used for modification of the spectral distribution and smooth connection of two words in the speech synthesis systems based on a corpus.
PDF

검색결과 203건 처리시간 0.026초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)