A Neural Network Based Korean Segmental Duration Modeling Using Tonal Information of Phonemes

;;;

The Journal of the Acoustical Society of Korea (한국음향학회지)

Volume 18 Issue 6
/
Pages.84-88
/
1999
/
1225-4428(pISSN)
/
2287-3775(eISSN)

The Acoustical Society of Korea (한국음향학회)

A Neural Network Based Korean Segmental Duration Modeling Using Tonal Information of Phonemes

음소별 성조 정보를 이용한 신경망 기반의 한국어 음소 지속시간 모델링

김은경 (한국과학기술원) ;
이상호 (한국과학기술원) ;
오영환 (한국과학기술원)

Published : 1999.08.01

PDF

Download PDF

⟨ Previous Next ⟩

Abstract

The accurate estimation of segmental duration is crucial for natural-sounding text-to-speech synthesis. For predicting Korean segmental durations, conventional methods utilized phonemic context, part-of-speech context and locational information in prosodic phrase. In this paper, the tonal information of phonemes is employed for more accurate prediction. After defining two non-boundary tones and six boundary tones, we annotated the tonal label on each syllable of 400 sentences. To predict segmental duration using tonal information, we constructed neural networks with a real-valued output node predicting phonemic duration and trained them by backpropagation algorithm. Experimental results showed that the proposed features are effective for predicting Korean segmental durations, and we got 0.863 correlation coefficient of the observed durations and predicted ones.

음소별 지속시간의 정확한 예측은 TTS 시스템의 자연성을 향상시키는데 중요한 역할을 한다. 기존의 한국어 음소 지속 시간의 모델링을 위해 사용된 특징 변수에는 음소 문맥 정보, 품사 정보, 운율구 내에서의 위치 정보 등이 있다. 본 논문에서는 음소별 성조 정보 값을 새로운 특징 변수로 정의하여 예측 성능을 향상시키고자 한다. 성조 정보의 표현을 위해 두 개의 비경계 성조와 여섯 개의 경계 성조를 정의한 후, 400문장의 음성 코퍼스에 음절별 표기를 수행하였다. 성조 정보를 이용한 지속 시간 예측을 위해, 출력노드에서 음소의 지속 시간을 실수 형태로 출력하는 신경망을 구성하고 이를 오류 역전파 알고리즘으로 학습시켰다. 실험 결과, 성조 정보를 사용하는 경우 실험 데이터에 대해 예측값과 실제값 사이의 상관계수로 0.863을 얻을 수 있었으며 이는 성조를 사용하지 않는 경우에 비해 향상된 성능을 나타내었다.

Keywords

References

Talking Machines: Theories, Models, Designs Tree-based modelling of segmental duration M. D. Riley;G. Bailly(ed.); C. Benoit(ed.);T.R. Sawallis(ed.)
Computer Speech and Language v.8 Assignment of Segmental Duration in Text-to-Speech Synthesis Jan P. H. van Santen
PhD Thesis, Swiss Federal Institute of Technology Controlling Segmental Duration in Speech Synthesis Systems Marcel Pazi Riedi
한국음향학회지 v.17 no.6 운율구 추출 및 음소 지속 시간의 트리 기반 모델링 이상호;오영환
한국음향학회 학술발표대회 논문집 v.17 no.2 정규화 지속시간 희귀트리를 기반으로 한 음운지속시간 모델화 정지혜;김인영;이양희
한국정보과학회 봄 학술발표논문집(B) v.26 no.1 한국어 문서 음성 변환 시스템을 위한 신경회로망 기반의 음소 지속시간 모델링 김은경;이상호;오영환
한국음향학회지 v.15 no.3 한국어 문서 음성 변환 시스템을 위한 문서 분석기 이상호;오영환;서정연
한국어의 표준발음 이현복
K-ToBI (KoreanToBI) Labeling Conventions M. Beckman;S.A. Jun
한국음향학회 학술발표대회 논문집 v.16 no.2 연속된 발화에서 운율구 내 high tone의 위치 김선미;성굉모

The Journal of the Acoustical Society of Korea (한국음향학회지)

A Neural Network Based Korean Segmental Duration Modeling Using Tonal Information of Phonemes

음소별 성조 정보를 이용한 신경망 기반의 한국어 음소 지속시간 모델링

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)