Automatic Recognition of Pitch Accent Using Distributed Time-Delay Recursive Neural Network

Kim Sung-Suk;

doi:10.7776/ASK.2006.25.6.277

한국음향학회지 (The Journal of the Acoustical Society of Korea)

제25권6호
/
Pages.277-281
/
2006
/
1225-4428(pISSN)
/
2287-3775(eISSN)

한국음향학회 (The Acoustical Society of Korea)

DOI QR Code

분산 시간지연 회귀신경망을 이용한 피치 악센트 자동 인식

Automatic Recognition of Pitch Accent Using Distributed Time-Delay Recursive Neural Network

김성석 (용인대학교 컴퓨터정보학과)

Kim Sung-Suk

발행 : 2006.08.01

https://doi.org/10.7776/ASK.2006.25.6.277 인용 PDF KSCI

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

본 논문에서는 시간지연 회귀신경회로망을 이용한 음절 레벨에서의 피치 악센트 자동 인식 방법을 제안한다. 시간지연 회귀 신경회로망은 두 종류의 동적 문맥정보를 표현한다. 시간지연 회귀신경회로망의 시간지연 입력 노드는 시간 축 상의 피치 및 에너지 궤도를 표현하고, 회귀 노드는 피치 악센트의 특성을 반영하는 문맥 정보를 표현한다. 본 논문에서는 이러한 시간지연 회귀신경회로망을 두 가지 형태로 구성하여 피치 악센트 자동 인식에 적용한다. 하나의 형태는 단일 시간지연 회귀 신경회로망에서 복수 개의 운율 특정파라미터 (피치, 에너지, 지속시간)를 입력 노드에 함께 공급하여 피치 악센트 인식을 수행하고, 다른 하나는 분산 시간지연 회귀 신경회로망을 이용하여 피치 악센트 인식을 수행한다. 분산 시간지연 회귀 신경회로망은 여러 개의 시간지연 회귀 신경회로망으로 구성되고, 각 시간지연 회귀 신경회로망은 단일 운율 특징 파라미터만으로 학습된다. 분산 시간지연 회귀 신경회로망의 인식결과는 개별 시간지연 회귀 신경회로망의 출력 값의 가중치 합으로 결정된다. 화자 독립 피치 악센트 인식 실험을 위해 보스톤 라디오 뉴스 코퍼스 (BRNC)를 사용하였다. 실험결과, 분산 시간지연 회귀 신경회로망은 83.64%의 피치 악센트 인식률을 보였다.

This paper presents a method for the automatic recognition of pitch accents over syllables. The method that we propose is based on the time-delay recursive neural network (TDRNN). which is a neural network classifier with two different representation of dynamic context: the delayed input nodes allow the representation of an explicit trajectory F0(t) along time. while the recursive nodes provide long-term context information that reflects the characteristics of pitch accentuation in spoken English. We apply the TDRNN to pitch accent recognition in two forms: in the normal TDRNN. all of the prosodic features (pitch. energy, duration) are used as an entire set in a single TDRNN. while in the distributed TDRNN. the network consists of several TDRNNs each taking a single prosodic feature as the input. The final output of the distributed TDRNN is weighted sum of the output of individual TDRNN. We used the Boston Radio News Corpus (BRNC) for the experiments on the speaker-independent pitch accent recognition. π 1e experimental results show that the distributed TDRNN exhibits an average recognition accuracy of 83.64% over both pitch events and non-events.

키워드

참고문헌

Mary E. Beckman and Janet Pierrehumbert, 'Intonational structure in Japanese and English,' Phonology Yearbook, 3 255-309, 1986 https://doi.org/10.1017/S095267570000066X
Jennifer Cole, Hansook Choi, Heejin Kim, and Mark Hasegawa-Johnson, 'The effect of accent on the acoustic cues to stop voicing in radio news speech,' in Proc. Inter. Conf. Phonetic 2003
P.J. Price, M. Ostendorf, S. Shattuck-Hufnagel, and C. Fong, 'The use of prosody in syntactic disambiguation,' J. Acoust. Soc. Am, 90 (6) 2956-2970, 1991 https://doi.org/10.1121/1.401770
Ji-Hwan Kim and Philip C. Woodland, 'The use of prosody in a combined system for punctuation generation and speech recognition,' in Proc. EUROSPEECH, 2001
P. Taylor, S. King, S. Isard, H. Wright and J. Kowtko, 'Using intonation to constrain language models in speech recognition,' in Proc. EUROSPEECH, 1997
Christine H. Nakatani and Julia Hirschberg, 'A corpus-based study of repair cues in spontaneous speech,' J. Acoust. Soc. Am, 95 (3) 1603-1616, 1994 https://doi.org/10.1121/1.408547
Paul Taylor,'Analysis and synthesis of intonation using the . Tilt model,' J. Acoust. Soc. Am, 107 (3) 1697-1714, 2000 https://doi.org/10.1121/1.428453
M. Ostendorf and K. Ross, 'A multi-level model for recognition of intonation labels,' in Computing prosody: computational models for processing spontaneous speech. Springer-Verlag New York, Inc., 1997
Sung-Suk Kim, 'Time-delay recurrent neural network for temporal correlations and prediction,' Neuorcomputing, 20 253-263, Elsevier 1998
Rumelhart D. E., McClelland J. L., and the PDP Research Group, 'Learning representations by back-propagating errors,' in Parallel Distributed Processing, 1 318-362. MIT Press, 1986
M. Ostendorf, P.J. Price, and S. Shattuck-Hufnagel, 'The Boston University Radio News Corpus,' Linguistic Data Consortium, 1995
Joseph F. Pitrelli, Mary Beckman, and Julia Hirschberg, 'Evaluation of prosodic transcription labeling reliability in the TOBI framework,' in Proc. ICSLP, 1994

한국음향학회지 (The Journal of the Acoustical Society of Korea)

분산 시간지연 회귀신경망을 이용한 피치 악센트 자동 인식

Automatic Recognition of Pitch Accent Using Distributed Time-Delay Recursive Neural Network

초록

키워드

참고문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)