Automatic Recognition of Pitch Accents Using Time-Delay Recurrent Neural Network

Kim, Sung-Suk;Kim, Chul;Lee, Wan-Joo;

The Journal of the Acoustical Society of Korea

제23권4E호
/
Pages.112-119
/
2004
/
1225-4428(pISSN)

한국음향학회 (The Acoustical Society of Korea)

시간지연 회귀 신경회로망을 이용한 피치 악센트 인식

Automatic Recognition of Pitch Accents Using Time-Delay Recurrent Neural Network

Kim, Sung-Suk (School of Computer & Information, Yong-In University) ;
Kim, Chul (School of Computer & Information, Yong-In University) ;
Lee, Wan-Joo (School of Computer & Information, Yong-In University)

발행 : 2004.12.01

PDF KSCI

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

This paper presents a method for the automatic recognition of pitch accents with no prior knowledge about the phonetic content of the signal (no knowledge of word or phoneme boundaries or of phoneme labels). The recognition algorithm used in this paper is a time-delay recurrent neural network (TDRNN). A TDRNN is a neural network classier with two different representations of dynamic context: delayed input nodes allow the representation of an explicit trajectory F0(t), while recurrent nodes provide long-term context information that can be used to normalize the input F0 trajectory. Performance of the TDRNN is compared to the performance of a MLP (multi-layer perceptron) and an HMM (Hidden Markov Model) on the same task. The TDRNN shows the correct recognition of $91.9{\%}\;of\;pitch\;events\;and\;91.0{\%}$ of pitch non-events, for an average accuracy of $91.5{\%}$ over both pitch events and non-events. The MLP with contextual input exhibits $85.8{\%},\;85.5{\%},\;and\;85.6{\%}$ recognition accuracy respectively, while the HMM shows the correct recognition of $36.8{\%}\;of\;pitch\;events\;and\;87.3{\%}$ of pitch non-events, for an average accuracy of $62.2{\%}$ over both pitch events and non-events. These results suggest that the TDRNN architecture is useful for the automatic recognition of pitch accents.

키워드

참고문헌

Mary E. Beckman and Janet Pierrehumbert. Intonational structure injapanese and english, Phonology Yearbook, 3:255-309, 1986 https://doi.org/10.1017/S095267570000066X
Jennifer Cole, Hansook Choi, Heejin Kim, and Mark Hasegawa Johnson. The effect of accent on the acoustic cues to stop voicingin radio news speech, In Proc. Internat. Conf. Phonetic Sciences,2003
Sung-Suk Kim, Timedelay recurrent neural network for temporalcorrelations and prediction, Neurocomputing, 20, pp. 253-263, 1998 https://doi.org/10.1016/S0925-2312(98)00018-6
P.J. Price, M. Ostendorf, S. Shattuck-Hufnagel, and C. Fong. Theuse of prosody in syntactic disambiguation. J. Acoust. Soc. Am, 90(6) :2956-2970, Dec. 1991 https://doi.org/10.1121/1.401770
Ji-Hwan Kim and Philip C. Woodland. The use of prosody in acombined system for punctuation generation and speech recognition.In Proc. EUROSPEECH, 2001
P. Taylor, S. King, S. Isard, H. Wright and J. Kowtko, Usingintonation to constrain language models in speech recognition, inProc. EUROSPEECH, 1997
Chnstine H. Nakatani and Julia Hirschberg. A corpus-based study ofrepair cues in spontaneous speech. J, Acoust. Soc. Am, 95(3):1603-1616, 1994 https://doi.org/10.1121/1.408547
T Cho. Effects of Prosody on Articulatlon in English. PhD thesis,UCLA, 2001
Kenneth DeJong. The supraglottal articulation of prominence inenglish: Linguistic stress as localized hyperarticulation. J. Acoust.Soc. Am, 89(1) :369-382, 1995 https://doi.org/10.1121/1.400674
Cecile Fougeron and Patricia Keating. Articulatory strengthening atedges of prosodic domains. J. Acoust. Soc. Am, 101(6) :3728-3740,1997 https://doi.org/10.1121/1.418332
R. Kompe. Prosody in Speech Understanding Systems. Springer-Verlag, 1997
Colin Wightman and Mari Ostendorf. Automatic labeling of prosodicpatterns. IEEE Trans. Speech and Audio Processing, 2(4) :469-481,Oct 1994 https://doi.org/10.1109/89.326607
Paul Taylor. Analysis and synthesis of intonation using the Tiltmodel. J. Acoust. Soc. Am, 107(3) :1697-1714, 2000 https://doi.org/10.1121/1.428453
M. Ostendorf, P.J. Price, and S. Shattuck-Hufnagel. The BostonUniversity Radio News Corpus. Linguistic Data Consortium, 1995
Mary E. Beckman and Qayle M. Ayers. Guidelines for ToBl Labelling:the Very Experimental HTML Version, www.ling.ohiostate.edu/research/phonetics/E ToBI/singer tobi.html, 1994
Joseph F. Pitrelli, Mary Beckman, and Julia Hirschberg. Evaluationof prosodic transcription labeling reliability in the TOBI framework, InProc ICSLP, 1994
Audra Dainora. Eliminating downstep in prosodic labeling of americanenglish, in ISCA Workshop on Prosody in Speech Recognition andUnderstanding, Pages 41-46, 2001
E.G. Bard, C. Sotillo, A.H. Anderson, and M.M. Taylor. The DCIEMmap task corpus: Spontaneous dialogues under sleep deprivationand drug treatment, In Proc. ESCA-NATO Tutorial and Workshop onSpeech under Stress, Pages 25-28, Lisbon, 1995
Alexander Waibel, Toshiyuki Hanazawa, Georey Hinton, KiyohiroShikano, and Kevin J. Lang. Phoneme recognition using time-delayneural networks. Trans. Acoust. Speech Sig. Proc., 37:328-339,1989
Rumelhart D. E., McCIeIIand J. L., and the PDP Research Group.Learning representations by back-propagating errors, In ParallelDistributed Processing, 1, Pages 318-362. MIT Press, 1986
M. Ostendorf and K. Ross. A multi-level model for recognition ofintonation labels, In Computing prosody: computational models forprocessing spontaneous speech. Springer-VerIag New York, Inc.,1997

The Journal of the Acoustical Society of Korea

시간지연 회귀 신경회로망을 이용한 피치 악센트 인식

Automatic Recognition of Pitch Accents Using Time-Delay Recurrent Neural Network

초록

키워드

참고문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)