[KSCI] Korea Science Citation Index Service

Automatic Recognition of Pitch Accents Using Time-Delay Recurrent Neural Network

Kim, Sung-Suk (School of Computer & Information, Yong-In University)
Kim, Chul (School of Computer & Information, Yong-In University)
Lee, Wan-Joo (School of Computer & Information, Yong-In University)

Publication Information

The Journal of the Acoustical Society of Korea / v.23, no.4E, 2004 , pp. 112-119 More about this Journal

Abstract

This paper presents a method for the automatic recognition of pitch accents with no prior knowledge about the phonetic content of the signal (no knowledge of word or phoneme boundaries or of phoneme labels). The recognition algorithm used in this paper is a time-delay recurrent neural network (TDRNN). A TDRNN is a neural network classier with two different representations of dynamic context: delayed input nodes allow the representation of an explicit trajectory F0(t), while recurrent nodes provide long-term context information that can be used to normalize the input F0 trajectory. Performance of the TDRNN is compared to the performance of a MLP (multi-layer perceptron) and an HMM (Hidden Markov Model) on the same task. The TDRNN shows the correct recognition of $91.9{\%}\;of\;pitch\;events\;and\;91.0{\%}$ of pitch non-events, for an average accuracy of $91.5{\%}$ over both pitch events and non-events. The MLP with contextual input exhibits $85.8{\%},\;85.5{\%},\;and\;85.6{\%}$ recognition accuracy respectively, while the HMM shows the correct recognition of $36.8{\%}\;of\;pitch\;events\;and\;87.3{\%}$ of pitch non-events, for an average accuracy of $62.2{\%}$ over both pitch events and non-events. These results suggest that the TDRNN architecture is useful for the automatic recognition of pitch accents.

Keywords

Pitch accent; Prosody recognition; Speech recognition; TDRNN; MLP; HMM;

Citations & Related Records

Reference

1	Jennifer Cole, Hansook Choi, Heejin Kim, and Mark Hasegawa Johnson. The effect of accent on the acoustic cues to stop voicingin radio news speech, In Proc. Internat. Conf. Phonetic Sciences,2003
2	Sung-Suk Kim, Timedelay recurrent neural network for temporalcorrelations and prediction, Neurocomputing, 20, pp. 253-263, 1998 DOI ScienceOn
3	Ji-Hwan Kim and Philip C. Woodland. The use of prosody in acombined system for punctuation generation and speech recognition.In Proc. EUROSPEECH, 2001
4	P. Taylor, S. King, S. Isard, H. Wright and J. Kowtko, Usingintonation to constrain language models in speech recognition, inProc. EUROSPEECH, 1997
5	Alexander Waibel, Toshiyuki Hanazawa, Georey Hinton, KiyohiroShikano, and Kevin J. Lang. Phoneme recognition using time-delayneural networks. Trans. Acoust. Speech Sig. Proc., 37:328-339,1989
6	M. Ostendorf and K. Ross. A multi-level model for recognition ofintonation labels, In Computing prosody: computational models forprocessing spontaneous speech. Springer-VerIag New York, Inc.,1997
7	T Cho. Effects of Prosody on Articulatlon in English. PhD thesis,UCLA, 2001
8	Kenneth DeJong. The supraglottal articulation of prominence inenglish: Linguistic stress as localized hyperarticulation. J. Acoust.Soc. Am, 89(1) :369-382, 1995 DOI
9	Audra Dainora. Eliminating downstep in prosodic labeling of americanenglish, in ISCA Workshop on Prosody in Speech Recognition andUnderstanding, Pages 41-46, 2001
10	Chnstine H. Nakatani and Julia Hirschberg. A corpus-based study ofrepair cues in spontaneous speech. J, Acoust. Soc. Am, 95(3):1603-1616, 1994 DOI ScienceOn
11	M. Ostendorf, P.J. Price, and S. Shattuck-Hufnagel. The BostonUniversity Radio News Corpus. Linguistic Data Consortium, 1995
12	Colin Wightman and Mari Ostendorf. Automatic labeling of prosodicpatterns. IEEE Trans. Speech and Audio Processing, 2(4) :469-481,Oct 1994 DOI ScienceOn
13	Mary E. Beckman and Qayle M. Ayers. Guidelines for ToBl Labelling:the Very Experimental HTML Version, www.ling.ohiostate.edu/research/phonetics/E ToBI/singer tobi.html, 1994
14	Joseph F. Pitrelli, Mary Beckman, and Julia Hirschberg. Evaluationof prosodic transcription labeling reliability in the TOBI framework, InProc ICSLP, 1994
15	R. Kompe. Prosody in Speech Understanding Systems. Springer-Verlag, 1997
16	Mary E. Beckman and Janet Pierrehumbert. Intonational structure injapanese and english, Phonology Yearbook, 3:255-309, 1986 DOI
17	Cecile Fougeron and Patricia Keating. Articulatory strengthening atedges of prosodic domains. J. Acoust. Soc. Am, 101(6) :3728-3740,1997 DOI ScienceOn
18	P.J. Price, M. Ostendorf, S. Shattuck-Hufnagel, and C. Fong. Theuse of prosody in syntactic disambiguation. J. Acoust. Soc. Am, 90(6) :2956-2970, Dec. 1991 DOI PUBMED
19	Paul Taylor. Analysis and synthesis of intonation using the Tiltmodel. J. Acoust. Soc. Am, 107(3) :1697-1714, 2000 DOI PUBMED ScienceOn
20	E.G. Bard, C. Sotillo, A.H. Anderson, and M.M. Taylor. The DCIEMmap task corpus: Spontaneous dialogues under sleep deprivationand drug treatment, In Proc. ESCA-NATO Tutorial and Workshop onSpeech under Stress, Pages 25-28, Lisbon, 1995
21	Rumelhart D. E., McCIeIIand J. L., and the PDP Research Group.Learning representations by back-propagating errors, In ParallelDistributed Processing, 1, Pages 318-362. MIT Press, 1986

KSCI

Automatic Recognition of Pitch Accents Using Time-Delay Recurrent Neural Network 시간지연 회귀 신경회로망을 이용한 피치 악센트 인식

Automatic Recognition of Pitch Accents Using Time-Delay Recurrent Neural Network