Browse > Article

Automatic Recognition of Pitch Accents Using Time-Delay Recurrent Neural Network  

Kim, Sung-Suk (School of Computer & Information, Yong-In University)
Kim, Chul (School of Computer & Information, Yong-In University)
Lee, Wan-Joo (School of Computer & Information, Yong-In University)
Abstract
This paper presents a method for the automatic recognition of pitch accents with no prior knowledge about the phonetic content of the signal (no knowledge of word or phoneme boundaries or of phoneme labels). The recognition algorithm used in this paper is a time-delay recurrent neural network (TDRNN). A TDRNN is a neural network classier with two different representations of dynamic context: delayed input nodes allow the representation of an explicit trajectory F0(t), while recurrent nodes provide long-term context information that can be used to normalize the input F0 trajectory. Performance of the TDRNN is compared to the performance of a MLP (multi-layer perceptron) and an HMM (Hidden Markov Model) on the same task. The TDRNN shows the correct recognition of $91.9{\%}\;of\;pitch\;events\;and\;91.0{\%}$ of pitch non-events, for an average accuracy of $91.5{\%}$ over both pitch events and non-events. The MLP with contextual input exhibits $85.8{\%},\;85.5{\%},\;and\;85.6{\%}$ recognition accuracy respectively, while the HMM shows the correct recognition of $36.8{\%}\;of\;pitch\;events\;and\;87.3{\%}$ of pitch non-events, for an average accuracy of $62.2{\%}$ over both pitch events and non-events. These results suggest that the TDRNN architecture is useful for the automatic recognition of pitch accents.
Keywords
Pitch accent; Prosody recognition; Speech recognition; TDRNN; MLP; HMM;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Jennifer Cole, Hansook Choi, Heejin Kim, and Mark Hasegawa Johnson. The effect of accent on the acoustic cues to stop voicingin radio news speech, In Proc. Internat. Conf. Phonetic Sciences,2003
2 Sung-Suk Kim, Timedelay recurrent neural network for temporalcorrelations and prediction, Neurocomputing, 20, pp. 253-263, 1998   DOI   ScienceOn
3 Ji-Hwan Kim and Philip C. Woodland. The use of prosody in acombined system for punctuation generation and speech recognition.In Proc. EUROSPEECH, 2001
4 P. Taylor, S. King, S. Isard, H. Wright and J. Kowtko, Usingintonation to constrain language models in speech recognition, inProc. EUROSPEECH, 1997
5 Alexander Waibel, Toshiyuki Hanazawa, Georey Hinton, KiyohiroShikano, and Kevin J. Lang. Phoneme recognition using time-delayneural networks. Trans. Acoust. Speech Sig. Proc., 37:328-339,1989
6 M. Ostendorf and K. Ross. A multi-level model for recognition ofintonation labels, In Computing prosody: computational models forprocessing spontaneous speech. Springer-VerIag New York, Inc.,1997
7 T Cho. Effects of Prosody on Articulatlon in English. PhD thesis,UCLA, 2001
8 Kenneth DeJong. The supraglottal articulation of prominence inenglish: Linguistic stress as localized hyperarticulation. J. Acoust.Soc. Am, 89(1) :369-382, 1995   DOI
9 Audra Dainora. Eliminating downstep in prosodic labeling of americanenglish, in ISCA Workshop on Prosody in Speech Recognition andUnderstanding, Pages 41-46, 2001
10 Chnstine H. Nakatani and Julia Hirschberg. A corpus-based study ofrepair cues in spontaneous speech. J, Acoust. Soc. Am, 95(3):1603-1616, 1994   DOI   ScienceOn
11 M. Ostendorf, P.J. Price, and S. Shattuck-Hufnagel. The BostonUniversity Radio News Corpus. Linguistic Data Consortium, 1995
12 Colin Wightman and Mari Ostendorf. Automatic labeling of prosodicpatterns. IEEE Trans. Speech and Audio Processing, 2(4) :469-481,Oct 1994   DOI   ScienceOn
13 Mary E. Beckman and Qayle M. Ayers. Guidelines for ToBl Labelling:the Very Experimental HTML Version, www.ling.ohiostate.edu/research/phonetics/E ToBI/singer tobi.html, 1994
14 Joseph F. Pitrelli, Mary Beckman, and Julia Hirschberg. Evaluationof prosodic transcription labeling reliability in the TOBI framework, InProc ICSLP, 1994
15 R. Kompe. Prosody in Speech Understanding Systems. Springer-Verlag, 1997
16 Mary E. Beckman and Janet Pierrehumbert. Intonational structure injapanese and english, Phonology Yearbook, 3:255-309, 1986   DOI
17 Cecile Fougeron and Patricia Keating. Articulatory strengthening atedges of prosodic domains. J. Acoust. Soc. Am, 101(6) :3728-3740,1997   DOI   ScienceOn
18 P.J. Price, M. Ostendorf, S. Shattuck-Hufnagel, and C. Fong. Theuse of prosody in syntactic disambiguation. J. Acoust. Soc. Am, 90(6) :2956-2970, Dec. 1991   DOI   PUBMED
19 Paul Taylor. Analysis and synthesis of intonation using the Tiltmodel. J. Acoust. Soc. Am, 107(3) :1697-1714, 2000   DOI   PUBMED   ScienceOn
20 E.G. Bard, C. Sotillo, A.H. Anderson, and M.M. Taylor. The DCIEMmap task corpus: Spontaneous dialogues under sleep deprivationand drug treatment, In Proc. ESCA-NATO Tutorial and Workshop onSpeech under Stress, Pages 25-28, Lisbon, 1995
21 Rumelhart D. E., McCIeIIand J. L., and the PDP Research Group.Learning representations by back-propagating errors, In ParallelDistributed Processing, 1, Pages 318-362. MIT Press, 1986