Browse > Article
http://dx.doi.org/10.7776/ASK.2006.25.6.277

Automatic Recognition of Pitch Accent Using Distributed Time-Delay Recursive Neural Network  

Kim Sung-Suk (용인대학교 컴퓨터정보학과)
Abstract
This paper presents a method for the automatic recognition of pitch accents over syllables. The method that we propose is based on the time-delay recursive neural network (TDRNN). which is a neural network classifier with two different representation of dynamic context: the delayed input nodes allow the representation of an explicit trajectory F0(t) along time. while the recursive nodes provide long-term context information that reflects the characteristics of pitch accentuation in spoken English. We apply the TDRNN to pitch accent recognition in two forms: in the normal TDRNN. all of the prosodic features (pitch. energy, duration) are used as an entire set in a single TDRNN. while in the distributed TDRNN. the network consists of several TDRNNs each taking a single prosodic feature as the input. The final output of the distributed TDRNN is weighted sum of the output of individual TDRNN. We used the Boston Radio News Corpus (BRNC) for the experiments on the speaker-independent pitch accent recognition. π 1e experimental results show that the distributed TDRNN exhibits an average recognition accuracy of 83.64% over both pitch events and non-events.
Keywords
Pitch accent; Prosody; TDRNN; Distributed TDRNN;
Citations & Related Records
연도 인용수 순위
  • Reference
1 P.J. Price, M. Ostendorf, S. Shattuck-Hufnagel, and C. Fong, 'The use of prosody in syntactic disambiguation,' J. Acoust. Soc. Am, 90 (6) 2956-2970, 1991   DOI
2 Sung-Suk Kim, 'Time-delay recurrent neural network for temporal correlations and prediction,' Neuorcomputing, 20 253-263, Elsevier 1998
3 Jennifer Cole, Hansook Choi, Heejin Kim, and Mark Hasegawa-Johnson, 'The effect of accent on the acoustic cues to stop voicing in radio news speech,' in Proc. Inter. Conf. Phonetic 2003
4 Rumelhart D. E., McClelland J. L., and the PDP Research Group, 'Learning representations by back-propagating errors,' in Parallel Distributed Processing, 1 318-362. MIT Press, 1986
5 M. Ostendorf, P.J. Price, and S. Shattuck-Hufnagel, 'The Boston University Radio News Corpus,' Linguistic Data Consortium, 1995
6 Mary E. Beckman and Janet Pierrehumbert, 'Intonational structure in Japanese and English,' Phonology Yearbook, 3 255-309, 1986   DOI
7 M. Ostendorf and K. Ross, 'A multi-level model for recognition of intonation labels,' in Computing prosody: computational models for processing spontaneous speech. Springer-Verlag New York, Inc., 1997
8 P. Taylor, S. King, S. Isard, H. Wright and J. Kowtko, 'Using intonation to constrain language models in speech recognition,' in Proc. EUROSPEECH, 1997
9 Christine H. Nakatani and Julia Hirschberg, 'A corpus-based study of repair cues in spontaneous speech,' J. Acoust. Soc. Am, 95 (3) 1603-1616, 1994   DOI   ScienceOn
10 Paul Taylor,'Analysis and synthesis of intonation using the . Tilt model,' J. Acoust. Soc. Am, 107 (3) 1697-1714, 2000   DOI   ScienceOn
11 Joseph F. Pitrelli, Mary Beckman, and Julia Hirschberg, 'Evaluation of prosodic transcription labeling reliability in the TOBI framework,' in Proc. ICSLP, 1994
12 Ji-Hwan Kim and Philip C. Woodland, 'The use of prosody in a combined system for punctuation generation and speech recognition,' in Proc. EUROSPEECH, 2001