시간지연 회귀 신경회로망을 이용한 피치 악센트 인식

Automatic Recognition of Pitch Accents Using Time-Delay Recurrent Neural Network

  • Kim, Sung-Suk (School of Computer & Information, Yong-In University) ;
  • Kim, Chul (School of Computer & Information, Yong-In University) ;
  • Lee, Wan-Joo (School of Computer & Information, Yong-In University)
  • 발행 : 2004.12.01

초록

This paper presents a method for the automatic recognition of pitch accents with no prior knowledge about the phonetic content of the signal (no knowledge of word or phoneme boundaries or of phoneme labels). The recognition algorithm used in this paper is a time-delay recurrent neural network (TDRNN). A TDRNN is a neural network classier with two different representations of dynamic context: delayed input nodes allow the representation of an explicit trajectory F0(t), while recurrent nodes provide long-term context information that can be used to normalize the input F0 trajectory. Performance of the TDRNN is compared to the performance of a MLP (multi-layer perceptron) and an HMM (Hidden Markov Model) on the same task. The TDRNN shows the correct recognition of $91.9{\%}\;of\;pitch\;events\;and\;91.0{\%}$ of pitch non-events, for an average accuracy of $91.5{\%}$ over both pitch events and non-events. The MLP with contextual input exhibits $85.8{\%},\;85.5{\%},\;and\;85.6{\%}$ recognition accuracy respectively, while the HMM shows the correct recognition of $36.8{\%}\;of\;pitch\;events\;and\;87.3{\%}$ of pitch non-events, for an average accuracy of $62.2{\%}$ over both pitch events and non-events. These results suggest that the TDRNN architecture is useful for the automatic recognition of pitch accents.

키워드

참고문헌

  1. Mary E. Beckman and Janet Pierrehumbert. Intonational structure injapanese and english, Phonology Yearbook, 3:255-309, 1986 https://doi.org/10.1017/S095267570000066X
  2. Jennifer Cole, Hansook Choi, Heejin Kim, and Mark Hasegawa Johnson. The effect of accent on the acoustic cues to stop voicingin radio news speech, In Proc. Internat. Conf. Phonetic Sciences,2003
  3. Sung-Suk Kim, Timedelay recurrent neural network for temporalcorrelations and prediction, Neurocomputing, 20, pp. 253-263, 1998 https://doi.org/10.1016/S0925-2312(98)00018-6
  4. P.J. Price, M. Ostendorf, S. Shattuck-Hufnagel, and C. Fong. Theuse of prosody in syntactic disambiguation. J. Acoust. Soc. Am, 90(6) :2956-2970, Dec. 1991 https://doi.org/10.1121/1.401770
  5. Ji-Hwan Kim and Philip C. Woodland. The use of prosody in acombined system for punctuation generation and speech recognition.In Proc. EUROSPEECH, 2001
  6. P. Taylor, S. King, S. Isard, H. Wright and J. Kowtko, Usingintonation to constrain language models in speech recognition, inProc. EUROSPEECH, 1997
  7. Chnstine H. Nakatani and Julia Hirschberg. A corpus-based study ofrepair cues in spontaneous speech. J, Acoust. Soc. Am, 95(3):1603-1616, 1994 https://doi.org/10.1121/1.408547
  8. T Cho. Effects of Prosody on Articulatlon in English. PhD thesis,UCLA, 2001
  9. Kenneth DeJong. The supraglottal articulation of prominence inenglish: Linguistic stress as localized hyperarticulation. J. Acoust.Soc. Am, 89(1) :369-382, 1995 https://doi.org/10.1121/1.400674
  10. Cecile Fougeron and Patricia Keating. Articulatory strengthening atedges of prosodic domains. J. Acoust. Soc. Am, 101(6) :3728-3740,1997 https://doi.org/10.1121/1.418332
  11. R. Kompe. Prosody in Speech Understanding Systems. Springer-Verlag, 1997
  12. Colin Wightman and Mari Ostendorf. Automatic labeling of prosodicpatterns. IEEE Trans. Speech and Audio Processing, 2(4) :469-481,Oct 1994 https://doi.org/10.1109/89.326607
  13. Paul Taylor. Analysis and synthesis of intonation using the Tiltmodel. J. Acoust. Soc. Am, 107(3) :1697-1714, 2000 https://doi.org/10.1121/1.428453
  14. M. Ostendorf, P.J. Price, and S. Shattuck-Hufnagel. The BostonUniversity Radio News Corpus. Linguistic Data Consortium, 1995
  15. Mary E. Beckman and Qayle M. Ayers. Guidelines for ToBl Labelling:the Very Experimental HTML Version, www.ling.ohiostate.edu/research/phonetics/E ToBI/singer tobi.html, 1994
  16. Joseph F. Pitrelli, Mary Beckman, and Julia Hirschberg. Evaluationof prosodic transcription labeling reliability in the TOBI framework, InProc ICSLP, 1994
  17. Audra Dainora. Eliminating downstep in prosodic labeling of americanenglish, in ISCA Workshop on Prosody in Speech Recognition andUnderstanding, Pages 41-46, 2001
  18. E.G. Bard, C. Sotillo, A.H. Anderson, and M.M. Taylor. The DCIEMmap task corpus: Spontaneous dialogues under sleep deprivationand drug treatment, In Proc. ESCA-NATO Tutorial and Workshop onSpeech under Stress, Pages 25-28, Lisbon, 1995
  19. Alexander Waibel, Toshiyuki Hanazawa, Georey Hinton, KiyohiroShikano, and Kevin J. Lang. Phoneme recognition using time-delayneural networks. Trans. Acoust. Speech Sig. Proc., 37:328-339,1989
  20. Rumelhart D. E., McCIeIIand J. L., and the PDP Research Group.Learning representations by back-propagating errors, In ParallelDistributed Processing, 1, Pages 318-362. MIT Press, 1986
  21. M. Ostendorf and K. Ross. A multi-level model for recognition ofintonation labels, In Computing prosody: computational models forprocessing spontaneous speech. Springer-VerIag New York, Inc.,1997