Browse > Article
http://dx.doi.org/10.7776/ASK.2013.32.2.167

Classification of Diphthongs using Acoustic Phonetic Parameters  

Lee, Suk-Myung (School of Electrical and Electronic Engineering, Yonsei University)
Choi, Jeung-Yoon (School of Electrical and Electronic Engineering, Yonsei University)
Abstract
This work examines classification of diphthongs, as part of a distinctive feature-based speech recognition system. Acoustic measurements related to the vocal tract and the voice source are examined, and analysis of variance (ANOVA) results show that vowel duration, energy trajectory, and formant variation are significant. A balanced error rate of 17.8% is obtained for 2-way diphthong classification on the TIMIT database, and error rates of 32.9%, 29.9%, and 20.2% are obtained for /aw/, /ay/, and /oy/, for 4-way classification, respectively. Adding the acoustic features to widely used Mel-frequency cepstral coefficients also improves classification.
Keywords
Diphthong; Diphthong classification; Acoustic phonetic parameter; Speech recognition;
Citations & Related Records
연도 인용수 순위
  • Reference
1 K. N. Stevens, "Toward a model for lexical access based on acoustic landmarks and distinctive features," J. Acoust. Soc. Am. 111, 1872-1891 (2002).   DOI   ScienceOn
2 I. Lehiste and G. E. Peterson, "Transitions, glides, and diphthongs," J. Acoust. Soc. Am. 33, 268-277 (1961).   DOI
3 A. Holbrook and G. Fairbanks, "Diphthong formants and their movements," J. Speech and Hearing Res. 5, 38-58 (1962).   DOI
4 B. Yang, "An acoustic study of English diphthongs produced by American males and females," Phonetics and Speech Sciences, 2, 43-50 (2010).
5 R. Carlson and J. Glass, "Vowel classification based on analysis-by-synthesis," in Proc. Int. Conf. Spoken Language Processing, 575-578 (1992).
6 C. Y. Espy-Wilson, "Acoustic measures for linguistic features distinguishing the semivowels in American English," J. Acoust. Soc. Am. 92, 736-757 (1992).   DOI
7 J. Gustafson and K. Sjolander, "Educational tools for speech technology," in Proc. Fonetik, 176-179 (1998).
8 J. S. Garofalo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, D. S. Pallett, and N. L. Dahlgren, "The DARPA TIMIT acousticphonetic continuous speech corpus CDROM," Linguistic Data Consortium (1993).
9 I. Read and S. Cox, "Automatic pitch accent prediction for Text-To-Speech synthesis," in Proc. Interspeech, 482-485 (2007).
10 J. Hillenbrand, L. A. Getty, M. J. Clark, and K. Wheeler, "Acoustic characteristics of American English vowels," J. Acoust. Soc. Am. 97, 3099-3111 (1995).   DOI   ScienceOn
11 R. G. Miller, Beyond ANOVA: Basics of Applied Statistics. (Chapman & Hall, New York, 1997).