Browse > Article
http://dx.doi.org/10.5762/KAIS.2012.13.1.333

Automatic Vowel Onset Point Detection Based on Auditory Frequency Response  

Zang, Xian (Electronics and Information Department, Chonbuk National University)
Kim, Hag-Tae (Electronics and Information Department, Chonbuk National University)
Chong, Kil-To (Electronics and Information Department, Chonbuk National University)
Publication Information
Journal of the Korea Academia-Industrial cooperation Society / v.13, no.1, 2012 , pp. 333-342 More about this Journal
Abstract
This paper presents a vowel onset point (VOP) detection method based on the human auditory system. This method maps the "perceptual" frequency scale, i.e. Mel scale onto a linear acoustic frequency, and then establishes a series of Triangular Mel-weighted Filter Bank simulate the function of band pass filtering in human ear. This nonlinear critical-band filter bank helps greatly reduce the data dimensionality, and eliminate the effect of harmonic waves to make the formants more prominent in the nonlinear spaced Mel spectrum. The sum of mel spectrum peaks energy is extracted as feature for each frame, and the instinct at which the energy amplitude starts rising sharply is detected as VOP, by convolving with Gabor window. For the single-word database which contains 12 vowels articulated with different kinds of consonants, the experimental results showed a good average detection rate of 72.73%, higher than other vowel detection methods based on short-time energy and zero-crossing rate.
Keywords
VOP detection; Formant; Human auditory system; Mel Scale; Triangular Mel-weighted filter bank; Gabor window;
Citations & Related Records
연도 인용수 순위
  • Reference
1 J. O. Pickles, "An introduction to the Physiology of Hearing", New York: Academic press, 1988.
2 A. R. Moller, "Auditory Physiology", New York: Academic press, 1983.
3 Stevens, SS, Volkman, J, "The relation of pitch to frequency", American Journal of Psychology, Vol.53, pg. 329.   DOI
4 J. R. Deller, J. G. Proakis, and J. H. L. Hansen, " Discrete Time Processing of Speech Signals", New York: MacMillan,1993.
5 J. Markel and A. H. Gray, Jr., "Linear Prediction of Speech", New York: Springer-Verlag, 1980.
6 L. R. Rabiner and R. W. Schafer, "Digital Processing of Speech Signals", Englewood Cliffs, NJ: Prentice-Hall, 1978.
7 M. Sigmund, Voice Recognition by Computer, Tectum Verlag, Marburg, 2003.
8 O. E. Brigham, "The Fast Fourier Transform", Englewood Cliffs, NJ: Prentice-Hall, 1974.
9 X. Huang, A. Acero, and H.W. Hon, "Spoken Language Processing: A Guide to Theory, Algorithm and System Development", Prentice Hall, 2001.
10 Schroeder, MR, "Recognition of complex acoustic signals", Life Science Research Reports, Vol.55, pp.323-328, 1977.
11 D. Gabor, "Theory of communication", Journal of IEE, vol. 93, pp. 429-457, 1946.
12 R. L. Smith and J. J. Zwislocki, "Short-term adaptation and incremental response of single auditory-nerve fibers", Biological Cybernetics, Vol.17, pp.169-182, 1975.   DOI   ScienceOn
13 Fant, G. (1960). Acoustic Theory of Speech Production. Mouton & Co, The Hague, Netherlands.