Browse > Article
http://dx.doi.org/10.5909/JBE.2019.24.5.735

Speech Feature Extraction based on Spikegram for Phoneme Recognition  

Han, Seokhyeon (Dept. of Electronics Engineering, Kwangwoon University)
Kim, Jaewon (Dept. of Electronics Engineering, Kwangwoon University)
An, Soonho (Dept. of Electronics Engineering, Kwangwoon University)
Shin, Seonghyeon (Dept. of Electronics Engineering, Kwangwoon University)
Park, Hochong (Dept. of Electronics Engineering, Kwangwoon University)
Publication Information
Journal of Broadcast Engineering / v.24, no.5, 2019 , pp. 735-742 More about this Journal
Abstract
In this paper, we propose a method of extracting speech features for phoneme recognition based on spikegram. The Fourier-transform-based features are widely used in phoneme recognition, but they are not extracted in a biologically plausible way and cannot have high temporal resolution due to the frame-based operation. For better phoneme recognition, therefore, it is desirable to have a new method of extracting speech features, which analyzes speech signal in high temporal resolution following the model of human auditory system. In this paper, we analyze speech signal based on a spikegram that models feature extraction and transmission in auditory system, and then propose a method of feature extraction from the spikegram for phoneme recognition. We evaluate the performance of proposed features by using a DNN-based phoneme recognizer and confirm that the proposed features provide better performance than the Fourier-transform-based features for short-length phonemes. From this result, we can verify the feasibility of new speech features extracted based on auditory model for phoneme recognition.
Keywords
Spikegram; Speech feature; Phoneme recognition; Deep neural network;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 D. Yu and L. Deng, Automatic Speech Recognition: A Deep Learning Approach, Springer Publishing Company, Incorporated, 2014.
2 O. Abdel-Hamid, A. Mohamed, H. Jiang, L. Deng, G. Penn and D. Yu, "Convolutional Neural Networks for Speech Recognition," IEEE/ACM Trans. on Audio, Speech, and Language Processing, Vol. 22, No. 10, pp. 1533-1545, Oct. 2014, doi:10.1109/TASLP.2014. 2339736.   DOI
3 E. Smith and M. Lewicki, "Efficient Auditory Coding," Nature, Vol. 439, No. 7079, pp. 978-982, Feb. 2006, doi:10.1038/nature04485.   DOI
4 W.-J. Jang, H.-W. Yun, S.-H. Shin and H. Park, "Music genre classification using spikegram and deep neural network," J. of Broadcast Engineering, Vol. 22, No. 6, pp. 693-701, Nov. 2017, doi:10.5909/JBE. 2017.22.6.693.   DOI
5 S.-H. Shin, H.-W. Yun, W.-J. Jang and H. Park, "Extraction of acoustic features based on auditory spike code and its application to music genre classification," IET Signal Processing, Vol. 13, No. 2, pp. 230-234, Apr. 2019, doi:10.1049/iet-spr.2018.5158.   DOI
6 G. Mather, Foundations of Perception, Psychology Press, 2006.
7 M. Slaney, "An Efficient Implementation of the Patterson - Holdsworth Auditory Filter Bank," Apple Computer Technical Report #35, 1993.
8 J. Tropp and A. Gilbert, "Signal Recovery From Random Measurements Via Orthogonal Matching Pursuit," IEEE Trans. on Information Theory, Vol. 53, No. 12, Dec. 2007, doi:10.1109/TIT. 2007.909108.
9 X. Huang, A. Acero, and H. Hon. Spoken Language Processing: A guide to theory, algorithm, and system development. Prentice Hall, 2001.
10 I. Goodfellow, Y. Bengio, A. Courville, Deep Learning, The MIT Press, Cambridge and London, 2016.
11 K. F. Lee and H. W. Hon, "Speaker-independent phone recognition using hidden markov models," IEEE Trans. on Audio, Speech, Lang. Process., Vol. 37, No. 11, pp. 1641-1648, Nov. 1989, doi:10.1109/29. 46546.   DOI
12 P. Ladefoged and I. Maddieson. The Sounds of the World's Languages. Oxford, OX, UK: Blackwell Publishers, 1996.
13 N. Faraji, S. M. Ahadi and H. Sheikhzadeh, "Sequential method for speech segmentation based on Random Matrix Theory," IET Signal Processing, Vol. 7, No. 7, pp. 625-633, Sept. 2013, doi:10.1049/ietspr.2011.0471.   DOI