Browse > Article

Development of a Korean Speech Recognition Platform (ECHOS)  

Kwon Oh-Wook (충북대학교)
Kwon Sukbong (한국정보통신대학교)
Jang Gyucheol (한국과학기술원)
Yun Sungrack (한국과학기술원)
Kim Yong-Rae (충북대학교)
Jang Kwang-Dong (충북대학교)
Kim Hoi-Rin (한국정보통신대학교)
Yoo Changdong (한국과학기술원)
Kim Bong-Wan (음성정보기술산업지원센터)
Lee Yong-Ju (음성정보기술산업지원센터)
Abstract
We introduce a Korean speech recognition platform (ECHOS) developed for education and research Purposes. ECHOS lowers the entry barrier to speech recognition research and can be used as a reference engine by providing elementary speech recognition modules. It has an easy simple object-oriented architecture, implemented in the C++ language with the standard template library. The input of the ECHOS is digital speech data sampled at 8 or 16 kHz. Its output is the 1-best recognition result. N-best recognition results, and a word graph. The recognition engine is composed of MFCC/PLP feature extraction, HMM-based acoustic modeling, n-gram language modeling, finite state network (FSN)- and lexical tree-based search algorithms. It can handle various tasks from isolated word recognition to large vocabulary continuous speech recognition. We compare the performance of ECHOS and hidden Markov model toolkit (HTK) for validation. In an FSN-based task. ECHOS shows similar word accuracy while the recognition time is doubled because of object-oriented implementation. For a 8000-word continuous speech recognition task, using the lexical tree search algorithm different from the algorithm used in HTK, it increases the word error rate by $40\%$ relatively but reduces the recognition time to half.
Keywords
Speech recognition; search algorithm;
Citations & Related Records
연도 인용수 순위
  • Reference
1 http://speech.chungbuk.ac.kr/~owkwon/srhome/index.html ezCSR
2 Aurora, Distributed Speech Recognition. http://portal.etsi.org/stq/kta/DSR/dsr.asp
3 X. Huang, A. Acero, and H.-W. Hon, Spoken Language Processing, 648-650, Pretice Hall, 2001
4 M.K. Raishankar, Efficient Algorithms for Speech Recognition, (PhD Thesis, CMU, 1996)
5 Multipurpose Large Vocabulary Continuous Speech Recognition Engine Julius. http://www.ar.media.kyoto-u.ac.jp/ members/ian/doc
6 권오욱, 김회린,유창동,김봉완,이용주,'한국어 음성인식 플랫폼의 설계,' 말소리, 51 (9). 2004
7 L. Rabiner and B.-H. .Iuang, Fundamentals of Speech Recognition, (Prentice-Hall, 1993)
8 Standard Template Library Programmer's Guide. http://www.sgi.com/tsch/stl/
9 CMU Sphinx: Open Source Speech Recognition. http://www.speech.cs.cmu.edu/sphinx/Sphinx. html
10 F. Jelinek, Statistical Methods for Speech Recognition (Language, Speech, and Communication), (MIT Press, 1999)
11 HTK Home page. http://htk.eng.cam.ac.uk
12 H. Herrnanskv, 'Perceptual linear predictive (PLP) analysis of speech,' Journal of the Acoustical Society of America, 87 (4), 1738-1752, 1990   DOI   PUBMED
13 Practical UML: A Hands-On Introduction for Developers- by Randy Miller
14 S.B. Davis and P. Mermelstein, 'Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences,' IEEE Trans. ASSP, 28, 357-366, Aug. 1980   DOI
15 Automatic Speech Recognition: Software. http://www.isip.msstate. edu/proiects/speech/software/