Feature Extraction by Optimizing the Cepstral Resolution of Frequency Sub-bands

주파수 부대역의 켑스트럼 해상도 최적화에 의한 특징추출

  • 지상문 (경성대학교 정보과학부) ;
  • 조훈영 (한국과학기술원 전자전산학과 전산학전공) ;
  • 오영환 (한국과학기술원 전자전산학과 전산학전공)
  • Published : 2003.01.01

Abstract

Feature vectors for conventional speech recognition are usually extracted in full frequency band. Therefore, each sub-band contributes equally to final speech recognition results. In this paper, feature Teeters are extracted indepedently in each sub-band. The cepstral resolution of each sub-band feature is controlled for the optimal speech recognition. For this purpose, different dimension of each sub-band ceptral vectors are extracted based on the multi-band approach, which extracts feature vector independently for each sub-band. Speech recognition rates and clustering quality are suggested as the criteria for finding the optimal combination of sub-band Teeter dimension. In the connected digit recognition experiments using TIDIGITS database, the proposed method gave string accuracy of 99.125%, 99.775% percent correct, and 99.705% percent accuracy, which is 38%, 32% and 37% error rate reduction relative to baseline full-band feature vector, respectively.

일반적인 음성인식 방법에서는 주파수 전대역에서 추출한 특징벡터를 사용하므로, 각 주파수 부대역은 최종인식 결과에 동등하게 기여한다. 본 논문에서는 주파수 부대역별로 독립적인 특징을 추출하고, 음성인식에 효과적이 되도록 부대역의 켑스트럼 해상도를 조절하는 방법을 제안한다. 주파수 부대역별로 독립적인 특징을 추출하는 멀티밴드 음성인식접근을 사용하여 부대역 특징벡터의 차원을 변화시킨다. 최적의 벡터 차원 조합을 찾기 위하여 음성인식률과 군집화 품질을 사용한다. TIDIGITS 연결 숫자음을 사용한 실험결과에서, 제안한 방법은 전대역 특징추출에 비해 적은 계산량으로도 숫자열 인식률은 99.12%, 백분율 정확도 (percent correct)는 99.775%, 백분율 정밀도 (percent accuracy)는 99.705%를 얻었으며, 이는 전대역 특징벡터에 비해 상대적 오류율을 각각 38%, 32%, 37% 감소시킨 결과이다.

Keywords

References

  1. IEEE Trans. ASSP v.28 no.4 Comparison of parametric representations for monosyllable word recognition S.B.Davis;P.Mermelstain
  2. J. Acoust. Soc. Am. Perceptual linear prediction(PLP) analysis of speech H.Hermansky
  3. Proc. ICASSP Optimizing feature extraction for english wrid recognition E.Choi;D.Hyun;C.Lee
  4. Proc. ICASSP Including detailed information feature in MFCC for large vocabulary continuous speech recognition J.Lei;X.Bo
  5. Proc. ICASSP, SPEECH-P11.10 Subband feature extraction using lapped orthogonal transform for speech recognition Z.Tufekci;J.Gowdy
  6. Proc. ICASSP. SPEECH-P11,3 Integration of fixed and multiple resolution analysis in a speech recognition system R.Gemello;D.Albesano;L.Moisa;R.Mori
  7. Statistical Pattern Recognition K.Fukunaga
  8. Proc. EUROSPEECH Optimal feature sub-space selection based on discriminant analysis K.Demuynck;J.Duchateau;D.V.Compernolle
  9. Speech Communicatioon v.26 Heteroscedastic discriminant analysis and reduced rank HMMs for improved speech recognition N.Kumar;A.G.Andreou
  10. IEEE Trans. On Speech and Audio Processing v.2 no.4 How do humans pocess and recognize speech J.B.Allen
  11. Proc. Int. Conf. on Spoken Language Processing v.1 ASR based on independent processing and recombinaiton of partial frequency bands H.Bourlard;S.Dupont
  12. Proc. Int. Conf. on Spoken Language Processing v.1 Towards ASR on partially corrupted speech H.Hermansky:S.Tibrewala;M.Pavel
  13. ICI TR-99-04 A multi-band approach to automatic speech recognition N.N.Mirghafori
  14. Computer Speech and Language v.15 Multi-band automatic speech recognition C.Cerisera;D.Fohr
  15. Proc. ICASSP, 3. 42.11/1-4 A database for speaker independent digit recognition R.G.Reonard