Feature Parameter Extraction and Speech Recognition Using Matrix Factorization

Lee Kwang-Seok;Hur Kang-In;

Journal of the Korea Institute of Information and Communication Engineering (한국정보통신학회논문지)

Volume 10 Issue 7
/
Pages.1307-1311
/
2006
/
2234-4772(pISSN)
/
2288-4165(eISSN)

The Korea Institute of Information and Commucation Engineering (한국정보통신학회)

Feature Parameter Extraction and Speech Recognition Using Matrix Factorization

Matrix Factorization을 이용한 음성 특징 파라미터 추출 및 인식

이광석 (진주산업대학교 전자공학과) ;
허강인 (동아대학교 전자공학과)

Published : 2006.07.01

PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

In this paper, we propose new speech feature parameter using the Matrix Factorization for appearance part-based features of speech spectrum. The proposed parameter represents effective dimensional reduced data from multi-dimensional feature data through matrix factorization procedure under all of the matrix elements are the non-negative constraint. Reduced feature data presents p art-based features of input data. We verify about usefulness of NMF(Non-Negative Matrix Factorization) algorithm for speech feature extraction applying feature parameter that is got using NMF in Mel-scaled filter bank output. According to recognition experiment results, we confirm that proposed feature parameter is superior to MFCC(Mel-Frequency Cepstral Coefficient) in recognition performance that is used generally.

본 연구에서는 행렬 분해 (Matrix Factorization)를 이용하여 음성 스펙트럼의 부분적 특정을 나타낼 수 있는 새로운 음성 파라마터를 제안한다. 제안된 파라미터는 행렬내의 모든 원소가 음수가 아니라는 조건에서 행렬분해 과정을 거치게 되고 고차원의 데이터가 효과적으로 축소되어 나타남을 알 수 있다. 차원 축소된 데이터는 입력 데이터의 부분적인 특성을 표현한다. 음성 특징 추출 과정에서 일반적으로 사용되는 멜 필터뱅크 (Mel-Filter Bank)의 출력 을 Non-Negative 행렬 분해(NMF:Non-Negative Matrix Factorization) 알고리즘의 입 력으로 사용하고, 알고리즘을 통해 차원 축소된 데이터를 음성인식기의 입력으로 사용하여 멜 주파수 캡스트럼 계수 (MFCC: Mel Frequency Cepstral Coefficient)의 인식결과와 비교해 보았다. 인식결과를 통하여 일반적으로 음성인식기의 성능평가를 위해 사용되는 MFCC에 비하여 제안된 특정 파라미터가 인식 성능이 뛰어남을 알 수 있었다.

Keywords

References

Daniel D. Lee and H. Sebastian Seung, 'Learning the parts of objects by non-negative matrix factorization,' Nature vol. 401, Oct. 21,1999
Daniel D. Lee, H. Sebastian Seung, 'Algorithms for Non-Negative matrix Factorization,' in Advances in Neural Information Procedding System 13, T. K. Leen, T. G. Dietterich, and V. Tresp, Eds., 2001
H. Y. Choi, S. J. Choi, 'Learning the Sparse Codes of Speeches via Non-Negative Matrix Factorization,' CVPR, 2002
Sven Behnke, 'Discovering hierarchical speech features using convolutional non-negative matrix factorization,' IJCNN'03, vol. 4, pp.2758-2763, 2003-10-14
Hoyer. P. O, 'Non-Negative Sparse Coding,' Neural Networks for Signal Processing, 2002, Proceddings of the 2002 12th IEEE Workshop on, pp. 557-565, 2002
S. Tsuge, M. Shishibori, S. Kurojwa, K. Kita, 'Dimensionally Reduction Using Non-Negative Matrix Factorization for Information Retreval,' Systems, Man and Cybermetics, 2001 IEEE International Conference on, vol. 2, pp. 960-965, 2001
D. Guillamet, B. Schiele, J. Vitria, 'Analyzing non-negative matrix factorization for image classification,'Pattern Recognition, 2002, Proceedings, 16th International Conference on, vol. 2, pp. 116-119, 11-15 Aug. 2002
L. R. Rabiner, R. W. Schafer, 'Digital Processing of Speech Signals,' Prentice Hall, 1993
L. R. Rabiner, B. H. Juang, 'Fundamentals of Speech Recognition,' Prentice hall, 1999
Simon Haykim, 'Neural Networks a Comprehensive Foundation,' Prentice Hall, 1999

Journal of the Korea Institute of Information and Communication Engineering (한국정보통신학회논문지)

Feature Parameter Extraction and Speech Recognition Using Matrix Factorization

Matrix Factorization을 이용한 음성 특징 파라미터 추출 및 인식

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)