• Title/Summary/Keyword: MFCC

Search Result 272, Processing Time 0.025 seconds

Performance comparison of Text-Independent Speaker Recognizer Using VQ and GMM (VQ와 GMM을 이용한 문맥독립 화자인식기의 성능 비교)

  • Kim, Seong-Jong;Chung, Hoon;Chung, Ik-Joo
    • Speech Sciences
    • /
    • v.7 no.2
    • /
    • pp.235-244
    • /
    • 2000
  • This paper was focused on realizing the text-independent speaker recognizer using the VQ and GMM algorithm and studying the characteristics of the speaker recognizers that adopt these two algorithms. Because it was difficult ascertain the effect two algorithms have on the speaker recognizer theoretically, we performed the recognition experiments using various parameters and, as the result of the experiments, we could show that GMM algorithm had better recognition performance than VQ algorithm as following. The GMM showed better performance with small training data, and it also showed just a little difference of recognition rate as the kind of feature vectors and the length of input data vary. The GMM showed good recognition performance than the VQ on the whole.

  • PDF

Parts-Based Feature Extraction of Spectrum of Speech Signal Using Non-Negative Matrix Factorization

  • Park, Jeong-Won;Kim, Chang-Keun;Lee, Kwang-Seok;Koh, Si-Young;Hur, Kang-In
    • Journal of information and communication convergence engineering
    • /
    • v.1 no.4
    • /
    • pp.209-212
    • /
    • 2003
  • In this paper, we proposed new speech feature parameter through parts-based feature extraction of speech spectrum using Non-Negative Matrix Factorization (NMF). NMF can effectively reduce dimension for multi-dimensional data through matrix factorization under the non-negativity constraints, and dimensionally reduced data should be presented parts-based features of input data. For speech feature extraction, we applied Mel-scaled filter bank outputs to inputs of NMF, than used outputs of NMF for inputs of speech recognizer. From recognition experiment result, we could confirm that proposed feature parameter is superior in recognition performance than mel frequency cepstral coefficient (MFCC) that is used generally.

Intelligent Speech Recognition System based on Situation Awareness for u-Green City (u-Green City 구현을 위한 상황인지기반 지능형 음성인식 시스템)

  • Cho, Young-Im;Jang, Sung-Soon
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.15 no.12
    • /
    • pp.1203-1208
    • /
    • 2009
  • Green IT based u-City means that u-City having Green IT concept. If we adopt the situation awareness or not, the processing of Green IT may be reduced. For example, if we recognize a lot of speech sound on CCTV in u-City environment, it takes a lot of processing time and cost. However, if we want recognize emergency sound on CCTV, it takes a few reduced processing cost. So, for detecting emergency state dynamically through CCTV, we propose our advanced speech recognition system. For the purpose of that, we adopt HMM (Hidden Markov Model) for feature extraction. Also, we adopt Wiener filter technique for noise elimination in many information coming from on CCTV in u-City environment.

Parts-based Feature Extraction of Speech Spectrum Using Non-Negative Matrix Factorization (Non-Negative Matrix Factorization을 이용한 음성 스펙트럼의 부분 특징 추출)

  • 박정원;김창근;허강인
    • Proceedings of the IEEK Conference
    • /
    • 2003.11a
    • /
    • pp.49-52
    • /
    • 2003
  • In this paper, we propose new speech feature parameter using NMf(Non-Negative Matrix Factorization). NMF can represent multi-dimensional data based on effective dimensional reduction through matrix factorization under the non-negativity constraint, and reduced data present parts-based features of input data. In this paper, we verify about usefulness of NMF algorithm for speech feature extraction applying feature parameter that is got using NMF in Mel-scaled filter bank output. According to recognition experiment result, we could confirm that proposal feature parameter is superior in recognition performance than MFCC(mel frequency cepstral coefficient) that is used generally.

  • PDF

Implementation of Speech Recognition Security System Using Speaker Defendent Algorithm (화자 종속 알고리즘을 이용한 음성 인식 보안 시스템 구현)

  • 김영현;문철홍
    • Proceedings of the IEEK Conference
    • /
    • 2003.11a
    • /
    • pp.65-68
    • /
    • 2003
  • In this paper, a speech recognition system using a speaker defendant algorithm is implemented on the PC. Results are loaded on a LDM display system that employs Intel StrongArm SA-1110. This research has completed so that this speech recognition system may correct its shortcomings. Sometimes a former system is operated by similar speech, not a same one. To input a vocalization is processed two times to solve mentioned defects. When references are creating, variable start-point and end-point are given to make efficient references. This references and new references are changed into feature parameter, LPC and MFCC. DTW is excuted using feature parameter. This security system will give user permission under fore execution have same result.

  • PDF

Performance Comparison of Korean Connected Digit Telephone Speech Recognition According to Aurora Feature Extraction (Aurora 특징파라미터 추출기법에 따른 한국어 연속숫자음 전화음성의 인식 성능 비교)

  • Kim Min Sung;Jung Sung Yun;Son Jong Mok;Bae Keun Sung;Kim Sang Hun
    • Proceedings of the KSPS conference
    • /
    • 2003.10a
    • /
    • pp.145-148
    • /
    • 2003
  • To improve the recognition performance of Korean connected digit telephone speech, in this paper, both Aurora feature extraction method that employs noise reduction 2-state Wiener filter and DWFBA method are investigated and used. CMN and MRTCN are applied to static features for channel compensation. Telephone digit speech database released by SITEC is used for recognition experiments with HTK system. Experimental results has shown that Aurora feature is slightly better than MFCC and DWFBA without channel compensation. And when channel compensation is included, Aurora feature is slightly better than DWFBA with MRTCN.

  • PDF

Classification of pathological and normal voice based on dimension reduction of feature vectors (피처벡터 축소방법에 기반한 장애음성 분류)

  • Lee, Ji-Yeoun;Jeong, Sang-Bae;Choi, Hong-Shik;Hahn, Min-Soo
    • Proceedings of the KSPS conference
    • /
    • 2007.05a
    • /
    • pp.123-126
    • /
    • 2007
  • This paper suggests a method to improve the performance of the pathological/normal voice classification. The effectiveness of the mel frequency-based filter bank energies using the fisher discriminant ratio (FDR) is analyzed. And mel frequency cepstrum coefficients (MFCCs) and the feature vectors through the linear discriminant analysis (LDA) transformation of the filter bank energies (FBE) are implemented. This paper shows that the FBE LDA-based GMM is more distinct method for the pathological/normal voice classification than the MFCC-based GMM.

  • PDF

Phoneme Segmentation Using Voice/Unvoiced/Silence Classifier and Spectral Information (유성/무성/묵음 분류기와 주파수 스펙트럼을 이용한 음소 경계 검출)

  • Lee Sang-Rae;Han Hyun-Bae;Hahn Minsoo
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • spring
    • /
    • pp.86-91
    • /
    • 1999
  • 본 논문에서는 유성/무성/묵음 분류기와 주파수 스펙트럼 비교를 통하여 음소 경계 검출기를 구현하였다. 음소경계 검출은 음성 인식, 합성 및 분석 둥의 분야에서 매우 중요하다 유성/무성/묵음 분류기를 이용하여 유성음으로 판별되는 구간은 스펙트럼 비교를 통하여 음소 단위로 세분하였고 무성음으로 판별되는 구간은 한국어의 음성 특성을 고려하여 하나의 음소 단위로 간주하였다. 유성음 구간에 대한 스펙트럼 비교는 수정된 Itakura-Saito distance measure 와 Euclidean MFCC(Mel Frequency Cepstrum Coeffcients) distance measure를 사용하였고 비교 프레임은한 프레임을 건너 윈 경우가 가장 결과가 좋았다. 최종적으로 평균 음소 길이 정보를 이용하여 음소의 경계로 검출된 구간을 더 세분하거나 통합하였다. 유성/무성/묵음 분류기의 경우는 사무실에서 녹음한 고립단어에 대하여 $94.247\%$의 정확도를 보였고 음소 경계 검출의 경우는 $72.8\%$의 정확도를 보였다.

  • PDF

Korean Phonological Viseme for Lip Synch Based on Phoneme Recognition (음소인식 기반의 립싱크 구현을 위한 한국어 음운학적 Viseme의 제안)

  • Joo Heeyeol;Kang Sunmee;Ko Hanseok
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • spring
    • /
    • pp.70-73
    • /
    • 1999
  • 본 논문에서는 한국어에 대한 실시간 음소 인식을 통한 Lip Synch 구현에 필수요소인 Viseme(Visual Phoneme)을 한국어의 음운학적 접근 방법을 통해 제시하고, Lip Synch에서 입술의 모양에 결정적인 영향을 미치는 모음에 대한 모음 인식 실험 및 결과 분석을 한다.모음인식 실험에서는 한국어 음소 51개 각각에 대해 3개의 State로 이루어진 CHMM (Continilous Hidden Makov Model)으로 모델링하고, 각각의 음소가 병렬로 연결되어진 음소네트워크를 사용한다. 입력된 음성은 12차 MFCC로 특징을 추출하고, Viterbi 알고리즘을 인식 알고리즘으로 사용했으며, 인식과정에서 Bigrim 문법과 유사한 구조의 음소배열 규칙을 사용해서 인식률과 인식 속도를 향상시켰다.

  • PDF

A Study on Speech Recognition inside the Car (차량내에서의 음성인식에 관한 연구)

  • Park Jeong-Hoon;Im Hyung-Kyu;Kim Chong-Kyo
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • spring
    • /
    • pp.56-60
    • /
    • 1999
  • 본 논문은, 자동차에서 발생할 수 있는 다양한 형태의 잡음이 섞인 음성을 대상으로, 잡음에 강인한 파라미터들을 사용하여 인식기들을 구축하였으며, 이들 파라미터를 비교 평가하였다. 실험에 사용된 음성 데이터는 차종, 속도, 도로 환경, 라디오 ON/OFF, 창문 개폐여부 등 다양한 잡음 환경에서 수집하였다. 실험에서 비교된 파라미터는 MFCC(Mel-Blrequency Cepstral Coefficient)와 PLP(Perceptually Linear Prediction) 이며, 각각의 파라미터에 대해서 MKM(Modified k-mean)을 이용하여 코드북을 작성하였고, DHMM(Discrete Hidden Markov Model)을 인식알고리즘으로 사용하였다. 실험 결과로서, 아스팔트 도로에서 창문을 닫고, 라디오를 켜지 않은 상태에서 60km/h로 주행시 $96.25\%$로 가장 높은 인식률을 얻었고, 고속도로에서 창문을 열고 100km/h로 주행시에는$60\%$로 가장 낮은 인식률을 얻었다.

  • PDF