• Title/Summary/Keyword: 묵음 정규화

Search Result 7, Processing Time 0.016 seconds

Cepstral Normalization Combined with CSFN for Noisy Speech Recognition (켑스트럼 정규화와 켑스트럼 거리기반 묵음특징정규화 방법을 이용한 잡음음성 인식)

  • Choi, Sook-Nam;Shen, Guang-Hu;Chung, Hyun-Yeol
    • Journal of Korea Multimedia Society
    • /
    • v.14 no.10
    • /
    • pp.1221-1228
    • /
    • 2011
  • The speech recognition system works well in general indoor environment. However, the recognition performance is dramatically decreased when the system is used in the real environment because of the several noises. In this paper we proposed CSFN-CMVN to improve the recognition performance of the existing CSFN(Cepstral distance based SFN). The CSFN-CMVN method is a combined method of cepstral normalization with CSFN that normalizes silence features using cepstral euclidean distance to classify speech/silence for better performance. From the test results using Aurora 2.0 DB, we could find out that our proposed CSFN-CMVN improves about 7% of more average word accuracy in all the test sets comparing with the typical silence features normalization SFN-I. We can also get improved accuracy of 6% and 5% respectively in compared tests with the conventional SFN-II and CSFN, showing the effectiveness of our proposed method.

Cepstral Distance and Log-Energy Based Silence Feature Normalization for Robust Speech Recognition (강인한 음성인식을 위한 켑스트럼 거리와 로그 에너지 기반 묵음 특징 정규화)

  • Shen, Guang-Hu;Chung, Hyun-Yeol
    • The Journal of the Acoustical Society of Korea
    • /
    • v.29 no.4
    • /
    • pp.278-285
    • /
    • 2010
  • The difference between training and test environments is one of the major performance degradation factors in noisy speech recognition and many silence feature normalization methods were proposed to solve this inconsistency. Conventional silence feature normalization method represents higher classification performance in higher SNR, but it has a problem of performance degradation in low SNR due to the low accuracy of speech/silence classification. On the other hand, cepstral distance represents well the characteristic distribution of speech/silence (or noise) in low SNR. In this paper, we propose a Cepstral distance and Log-energy based Silence Feature Normalization (CLSFN) method which uses both log-energy and cepstral euclidean distance to classify speech/silence for better performance. Because the proposed method reflects both the merit of log energy being less affected with noise in high SNR and the merit of cepstral distance having high discrimination accuracy for speech/silence classification in low SNR, the classification accuracy will be considered to be improved. The experimental results showed that our proposed CLSFN presented the improved recognition performances comparing with the conventional SFN-I/II and CSFN methods in all kinds of noisy environments.

Voice Activity Detection in Noisy Environment using Speech Energy Maximization and Silence Feature Normalization (음성 에너지 최대화와 묵음 특징 정규화를 이용한 잡음 환경에 강인한 음성 검출)

  • Ahn, Chan-Shik;Choi, Ki-Ho
    • Journal of Digital Convergence
    • /
    • v.11 no.6
    • /
    • pp.169-174
    • /
    • 2013
  • Speech recognition, the problem of performance degradation is the difference between the model training and recognition environments. Silence features normalized using the method as a way to reduce the inconsistency of such an environment. Silence features normalized way of existing in the low signal-to-noise ratio. Increase the energy level of the silence interval for voice and non-voice classification accuracy due to the falling. There is a problem in the recognition performance is degraded. This paper proposed a robust speech detection method in noisy environments using a silence feature normalization and voice energy maximize. In the high signal-to-noise ratio for the proposed method was used to maximize the characteristics receive less characterized the effects of noise by the voice energy. Cepstral feature distribution of voice / non-voice characteristics in the low signal-to-noise ratio and improves the recognition performance. Result of the recognition experiment, recognition performance improved compared to the conventional method.

Voice Recognition Performance Improvement using the Convergence of Voice signal Feature and Silence Feature Normalization in Cepstrum Feature Distribution (음성 신호 특징과 셉스트럽 특징 분포에서 묵음 특징 정규화를 융합한 음성 인식 성능 향상)

  • Hwang, Jae-Cheon
    • Journal of the Korea Convergence Society
    • /
    • v.8 no.5
    • /
    • pp.13-17
    • /
    • 2017
  • Existing Speech feature extracting method in speech Signal, there are incorrect recognition rates due to incorrect speech which is not clear threshold value. In this article, the modeling method for improving speech recognition performance that combines the feature extraction for speech and silence characteristics normalized to the non-speech. The proposed method is minimized the noise affect, and speech recognition model are convergence of speech signal feature extraction to each speech frame and the silence feature normalization. Also, this method create the original speech signal with energy spectrum similar to entropy, therefore speech noise effects are to receive less of the noise. the performance values are improved in signal to noise ration by the silence feature normalization. We fixed speech and non speech classification standard value in cepstrum For th Performance analysis of the method presented in this paper is showed by comparing the results with CHMM HMM, the recognition rate was improved 2.7%p in the speech dependent and advanced 0.7%p in the speech independent.

A study on the Voiced, Unvoiced and Silence Classification (유, 무성음 및 묵음 식별에 관한 연구)

  • 김명환;김순협
    • The Journal of the Acoustical Society of Korea
    • /
    • v.3 no.2
    • /
    • pp.46-58
    • /
    • 1984
  • 본 논문은 한국어 음성 인식을 위한 유성음, 무성음, 묵음 식별에 관한 연구이다. 주어진 음성 구간을 3가지 음성 신호 부류로 식별하기 위하여 패턴 인식 방법을 사용하였다. 여기에 사용한 분석 파 라메타는 음성 신호의 영교차율, 대수 에너지, 정규화 된 첫 번째 자동 상관 계수, 선형 예측 분석에서 얻은 첫 번째 예측 계수, 그리고 예측 오차의 에너지이다. 한편 측정된 파라메타들이 다차원 가우스 확 률 밀도 함수에 따라 분산되었다는 가정하에서 어어진 최소 거리 법칙에 기본을 두고 음성 구간을 결정 하였다. 측정된 파라메타들을 여러 가지 방법으로 조합하여 식별한 결과 영교차율, 첫 번째 예측계수, 예측 오차의 에너지를 측정 파라메타로 사용했을 때 1%보다 적은 식별 오차율을 얻었다.

  • PDF

Speech Recognition in Noisy Environments Using Modified Gain Function (변형된 이득함수를 이용한 잡음 환경에서의 음성인식)

  • Jin, Ho-Sung;Lee, Sang-Ho;Hong, Jae-Keun
    • Proceedings of the KAIS Fall Conference
    • /
    • 2010.05a
    • /
    • pp.119-123
    • /
    • 2010
  • 본 논문에서는 2단계 잡음제거 방법의 이득함수를 이용한 고조파 복원 잡음제거 방법의 이득함수를 조정하여 기존의 방법보다 음성개선을 향상시켰고, 제안한 방법으로 개선된 음성을 음성인식 기술에 적용하였다. 본 논문에서는 기존 방법으로 음성개선 결과 묵음구간에서 음성구간으로 변화는 구간에서 이전 프레임의 추정된 음성신호로 스펙트럼의 이득함수가 구해져서 음성이 발생하는 구간에서 왜곡이 발생한다. 따라서 본 논문에서는 이러한 현상을 개선시키기 위해 2단계 잡음제거 방법의 이득함수를 추정된 a priori SNR과 비교하여 이득함수를 조정하고, 2단계 잡음제거 방법의 이득함수를 고조파 복원 방법의 이득함수와 비교하여 이득함수를 조정하여 음성을 개선하는 방법을 제안하였다. 그리고 음성인식을 위한 특징벡터 추출을 위해 제안한 방법으로 개선된 음성의 대수 에너지를 정규화 하는 대수 에너지 정규화 방법(Log Energy Normalization)을 음성인식 방법에 적용하였다.

  • PDF

Performance Improvements for Silence Feature Normalization Method by Using Filter Bank Energy Subtraction (필터 뱅크 에너지 차감을 이용한 묵음 특징 정규화 방법의 성능 향상)

  • Shen, Guanghu;Choi, Sook-Nam;Chung, Hyun-Yeol
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.35 no.7C
    • /
    • pp.604-610
    • /
    • 2010
  • In this paper we proposed FSFN (Filter bank sub-band energy subtraction based CLSFN) method to improve the recognition performance of the existing CLSFN (Cepstral distance and Log-energy based Silence Feature Normalization). The proposed FSFN reduces the energy of noise components in filter bank sub-band domain when extracting the features from speech data. This leads to extract the enhanced cepstral features and thus improves the accuracy of speech/silence classification using the enhanced cepstral features. Therefore, it can be expected to get improved performance comparing with the existing CLSFN. Experimental results conducted on Aurora 2.0 DB showed that our proposed FSFN method improves the averaged word accuracy of 2% comparing with the conventional CLSFN method, and FSFN combined with CMVN (Cepstral Mean and Variance Normalization) also showed the best recognition performance comparing with others.