• Title/Summary/Keyword: 잡음음성인식

Search Result 393, Processing Time 0.029 seconds

Method for Spectral Enhancement by Binary Mask for Speech Recognition Enhancement Under Noise Environment (잡음환경에서 음성인식 성능향상을 위한 바이너리 마스크를 이용한 스펙트럼 향상 방법)

  • Choi, Gab-Keun;Kim, Soon-Hyob
    • The Journal of the Acoustical Society of Korea
    • /
    • v.29 no.7
    • /
    • pp.468-474
    • /
    • 2010
  • The major factor that disturbs practical use of speech recognition is distortion by the ambient and channel noises. Generally, the ambient noise drops the performance and restricts places to use. DSR (Distributed Speech Recognition) based speech recognition also has this problem. Various noise cancelling algorithms are applied to solve this problem, but loss of spectrum and remaining noise by incorrect noise estimation at low SNR environments cause drop of recognition rate. This paper proposes methods for speech enhancement. This method uses MMSE-STSA for noise cancelling and ideal binary mask to compensate damaged spectrum. According to experiments at noisy environment (SNR 15 dB ~ 0 dB), the proposed methods showed better spectral results and recognition performance.

Background Noise Classification in Noisy Speech of Short Time Duration Using Improved Speech Parameter (개량된 음성매개변수를 사용한 지속시간이 짧은 잡음음성 중의 배경잡음 분류)

  • Choi, Jae-Seung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.20 no.9
    • /
    • pp.1673-1678
    • /
    • 2016
  • In the area of the speech recognition processing, background noises are caused the incorrect response to the speech input, therefore the speech recognition rates are decreased by the background noises. Accordingly, a more high level noise processing techniques are required since these kinds of noise countermeasures are not simple. Therefore, this paper proposes an algorithm to distinguish between the stationary background noises or non-stationary background noises and the speech signal having short time duration in the noisy environments. The proposed algorithm uses the characteristic parameter of the improved speech signal as an important measure in order to distinguish different types of the background noises and the speech signals. Next, this algorithm estimates various kinds of the background noises using a multi-layer perceptron neural network. In this experiment, it was experimentally clear the estimation of the background noises and the speech signals.

Speech Recognition in the Noisy Environments using Hybrid Method of Spectral Subtraction and Noise Masking (스펙트럼 차감법과 잡음 마스킹의 hybrid 방식을 이용한 잡음환경에서의 음성인식)

  • 권영욱
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1998.06e
    • /
    • pp.343-346
    • /
    • 1998
  • 잡음환경에서의 음성인식 성능향상을 위하여 본 논문에서는 스펙트럼 차감법 이후에 남아 있는 잔여 잡음으로 인한 mismatch를 극복하는 수단으로 기존의 스펙트럼 차감법에서의 flooring factor를 사용하는 대신에 target 잡음레벨을 이용하여 잡음 마스킹을 적용하는 스펙트럼 차감법과 잡음 마스킹의 hybrid 방식을 사용한다. 이 방법은 낮은 SNR에서 개선되지 않는 기존의 잡음 마스킹이 가지는 약점을 극복하고 동시에 스펙트럼 차감버에서의 잔여 잡음 문제를 완화시킬 수 있었다. 특히 시간/주파수 영역 smoothing을 적용함으로써 스펙트럼 차감법과 잡음 마스킹의 hybrid 방식의 적용 이후에도 여전히 남아 있는 일부 잡음을 추가적으로 감소시켰으며, 더욱 향상된 인식성능을 얻을 수 있었다.

  • PDF

Speech Recognition Using Noise Processing in Spectral Dimension (스펙트럴 차원의 잡음처리를 이용한 음성인식)

  • Lee, Gwang-seok
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2009.10a
    • /
    • pp.738-741
    • /
    • 2009
  • This research is concerned for improving the result of speech recognition under the noisy speech. We knew that spectral subtraction and recovery of valleys in spectral envelope obtained from noisy speech are more effective for the improvement of the recognition. In this research, the averaged spectral envelope obtained from vowel spectrums are used for the emphasis of valleys. The vocalic spectral information at lower frequency range is emphasized and the spectrum obtained from consonants is not changed. In simulation, the emphasis coefficients are varied on cepstral domain. This method is used for the recognition of noisy digits and is improved.

  • PDF

Comparison of Integration Methods of Speech and Lip Information in the Bi-modal Speech Recognition (바이모달 음성인식의 음성정보와 입술정보 결합방법 비교)

  • 박병구;김진영;최승호
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.4
    • /
    • pp.31-37
    • /
    • 1999
  • A bimodal speech recognition using visual and audio information has been proposed and researched to improve the performance of ASR(Automatic Speech Recognition) system in noisy environments. The integration method of two modalities can be usually classified into an early integration and a late integration. The early integration method includes a method using a fixed weight of lip parameters and a method using a variable weight according to speech SNR information. The 4 late integration methods are a method using audio and visual information independently, a method using speech optimal path, a method using lip optimal path and a way using speech SNR information. Among these 6 methods, the method using the fixed weight of lip parameter showed a better recognition rate.

  • PDF

Nonlinear Speech Enhancement Method for Reducing the Amount of Speech Distortion According to Speech Statistics Model (음성 통계 모형에 따른 음성 왜곡량 감소를 위한 비선형 음성강조법)

  • Choi, Jae-Seung
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.16 no.3
    • /
    • pp.465-470
    • /
    • 2021
  • A robust speech recognition technology is required that does not degrade the performance of speech recognition and the quality of the speech when speech recognition is performed in an actual environment of the speech mixed with noise. With the development of such speech recognition technology, it is necessary to develop an application that achieves stable and high speech recognition rate even in a noisy environment similar to the human speech spectrum. Therefore, this paper proposes a speech enhancement algorithm that processes a noise suppression based on the MMSA-STSA estimation algorithm, which is a short-time spectral amplitude method based on the error of the least mean square. This algorithm is an effective nonlinear speech enhancement algorithm based on a single channel input and has high noise suppression performance. Moreover this algorithm is a technique that reduces the amount of distortion of the speech based on the statistical model of the speech. In this experiment, in order to verify the effectiveness of the MMSA-STSA estimation algorithm, the effectiveness of the proposed algorithm is verified by comparing the input speech waveform and the output speech waveform.

A Study on Front-End Processing Methods of Environmental Noise for Speech Recognition (음성인식을 위한 환경잡음의 전처리기법에 관한 검토)

  • 김광수
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1997.06a
    • /
    • pp.17-22
    • /
    • 1997
  • 본 논문에서는 음성 인식기의 성능을 저하시키는 요인중 부가 잡음과 마이크의 변동에 의한 채널 왜곡을 동시에 감소시키는 방법으로 기존의 전처리에 의한 환경덥음처리기법의 단점을 개선한 Histogram 처리기법을 잡음처리에 도입하고 그 유효성을 확인하였다. 도입한 잡음처리기법의 유효성을 확인하기 위하여 기존의 잡음처리기법으로 잘 알려진 여러 가지 방법과 비교하기 위하여 단어 인식실험을 실시하였다. 실험결과, 부가잡음만이 첨가된 경우에 있어서는 일반적으로 알려진 SS, CMN, RASTA등을 이용한 결과 전처리방법을 이용하지 않은 경우의 기본인식률에 비해 SN비에 따라 25% 이상이 인식률 향상을 볼 수 있었다. 특히 CDCN 처리와 H-RASTA를 사용한 경우, 채널왜곡과 부가잡음이 함께 포함된 음성에 대해 SN비에 관계없이 약 15~30%정도의 인식률의 향상을 볼 수 있어 기존 방법으로서는 이글 방법이 우수함을 확인할 수 있었다. 이 위에 Histogram 에 의한 추정법을 적용한 경우 전처리의 성능을 10~15% 정도 성능향상을 가져와 도입한 방법의 유효성을 확인할 수 있었다.

  • PDF

Feature Extraction through the post processing of WFBA based on MMSE-STSA for Robust Speech Recognition (강인한 음성인식을 위한 MMSE-STSA기반 후처리 가중필터뱅크분석을 통한 특징추출)

  • Jung Sungyun;Bae Keunsung
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • autumn
    • /
    • pp.39-42
    • /
    • 2004
  • 본 논문에서는, 잡음음성에 강인한 음성인식을 위한 특징추출 방법을 제시한다. 제시한 방법은 2 단계 잡음제거 과정으로 구성되어 있다. 첫번째 단계는 MMSE-STSA 음성개선기법을 통해 잡음음성신호를 개선시키는 과정이고, 두 번째 단계는, MMSE-STSA 의 개선된 음성에 후처리 가중필터뱅크분석을 통해 잔여잡음의 영향을 감소시키는 과정이다. 제안한 방법의 성능평가를 위해, AURORA2의 잡음음성 DB 중 테스트 집합 A 에 대해 인식실험을 수행하고, 결과를 기존 방법들과 비교, 검토한다.

  • PDF

Voice Recognition Performance Improvement using a convergence of Voice Energy Distribution Process and Parameter (음성 에너지 분포 처리와 에너지 파라미터를 융합한 음성 인식 성능 향상)

  • Oh, Sang-Yeob
    • Journal of Digital Convergence
    • /
    • v.13 no.10
    • /
    • pp.313-318
    • /
    • 2015
  • A traditional speech enhancement methods distort the sound spectrum generated according to estimation of the remaining noise, or invalid noise is a problem of lowering the speech recognition performance. In this paper, we propose a speech detection method that convergence the sound energy distribution process and sound energy parameters. The proposed method was used to receive properties reduce the influence of noise to maximize voice energy. In addition, the smaller value from the feature parameters of the speech signal The log energy features of the interval having a more of the log energy value relative to the region having a large energy similar to the log energy feature of the size of the voice signal containing the noise which reducing the mismatch of the training and the recognition environment recognition experiments Results confirmed that the improved recognition performance are checked compared to the conventional method. Car noise environment of Pause Hit Rate is in the 0dB and 5dB lower SNR region showed an accuracy of 97.1% and 97.3% in the high SNR region 10dB and 15dB 98.3%, showed an accuracy of 98.6%.

잡음억제 신경회로망에 의한 스펙트럼의 추정 기법

  • Choe, Jae-Seung
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2012.10a
    • /
    • pp.597-599
    • /
    • 2012
  • 음성인식 및 음성신호처리 분야에서 신경회로망은 음성인식의 카테고리 분류에 주로 이용되고 있다는 점에 착안하여, 본 논문에서는 신경회로망의 입력신호로 음성의 진폭 스펙트럼 및 위상 스펙트럼을 사용한 잡음억제를 위한 신경회로망을 제안한다. 본 논문에서 제안한 알고리즘은 고속 푸리에 변환(Fast Fourier Transform, FFT)에 의한 진폭 스펙트럼 및 위상 스펙트럼을 사용한 잡음억제 신경회로망을 이용하여 각 프레임에서 FFT 스펙트럼을 추정한다.

  • PDF