• Title/Summary/Keyword: 유효 음성

Search Result 183, Processing Time 0.028 seconds

An Proposal and Evaluation of the New formant Tracking Algorithm for Speech Recognition (음성인식을 위한 새로운 포만트트랙킹 알고리즘의 제안과 평가)

  • 송정영
    • Journal of Internet Computing and Services
    • /
    • v.3 no.4
    • /
    • pp.51-59
    • /
    • 2002
  • For the speech recognition, this paper proposes a improved new formant tracking algorithm The recognition data for the simulation on this paper are used with the Korean digit speech. The recognition rate of the improved algorithm for the Korean digit speech shows 91% for 300 digit speech The effectiveness of this research has been confirmed through recognition simulations.

  • PDF

Noise reduction system using time-delay neural network (시간지연 신경회로망을 이용한 잡음제거 시스템)

  • Choi Jae-Seung
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.42 no.3 s.303
    • /
    • pp.121-128
    • /
    • 2005
  • On the research field for speech signal, neural network mainly uses for the category classification in speech recognition and applies to signal processing. Accordingly, this paper proposes a noise reduction system using a time-delay neural network, which implements the mapping from the space of speech signal degraded by noise to the space of clean speech signal. It is confirmed that this method is effective for speech degraded not only by white noise but also by colored noise using the noise reduction system, which restores the amplitude component of fast Fourier transform.

Noise Suppression of Speech Signal using TDNN for each Frequency Band (주파수대역별 TDNN을 이용한 음성신호의 잡음억제)

  • Choi, Jae Seung
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2009.05a
    • /
    • pp.341-344
    • /
    • 2009
  • 본 논문에서는 신경회로망(Neural network)에 시간구조를 도입한 시간지연 신경회로망(Time-delay Neural Network: TDNN)을 사용하여 잡음을 포함한 음성신호로부터 잡음을 제거함으로써 음성을 강조하는 것을 목적으로 한다. 본 논문에서는 먼저 각 프레임의 FFT 진폭성분들을 유성음 구간과 무성음 구간으로 검출한 후, 무성음 구간에 대해서는 각 프레임에서 이동평균을 취하여 음성을 강조한다. 유성음 구간에 대해서는 각 프레임의 FFT 진폭성분들을 저역, 중역 및 고역으로 각각 분리한 후에 각 대역의 FFT 진폭성분들을 저역용 TDNN, 중역용 TDNN, 그리고 고역용 TDNN의 입력으로 하여 각 TDNN에 학습시킴으로써 최종 FFT 진폭성분들을 구한다. 본 실험에서는 Aurora2 데이터베이스를 사용하여 FFT의 진폭성분을 복원하는 잡음제거의 알고리즘을 사용하여 여러 잡음에 대해서 본 알고리즘의 유효성을 실험적으로 확인한다.

  • PDF

Reduction Algorithm of Environmental Noise by Multi-band Filter (멀티밴드필터에 의한 환경잡음억압 알고리즘)

  • Choi, Jae-Seung
    • Journal of the Korea Society of Computer and Information
    • /
    • v.17 no.8
    • /
    • pp.91-97
    • /
    • 2012
  • This paper first proposes the speech recognition algorithm by detection of the speech and noise sections at each frame, then proposes the reduction algorithm of environmental noise by multi-band filter which removes the background noises at each frame according to detection of the speech and noise sections. The proposed algorithm reduces the background noises using filter bank sub-band domain after extracting the features from the speech data. In this experiment, experimental results of the proposed noise reduction algorithm by the multi-band filter demonstrate using the speech and noise data, at each frame. Based on measuring the spectral distortion, experiments confirm that the proposed algorithm is effective for the speech by corrupted the noise.

A Study on the Redundancy Reduction in Speech Recognition (음성인식에서 중복성의 저감에 대한 연구)

  • Lee, Chang-Young
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.7 no.3
    • /
    • pp.475-483
    • /
    • 2012
  • The characteristic features of speech signal do not vary significantly from frame to frame. Therefore, it is advisable to reduce the redundancy involved in the similar feature vectors. The objective of this paper is to search for the optimal condition of minimum redundancy and maximum relevancy of the speech feature vectors in speech recognition. For this purpose, we realize redundancy reduction by way of a vigilance parameter and investigate the resultant effect on the speaker-independent speech recognition of isolated words by using FVQ/HMM. Experimental results showed that the number of feature vectors might be reduced by 30% without deteriorating the speech recognition accuracy.

Speaker-dependent Speech Recognition Algorithm for Male and Female Classification (남녀성별 분류를 위한 화자종속 음성인식 알고리즘)

  • Choi, Jae-Seung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.17 no.4
    • /
    • pp.775-780
    • /
    • 2013
  • This paper proposes a speaker-dependent speech recognition algorithm which can classify the gender for male and female speakers in white noise and car noise, using a neural network. The proposed speech recognition algorithm is trained by the neural network to recognize the gender for male and female speakers, using LPC (Linear Predictive Coding) cepstrum coefficients. In the experiment results, the maximal improvement of total speech recognition rate is 96% for white noise and 88% for car noise, respectively, after trained a total of six neural networks. Finally, the proposed speech recognition algorithm is compared with the results of a conventional speech recognition algorithm in the background noisy environment.

Noise Reduction Algorithm in Speech by Wiener Filter (위너필터에 의한 음성 중의 잡음제거 알고리즘)

  • Choi, Jae-Seung
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.8 no.9
    • /
    • pp.1293-1298
    • /
    • 2013
  • This paper proposes a noise reduction algorithm using Wiener filter to remove the noise components from the noisy speech in order to improve the speech signal. The proposed algorithm first removes the noise spectrums of white noise from the noisy signal based on the noise reshaping and reduction method at each frame. And this algorithm enhances the speech signal using Wiener filter based on linear predictive coding analysis. In this experiment, experimental results of the proposed algorithm demonstrate using the speech and noise data by Japanese male speaker. Based on measuring the spectral distortion (SD) measure, experiments confirm that the proposed algorithm is effective for the speech by contaminated white noise. From the experiments, the maximum improvement in the output SD values was 4.94 dB better for white noise compared with former Wiener filter.

Language Models Using Iterative Learning Method for the Improvement of Performance of CSR System (연속음성인식 시스템의 성능 향상을 위한 반복학습법을 이용한 언어모델)

  • Oh Se-Jin;Hwang Cheol-Jun;Kim Bum-Koog;Jung Ho-Ynul;Chung Hyun-Yeol
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • spring
    • /
    • pp.82-85
    • /
    • 1999
  • 본 연구에서는 연속음성인식 시스템의 성능 향상을 위하여 음성의 채록환경 및 데이터량 등을 고려한 효과적인 언어모델 작성방법을 제안하고, 이를 항공편 예약시스템에 적용하여 성능 평가 실험을 실시한 결과 $91.6\%$의 인식률을 얻어 제안한 방법의 유효성을 확인하였다. 이를 위하여 소량의 200문장의 항공편 예약 텍스트 데이터를 이용하여 좀더 강건한 단어발생 확률을 가지도록 하기 위해 일반적으로 대어휘 연속음성인식에서 많이 이용되고 있는 단어 N-gram 언어모델을 도입하고 이를 다양한 발성환경을 고려하여 1,154문장으로 확장한 후 동일 문장'을 반복 학습하여 언어모델을 작성하였다. 인식에 있어서는 오인식과 문법적 오류를 최소화하기 위하여 forward - backward pass 방법의 stack decoding알고리즘을 이용하였다. 인식실험 결과, 평가용 3인의 200문장을 각 반복학습 회수에 따라 학습한 각 언어모델에 대해 평가한 결과, forward pass의 경우 평균 $84.1\%$, backward pass의 경우 평균 $91.6\%$의 문장 인식률을 얻었다. 또한, 반복학습 회수가 증가함에 따라 backward pass의 인시률의 변화는 없었으나, forward pass의 경우, 인식률이 반복회수에 따라 증가하다가 일정값에 수렴함을 알 수 있었고, 언어모델의 복잡도에서도 반복회수가 증가함에 따라 서서히 줄어들며 수렴함을 알 수 있었다. 이상의 결과로부터 소량의 텍스트 데이터를 이용한 제한된 태스크에서 언어모델을 작성할 때 반복학습 방법이 유효함을 확인할 수 있다.

  • PDF

A Korean speech recognition based on conformer (콘포머 기반 한국어 음성인식)

  • Koo, Myoung-Wan
    • The Journal of the Acoustical Society of Korea
    • /
    • v.40 no.5
    • /
    • pp.488-495
    • /
    • 2021
  • We propose a speech recognition system based on conformer. Conformer is known to be convolution-augmented transformer, which combines transfer model for capturing global information with Convolution Neural Network (CNN) for exploiting local feature effectively. The baseline system is developed to be a transfer-based speech recognition using Long Short-Term Memory (LSTM)-based language model. The proposed system is a system which uses conformer instead of transformer with transformer-based language model. When Electronics and Telecommunications Research Institute (ETRI) speech corpus in AI-Hub is used for our evaluation, the proposed system yields 5.7 % of Character Error Rate (CER) while the baseline system results in 11.8 % of CER. Even though speech corpus is extended into other domain of AI-hub such as NHNdiguest speech corpus, the proposed system makes a robust performance for two domains. Throughout those experiments, we can prove a validation of the proposed system.

A Noise Robust Speech Recognition Method Using Model Compensation Based on Speech Enhancement (음성 개선 기반의 모델 보상 기법을 이용한 강인한 잡음 음성 인식)

  • Shen, Guang-Hu;Jung, Ho-Youl;Chung, Hyun-Yeol
    • The Journal of the Acoustical Society of Korea
    • /
    • v.27 no.4
    • /
    • pp.191-199
    • /
    • 2008
  • In this paper, we propose a MWF-PMC noise processing method which enhances the input speech by using Mel-warped Wiener Filtering (MWF) at pre-processing stage and compensates the recognition model by using PMC (Parallel Model Combination) at post-processing stage for speech recognition in noisy environments. The PMC uses the residual noise extracted from the silence region of enhanced speech at pre-processing stage to compensate the clean speech model and thus this method is considered to improve the performance of speech recognition in noisy environments. For recognition experiments we dew.-sampled KLE PBW (Phoneme Balanced Words) 452 word speech data to 8kHz and made 5 different SNR levels of noisy speech, i.e., 0dB. 5dB, 10dB, 15dB and 20dB, by adding Subway, Car and Exhibition noise to clean speech. From the recognition results, we could confirm the effectiveness of the proposed MWF-PMC method by obtaining the improved recognition performances over all compared with the existing combined methods.