• Title/Summary/Keyword: robust speech recognition

Search Result 225, Processing Time 0.026 seconds

Model adaptation employing DNN-based estimation of noise corruption function for noise-robust speech recognition (잡음 환경 음성 인식을 위한 심층 신경망 기반의 잡음 오염 함수 예측을 통한 음향 모델 적응 기법)

  • Yoon, Ki-mu;Kim, Wooil
    • The Journal of the Acoustical Society of Korea
    • /
    • v.38 no.1
    • /
    • pp.47-50
    • /
    • 2019
  • This paper proposes an acoustic model adaptation method for effective speech recognition in noisy environments. In the proposed algorithm, the noise corruption function is estimated employing DNN (Deep Neural Network), and the function is applied to the model parameter estimation. The experimental results using the Aurora 2.0 framework and database demonstrate that the proposed model adaptation method shows more effective in known and unknown noisy environments compared to the conventional methods. In particular, the experiments of the unknown environments show 15.87 % of relative improvement in the average of WER (Word Error Rate).

Acoustic Channel Compensation at Mel-frequency Spectrum Domain

  • Jeong, So-Young;Oh, Sang-Hoon;Lee, Soo-Young
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.1E
    • /
    • pp.43-48
    • /
    • 2003
  • The effects of linear acoustic channels have been analyzed and compensated at mel-frequency feature domain. Unlike popular RASTA filtering our approach incorporates separate filters for each mel-frequency band, which results in better recognition performance for heavy-reverberated speeches.

Spectral Feature Transformation for Compensation of Microphone Mismatches

  • Jeong, So-Young;Oh, Sang-Hoon;Lee, Soo-Young
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.4E
    • /
    • pp.150-154
    • /
    • 2003
  • The distortion effects of microphones have been analyzed and compensated at mel-frequency feature domain. Unlike popular bias removal algorithms a linear transformation of mel-frequency spectrum is incorporated. Although a diagonal matrix transformation is sufficient for medium-quality microphones, a full-matrix transform is required for low-quality microphones with severe nonlinearity. Proposed compensation algorithms are tested with HTIMIT database, which resulted in about 5 percents improvements in recognition rate over conventional CMS algorithm.

A Study on Speech Recognition in a Running Automobile (주행중인 자동차 환경에서의 음성인식 연구)

  • 양진우;김순협
    • The Journal of the Acoustical Society of Korea
    • /
    • v.19 no.5
    • /
    • pp.3-8
    • /
    • 2000
  • In this paper, we studied design and implementation of a robust speech recognition system in noisy car environment. The reference pattern used in the system is DMS(Dynamic Multi-Section). Two separate acoustic models, which are selected automatically depending on the noisy car environment for the speech in a car moving at below 80km/h and over 80km/h are proposed. PLP(Perceptual Linear Predictive) of order 13 is used for the feature vector and OSDP (One-Stage Dynamic Programming) is used for decoding. The system also has the function of editing the phone-book for voice dialing. The system yields a recognition rate of 89.75% for male speakers in SI (speaker independent) mode in a car running on a cemented express way at over 80km/h with a vocabulary of 33 words. The system also yields a recognition rate of 92.29% for male speakers in SI mode in a car running on a paved express way at over 80km/h.

  • PDF

Features for Figure Speech Recognition in Noise Environment (잡음환경에서의 숫자음 인식을 위한 특징파라메타)

  • Lee, Jae-Ki;Koh, Si-Young;Lee, Kwang-Suk;Hur, Kang-In
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • v.9 no.2
    • /
    • pp.473-476
    • /
    • 2005
  • This paper is proposed a robust various feature parameters in noise. Feature parameter MFCC(Mel Frequency Cepstral Coefficient) used in conventional speech recognition shows good performance. But, parameter transformed feature space that uses PCA(Principal Component Analysis)and ICA(Independent Component Analysis) that is algorithm transformed parameter MFCC's feature space that use in old for more robust performance in noise is compared with the conventional parameter MFCC's performance. The result shows more superior performance than parameter and MFCC that feature parameter transformed by the result ICA is transformed by PCA.

  • PDF

Robust Feature Parameter for Implementation of Speech Recognizer Using Support Vector Machines (SVM음성인식기 구현을 위한 강인한 특징 파라메터)

  • 김창근;박정원;허강인
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.41 no.3
    • /
    • pp.195-200
    • /
    • 2004
  • In this paper we propose effective speech recognizer through two recognition experiments. In general, SVM is classification method which classify two class set by finding voluntary nonlinear boundary in vector space and possesses high classification performance under few training data number. In this paper we compare recognition performance of HMM and SVM at training data number and investigate recognition performance of each feature parameter while changing feature space of MFCC using Independent Component Analysis(ICA) and Principal Component Analysis(PCA). As a result of experiment, recognition performance of SVM is better than 1:.um under few training data number, and feature parameter by ICA showed the highest recognition performance because of superior linear classification.

A Study on a Method of U/V Decision by Using The LSP Parameter in The Speech Signal (LSP 파라미터를 이용한 음성신호의 성분분리에 관한 연구)

  • 이희원;나덕수;정찬중;배명진
    • Proceedings of the IEEK Conference
    • /
    • 1999.06a
    • /
    • pp.1107-1110
    • /
    • 1999
  • In speech signal processing, the accurate decision of the voiced/unvoiced sound is important for robust word recognition and analysis and a high coding efficiency. In this paper, we propose the mehod of the voiced/unvoiced decision using the LSP parameter which represents the spectrum characteristics of the speech signal. The voiced sound has many more LSP parameters in low frequency region. To the contrary, the unvoiced sound has many more LSP parameters in high frequency region. That is, the LSP parameter distribution of the voiced sound is different to that of the unvoiced sound. Also, the voiced sound has the minimun value of sequantial intervals of the LSP parameters in low frequency region. The unvoiced sound has it in high frequency region. we decide the voiced/unvoiced sound by using this charateristics. We used the proposed method to some continuous speech and then achieved good performance.

  • PDF

Speech Feature Extraction Using Auditory Model (청각모델을 이용한 음성신호의 특징 추출 방법에 관한 연구)

  • Park, Kyu-Hong;Kim, Young-Ho;Jung, Sang-Kuk;Rho, Seung-Yong
    • Proceedings of the KIEE Conference
    • /
    • 1998.07g
    • /
    • pp.2259-2261
    • /
    • 1998
  • Auditory Models that are capable of achieving human performance would provide a basis for realizing effective speech processing systems. Perceptual invariance to adverse signal conditions (noise, microphone and channel distortions, room reverberations) may provide a basis for robust speech recognition and speech coder with high efficiency. Auditory model that simulates the part of auditory periphery up through the auditory nerve level and new distance measure that is defined as angle between vectors are described.

  • PDF

N-gram Based Robust Spoken Document Retrievals for Phoneme Recognition Errors (음소인식 오류에 강인한 N-gram 기반 음성 문서 검색)

  • Lee, Su-Jang;Park, Kyung-Mi;Oh, Yung-Hwan
    • MALSORI
    • /
    • no.67
    • /
    • pp.149-166
    • /
    • 2008
  • In spoken document retrievals (SDR), subword (typically phonemes) indexing term is used to avoid the out-of-vocabulary (OOV) problem. It makes the indexing and retrieval process independent from any vocabulary. It also requires a small corpus to train the acoustic model. However, subword indexing term approach has a major drawback. It shows higher word error rates than the large vocabulary continuous speech recognition (LVCSR) system. In this paper, we propose an probabilistic slot detection and n-gram based string matching method for phone based spoken document retrievals to overcome high error rates of phone recognizer. Experimental results have shown 9.25% relative improvement in the mean average precision (mAP) with 1.7 times speed up in comparison with the baseline system.

  • PDF

Design of Robust Speech Recognition System Using Tandem Architecture (탠덤 구조를 이용한 강인한 음성 인식 시스템 설계)

  • Yun, Young-Sun;Lee, Yun-Keun
    • Proceedings of the KSPS conference
    • /
    • 2007.05a
    • /
    • pp.323-326
    • /
    • 2007
  • The various studies of combining neural network and hidden Markov models within a single system are done with expectations that it may potentially combine the advantages of both systems. With the influence of these studies, tandem approach was presented to use neural network as the classifier and hidden Markov models as the decoder. In this paper, we applied the trend information of segmental features to tandem architecture and used posterior probabilities, which are the output of neural network, as inputs of recognition system. The experiments are performed on Aurora2 database to examine the potentiality of the trend feature based tandem architecture. The proposed method shows the better results than the baseline system on very low SNR environments.

  • PDF