• Title/Summary/Keyword: Continuous speech enhancement

Search Result 11, Processing Time 0.024 seconds

Performance Analysis of Noisy Speech Recognition Depending on Parameters for Noise and Signal Power Estimation in MMSE-STSA Based Speech Enhancement (MMSE-STSA 기반의 음성개선 기법에서 잡음 및 신호 전력 추정에 사용되는 파라미터 값의 변화에 따른 잡음음성의 인식성능 분석)

  • Park Chul-Ho;Bae Keun-Sung
    • MALSORI
    • /
    • no.57
    • /
    • pp.153-164
    • /
    • 2006
  • The MMSE-STSA based speech enhancement algorithm is widely used as a preprocessing for noise robust speech recognition. It weighs the gain of each spectral bin of the noisy speech using the estimate of noise and signal power spectrum. In this paper, we investigate the influence of parameters used to estimate the speech signal and noise power in MMSE-STSA upon the recognition performance of noisy speech. For experiments, we use the Aurora2 DB which contains noisy speech with subway, babble, car, and exhibition noises. The HTK-based continuous HMM system is constructed for recognition experiments. Experimental results are presented and discussed with our findings.

  • PDF

Combining deep learning-based online beamforming with spectral subtraction for speech recognition in noisy environments (잡음 환경에서의 음성인식을 위한 온라인 빔포밍과 스펙트럼 감산의 결합)

  • Yoon, Sung-Wook;Kwon, Oh-Wook
    • The Journal of the Acoustical Society of Korea
    • /
    • v.40 no.5
    • /
    • pp.439-451
    • /
    • 2021
  • We propose a deep learning-based beamformer combined with spectral subtraction for continuous speech recognition operating in noisy environments. Conventional beamforming systems were mostly evaluated by using pre-segmented audio signals which were typically generated by mixing speech and noise continuously on a computer. However, since speech utterances are sparsely uttered along the time axis in real environments, conventional beamforming systems degrade in case when noise-only signals without speech are input. To alleviate this drawback, we combine online beamforming algorithm and spectral subtraction. We construct a Continuous Speech Enhancement (CSE) evaluation set to evaluate the online beamforming algorithm in noisy environments. The evaluation set is built by mixing sparsely-occurring speech utterances of the CHiME3 evaluation set and continuously-played CHiME3 background noise and background music of MUSDB. Using a Kaldi-based toolkit and Google web speech recognizer as a speech recognition back-end, we confirm that the proposed online beamforming algorithm with spectral subtraction shows better performance than the baseline online algorithm.

Robust Speech Recognition in the Car Interior Environment having Car Noise and Audio Output (자동차 잡음 및 오디오 출력신호가 존재하는 자동차 실내 환경에서의 강인한 음성인식)

  • Park, Chul-Ho;Bae, Jae-Chul;Bae, Keun-Sung
    • MALSORI
    • /
    • no.62
    • /
    • pp.85-96
    • /
    • 2007
  • In this paper, we carried out recognition experiments for noisy speech having various levels of car noise and output of an audio system using the speech interface. The speech interface consists of three parts: pre-processing, acoustic echo canceller, post-processing. First, a high pass filter is employed as a pre-processing part to remove some engine noises. Then, an echo canceller implemented by using an FIR-type filter with an NLMS adaptive algorithm is used to remove the music or speech coming from the audio system in a car. As a last part, the MMSE-STSA based speech enhancement method is applied to the out of the echo canceller to remove the residual noise further. For recognition experiments, we generated test signals by adding music to the car noisy speech from Aurora 2 database. The HTK-based continuous HMM system is constructed for a recognition system. Experimental results show that the proposed speech interface is very promising for robust speech recognition in a noisy car environment.

  • PDF

A Model for Post-processing of Speech Recognition Using Syntactic Unit of Morphemes (구문형태소 단위를 이용한 음성 인식의 후처리 모델)

  • 양승원;황이규
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.7 no.3
    • /
    • pp.74-80
    • /
    • 2002
  • There are many researches on post-processing methods for the Korean continuous speech recognition enhancement using natural language processing techniques. It is very difficult to use a formal morphological analyzer for improving the speech recognition because the analysis technique of natural language processing is mainly for formal written languages. In this paper, we propose a speech recognition enhancement model using syntactic unit of morphemes. This approach uses the functional word level longest match which dose not consider spacing words. We describe the post-processing mechanism for the improving speech recognition by using proposed model which uses the relationship of phonological structure information between predicates md auxiliary predicates or bound nouns that are frequently occurred in Korean sentences.

  • PDF

Study on Efficient Generation of Dictionary for Korean Vocabulary Recognition (한국어 음성인식을 위한 효율적인 사전 구성에 관한 연구)

  • Lee Sang-Bok;Choi Dae-Lim;Kim Chong-Kyo
    • Proceedings of the KSPS conference
    • /
    • 2002.11a
    • /
    • pp.41-44
    • /
    • 2002
  • This paper is related to the enhancement of speech recognition rate using enhanced pronunciation dictionary. Modern large vocabulary, continuous speech recognition systems have pronunciation dictionaries. A pronunciation dictionary provides pronunciation information for each word in the vocabulary in phonemic units, which are modeled in detail by the acoustic models. But in most speech recognition system based on Hidden Markov Model, actual pronunciation variations are disregarded. Without the pronunciation variations in the speech recognition system, the phonetic transcriptions in the dictionary do not match the actual occurrences in the database. In this paper, we proposed the unvoiced rule of semivowel in allophone rules to pronunciation dictionary. Experimental results on speech recognition system give higher performance than existing pronunciation dictionaries.

  • PDF

Phonetic Tied-Mixture Syllable Model for Efficient Decoding in Korean ASR (효율적 한국어 음성 인식을 위한 PTM 음절 모델)

  • Kim Bong-Wan;Lee Yong-Jn
    • MALSORI
    • /
    • no.50
    • /
    • pp.139-150
    • /
    • 2004
  • A Phonetic Tied-Mixture (PTM) model has been proposed as a way of efficient decoding in large vocabulary continuous speech recognition systems (LVCSR). It has been reported that PTM model shows better performance in decoding than triphones by sharing a set of mixture components among states of the same topological location[5]. In this paper we propose a Phonetic Tied-Mixture Syllable (PTMS) model which extends PTM technique up to syllables. The proposed PTMS model shows 13% enhancement in decoding speed than PTM. In spite of difference in context dependent modeling (PTM : cross-word context dependent modeling, PTMS : word-internal left-phone dependent modeling), the proposed model shows just less than 1% degradation in word accuracy than PTM with the same beam width. With a different beam width, it shows better word accuracy than in PTM at the same or higher speed.

  • PDF

A Study on the PMC Adaptation for Speech Recognition under Noisy Conditions (잡음 환경에서의 음성인식을 위한 PMC 적응에 관한 연구)

  • 김현기
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.7 no.3
    • /
    • pp.9-14
    • /
    • 2002
  • In this paper we propose a method for performance enhancement of speech recognizer under noisy conditions. The parallel combination model which is presented at the PMC method using multiple Gaussian-distributed mixtures have been adapted to the variation of each mixture. The CDHMM(continuous observation density HMM) which has multiple Gaussian distributed mixtures are combined by the proposed PMC method. Also, the EM(expectation maximization) algorithm is used for adapting the model mean parameter in order to reduce the variation of the mixture density. The result of simulation, the proposed PMC adaptation method show better performance than the conventional PMC method.

  • PDF

Variation Analysis of Feature Parameters According to the Channel Distortion of Korean Telephone Digit Speech (한국어 숫자음 전화음성의 채널왜곡에 따른 특징파라미터의 변이 분석)

  • 정성윤;손종목;김민성;배건성
    • Proceedings of the IEEK Conference
    • /
    • 2002.06d
    • /
    • pp.191-194
    • /
    • 2002
  • The final purpose of this paper is the enhancement of speech recognition rate under the matched telephone environment between training data and test data. To analyze the effect by the distortion of the changing telephone channel on every call, MFCC is used as the feature parameter and CMN, RTCN, and RASTA are used as channel compensation techniques. For each case, the variation of feature parameters of all phones is analyzed. And, we find recognition rates according to each compensation method using the continuous HMM recognizer, and examine the relationship between variation and recognition rate.

  • PDF

Signal Enhancement of a Variable Rate Vocoder with a Hybrid domain SNR Estimator

  • Park, Hyung Woo
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.2
    • /
    • pp.962-977
    • /
    • 2019
  • The human voice is a convenient method of information transfer between different objects such as between men, men and machine, between machines. The development of information and communication technology, the voice has been able to transfer farther than before. The way to communicate, it is to convert the voice to another form, transmit it, and then reconvert it back to sound. In such a communication process, a vocoder is a method of converting and re-converting a voice and sound. The CELP (Code-Excited Linear Prediction) type vocoder, one of the voice codecs, is adapted as a standard codec since it provides high quality sound even though its transmission speed is relatively low. The EVRC (Enhanced Variable Rate CODEC) and QCELP (Qualcomm Code-Excited Linear Prediction), variable bit rate vocoders, are used for mobile phones in 3G environment. For the real-time implementation of a vocoder, the reduction of sound quality is a typical problem. To improve the sound quality, that is important to know the size and shape of noise. In the existing sound quality improvement method, the voice activated is detected or used, or statistical methods are used by the large mount of data. However, there is a disadvantage in that no noise can be detected, when there is a continuous signal or when a change in noise is large.This paper focused on finding a better way to decrease the reduction of sound quality in lower bit transmission environments. Based on simulation results, this study proposed a preprocessor application that estimates the SNR (Signal to Noise Ratio) using the spectral SNR estimation method. The SNR estimation method adopted the IMBE (Improved Multi-Band Excitation) instead of using the SNR, which is a continuous speech signal. Finally, this application improves the quality of the vocoder by enhancing sound quality adaptively.

A Case of Subdural Hematoma after Epidural Blood Patch in a Spontaneous Intracranial Hypotensive Patient - A case report - (자발성 두개강내 저혈압성 두통 환자에서 치료 도중 발생한 경막하혈종 - 증례보고 -)

  • Kim, Yeui Seok;Han, Kyung Ream;Kim, Chan
    • The Korean Journal of Pain
    • /
    • v.20 no.2
    • /
    • pp.235-239
    • /
    • 2007
  • Spontaneous intracranial hypotension (SIH) is believed to be a benign disease. However, numerous studies have reported serious complications related to SIH, including subdural hematoma. In this case report, a 54-year-old male patient visited the emergency room with orthostatic headache. A brain magnetic resonance imaging (MRI) study showed diffuse mild thickening and enhancement of pachymeninges, with a suspicious minimal amount of subdural fluid collected in the left posterior parietal area. His orthostatic headache showed no improvement with conservative treatment; but his pain was almost completely relieved after two trials of cervical epidural blood patch. On the 74th day after the onset of his pain, the patient showed a drowsy mental status and slurred speech when he visited the pain clinic. Brain computerized tomography indicated a left subdural hemorrhage, and he underwent emergency operation to drain the SDH. In conclusion, pain clinicians should pay attention to abrupt changes in mental status as well as continuous headache, for the early diagnosis of SDH in SIH patients.