• 제목/요약/키워드: Continuous speech enhancement

검색결과 11건 처리시간 0.019초

MMSE-STSA 기반의 음성개선 기법에서 잡음 및 신호 전력 추정에 사용되는 파라미터 값의 변화에 따른 잡음음성의 인식성능 분석 (Performance Analysis of Noisy Speech Recognition Depending on Parameters for Noise and Signal Power Estimation in MMSE-STSA Based Speech Enhancement)

  • 박철호;배건성
    • 대한음성학회지:말소리
    • /
    • 제57호
    • /
    • pp.153-164
    • /
    • 2006
  • The MMSE-STSA based speech enhancement algorithm is widely used as a preprocessing for noise robust speech recognition. It weighs the gain of each spectral bin of the noisy speech using the estimate of noise and signal power spectrum. In this paper, we investigate the influence of parameters used to estimate the speech signal and noise power in MMSE-STSA upon the recognition performance of noisy speech. For experiments, we use the Aurora2 DB which contains noisy speech with subway, babble, car, and exhibition noises. The HTK-based continuous HMM system is constructed for recognition experiments. Experimental results are presented and discussed with our findings.

  • PDF

잡음 환경에서의 음성인식을 위한 온라인 빔포밍과 스펙트럼 감산의 결합 (Combining deep learning-based online beamforming with spectral subtraction for speech recognition in noisy environments)

  • 윤성욱;권오욱
    • 한국음향학회지
    • /
    • 제40권5호
    • /
    • pp.439-451
    • /
    • 2021
  • 본 논문에서는 실제 환경에서의 연속 음성 강화를 위한 딥러닝 기반 온라인 빔포밍 알고리듬과 스펙트럼 감산을 결합한 빔포머를 제안한다. 기존 빔포밍 시스템은 컴퓨터에서 음성과 잡음을 완전히 겹친 방식으로 혼합하여 생성된 사전 분할 오디오 신호를 사용하여 대부분 평가되었다. 하지만 실제 환경에서는 시간 축으로 음성 발화가 띄엄띄엄 발성되기 때문에, 음성이 없는 잡음 신호가 시스템에 입력되면 기존 빔포밍 알고리듬의 성능이 저하된다. 이러한 효과를 경감하기 위하여, 심층 학습 기반 온라인 빔포밍 알고리듬과 스펙트럼 감산을 결합하였다. 잡음 환경에서 온라인 빔포밍 알고리듬을 평가하기 위해 연속 음성 강화 세트를 구성하였다. 평가 세트는 CHiME3 평가 세트에서 추출한 음성 발화와 CHiME3 배경 잡음 및 MUSDB에서 추출한 연속 재생되는 배경음악을 혼합하여 구성되었다. 음성인식기로는 Kaldi 기반 툴킷 및 구글 웹 음성인식기를 사용하였다. 제안한 온라인 빔포밍 알고리듬 과 스펙트럼 감산이 베이스라인 빔포밍 알고리듬에 비해 성능 향상을 보임을 확인하였다.

자동차 잡음 및 오디오 출력신호가 존재하는 자동차 실내 환경에서의 강인한 음성인식 (Robust Speech Recognition in the Car Interior Environment having Car Noise and Audio Output)

  • 박철호;배재철;배건성
    • 대한음성학회지:말소리
    • /
    • 제62호
    • /
    • pp.85-96
    • /
    • 2007
  • In this paper, we carried out recognition experiments for noisy speech having various levels of car noise and output of an audio system using the speech interface. The speech interface consists of three parts: pre-processing, acoustic echo canceller, post-processing. First, a high pass filter is employed as a pre-processing part to remove some engine noises. Then, an echo canceller implemented by using an FIR-type filter with an NLMS adaptive algorithm is used to remove the music or speech coming from the audio system in a car. As a last part, the MMSE-STSA based speech enhancement method is applied to the out of the echo canceller to remove the residual noise further. For recognition experiments, we generated test signals by adding music to the car noisy speech from Aurora 2 database. The HTK-based continuous HMM system is constructed for a recognition system. Experimental results show that the proposed speech interface is very promising for robust speech recognition in a noisy car environment.

  • PDF

구문형태소 단위를 이용한 음성 인식의 후처리 모델 (A Model for Post-processing of Speech Recognition Using Syntactic Unit of Morphemes)

  • 양승원;황이규
    • 한국산업정보학회논문지
    • /
    • 제7권3호
    • /
    • pp.74-80
    • /
    • 2002
  • 한국어 연속 음성 인식결과의 성능향상을 위해서 자연어 처리 기술을 이용한 후처리 기법이 사용된다. 그러나 자연어 처리 기법이 대부분 띄어쓰기가 있는 정형화된 입력 문장에 대한 분석을 수행하여 왔기 때문에 형태소 분석기를 직접 음성인식 결과의 향상에 사용하는 데에는 어려운 점이 많다. 본 논문에서는 띄어쓰기를 고려하지 않는 기능어 기반의 최장일치 형태소 해석 방법인 구문 형태소 단위의 분석을 이용한 음정인식 결과의 향상 모델을 제안한다. 제안된 모델을 통해 연속음성 인식 결과에서 자주 발생하는 용언과 보조 용언 및 의존 명사 사이의 음운들 사이의 구조적 정보를 활용함으로써 음성 인식 결과의 성능을 향상시키는 방법에 대해 기술한다.

  • PDF

한국어 음성인식을 위한 효율적인 사전 구성에 관한 연구 (Study on Efficient Generation of Dictionary for Korean Vocabulary Recognition)

  • 이상복;최대림;김종교
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2002년도 11월 학술대회지
    • /
    • pp.41-44
    • /
    • 2002
  • This paper is related to the enhancement of speech recognition rate using enhanced pronunciation dictionary. Modern large vocabulary, continuous speech recognition systems have pronunciation dictionaries. A pronunciation dictionary provides pronunciation information for each word in the vocabulary in phonemic units, which are modeled in detail by the acoustic models. But in most speech recognition system based on Hidden Markov Model, actual pronunciation variations are disregarded. Without the pronunciation variations in the speech recognition system, the phonetic transcriptions in the dictionary do not match the actual occurrences in the database. In this paper, we proposed the unvoiced rule of semivowel in allophone rules to pronunciation dictionary. Experimental results on speech recognition system give higher performance than existing pronunciation dictionaries.

  • PDF

효율적 한국어 음성 인식을 위한 PTM 음절 모델 (Phonetic Tied-Mixture Syllable Model for Efficient Decoding in Korean ASR)

  • 김봉완;이용주
    • 대한음성학회지:말소리
    • /
    • 제50호
    • /
    • pp.139-150
    • /
    • 2004
  • A Phonetic Tied-Mixture (PTM) model has been proposed as a way of efficient decoding in large vocabulary continuous speech recognition systems (LVCSR). It has been reported that PTM model shows better performance in decoding than triphones by sharing a set of mixture components among states of the same topological location[5]. In this paper we propose a Phonetic Tied-Mixture Syllable (PTMS) model which extends PTM technique up to syllables. The proposed PTMS model shows 13% enhancement in decoding speed than PTM. In spite of difference in context dependent modeling (PTM : cross-word context dependent modeling, PTMS : word-internal left-phone dependent modeling), the proposed model shows just less than 1% degradation in word accuracy than PTM with the same beam width. With a different beam width, it shows better word accuracy than in PTM at the same or higher speed.

  • PDF

잡음 환경에서의 음성인식을 위한 PMC 적응에 관한 연구 (A Study on the PMC Adaptation for Speech Recognition under Noisy Conditions)

  • 김현기
    • 한국산업정보학회논문지
    • /
    • 제7권3호
    • /
    • pp.9-14
    • /
    • 2002
  • 본 논문에서는 잡음 환경에서 음성 인식기의 성능을 향상시키기 위한 방법을 제안한다. 제안한 방법은 기존의 PMC방법으로 상태 당 가지 수가 많은 모델을 만들 때 발생하는 확률 밀도 분포의 변화를 보상하기 위해 상태 수준에서 조합한 파라미터를 재 추정하여 각 상태에서 가지의 확률 분포의 변화를 적응시키는 방법이다. 상태 당 다수의 가지를 가지는 CDHMM은 제안한 PMC 방법과 조합된다. 또한, EM 알고리즘은 가지 평균의 분산을 줄이기 위하여 모델 평균 파라미터를 적응시키는데 사용한다. 그리고 시뮬레이션을 통하여 본 논문에서 제안한 PMC 방법은 기존의PMC 방법보다 더 향상된 성능을 얻을 수 있었다.

  • PDF

한국어 숫자음 전화음성의 채널왜곡에 따른 특징파라미터의 변이 분석 (Variation Analysis of Feature Parameters According to the Channel Distortion of Korean Telephone Digit Speech)

  • 정성윤;손종목;김민성;배건성
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2002년도 하계종합학술대회 논문집(4)
    • /
    • pp.191-194
    • /
    • 2002
  • The final purpose of this paper is the enhancement of speech recognition rate under the matched telephone environment between training data and test data. To analyze the effect by the distortion of the changing telephone channel on every call, MFCC is used as the feature parameter and CMN, RTCN, and RASTA are used as channel compensation techniques. For each case, the variation of feature parameters of all phones is analyzed. And, we find recognition rates according to each compensation method using the continuous HMM recognizer, and examine the relationship between variation and recognition rate.

  • PDF

Signal Enhancement of a Variable Rate Vocoder with a Hybrid domain SNR Estimator

  • Park, Hyung Woo
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제13권2호
    • /
    • pp.962-977
    • /
    • 2019
  • The human voice is a convenient method of information transfer between different objects such as between men, men and machine, between machines. The development of information and communication technology, the voice has been able to transfer farther than before. The way to communicate, it is to convert the voice to another form, transmit it, and then reconvert it back to sound. In such a communication process, a vocoder is a method of converting and re-converting a voice and sound. The CELP (Code-Excited Linear Prediction) type vocoder, one of the voice codecs, is adapted as a standard codec since it provides high quality sound even though its transmission speed is relatively low. The EVRC (Enhanced Variable Rate CODEC) and QCELP (Qualcomm Code-Excited Linear Prediction), variable bit rate vocoders, are used for mobile phones in 3G environment. For the real-time implementation of a vocoder, the reduction of sound quality is a typical problem. To improve the sound quality, that is important to know the size and shape of noise. In the existing sound quality improvement method, the voice activated is detected or used, or statistical methods are used by the large mount of data. However, there is a disadvantage in that no noise can be detected, when there is a continuous signal or when a change in noise is large.This paper focused on finding a better way to decrease the reduction of sound quality in lower bit transmission environments. Based on simulation results, this study proposed a preprocessor application that estimates the SNR (Signal to Noise Ratio) using the spectral SNR estimation method. The SNR estimation method adopted the IMBE (Improved Multi-Band Excitation) instead of using the SNR, which is a continuous speech signal. Finally, this application improves the quality of the vocoder by enhancing sound quality adaptively.

자발성 두개강내 저혈압성 두통 환자에서 치료 도중 발생한 경막하혈종 - 증례보고 - (A Case of Subdural Hematoma after Epidural Blood Patch in a Spontaneous Intracranial Hypotensive Patient - A case report -)

  • 김의석;한경림;김찬
    • The Korean Journal of Pain
    • /
    • 제20권2호
    • /
    • pp.235-239
    • /
    • 2007
  • Spontaneous intracranial hypotension (SIH) is believed to be a benign disease. However, numerous studies have reported serious complications related to SIH, including subdural hematoma. In this case report, a 54-year-old male patient visited the emergency room with orthostatic headache. A brain magnetic resonance imaging (MRI) study showed diffuse mild thickening and enhancement of pachymeninges, with a suspicious minimal amount of subdural fluid collected in the left posterior parietal area. His orthostatic headache showed no improvement with conservative treatment; but his pain was almost completely relieved after two trials of cervical epidural blood patch. On the 74th day after the onset of his pain, the patient showed a drowsy mental status and slurred speech when he visited the pain clinic. Brain computerized tomography indicated a left subdural hemorrhage, and he underwent emergency operation to drain the SDH. In conclusion, pain clinicians should pay attention to abrupt changes in mental status as well as continuous headache, for the early diagnosis of SDH in SIH patients.