• Title/Summary/Keyword: speech enhancement

Search Result 340, Processing Time 0.027 seconds

A Study on Lip-reading Enhancement Using Time-domain Filter (시간영역 필터를 이용한 립리딩 성능향상에 관한 연구)

  • 신도성;김진영;최승호
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.5
    • /
    • pp.375-382
    • /
    • 2003
  • Lip-reading technique based on bimodal is to enhance speech recognition rate in noisy environment. It is most important to detect the correct lip-image. But it is hard to estimate stable performance in dynamic environment, because of many factors to deteriorate Lip-reading's performance. There are illumination change, speaker's pronunciation habit, versatility of lips shape and rotation or size change of lips etc. In this paper, we propose the IIR filtering in time-domain for the stable performance. It is very proper to remove the noise of speech, to enhance performance of recognition by digital filtering in time domain. While the lip-reading technique in whole lip image makes data massive, the Principal Component Analysis of pre-process allows to reduce the data quantify by detection of feature without loss of image information. For the observation performance of speech recognition using only image information, we made an experiment on recognition after choosing 22 words in available car service. We used Hidden Markov Model by speech recognition algorithm to compare this words' recognition performance. As a result, while the recognition rate of lip-reading using PCA is 64%, Time-domain filter applied to lip-reading enhances recognition rate of 72.4%.

Speech Recognition Using Noise Robust Features and Spectral Subtraction (잡음에 강한 특징 벡터 및 스펙트럼 차감법을 이용한 음성 인식)

  • Shin, Won-Ho;Yang, Tae-Young;Kim, Weon-Goo;Youn, Dae-Hee;Seo, Young-Joo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.15 no.5
    • /
    • pp.38-43
    • /
    • 1996
  • This paper compares the recognition performances of feature vectors known to be robust to the environmental noise. And, the speech subtraction technique is combined with the noise robust feature to get more performance enhancement. The experiments using SMC(Short time Modified Coherence) analysis, root cepstral analysis, LDA(Linear Discriminant Analysis), PLP(Perceptual Linear Prediction), RASTA(RelAtive SpecTrAl) processing are carried out. An isolated word recognition system is composed using semi-continuous HMM. Noisy environment experiments usign two types of noises:exhibition hall, computer room are carried out at 0, 10, 20dB SNRs. The experimental result shows that SMC and root based mel cepstrum(root_mel cepstrum) show 9.86% and 12.68% recognition enhancement at 10dB in compare to the LPCC(Linear Prediction Cepstral Coefficient). And when combined with spectral subtraction, mel cepstrum and root_mel cepstrum show 16.7% and 8.4% enhanced recognition rate of 94.91% and 94.28% at 10dB.

  • PDF

A Study on a Generation of a Syllable Restoration Candidate Set and a Candidate Decrease (음절 복원 후보 집합의 생성과 후보 감소에 관한 연구)

  • 김규식;김경징;이상범
    • Journal of the Korea Computer Industry Society
    • /
    • v.3 no.12
    • /
    • pp.1679-1690
    • /
    • 2002
  • This paper, describe about a generation of a syllable restoration regulation for a post processing of a speech recognition and a decrease of a restoration candidate. It created a syllable restoration regulation to create a restoration candidate pronounced with phonetic value recognized through a post processing of the formula system that was a tone to recognize syllable unit phonetic value for a performance enhancement of a dialogue serial speech recognition. Also, I presented a plan to remove a regulation to create unused notation from a real life in a restoration regulation with a plan to reduce number candidate of a restoration meeting. A design implemented a restoration candidate set generator in order a syllable restoration regulation display that it created a proper restoration candidate set. The proper notation meeting that as a result of having proved about a standard pronunciation example and a word extracted from a pronunciation dictionary at random, the notation that an utterance was former was included in proved with what a generation became.

  • PDF

Enhancement of a language model using two separate corpora of distinct characteristics

  • Cho, Sehyeong;Chung, Tae-Sun
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.14 no.3
    • /
    • pp.357-362
    • /
    • 2004
  • Language models are essential in predicting the next word in a spoken sentence, thereby enhancing the speech recognition accuracy, among other things. However, spoken language domains are too numerous, and therefore developers suffer from the lack of corpora with sufficient sizes. This paper proposes a method of combining two n-gram language models, one constructed from a very small corpus of the right domain of interest, the other constructed from a large but less adequate corpus, resulting in a significantly enhanced language model. This method is based on the observation that a small corpus from the right domain has high quality n-grams but has serious sparseness problem, while a large corpus from a different domain has more n-gram statistics but incorrectly biased. With our approach, two n-gram statistics are combined by extending the idea of Katz's backoff and therefore is called a dual-source backoff. We ran experiments with 3-gram language models constructed from newspaper corpora of several million to tens of million words together with models from smaller broadcast news corpora. The target domain was broadcast news. We obtained significant improvement (30%) by incorporating a small corpus around one thirtieth size of the newspaper corpus.

음성통신을 위한 잡음처리 기술

  • Sin, Jong-Won;Jang, Jun-Hyeok;Kim, Nam-Su
    • Information and Communications Magazine
    • /
    • v.24 no.4
    • /
    • pp.27-35
    • /
    • 2007
  • 음성 통신을 할 때 배경 잡음이 존재하게 되면 일반적으로 음질이 저하된다. 이것은 잡음 자체가 듣기 싫다거나 음성을 더 작게 들리게 만들기 때문이기도 하고 음성 코덱이 잡음이 섞이지 않은 깨끗한 음성에 최적화되어 있어서 잡음이 섞인 음성에 대한 코딩 효율이 떨어지기 때문이기도 하다. 이 논문에서는 잡음에 의한 음성 통신의 품질 저하를 막기 위한 방법으로서 음성 향상(speech enhancement) 기술과 음성 강화(speech reinforcement) 기술에 대해 소개한다. 음성 향상 기술이란 전송부의 마이크에서 녹음된 잡음과 음성이 섞인 입력 음성으로부터 깨끗한 음성을 추정하는 기술을 말한다. 음성 향상 기술은 상당히 오랜 기간 동안 연구되어 온 기술이며, 최근에는 각 파라미터의 분포에 의존하는 방법보다 확률 모델에 기반한 방법이 각광을 받고 있으며 인간의 청각 특성을 고려한 음성 향상 방법도 제안되고 있다. 음성 강화 기술이란 수신단에서 주변 잡음에 따라 전송되어 온 음성을 주파수별로 증폭하여 더 잘 들리도록 만드는 기술이다. 음성 향상이 내 주위의 잡음이 상대방에게 들리는 음성에 미치는 영향 혹은 상대방 주변의 잡음이 나에게 들리는 소리에 미치는 영향을 줄여주는 기술이라면 음성 강화는 내 주위의 잡음이 나에게 들리는 음성에 미치는 영향을 상쇄해 주는 기술이다. 이 경우 주변 잡음은 어떤 전자 시스템도 거치지 않고 귀로 직접 들어오기 때문에 잡음 자체를 줄여 주는 것은 힘들고 전송되어 온 음성을 적절히 증폭 혹은 변형함으로써 귀에 들리는 음질 또는 명료성을 개선하게 된다. 이 논문에서는 통계 모델을 기반으로 한 음성 향상 기법과 인간의 청각 특성을 고려한 음성 향상 기법, 그리고 음성 강화 기법에 대해 설명한다.을 시도한 결과 안정적이고 반복 가능한 급성 심부전 모델을 얻을 수 있었다. bench scale실험결과와 같이 AOC는 배수관망에서의 박테리아 증식과 크게 상관관계를 갖고 있는 것으로 밝혀졌다.)', 'have a headache (2.10±0.79)', 'poor memory (2.09±0.83)', 'no appetite (1.99±0.85)', As for the correlation between iron parameter and clinical symptoms related to anemia, the hematocrit rate was negatively correlated with 'get a cold easily', 'pale face', 'feeling blue', 'difficult digestion' (p<0.05). The level of iron was negatively correlated with 'tired out easily', 'get a cold easily' (p<0.05) and TS (%) were negatively correlated with 'tired out easily (p<0.05)', 'get a cold easily (p<0.01). Our study resulted that the prevalence of a iron deficiency of a middle school girl is very high, therefore the guidelines for iron supplementation and nutritional education to improve their iron status should be provided.한 질소제거를 N-balance로부터

Adaptation Mode Controller for Adaptive Microphone Array System (마이크로폰 어레이를 위한 적응 모드 컨트롤러)

  • Jung Yang-Won;Kang Hong-Goo;Lee Chungyong;Hwang Youngsoo;Youn Dae Hee
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.29 no.11C
    • /
    • pp.1573-1580
    • /
    • 2004
  • In this paper, an adaptation mode controller for adaptive microphone array system is proposed for high-quality speech acquisition in real environments. To ensure proper adaptation of the adaptive array algorithm, the proposed adaptation mode controller uses not only temporal information, but also spatial information. The proposed adaptation mode controller is constructed with two processing stages: an initialization stage and a running stage. In the initialization stage, a sound source localization technique is adopted, and a signal correlation characteristic is used in the running stage. For the adaptive may algorithm, a generalized sidelobe canceller with an adaptive blocking matrix is used. The proposed adaptation mode controller can be used even when the adaptive blocking matrix is not adapted, and is much stable than the power ratio method. The proposed algorithm is evaluated in real environment, and simulation results show 13dB SINR improvement with the speaker sitting 2m distance from the may.

Performance Improvement of Speech Enhancement Using Independent Component Analysis and Perceptual Filtering (독립 성분 분석과 지각 필터를 이용한 음질 개선)

  • Koo, Kyo-Sik;Cha, Hyung-Tai
    • The Journal of the Acoustical Society of Korea
    • /
    • v.29 no.4
    • /
    • pp.270-277
    • /
    • 2010
  • In this paper, we proposed an algorithm that improves tone quality of noisy audio signals by using ICA(Independent Component Analysis) algorithm and perceptual filters. Many algorithms have been proposed to eliminate the noise from the audio signals, such as spectral subtraction method, perceptual filter, etc. The perceptual filter uses a noise that is acquired from silent ranges in the input signal. In this case, the improvement rate of tone quality decreases if the noise energy is changed by the environmental variation in a signal frame. But the proposed method estimates a noise that is changed at each frame using ICA algorithm. The estimated noise is applied to perceptual filter. To show the performance of the proposed algorithm, several tests are performed to various input signals. With the proposed algorithm, we could confirm the enhancement of tone quality in terms of segmental SNR (SSNR), noise-to-mask ratio (NMR) and Degradation Category Rating (DCR) test.

Audio Listening Enhancement in Adverse Environment based on Loudness Restoration (라우드니스 복원에 기반한 잡음 환경에서의 오디오 청취 향상)

  • Pak, Junhyeong;Shin, Jong Won
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.50 no.12
    • /
    • pp.210-216
    • /
    • 2013
  • It is hard to listen to the music clearly in the presence of background noise. In this paper, a method that modifies the audio signal automatically to enhance the audio listening experience in adverse environment is proposed. Specifically, the method that amplifies the audio signal so that the perceived loudness of audio signal in each band becomes similar to that of the noiseless signal. The loudness perception model proposed by Moore et. al is utilized. Extending the previous work that is applied to speech reinforcement, the full band signal sampled at 48kHz is manipulated based on the loudness restoration principle. Moreover, based on the observation that the audio clarity is compromised even with loudness restored signal, a modification that intentionally boosts high frequency loudness more than lower band is also proposed. Experimental results showed that the proposed algorithm can enhance the audio listening experience in adverse environment.

A Preprocessing Approach to Improving the Quality of the Music Produced by the EVRC (EVRC 코덱으로 재생하는 음악의 품질을 개선하기 위한 전처리 기법)

  • 남영한;하태균;전윤호;김재수;박섭형
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.28 no.5C
    • /
    • pp.476-485
    • /
    • 2003
  • This paper proposers a preprocessing approach to improving the quality of the music produced by the EVRC(enhanced variable rate codec) which is one of the CDMA(Code Division Multiple Access) voice codecs. Since the EVRC is optimized only for speech signals, it can deteriorate the quality of the music passed through it. One of the problems with the EVRC-coded music is time-clipping, which usually occurs when subsequent frames are encoded at Rate l/8. Since the EVRC determines the bit rate for an input frame based on the long-term prediction gain, we increase the long-term prediction gain in order for the most of the frames to be encoded at Rate 1 or Rate 1/2. Experimental results show that the approach works well on music signals and the number of time-clipped frames is considerably reduced.

Signal Enhancement of a Variable Rate Vocoder with a Hybrid domain SNR Estimator

  • Park, Hyung Woo
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.2
    • /
    • pp.962-977
    • /
    • 2019
  • The human voice is a convenient method of information transfer between different objects such as between men, men and machine, between machines. The development of information and communication technology, the voice has been able to transfer farther than before. The way to communicate, it is to convert the voice to another form, transmit it, and then reconvert it back to sound. In such a communication process, a vocoder is a method of converting and re-converting a voice and sound. The CELP (Code-Excited Linear Prediction) type vocoder, one of the voice codecs, is adapted as a standard codec since it provides high quality sound even though its transmission speed is relatively low. The EVRC (Enhanced Variable Rate CODEC) and QCELP (Qualcomm Code-Excited Linear Prediction), variable bit rate vocoders, are used for mobile phones in 3G environment. For the real-time implementation of a vocoder, the reduction of sound quality is a typical problem. To improve the sound quality, that is important to know the size and shape of noise. In the existing sound quality improvement method, the voice activated is detected or used, or statistical methods are used by the large mount of data. However, there is a disadvantage in that no noise can be detected, when there is a continuous signal or when a change in noise is large.This paper focused on finding a better way to decrease the reduction of sound quality in lower bit transmission environments. Based on simulation results, this study proposed a preprocessor application that estimates the SNR (Signal to Noise Ratio) using the spectral SNR estimation method. The SNR estimation method adopted the IMBE (Improved Multi-Band Excitation) instead of using the SNR, which is a continuous speech signal. Finally, this application improves the quality of the vocoder by enhancing sound quality adaptively.