• Title/Summary/Keyword: Speech Separation

Search Result 89, Processing Time 0.028 seconds

Robust Blind Source Separation to Noisy Environment For Speech Recognition in Car (차량용 음성인식을 위한 주변잡음에 강건한 브라인드 음원분리)

  • Kim, Hyun-Tae;Park, Jang-Sik
    • The Journal of the Korea Contents Association
    • /
    • v.6 no.12
    • /
    • pp.89-95
    • /
    • 2006
  • The performance of blind source separation(BSS) using independent component analysis (ICA) declines significantly in a reverberant environment. A post-processing method proposed in this paper was designed to remove the residual component precisely. The proposed method used modified NLMS(normalized least mean square) filter in frequency domain, to estimate cross-talk path that causes residual cross-talk components. Residual cross-talk components in one channel is correspond to direct components in another channel. Therefore, we can estimate cross-talk path using another channel input signals from adaptive filter. Step size is normalized by input signal power in conventional NLMS filter, but it is normalized by sum of input signal power and error signal power in modified NLMS filter. By using this method, we can prevent misadjustment of filter weights. The estimated residual cross-talk components are subtracted by non-stationary spectral subtraction. The computer simulation results using speech signals show that the proposed method improves the noise reduction ratio(NRR) by approximately 3dB on conventional FDICA.

  • PDF

Independent Component Analysis on a Subband Domain for Robust Speech Recognition (음성의 특징 단계에 독립 요소 해석 기법의 효율적 적용을 통한 잡음 음성 인식)

  • Park, Hyeong-Min;Jeong, Ho-Yeong;Lee, Tae-Won;Lee, Su-Yeong
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.37 no.6
    • /
    • pp.22-31
    • /
    • 2000
  • In this paper, we propose a method for removing noise components in the feature extraction process for robust speech recognition. This method is based on blind separation using independent component analysis (ICA). Given two noisy speech recordings the algorithm linearly separates speech from the unwanted noise signal. To apply ICA as closely as possible to the feature level for recognition, a new spectral analysis is presented. It modifies the computation of band energies by previously averaging out fast Fourier transform (FFT) points in several divided ranges within one met-scaled band. The simple analysis using sample variances of band energies of speech and noise, and recognition experiments showed its noise robustness. For noisy speech signals recorded in real environments, the proposed method which applies ICA to the new spectral analysis improved the recognition performances to a considerable extent, and was particularly effective for low signal-to-noise ratios (SNRs). This method gives some insights into applying ICA to feature levels and appears useful for robust speech recognition.

  • PDF

Automatic Vowel Sequence Reproduction for a Talking Robot Based on PARCOR Coefficient Template Matching

  • Vo, Nhu Thanh;Sawada, Hideyuki
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.5 no.3
    • /
    • pp.215-221
    • /
    • 2016
  • This paper describes an automatic vowel sequence reproduction system for a talking robot built to reproduce the human voice based on the working behavior of the human articulatory system. A sound analysis system is developed to record a sentence spoken by a human (mainly vowel sequences in the Japanese language) and to then analyze that sentence to give the correct command packet so the talking robot can repeat it. An algorithm based on a short-time energy method is developed to separate and count sound phonemes. A matching template using partial correlation coefficients (PARCOR) is applied to detect a voice in the talking robot's database similar to the spoken voice. Combining the sound separation and counting the result with the detection of vowels in human speech, the talking robot can reproduce a vowel sequence similar to the one spoken by the human. Two tests to verify the working behavior of the robot are performed. The results of the tests indicate that the robot can repeat a sequence of vowels spoken by a human with an average success rate of more than 60%.

A Comparison of Resonance Parameters before and after Pharyngeal Flap Surgery:A Preliminary Report (인두피판술 전.후의 공명파라미터의 비교: 예비연구)

  • Kang, Young-Ae;Kang, Nak-Heon;Lee, Tae-Yong;Seong, Cheol-Jae
    • Phonetics and Speech Sciences
    • /
    • v.1 no.3
    • /
    • pp.133-144
    • /
    • 2009
  • Pharyngeal flap surgery changes the space and shape of the oral cavity and vocal tract, and these changing conditions bring resonance change. The purpose of this study was to determine the most reliable and valuable parameters for evaluating hypernasality to distinguish two patients before and after pharyngeal flap surgery. Each patient was asked to clearly speak the vowels /a/, /i/, /u/, /e/, /o/ for voice recording. There were nine parameters: Formant (F1, F2, F3), Bandwidth (BW1, BW2, BW3), LPC energy slope ($\Delta$ |A2-A1/F2-F1|), and Band Energy (0-500 Hz, 500-1000 Hz) by each vowel. From the results of discrimination analyses on acoustic parameters, the vowels /a/, /e/ appeared to be insignificant but vowels /i/, /u/, /o/ appeared to be efficient in the separation. A 95%, 100%, and 100% recognition score could be reached when vowels /i/, /u/, and /o/ were analyzed. The results showed that F2, BW3, and LPC slope are more important parameters than the others. Finally, there is a relation between perceptual evaluation score and LPC energy slope of acoustic parameters by least square slope.

  • PDF

A Study on the Realization of Wireless Home Network System Using High-performance Speech Recognition in Variable Position (가변위치 고음성인식 기술을 이용한 무선 홈 네트워크 시스템 구현에 관한 연구)

  • Yoon, Jun-Chul;Choi, Sang-Bang;Park, Chan-Sub;Kim, Se-Yong;Kim, Ki-Man;Kang, Suk-Youb
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.14 no.4
    • /
    • pp.991-998
    • /
    • 2010
  • In realization of wireless home network system using speech recognition in indoor voice recognition environment, background noise and reverberation are two main causes of digression in voice recognition system. In this study, the home network system resistant to reverberation and background noise using voice section detection method based on spectral entropy in indoor recognition environment is to be realized. Spectral subtraction can reduce the effect of reverberation and remove noise independent from voice signal by eliminating signal distorted by reverberation in spectrum. For effective spectral subtraction, the correct separation of voice section and silent section should be accompanied and for this, improvement of performance needs to be done, applying to voice section detection method based on entropy. In this study, experimental and indoor environment testing is carried out to figure out command recognition rate in indoor recognition environment. The test result shows that command recognition rate improved in static environment and reverberant room condition, using voice section detection method based on spectral entropy.

Laryngotracheal Separation in Patient with Chronic Intractable Aspiration (후두기관 분리술로 치료한 만성 흡인 15례)

  • Kong, Il-Gyu;An, Soo-Youn;Kim, Bong-Jik;Jung, Eun-Jung;Lee, Myung-Chul;Kwon, Tack-Kyun;Sung, Myung-Whun;Kim, Kwang-Hyun
    • Korean Journal of Bronchoesophagology
    • /
    • v.13 no.1
    • /
    • pp.23-28
    • /
    • 2007
  • Background and Objectives: Since intractable aspiration in patients with impaired protective function of the larynx often results in multiple episode of aspiration pneumonia, repeated hospitalizations and expensive nursing care. The authors reported the preliminary results of laryngotracheal separation(LTS) in patient with chronic intractable aspiration. The purpose of this study was to report the follow up results of patient outcome with the LTS. Materials and Methods: A retrospective review of 15 patients who underwent LTS between 1996 and 2006 was conducted. Ages ranged from 3 to 72 years. Results: Eight patients had morbid aspiration as a consequence of acquired neurologic injuries and seven patients with congenital neurologic injuries. Two patients had a postoperative fistula, which was well controlled with local wound care. Following LTS, aspiration was effectively controlled in all patients and eight were able to tolerate a regular diet. Conclusion: LTS is a low-risk, successful, definitive procedure which decreases the potential for aspiration, pulmonary complications, duration of hospitalizations and increases quality of life, especially in patent with irreversible upper airway dysfunction and poor speech potential.

  • PDF

Multi-dimensional Representation and Correlation Analyses of Acoustic Cues for Stops (폐쇄음 음향 단서의 다차원 표현과 상관관계 분석)

  • Yun, Weon-Hee
    • MALSORI
    • /
    • v.55
    • /
    • pp.45-60
    • /
    • 2005
  • The purpose of this paper is to represent values of acoustic cues for Korean oral stops in the multi-dimensional space, and to attempt to find possible relationships among acoustic cues through correlation analyses. The acoustic cues used for differentiation of 3 types of Korean stops are closure duration, voice onset time and fundamental frequency of a vowel after a stop. The values of these cues are plotted in the two and three dimensional space to see what the critical cues are for separation of different types of stops. Correlation coefficient analyses show that multi-variate approach to statistical analysis is legitimate, and that there are statistically significant relationships among acoustic cues but Oey are not strong enough to make the conjecture that there is a possible relationship among the articulatory or laryngeal mechanisms employed by the acoustic cues.

  • PDF

Generalization of DUET using neighborhood relationship (Neighborhood 관계를 이용한 DUET Generalization)

  • Woo, Sung-Min;Jeong, Hong
    • Proceedings of the IEEK Conference
    • /
    • 2008.06a
    • /
    • pp.1017-1018
    • /
    • 2008
  • In this paper, we propose a method that makes use of neighborhood relationship in 2D spectrogram of separated sources toward the generalization of the binary mask in Degenerate Unmixing Estimation Technique (DUET). A new generalized mask can be consist of five to ten mask. According to the new mask, the original power of the spectrogram in each frequency-time point is assigned. The result showed a smooth and tender wave-form, indicating a high speech separation performance compared to the original method.

  • PDF

Multi-dimenstional Representation of Acoustic Cues for Korean Stops (한국어 폐쇄음 음향단서의 다차원 표현)

  • Yun, Weon-Hee
    • Proceedings of the KSPS conference
    • /
    • 2005.04a
    • /
    • pp.25-28
    • /
    • 2005
  • The purpose of this paper is to represent values of acoustic cues for Korean oral stops in the multi-dimensional space, and to attempt to find possible relationships among acoustic cues through correlation coefficient analyses. The acoustic cues used for differentiation of 3 types of Korean stops are closure duration, voice onset time and fundamental frequency of a vowel after a stop. The values of these cues are plotted in the two and three dimensional space and see what the critical cues are for complete separation of different types of stops. Correlation coefficient analyses show that there are statistically significant relationships among acoustic cues but they are not strong enough to make a conjecture that there is a possible articulatory relationship among the mechanisms employed by the acoustic cues.

  • PDF

A Study on MLP Neural Network Architecture and Feature Extraction for Korean Syllable Recognition (한국어 음절 인식을 위한 MLP 신경망 구조 및 특징 추출에 관한 연구)

  • 금지수;이현수
    • Proceedings of the IEEK Conference
    • /
    • 1999.11a
    • /
    • pp.672-675
    • /
    • 1999
  • In this paper, we propose a MLP neural network architecture and feature extraction for Korean syllable recognition. In the proposed syllable recognition system, firstly onset is classified by onset classification neural network. And the results information of onset classification neural network are used for feature selection of imput patterns vector. The feature extraction of Korean syllables is based on sonority. Using the threshold rate separate the syllable. The results of separation are used for feature of onset. nucleus and coda. ETRI's SAMDORI has been used by speech DB. The recognition rate is 96% in the speaker dependent and 93.3% in the speaker independent.

  • PDF