• Title/Summary/Keyword: speech signal

Search Result 1,175, Processing Time 0.036 seconds

Reduction of Environmental Background Noise using Speech and Noise Recognition (음성 및 잡음 인식 알고리즘을 이용한 환경 배경잡음의 제거)

  • Choi, Jae-Seung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.15 no.4
    • /
    • pp.817-822
    • /
    • 2011
  • This paper first proposes the speech recognition algorithm by detection of the speech and noise sections at each frame using a neural network training by back-propagation algorithm, then proposes the spectral subtraction method which removes the noises at each frame according to detection of the speech and noise sections. In this experiment, the performance of the proposed recognition system was evaluated based on the recognition rate using various speeches that are degraded by white noise and car noise. Moreover, experimental results of the noise reduction by the spectral subtraction method demonstrate using the speech and noise sections detecting by the speech recognition algorithm at each frame. Based on measuring signal-to-noise ratio, experiments confirm that the proposed algorithm is effective for the speech by corrupted the noise using signal-to-noise ratio.

Gain Compensation Method for Codebook-Based Speech Enhancement (코드북 기반 음성향상 기법을 위한 게인 보상 방법)

  • Jung, Seungmo;Kim, Moo Young
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.51 no.9
    • /
    • pp.165-170
    • /
    • 2014
  • Speech enhancement techniques that remove surrounding noise are stressed to preprocessor of speech recognition. Among the various speech enhancement techniques, Codebook-based Speech Enhancement (CBSE) operates efficiently in non-stationary noise environments. But, CBSE has some problems that inaccurate gains can be estimated if mismatch occur between input noisy signal and trained speech/noise codevectors. In this paper, the Normalized Weighting Factor (NWF) is calculated by long-term noise estimation algorithm based on Signal-to-Noise Ratio, compensated to the conventional inaccurate gains. The proposed CBSE shows better performance than conventional CBSE.

Perception of Tamil Mono-Syllabic and Bi-Syllabic Words in Multi-Talker Speech Babble by Young Adults with Normal Hearing

  • Gnanasekar, Sasirekha;Vaidyanath, Ramya
    • Journal of Audiology & Otology
    • /
    • v.23 no.4
    • /
    • pp.181-186
    • /
    • 2019
  • Background and Objectives: This study compared the perception of mono-syllabic and bisyllabic words in Tamil by young normal hearing adults in the presence of multi-talker speech babble at two signal-to-noise ratios (SNRs). Further for this comparison, a speech perception in noise test was constructed using existing mono-syllabic and bi-syllabic word lists in Tamil. Subjects and Methods: A total of 30 participants with normal hearing in the age range of 18 to 25 years participated in the study. Speech-in-noise test in Tamil (SPIN-T) constructed using mono-syllabic and bi-syllabic words in Tamil was used as stimuli. The stimuli were presented in the background of multi-talker speech babble at two SNRs (0 dB and +10 dB SNR). Results: The effect of noise on SPIN-T varied with SNR. All the participants performed better at +10 dB SNR, the higher of the two SNRs considered. Additionally, at +10 dB SNR performance did not vary significantly for neither mono-syllabic or bi-syllabic words. However, a significant difference existed at 0 dB SNR. Conclusions: The current study indicated that higher SNR leads to better performance. In addition, bi-syllabic words were identified with minimal errors compared to mono-syllabic words. Spectral cues were the most affected in the presence of noise leading to more of place of articulation errors for both mono-syllabic and bi-syllabic words.

Perception of Tamil Mono-Syllabic and Bi-Syllabic Words in Multi-Talker Speech Babble by Young Adults with Normal Hearing

  • Gnanasekar, Sasirekha;Vaidyanath, Ramya
    • Korean Journal of Audiology
    • /
    • v.23 no.4
    • /
    • pp.181-186
    • /
    • 2019
  • Background and Objectives: This study compared the perception of mono-syllabic and bisyllabic words in Tamil by young normal hearing adults in the presence of multi-talker speech babble at two signal-to-noise ratios (SNRs). Further for this comparison, a speech perception in noise test was constructed using existing mono-syllabic and bi-syllabic word lists in Tamil. Subjects and Methods: A total of 30 participants with normal hearing in the age range of 18 to 25 years participated in the study. Speech-in-noise test in Tamil (SPIN-T) constructed using mono-syllabic and bi-syllabic words in Tamil was used as stimuli. The stimuli were presented in the background of multi-talker speech babble at two SNRs (0 dB and +10 dB SNR). Results: The effect of noise on SPIN-T varied with SNR. All the participants performed better at +10 dB SNR, the higher of the two SNRs considered. Additionally, at +10 dB SNR performance did not vary significantly for neither mono-syllabic or bi-syllabic words. However, a significant difference existed at 0 dB SNR. Conclusions: The current study indicated that higher SNR leads to better performance. In addition, bi-syllabic words were identified with minimal errors compared to mono-syllabic words. Spectral cues were the most affected in the presence of noise leading to more of place of articulation errors for both mono-syllabic and bi-syllabic words.

A study on the interactive speech recognition mobile robot (대화형 음성인식 이동로봇에 관한 연구)

  • 이재영;윤석현;홍광석
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.33B no.11
    • /
    • pp.97-105
    • /
    • 1996
  • This paper is a study on the implementation of speech recognition mobile robot to which the interactive speech recognition techniques is applied. The speech command uttered the sentential connected word and is asserted through the wireless mic system. This speech signal transferred LPC-cepstrum and shorttime energy which are computed from the received signal on the DSP board to notebook PC. In notebook PC, DP matching technique is used for recognizer and the recognition results are transferred to the motor control unit which output pulse signals corresponding to the recognized command and drive the stepping motor. Grammar network applied to reduce the recognition speed of the recogniger, so that real time recognition is realized. The misrecognized command is revised by interface revision through the conversation with mobile robot. Therefore, user can move the mobile robot to the direction which user wants.

  • PDF

A Fixed Rate Speech Coder Based on the Filter Bank Method and the Inflection Point Detection

  • Iem, Byeong-Gwan
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.16 no.4
    • /
    • pp.276-280
    • /
    • 2016
  • A fixed rate speech coder based on the filter bank and the non-uniform sampling technique is proposed. The non-uniform sampling is achieved by the detection of inflection points (IPs). A speech block is band passed by the filter bank, and the subband signals are processed by the IP detector, and the detected IP patterns are compared with entries of the IP database. For each subband signal, the address of the closest member of the database and the energy of the IP pattern are transmitted through channel. In the receiver, the decoder recovers the subband signals using the received addresses and the energy information, and reconstructs the speech via the filter bank summation. As results, the coder shows fixed data rate contrary to the existing speech coders based on the non-uniform sampling. Through computer simulation, the usefulness of the proposed technique is confirmed. The signal-to-noise ratio (SNR) performance of the proposed method is comparable to that of the uniform sampled pulse code modulation (PCM) below 20 kbps data rate.

Speaker Separation Based on Directional Filter and Harmonic Filter (Directional Filter와 Harmonic Filter 기반 화자 분리)

  • Baek, Seung-Eun;Kim, Jin-Young;Na, Seung-You;Choi, Seung-Ho
    • Speech Sciences
    • /
    • v.12 no.3
    • /
    • pp.125-136
    • /
    • 2005
  • Automatic speech recognition is much more difficult in real world. Speech recognition according to SIR (Signal to Interface Ratio) is difficult in situations in which noise of surrounding environment and multi-speaker exists. Therefore, study on main speaker's voice extractions a very important field in speech signal processing in binaural sound. In this paper, we used directional filter and harmonic filter among other existing methods to extract the main speaker's information in binaural sound. The main speaker's voice was extracted using directional filter, and other remaining speaker's information was removed using harmonic filter through main speaker's pitch detection. As a result, voice of the main speaker was enhanced.

  • PDF

The Pattern Recognition Methods for Emotion Recognition with Speech Signal (음성신호를 이용한 감성인식에서의 패턴인식 방법)

  • Park Chang-Hyun;Sim Kwee-Bo
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.12 no.3
    • /
    • pp.284-288
    • /
    • 2006
  • In this paper, we apply several pattern recognition algorithms to emotion recognition system with speech signal and compare the results. Firstly, we need emotional speech databases. Also, speech features for emotion recognition is determined on the database analysis step. Secondly, recognition algorithms are applied to these speech features. The algorithms we try are artificial neural network, Bayesian learning, Principal Component Analysis, LBG algorithm. Thereafter, the performance gap of these methods is presented on the experiment result section. Truly, emotion recognition technique is not mature. That is, the emotion feature selection, relevant classification method selection, all these problems are disputable. So, we wish this paper to be a reference for the disputes.

A Multimodal Emotion Recognition Using the Facial Image and Speech Signal

  • Go, Hyoun-Joo;Kim, Yong-Tae;Chun, Myung-Geun
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.5 no.1
    • /
    • pp.1-6
    • /
    • 2005
  • In this paper, we propose an emotion recognition method using the facial images and speech signals. Six basic emotions including happiness, sadness, anger, surprise, fear and dislike are investigated. Facia] expression recognition is performed by using the multi-resolution analysis based on the discrete wavelet. Here, we obtain the feature vectors through the ICA(Independent Component Analysis). On the other hand, the emotion recognition from the speech signal method has a structure of performing the recognition algorithm independently for each wavelet subband and the final recognition is obtained from the multi-decision making scheme. After merging the facial and speech emotion recognition results, we obtained better performance than previous ones.

The Pattern Recognition Methods for Emotion Recognition with Speech Signal (음성신호를 이용한 감성인식에서의 패턴인식 방법)

  • Park Chang-Hyeon;Sim Gwi-Bo
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2006.05a
    • /
    • pp.347-350
    • /
    • 2006
  • In this paper, we apply several pattern recognition algorithms to emotion recognition system with speech signal and compare the results. Firstly, we need emotional speech databases. Also, speech features for emotion recognition is determined on the database analysis step. Secondly, recognition algorithms are applied to these speech features. The algorithms we try are artificial neural network, Bayesian learning, Principal Component Analysis, LBG algorithm. Thereafter, the performance gap of these methods is presented on the experiment result section. Truly, emotion recognition technique is not mature. That is, the emotion feature selection, relevant classification method selection, all these problems are disputable. So, we wish this paper to be a reference for the disputes.

  • PDF