• Title/Summary/Keyword: Speech signal processing

Search Result 331, Processing Time 0.023 seconds

Speech Query Recognition for Tamil Language Using Wavelet and Wavelet Packets

  • Iswarya, P.;Radha, V.
    • Journal of Information Processing Systems
    • /
    • v.13 no.5
    • /
    • pp.1135-1148
    • /
    • 2017
  • Speech recognition is one of the fascinating fields in the area of Computer science. Accuracy of speech recognition system may reduce due to the presence of noise present in speech signal. Therefore noise removal is an essential step in Automatic Speech Recognition (ASR) system and this paper proposes a new technique called combined thresholding for noise removal. Feature extraction is process of converting acoustic signal into most valuable set of parameters. This paper also concentrates on improving Mel Frequency Cepstral Coefficients (MFCC) features by introducing Discrete Wavelet Packet Transform (DWPT) in the place of Discrete Fourier Transformation (DFT) block to provide an efficient signal analysis. The feature vector is varied in size, for choosing the correct length of feature vector Self Organizing Map (SOM) is used. As a single classifier does not provide enough accuracy, so this research proposes an Ensemble Support Vector Machine (ESVM) classifier where the fixed length feature vector from SOM is given as input, termed as ESVM_SOM. The experimental results showed that the proposed methods provide better results than the existing methods.

IMBE Model Based SNR Estimation of Continuous Speech Signals (연속음성신호에서 IMBE 모델을 이용한 SNR 추정 연구)

  • Park, Hyung-Woo;Bae, Myung-Jin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.29 no.2
    • /
    • pp.148-153
    • /
    • 2010
  • In speech signal processing, speech signal corrupted by noise should be enhanced to improve quality. Usually noise estimation methods need flexibility for variable environment. Noise profile is renewed on silence region to avoid effects of speech properties. So we have to preprocess finding voice region before noise estimation. However, if received signal does not have silence region, we cannot apply that method. In this paper, we proposed SNR estimation method for continuous speech signal. A Speech signal consists of Voice and Unvoiced Band in The MBE excitation model. And the energy of speech signal is mostly distributed on voiced region, so we can estimate SNR by the ratio of voiced region energy to unvoiced. We use the IMBE vocoder for the Voice or Unvoice band of segmented speech signal. Continuously we calculate the segmented SNR using that information and the energy of each band. And we estimate the SNR of continuous speech signal.

A car number retrieving system using speech recognition for PDA (PDA상에서 음성인식을 이용한 차량번호 조회시스템)

  • 김우성;김동환;윤재선;홍광석
    • Proceedings of the Korea Institute of Convergence Signal Processing
    • /
    • 2001.06a
    • /
    • pp.281-284
    • /
    • 2001
  • In this paper, we present a car number retrieving system using speech recogntion and speech synthesis for PDA. This system consist of 4-digit numbers and command speech recognition as well its speech synthesis. Experiment results showed 4-digit numbers recognition rate 97% and commands recognition 99% through speaker-independent method.

  • PDF

Fixed Point Implementation of the QCELP Speech Coder

  • Yoon, Byung-Sik;Kim, Jae-Won;Lee, Won-Myoung;Jang, Seok-Jin;Choi, Song_in;Lim, Myoung-Seon
    • ETRI Journal
    • /
    • v.19 no.3
    • /
    • pp.242-258
    • /
    • 1997
  • The Qualcomm code excited linear prediction (QCELP) speech coder was adopted to increase the capacity of the CDMA Mobile System (CMS). In this paper, we implemented the QCELP speech coding algorithm by using TMS320C50 fixed point DSP chip. Also the fixed point simulation was done with C language. The computation complexity of QCELP on TMS320C50 was 10k words and data memory was 4k words. In the normal call test on the CMS, where mobile to mobile call test was done in the bypass mode without double vocoding, mean opinion score for the speech quality was he Qualcomm code excited linear prediction (QCELP) speech quality was 3.11.

  • PDF

Korean Speech Segmentation and Recognition by Frame Classification via GMM (GMM을 이용한 프레임 단위 분류에 의한 우리말 음성의 분할과 인식)

  • 권호민;한학용;고시영;허강인
    • Proceedings of the Korea Institute of Convergence Signal Processing
    • /
    • 2003.06a
    • /
    • pp.18-21
    • /
    • 2003
  • In general it has been considered to be the difficult problem that we divide continuous speech into short interval with having identical phoneme quality. In this paper we used Gaussian Mixture Model (GMM) related to probability density to divide speech into phonemes, an initial, medial, and final sound. From them we peformed continuous speech recognition. Decision boundary of phonemes is determined by algorithm with maximum frequency in a short interval. Recognition process is performed by Continuous Hidden Markov Model(CHMM), and we compared it with another phoneme divided by eye-measurement. For the experiments result we confirmed that the method we presented is relatively superior in auto-segmentation in korean speech.

  • PDF

Dimension Reduction Method of Speech Feature Vector for Real-Time Adaptation of Voice Activity Detection (음성구간 검출기의 실시간 적응화를 위한 음성 특징벡터의 차원 축소 방법)

  • Park Jin-Young;Lee Kwang-Seok;Hur Kang-In
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.7 no.3
    • /
    • pp.116-121
    • /
    • 2006
  • In this paper, we propose the dimension reduction method of multi-dimension speech feature vector for real-time adaptation procedure in various noisy environments. This method which reduces dimensions non-linearly to map the likelihood of speech feature vector and noise feature vector. The LRT(Likelihood Ratio Test) is used for classifying speech and non-speech. The results of implementation are similar to multi-dimensional speech feature vector. The results of speech recognition implementation of detected speech data are also similar to multi-dimensional(10-order dimensional MFCC(Mel-Frequency Cepstral Coefficient)) speech feature vector.

  • PDF

Introduction of ETRI Broadcast News Speech Recognition System (ETRI 방송뉴스음성인식시스템 소개)

  • Park Jun
    • Proceedings of the KSPS conference
    • /
    • 2006.05a
    • /
    • pp.89-93
    • /
    • 2006
  • This paper presents ETRI broadcast news speech recognition system. There are two major issues on the broadcast news speech recognition: 1) real-time processing and 2) out-of-vocabulary handling. For real-time processing, we devised the dual decoder architecture. The input speech signal is segmented based on the long-pause between utterances, and each decoder processes the speech segment alternatively. One decoder can start to recognize the current speech segment without waiting for the other decoder to recognize the previous speech segment completely. Thus, the processing delay is not accumulated. For out-of-vocabulary handling, we updated both the vocabulary and the language model, based on the recent news articles on the internet. By updating the language model as well as the vocabulary, we can improve the performance up to 17.2% ERR.

  • PDF

A Study on Realization of Speech Recognition System based on VoiceXML for Railroad Reservation Service (철도예약서비스를 위한 VoiceXML 기반의 음성인식 구현에 관한 연구)

  • Kim, Beom-Seung;Kim, Soon-Hyob
    • Journal of the Korean Society for Railway
    • /
    • v.14 no.2
    • /
    • pp.130-136
    • /
    • 2011
  • This paper suggests realization method for real-time speech recognition using VoiceXML in telephony environment based on SIP for Railroad Reservation Service. In this method, voice signal incoming through PSTN or Internet is treated as dialog using VoiceXML and the transferred voice signal is processed by Speech Recognition System, and the output is returned to dialog of VoiceXML which is transferred to users. VASR system is constituted of dialog server which processes dialog, APP server for processing voice signal, and Speech Recognition System to process speech recognition. This realizes transfer method to Speech Recognition System in which voice signal is recorded using Record Tag function of VoiceXML to process voice signal in telephony environment and it is played in real time.

A Study on SNR Estimation of Continuous Speech Signal (연속음성신호의 SNR 추정기법에 관한 연구)

  • Song, Young-Hwan;Park, Hyung-Woo;Bae, Myung-Jin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.4
    • /
    • pp.383-391
    • /
    • 2009
  • In speech signal processing, speech signal corrupted by noise should be enhanced to improve quality. Usually noise estimation methods need flexibility for variable environment. Noise profile is renewed on silence region to avoid effects of speech properties. So we have to preprocess finding voice region before noise estimation. However, if received signal does not have silence region, we cannot apply that method. In this paper, we proposed SNR estimation method for continuous speech signal. The waveform which is stationary region of voiced speech is very correlated by pitch period. So we can estimate the SNR by correlation of near waveform after dividing a frame for each pitch. For unvoiced speech signal, vocal track characteristic is reflected by noise, so we can estimate SNR by using spectral distance between spectrum of received signal and estimated vocal track. Lastly, energy of speech signal is mostly distributed on voiced region, so we can estimate SNR by the ratio of voiced region energy to unvoiced.

A Study on Objective Speech Quality Measure under CDMA Telephone Networks Environment (CDMA 통신망에서의 객관적 음질 평가 척도에 관한 연구)

  • 김광수;김민정;석수영;정호열;정현열
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.2 no.4
    • /
    • pp.53-58
    • /
    • 2001
  • In this paper to develop objective speech quality measure for CDMA telephone network environments, recent developed measures are investigated first. But those measures show low performances in CDMA telephone networks. To solve this problem, new objective speech quality measure adopting noise masking threshold is proposed and studied. To acquire better performance, scaled noise masking threshold calculation for speech signals is employed instead of conventional tone signals. To verify effectiveness of proposed method performance comparison experiments are carried out for CDMA telephone network speech databases, for the results proposed methods show improved performances compared to existing meaures.

  • PDF