• Title/Summary/Keyword: Speech Processing

Search Result 950, Processing Time 0.037 seconds

Improved Melody Recognition Performance of a Cochlear Implant Speech Processing Strategy Using Instantaneous Frequency Encoding Based on Teager Energy Operator

  • Choi, Sung-Jin;Ryu, Sang-Baek;Kim, Kyung-Hwan
    • Journal of Biomedical Engineering Research
    • /
    • v.31 no.6
    • /
    • pp.417-426
    • /
    • 2010
  • We present a speech processing strategy incorporating instantaneous frequency (IF) encoding for the enhancement of melody recognition performance of cochlear implants. For the IF extraction from incoming sound, we propose the use of a Teager energy operator (TEO), which is advantageous for its lower computational load. From time-frequency analysis, we verified that the TEO-based method provides proper IF encoding of input sound, which is crucial for melody recognition. Similar benefit could be obtained also from the use of a Hilbert transform (HT), but much higher computational cost was required. The melody recognition performance of the proposed speech processing strategy was compared with those of a conventional strategy using envelope extraction, and the HT-based IF encoding. Hearing tests on normal subjects were performed using acoustic simulation and a musical contour identification task. Insignificant difference in melody recognition performance was observed between the TEO-based and HT-based IF encodings, and both were superior to the conventional strategy. However, the TEO-based strategy was advantageous considering that it was approximately 35% faster than the HT-based strategy.

On-Line Linear Combination of Classifiers Based on Incremental Information in Speaker Verification

  • Huenupan, Fernando;Yoma, Nestor Becerra;Garreton, Claudio;Molina, Carlos
    • ETRI Journal
    • /
    • v.32 no.3
    • /
    • pp.395-405
    • /
    • 2010
  • A novel multiclassifier system (MCS) strategy is proposed and applied to a text-dependent speaker verification task. The presented scheme optimizes the linear combination of classifiers on an on-line basis. In contrast to ordinary MCS approaches, neither a priori distributions nor pre-tuned parameters are required. The idea is to improve the most accurate classifier by making use of the incremental information provided by the second classifier. The on-line multiclassifier optimization approach is applicable to any pattern recognition problem. The proposed method needs neither a priori distributions nor pre-estimated weights, and does not make use of any consideration about training/testing matching conditions. Results with Yoho database show that the presented approach can lead to reductions in equal error rate as high as 28%, when compared with the most accurate classifier, and 11% against a standard method for the optimization of linear combination of classifiers.

On a Detection for the Fundamental Frequency of Speech Signals (음성신호의기본주파수 검출)

  • 배명진
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1994.06c
    • /
    • pp.42-47
    • /
    • 1994
  • A pitch detector is an essential component in a variety of speech processing systems. Besides providing valuable insights into the nature of the exciation source for speech production, the pitch contour of an utterance is useful for recognizing speakers, aids-to-the handicapped, and is required in almost all speech analysis-synthesis system. Because of the importance of the pitch detection, a wide variety algorithms for pitch detection have been proposed in speech procesing literature. Thus, in this paper we discuss th evarious type of pitch detection algorithms which have been proposed until now. Then we provide th eperformance measurements for seven pitch detection algorithms.

  • PDF

On a Detection of the ZCR-Parameter for Higher Formants of Speech Signals (음성신호의 상위 포만트에 대한 ZCR-파라미터 검출에 관한 연구)

  • 유건수
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1992.06a
    • /
    • pp.49-53
    • /
    • 1992
  • In many applications such as speech analysis, speech coding, speech recognition, etc., the voiced-unvoiced decision should be performed correctly for efficient processing. One of the parameters which are used for voice-unvoiced decision is zero-crossing. But the information of higher formants have not represented as the zero-crossing rate for higher formants of speech signals.

  • PDF

On a detecting the transition segments of speech signal by energ approximatio degree of the synchronized pitch (피치 동기된 에너지 유사도에 의한 음성신호의 전이구간 검출)

  • 김종득;박형빈;김대호;배명진
    • Proceedings of the IEEK Conference
    • /
    • 1998.06a
    • /
    • pp.603-606
    • /
    • 1998
  • In a large number of words and the continued speech recognition system using a phoneme as teh recognition unit, it is necessary to segment processing. In this paper, a normalized AMDF new method. The suggested parameter represents a degree of sharpness at valley point. This method can detect the speech segment between the steady state and transient region to the continued speech without a prior information of speech signal.

  • PDF

Recognition of Emotion and Emotional Speech Based on Prosodic Processing

  • Kim, Sung-Ill
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.3E
    • /
    • pp.85-90
    • /
    • 2004
  • This paper presents two kinds of new approaches, one of which is concerned with recognition of emotional speech such as anger, happiness, normal, sadness, or surprise. The other is concerned with emotion recognition in speech. For the proposed speech recognition system handling human speech with emotional states, total nine kinds of prosodic features were first extracted and then given to prosodic identifier. In evaluation, the recognition results on emotional speech showed that the rates using proposed method increased more greatly than the existing speech recognizer. For recognition of emotion, on the other hands, four kinds of prosodic parameters such as pitch, energy, and their derivatives were proposed, that were then trained by discrete duration continuous hidden Markov models(DDCHMM) for recognition. In this approach, the emotional models were adapted by specific speaker's speech, using maximum a posteriori(MAP) estimation. In evaluation, the recognition results on emotional states showed that the rates on the vocal emotions gradually increased with an increase of adaptation sample number.

Creation of Speech Corpora for STiLL at SiTEC (SiTEC의 STiLL관련 음성 코퍼스의 구축 현황)

  • Kim, Young-Il;Kim, Bong-Wan;Choi, Dae-Lim;Lee, Kwang-Hyun;Jeong, Eun-Soon;Lee, Yong-Ju
    • Proceedings of the KSPS conference
    • /
    • 2005.11a
    • /
    • pp.13-16
    • /
    • 2005
  • As language learning that utilizes speech and information processing technology is getting popular. Speech Information Technology & Promotion Center(SiTEC) has created and is distributing speech corpora for STiLL in order to support basic research and development of products. We will introduce the corpus for Korean and those for English which we have created and are distributing.

  • PDF

Background Noise Classification in Noisy Speech of Short Time Duration Using Improved Speech Parameter (개량된 음성매개변수를 사용한 지속시간이 짧은 잡음음성 중의 배경잡음 분류)

  • Choi, Jae-Seung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.20 no.9
    • /
    • pp.1673-1678
    • /
    • 2016
  • In the area of the speech recognition processing, background noises are caused the incorrect response to the speech input, therefore the speech recognition rates are decreased by the background noises. Accordingly, a more high level noise processing techniques are required since these kinds of noise countermeasures are not simple. Therefore, this paper proposes an algorithm to distinguish between the stationary background noises or non-stationary background noises and the speech signal having short time duration in the noisy environments. The proposed algorithm uses the characteristic parameter of the improved speech signal as an important measure in order to distinguish different types of the background noises and the speech signals. Next, this algorithm estimates various kinds of the background noises using a multi-layer perceptron neural network. In this experiment, it was experimentally clear the estimation of the background noises and the speech signals.

A Generalized Subspace Approach for Enhancing Speech Corrupted by Colored Noise Using Whitening Transformation (유색 잡음에 오염된 음성의 향상을 위한 백색 변환을 이용한 일반화 부공간 접근)

  • Lee, Jeong-Wook;Son, Kyung-Sik;Park, Jang-Sik;Kim, Hyun-Tae
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.15 no.8
    • /
    • pp.1665-1674
    • /
    • 2011
  • In this paper, we proposed an algorithm for speech enhancement of speeches corrupted by colored noise. When there is no correlation between colored noise and speech signal, the colored noise turns into white noise through whitening transformation. This transformed signal has been applied to the generalized subspace approach for speech enhancement. The speech spectral distortion, produced by the whitening transformation as pre-processing, has been restored by using the inverse whitening transformation as post-processing of the proposed algorithm. The performance of the proposed algorithm for speech enhancement has been confirmed by computer simulation. The colored noises used in this experiment were car noise and multi-talker babble. It is confirmed that the proposed algorithm shows better performance from SNR and SSD viewpoint over the previous approach with the data from the AURORA and TIMIT data base.

DSP Implementation of Speech Enhancement System Using Microphone Array with Adaptive Post-processing (적응 후처리 과정을 갖는 마이크로폰 배열을 이용한 잡음제거기의 DSP 구현)

  • 권홍석;김시호;배건성
    • Proceedings of the IEEK Conference
    • /
    • 2002.06d
    • /
    • pp.413-416
    • /
    • 2002
  • In this paper, a speech enhancement system using microphone array with adaptive Post-Processing is implemented in real-lime with TMS320C6201 DSP. It consists of delay-and-sum beamformer and adaptive post-processing filters with NLMS (Normalized Least Mean Square) algorithm. THS1206 ADC is used for collection of 4-channel microphone signals. Sizes of program memory, data ROM and data RAM of the implemented system are 15,744, 748 and 47,540 bytes, respectively. Finally 21.839${\times}$106 clocks per second is required for real-time operation.

  • PDF