• Title/Summary/Keyword: cepstral mean

Search Result 59, Processing Time 0.025 seconds

Spectral and Cepstral Analyses of Esophageal Speakers (식도발성화자 음성의 spectral & cepstral 분석)

  • Shim, Hee-Jeong;Jang, Hyo-Ryung;Shin, Hee-Baek;Ko, Do-Heung
    • Phonetics and Speech Sciences
    • /
    • v.6 no.2
    • /
    • pp.47-54
    • /
    • 2014
  • The purpose of this study was to analyze spectral versus cepstral measurements in esophageal speakers. The comparison between the measurements in thirteen male esophageal speakers was compared with the control group of thirteen normal speakers using the sustained vowel /a/. The main results can be summarized as below: (a) the CPP and L/H ratio of the esophageal group were significantly lower than those of the control group (b) the CPP was significantly correlated with the spectral parameters such as jitter, shimmer, NHR and VTI, and (c) the ROC analysis showed that the threshold of 10.25dB for the CPP achieved a good classification for esophageal speakers, with 100% perfect sensitivity and specificity. Thus, it was known that cepstral-based acoustic measures such as CPP, may be more reliable predictors than other spectral-based acoustic measures such as jitter and shimmer. And it was found that cepstral-based acoustic measures were effective in distinguishing esophageal voice quality from normal voice quality. This research will contribute to establishing a baseline related to speech characteristics in voice rehabilitation with laryngectomees.

A Study on Environment Parameter Compensation Method for Robust Speech Recognition (잡음에 강인한 음성 인식을 위한 환경 파라미터 보상에 관한 연구)

  • Hong, Mi-Jung;Lee, Ho-Woong
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.5 no.2 s.10
    • /
    • pp.1-10
    • /
    • 2006
  • In this paper, VTS(Vector Taylor Series) algorithm, which was proposed by Moreno at Carnegie Mellon University in 1996, is analyzed and simulated. VTS is considered to be one of the robust speech recognition techniques where model parameter conversion technique is adapted. To evaluation performance of the VTS algorithm, We used CMN(Cepstral Mean Normalization) technique which is one of the well-known noise processing methods. And the recognition rate is evaluated when white gaussian and street noise are employed as background noise. Also, the simulation result is analyzed in order to be compared with the previous one which was performed by Moreno.

  • PDF

The Effect of the Telephone Channel to the Performance of the Speaker Verification System (전화선 채널이 화자확인 시스템의 성능에 미치는 영향)

  • 조태현;김유진;이재영;정재호
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.5
    • /
    • pp.12-20
    • /
    • 1999
  • In this paper, we compared speaker verification performance of the speech data collected in clean environment and in channel environment. For the improvement of the performance of speaker verification gathered in channel, we have studied on the efficient feature parameters in channel environment and on the preprocessing. Speech DB for experiment is consisted of Korean doublet of numbers, considering the text-prompted system. Speech features including LPCC(Linear Predictive Cepstral Coefficient), MFCC(Mel Frequency Cepstral Coefficient), PLP(Perceptually Linear Prediction), LSP(Line Spectrum Pair) are analyzed. Also, the preprocessing of filtering to remove channel noise is studied. To remove or compensate for the channel effect from the extracted features, cepstral weighting, CMS(Cepstral Mean Subtraction), RASTA(RelAtive SpecTrAl) are applied. Also by presenting the speech recognition performance on each features and the processing, we compared speech recognition performance and speaker verification performance. For the evaluation of the applied speech features and processing methods, HTK(HMM Tool Kit) 2.0 is used. Giving different threshold according to male or female speaker, we compare EER(Equal Error Rate) on the clean speech data and channel data. Our simulation results show that, removing low band and high band channel noise by applying band pass filter(150~3800Hz) in preprocessing procedure, and extracting MFCC from the filtered speech, the best speaker verification performance was achieved from the view point of EER measurement.

  • PDF

Formant-broadened CMS Using the Log-spectrum Transformed from the Cepstrum (켑스트럼으로부터 변환된 로그 스펙트럼을 이용한 포먼트 평활화 켑스트럴 평균 차감법)

  • 김유진;정혜경;정재호
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.4
    • /
    • pp.361-373
    • /
    • 2002
  • In this paper, we propose a channel normalization method to improve the performance of CMS (cepstral mean subtraction) which is widely adopted to normalize a channel variation for speech and speaker recognition. CMS which estimates the channel effects by averaging long-term cepstrum has a weak point that the estimated channel is biased by the formants of voiced speech which include a useful speech information. The proposed Formant-broadened Cepstral Mean Subtraction (FBCMS) is based on the facts that the formants can be found easily in log spectrum which is transformed from the cepstrum by fourier transform and the formants correspond to the dominant poles of all-pole model which is usually modeled vocal tract. The FBCMS evaluates only poles to be broadened from the log spectrum without polynomial factorization and makes a formant-broadened cepstrum by broadening the bandwidths of formant poles. We can estimate the channel cepstrum effectively by averaging formant-broadened cepstral coefficients. We performed the experiments to compare FBCMS with CMS, PFCMS using 4 simulated telephone channels. In the experiment of channel estimation, we evaluated the distance cepstrum of real channel from the cepstrum of estimated channel and found that we were able to get the mean cepstrum closer to the channel cepstrum due to an softening the bias of mean cepstrum to speech. In the experiment of text-independent speaker identification, we showed the result that the proposed method was superior than the conventional CMS and comparable to the pole-filtered CMS. Consequently, we showed the proposed method was efficiently able to normalize the channel variation based on the conventional CMS.

Robust Speech Parameters for the Emotional Speech Recognition (감정 음성 인식을 위한 강인한 음성 파라메터)

  • Lee, Guehyun;Kim, Weon-Goo
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.22 no.6
    • /
    • pp.681-686
    • /
    • 2012
  • This paper studied the speech parameters less affected by the human emotion for the development of the robust emotional speech recognition system. For this purpose, the effect of emotion on the speech recognition system and robust speech parameters of speech recognition system were studied using speech database containing various emotions. In this study, mel-cepstral coefficient, delta-cepstral coefficient, RASTA mel-cepstral coefficient, root-cepstral coefficient, PLP coefficient and frequency warped mel-cepstral coefficient in the vocal tract length normalization method were used as feature parameters. And CMS (Cepstral Mean Subtraction) and SBR(Signal Bias Removal) method were used as a signal bias removal technique. Experimental results showed that the HMM based speaker independent word recognizer using frequency warped RASTA mel-cepstral coefficient in the vocal tract length normalized method, its derivatives and CMS as a signal bias removal showed the best performance.

Front-End Processing for Speech Recognition in the Telephone Network (전화망에서의 음성인식을 위한 전처리 연구)

  • Jun, Won-Suk;Shin, Won-Ho;Yang, Tae-Young;Kim, Weon-Goo;Youn, Dae-Hee
    • The Journal of the Acoustical Society of Korea
    • /
    • v.16 no.4
    • /
    • pp.57-63
    • /
    • 1997
  • In this paper, we study the efficient feature vector extraction method and front-end processing to improve the performance of the speech recognition system using KT(Korea Telecommunication) database collected through various telephone channels. First of all, we compare the recognition performances of the feature vectors known to be robust to noise and environmental variation and verify the performance enhancement of the recognition system using weighted cepstral distance measure methods. The experiment result shows that the recognition rate is increasedby using both PLP(Perceptual Linear Prediction) and MFCC(Mel Frequency Cepstral Coefficient) in comparison with LPC cepstrum used in KT recognition system. In cepstral distance measure, the weighted cepstral distance measure functions such as RPS(Root Power Sums) and BPL(Band-Pass Lifter) help the recognition enhancement. The application of the spectral subtraction method decrease the recognition rate because of the effect of distortion. However, RASTA(RelAtive SpecTrAl) processing, CMS(Cepstral Mean Subtraction) and SBR(Signal Bias Removal) enhance the recognition performance. Especially, the CMS method is simple but shows high recognition enhancement. Finally, the performances of the modified methods for the real-time implementation of CMS are compared and the improved method is suggested to prevent the performance degradation.

  • PDF

A study on Effective Feature Parameters Comparison for Speaker Recognition (화자인식에 효과적인 특징벡터에 관한 비교연구)

  • Park TaeSun;Kim Sang-Jin;Kwang Moon;Hahn Minsoo
    • Proceedings of the KSPS conference
    • /
    • 2003.05a
    • /
    • pp.145-148
    • /
    • 2003
  • In this paper, we carried out comparative study about various feature parameters for the effective speaker recognition such as LPC, LPCC, MFCC, Log Area Ratio, Reflection Coefficients, Inverse Sine, and Delta Parameter. We also adopted cepstral liftering and cepstral mean subtraction methods to check their usefulness. Our recognition system is HMM based one with 4 connected-Korean-digit speech database. Various experimental results will help to select the most effective parameter for speaker recognition.

  • PDF

Efficient Compensation of Spectral Tilt for Speech Recognition in Noisy Environment (잡음 환경에서 음성인식을 위한 스펙트럼 기울기의 효과적인 보상 방법)

  • Cho, Jungho
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.17 no.1
    • /
    • pp.199-206
    • /
    • 2017
  • Environmental noise can degrade the performance of speech recognition system. This paper presents a procedure for performing cepstrum based feature compensation to make recognition system robust to noise. The approach is based on direct compensation of spectral tilt to remove effects of additive noise. The noise compensation scheme operates in the cepstral domain by means of calculating spectral tilt of the log power spectrum. Spectral compensation is applied in combination with SNR-dependent cepstral mean compensation. Experimental results, in the presence of white Gaussian noise, subway noise and car noise, show that the proposed compensation method achieves substantial improvements in recognition accuracy at various SNR's.

Cepstral Normalization Combined with CSFN for Noisy Speech Recognition (켑스트럼 정규화와 켑스트럼 거리기반 묵음특징정규화 방법을 이용한 잡음음성 인식)

  • Choi, Sook-Nam;Shen, Guang-Hu;Chung, Hyun-Yeol
    • Journal of Korea Multimedia Society
    • /
    • v.14 no.10
    • /
    • pp.1221-1228
    • /
    • 2011
  • The speech recognition system works well in general indoor environment. However, the recognition performance is dramatically decreased when the system is used in the real environment because of the several noises. In this paper we proposed CSFN-CMVN to improve the recognition performance of the existing CSFN(Cepstral distance based SFN). The CSFN-CMVN method is a combined method of cepstral normalization with CSFN that normalizes silence features using cepstral euclidean distance to classify speech/silence for better performance. From the test results using Aurora 2.0 DB, we could find out that our proposed CSFN-CMVN improves about 7% of more average word accuracy in all the test sets comparing with the typical silence features normalization SFN-I. We can also get improved accuracy of 6% and 5% respectively in compared tests with the conventional SFN-II and CSFN, showing the effectiveness of our proposed method.

A 3-Level Endpoint Detection Algorithm for Isolated Speech Using Time and Frequency-based Features

  • Eng, Goh Kia;Ahmad, Abdul Manan
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2004.08a
    • /
    • pp.1291-1295
    • /
    • 2004
  • This paper proposed a new approach for endpoint detection of isolated speech, which proves to significantly improve the endpoint detection performance. The proposed algorithm relies on the root mean square energy (rms energy), zero crossing rate and spectral characteristics of the speech signal where the Euclidean distance measure is adopted using cepstral coefficients to accurately detect the endpoint of isolated speech. The algorithm offers better performance than traditional energy-based algorithm. The vocabulary for the experiment includes English digit from one to nine. These experimental results were conducted by 360 utterances from a male speaker. Experimental results show that the accuracy of the algorithm is quite acceptable. Moreover, the computation overload of this algorithm is low since the cepstral coefficients parameters will be used in feature extraction later of speech recognition procedure.

  • PDF