• Title/Summary/Keyword: 켑스트럼

Search Result 163, Processing Time 0.025 seconds

Glottal Weighted Cepstrum for Robust Speech Recognition (잡음에 강한 음성 인식을 위한 성문 가중 켑스트럼에 관한 연구)

  • 전선도;강철호
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.5
    • /
    • pp.78-82
    • /
    • 1999
  • This paper is a study on weighted cepstrum used broadly for robust speech recognition. Especially, we propose the weighted function of asymmetric glottal pulse shape. which is used for weighted cepstrum extracted by PLP(Perceptual Linear Predictive) based on auditory model. Also, we analyze this glottal weighted cepstrum from the glottal pulse of glottal model in connection with the cepstrum. And we obtain speech features analyzed by both the glottal model and the auditory model. The isolated-word recognition rate is adopted for the test of proposed method in the car moise and street environment. And the performance of glottal weighted cepstrum is compared with both that of weighted cepstrum extracted by LP(Linear Prediction) and that of weighted cepstrum extracted by PLP. The result of computer simulation shows that recognition rate of the proposed glottal weighted cepstrum is better than those of other weighted cepstrums.

  • PDF

Robust Speech Recognition for Emotional Variation (감정 변화에 강인한 음성 인식)

  • Kim, Won-Gu
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2007.11a
    • /
    • pp.431-434
    • /
    • 2007
  • 본 논문에서는 인간의 감정 변화의 영향을 적게 받는 음성 인식 시스템의 특정 파라메터에 관한 연구를 수행하였다. 이를 위하여 우선 다양한 감정이 포함된 음성 데이터베이스를 사용하여 감정 변화가 음성 인식 시스템의 성능에 미치는 영향과 감정 변화의 영향을 적게 받는 특정 파라메터에 관한 연구를 수행하였다. 본 연구에서는 LPC 켑스트럼 계수, 멜 켑스트럼 계수, 루트 켑스트럼 계수, PLP 계수와 RASTA 처리를 한 멜 켑스트럼 계수와 음성의 에너지를 사용하였다. 또한 음성에 포함된 편의(bias)를 제거하는 방법으로 CMS 와 SBR 방법을 사용하여 그 성능을 비교하였다. HMM 기반의 화자독립 단어 인식기를 사용한 실험 결과에서 RASTA 멜 켑스트럼과 델타 켑스트럼을 사용하고 신호편의 제거 방법으로 CMS를 사용한 경우에 가장 우수한 성능을 나타내었다. 이러한 것은 멜 켑스트럼을 사용한 기준 시스템과 비교하여 59%정도 오차가 감소된 것이다.

  • PDF

A Study on the Channel Normalized Pitch Synchronous Cepstrum for Speaker Recognition (채널에 강인한 화자 인식을 위한 채널 정규화 피치 동기 켑스트럼에 관한 연구)

  • 김유진;정재호
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.1
    • /
    • pp.61-74
    • /
    • 2004
  • In this paper, a contort- and speaker-dependent cepstrum extraction method and a channel normalization method for minimizing the loss of speaker characteristics in the cepstrum were proposed for a robust speaker recognition system over the channel. The proposed extraction method creates a cepstrum based on the pitch synchronous analysis using the inherent pitch of the speaker. Therefore, the cepstrum called the 〃pitch synchronous cepstrum〃 (PSC) represents the impulse response of the vocal tract more accurately in voiced speech. And the PSC can compensate for channel distortion because the pitch is more robust in a channel environment than the spectrum of speech. And the proposed channel normalization method, the 〃formant-broadened pitch synchronous CMS〃 (FBPSCMS), applies the Formant-Broadened CMS to the PSC and improves the accuracy of the intraframe processing. We compared the text-independent closed-set speaker identification on 56 females and 112 males using TIMIT and NTIMIT database, respectively. The results show that pitch synchronous km improves the error reduction rate by up to 7.7% in comparison with conventional short-time cepstrum and the error rates of the FBPSCMS are more stable and lower than those of pole-filtered CMS.

Speaker Recognition using LPC cepstrum Coefficients and Neural Network (LPC 켑스트럼 계수와 신경회로망을 사용한 화자인식)

  • Choi, Jae-Seung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.15 no.12
    • /
    • pp.2521-2526
    • /
    • 2011
  • This paper proposes a speaker recognition algorithm using a perceptron neural network and LPC (Linear Predictive Coding) cepstrum coefficients. The proposed algorithm first detects the voiced sections at each frame. Then, the LPC cepstrum coefficients which have speaker characteristics are obtained by the linear predictive analysis for the detected voiced sections. To classify the obtained LPC cepstrum coefficients, a neural network is trained using the LPC cepstrum coefficients. In this experiment, the performance of the proposed algorithm was evaluated using the speech recognition rates based on the LPC cepstrum coefficients and the neural network.

남녀의 음향학적 특징벡터의 비교 분석에 관한 연구

  • Choe, Jae-Seung;Jeong, Byeong-Gu
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2012.05a
    • /
    • pp.887-890
    • /
    • 2012
  • 본 논문에서는 켑스트럼 계수의 변화에 따른 남성화자와 여성화자의 음향학적인 특징벡터를 비교하여 분석하는 기초적인 연구를 수행한다. 특히 FFT 켑스트럼 및 LPC 켑스트럼에 대한 남녀의 음향학적인 특징벡터의 차이점을 나타낸다. 향후 이러한 차이점을 기초로 하여 신경회로망 등에 의한 성별 인식에 대한 연구를 수행함으로써 남성화자 및 여성화자를 분리할 수 있는 근거를 마련하는 기초연구이다.

  • PDF

LPC 켑스트럼 및 FFT 스펙트럼에 의한 성별 인식 알고리즘

  • Choe, Jae-Seung;Jeong, Byeong-Gu
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2012.10a
    • /
    • pp.63-65
    • /
    • 2012
  • 본 논문에서는 입력된 음성이 남성화자인지 여성화자인지를 구분하는 FFT 스펙트럼 및 LPC 켑스트럼 입력에 의한 성별인식 알고리즘을 제안한다. 본 논문에서는 특히 남성화자와 여성화자의 특징벡터를 비교 분석하여, 이러한 남녀의 음향학적인 특징벡터의 차이점을 이용하여 신경회로망에 의한 성별 인식에 대한 실험을 수행한다. 특히 12차의 LPC 켑스트럼 및 8차의 저역 FFT 스펙트럼의 특징벡터를 사용한 경우에, 남성화자 및 여성화자에 대해서 양호한 남녀 성별인식률이 구해졌다.

  • PDF

Cepstral Normalization Combined with CSFN for Noisy Speech Recognition (켑스트럼 정규화와 켑스트럼 거리기반 묵음특징정규화 방법을 이용한 잡음음성 인식)

  • Choi, Sook-Nam;Shen, Guang-Hu;Chung, Hyun-Yeol
    • Journal of Korea Multimedia Society
    • /
    • v.14 no.10
    • /
    • pp.1221-1228
    • /
    • 2011
  • The speech recognition system works well in general indoor environment. However, the recognition performance is dramatically decreased when the system is used in the real environment because of the several noises. In this paper we proposed CSFN-CMVN to improve the recognition performance of the existing CSFN(Cepstral distance based SFN). The CSFN-CMVN method is a combined method of cepstral normalization with CSFN that normalizes silence features using cepstral euclidean distance to classify speech/silence for better performance. From the test results using Aurora 2.0 DB, we could find out that our proposed CSFN-CMVN improves about 7% of more average word accuracy in all the test sets comparing with the typical silence features normalization SFN-I. We can also get improved accuracy of 6% and 5% respectively in compared tests with the conventional SFN-II and CSFN, showing the effectiveness of our proposed method.

A Study on the Dynamic Feature of Phoneme for Word Recognition (단어인식을 위한 음소의 동적 특징에 관한 검토)

  • 김주곤
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1997.06a
    • /
    • pp.35-39
    • /
    • 1997
  • 본 연구에서는 음소를 인식의 기본단위로 하는 한국어 단어인식 시스템의 인식정도를 개선하기 이해 각 음소의 시간방향의 정보를 포함하고 있는 동적특징인 회귀계수와 K-L(Karhunen-Loeve)변환으로 얻은 특징파라미터(이하 K-L계수라 함)를 이용하여 음소인식과 단어인식 실험을 수행한 결과 그 유효성을 확인하였다. 이를 위해 먼저 파열음을 대상으로 정적 특징과 파라미터인 멜-켑스트럼(Mel-Cepstrum)과 동적 특징 파라미터인 회귀계수(Regressive Coefficient) 와 K-L 계수(Karhunen-Loeve Coefficient)를 추출하여 음소 인식실험을 수행하였다. 그 결과 멜-켑스트럼을 사용한 경우 39.84%, 회귀계수를 사용한 경우 48.52%, K-L계수를 사용한 경우 52.40%의 인식률을 얻었다. 이를 참고로 각각의 특징 파라미터를 결합하여 인식실험한 결과 멜-켑스트럼과 K-L계수를 사용한 경우 47.17%,멜 -켑스트럼과 회귀계수의 경우 60.11%,K-L계수와 회귀계수의 경우 60.35%, 멜-켑스트럼과 K-L계수 , 회귀계수를 사용한 경우 58.13%를 인식률을 얻어 동적특징인 K-L 계수와 회귀계수를 사용한 경우와 멜-켑스트럼과 회귀계수를 사용한 경우가 높은 인식률을 보였으며 이를 단어로 확장하여 인식실험을 수행한 결과 기존의 특징 파라미터를 이용한 경우보다 높은 인식률을 얻어 동적 파라미터의 유효성을 확인하였다

  • PDF

Cepstrum PDF Normalization Method for Speech Recognition in Noise Environment (잡음환경에서의 음성인식을 위한 켑스트럼의 확률분포 정규화 기법)

  • Suk Yong Ho;Lee Hwang-Soo;Choi Seung Ho
    • The Journal of the Acoustical Society of Korea
    • /
    • v.24 no.4
    • /
    • pp.224-229
    • /
    • 2005
  • In this paper, we Propose a novel cepstrum normalization method which normalizes the probability density function (pdf) of cepstrum for robust speech recognition in additive noise environments. While the conventional methods normalize the first- and/or second-order statistics such as the mean and/or variance of the cepstrum. the proposed method fully normalizes the statistics of cepstrum by making the pdfs of clean and noisy cepstrum identical to each other For the target Pdf, the generalized Gaussian distribution is selected to consider various densities. In recognition phase, we devise a table lookup method to save computational costs. From the speaker-independent isolated-word recognition experiments, we show that the Proposed method gives improved Performance compared with that of the conventional methods, especially in heavy noise environments.

Formant-broadened CMS Using the Log-spectrum Transformed from the Cepstrum (켑스트럼으로부터 변환된 로그 스펙트럼을 이용한 포먼트 평활화 켑스트럴 평균 차감법)

  • 김유진;정혜경;정재호
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.4
    • /
    • pp.361-373
    • /
    • 2002
  • In this paper, we propose a channel normalization method to improve the performance of CMS (cepstral mean subtraction) which is widely adopted to normalize a channel variation for speech and speaker recognition. CMS which estimates the channel effects by averaging long-term cepstrum has a weak point that the estimated channel is biased by the formants of voiced speech which include a useful speech information. The proposed Formant-broadened Cepstral Mean Subtraction (FBCMS) is based on the facts that the formants can be found easily in log spectrum which is transformed from the cepstrum by fourier transform and the formants correspond to the dominant poles of all-pole model which is usually modeled vocal tract. The FBCMS evaluates only poles to be broadened from the log spectrum without polynomial factorization and makes a formant-broadened cepstrum by broadening the bandwidths of formant poles. We can estimate the channel cepstrum effectively by averaging formant-broadened cepstral coefficients. We performed the experiments to compare FBCMS with CMS, PFCMS using 4 simulated telephone channels. In the experiment of channel estimation, we evaluated the distance cepstrum of real channel from the cepstrum of estimated channel and found that we were able to get the mean cepstrum closer to the channel cepstrum due to an softening the bias of mean cepstrum to speech. In the experiment of text-independent speaker identification, we showed the result that the proposed method was superior than the conventional CMS and comparable to the pole-filtered CMS. Consequently, we showed the proposed method was efficiently able to normalize the channel variation based on the conventional CMS.