• Title/Summary/Keyword: Vocal tract characteristics

Search Result 43, Processing Time 0.021 seconds

A Study on the Channel Normalized Pitch Synchronous Cepstrum for Speaker Recognition (채널에 강인한 화자 인식을 위한 채널 정규화 피치 동기 켑스트럼에 관한 연구)

  • 김유진;정재호
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.1
    • /
    • pp.61-74
    • /
    • 2004
  • In this paper, a contort- and speaker-dependent cepstrum extraction method and a channel normalization method for minimizing the loss of speaker characteristics in the cepstrum were proposed for a robust speaker recognition system over the channel. The proposed extraction method creates a cepstrum based on the pitch synchronous analysis using the inherent pitch of the speaker. Therefore, the cepstrum called the 〃pitch synchronous cepstrum〃 (PSC) represents the impulse response of the vocal tract more accurately in voiced speech. And the PSC can compensate for channel distortion because the pitch is more robust in a channel environment than the spectrum of speech. And the proposed channel normalization method, the 〃formant-broadened pitch synchronous CMS〃 (FBPSCMS), applies the Formant-Broadened CMS to the PSC and improves the accuracy of the intraframe processing. We compared the text-independent closed-set speaker identification on 56 females and 112 males using TIMIT and NTIMIT database, respectively. The results show that pitch synchronous km improves the error reduction rate by up to 7.7% in comparison with conventional short-time cepstrum and the error rates of the FBPSCMS are more stable and lower than those of pole-filtered CMS.

A Study on the Correlation Between Sasang Constitution and Sound Characteristics Used Harmonics and Formant Bandwidth (Harmonics(배음)와 Formant Bandwidth(포먼트 폭)를 이용한 음성특성(音聲特性)과 사상체질간(四象體質間)의 상관성(相關性) 연구(硏究))

  • Park, Sung-Jin;Kim, Dal-Rae
    • Journal of Sasang Constitutional Medicine
    • /
    • v.16 no.1
    • /
    • pp.61-73
    • /
    • 2004
  • This study was prepared to investigate the correlation between Sasang constitutional groups and voice characteristics using voice analysis system(in this study, CSL). I focused on the voice characteristics in terms of harmonics, Formant frequency and Formant Bandwidth. The subjects were 71 males. I classified them into three groups, that is Soeumin group, Soyangin group and Taeumin group. The classification method of Constitution used two ways, QSCCII(Questionnarie for the Sasang Constitution Classification II) and Interview with a specialist in Sasang Constitution. So 71 people were categorized into 31 Soeumin(people), 18 Soyangin(people) and 22 Taeumin(people). Pitch is approximately similar to the fundamental frequency(F0) in voices. Shimmer in dB gives an evaluation of the period-to-period variability of the peak-to-peak amplitude within the analyzed voice sample. FFT(Fast Fourier Transform) method in CSL can display sampled voices into harmonics. H1 is the first peak and h2 is the second peak in the harmonics. The amplitude difference of h1 and h2(h1-h2) can be explained as the speaker's phonation type, And Formant frequency and bandwidth can be explained as the speaker's vocal tract. So I checked the harmonics and Formant frequency and Bandwidth as the voice parameters. First I have captured /e/ voices from all subjects using microphone. And then I analyzed /e/ voices with CSL. Power Spectrum and Formant History is the menu in the CSL which can display harmonics and Formant frequency and bandwidth. The results about the correlation between Sasang Constitutional Groups and voice parameters are as follows; 1. There is no significant amplitude difference of harmonics(h1-h2) among three groups. 2. There is the significant difference between Soeumin Group and Soyangin Group in Formant Frequency 1 and Formant Bandwidth 1(p<0.05). Any other parameters have no significance. I assume that Soyangin Group has clearer and brighter voice than Soeumin Group according to the Formant Bandwidth difference. And I think its result has coincidence with the context of "Dongyi Suse Bowon" and "Sasangimhejinam".

  • PDF

Analysis of Acoustic Characteristics of Vowel and Consonants Production Study on Speech Proficiency in Esophageal Speech (식도발성의 숙련 정도에 따른 모음의 음향학적 특징과 자음 산출에 대한 연구)

  • Choi, Seong-Hee;Choi, Hong-Shik;Kim, Han-Soo;Lim, Sung-Eun;Lee, Sung-Eun;Pyo, Hwa-Young
    • Speech Sciences
    • /
    • v.10 no.3
    • /
    • pp.7-27
    • /
    • 2003
  • Esophageal Speech uses the esophageal air during phonation. Fluent esophageal speakers frequently intake air in oral communication, but unskilled esophageal speakers are difficult with swallowing lots of air. The purpose of this study was to investigate the difference of acoustic characteristics of vowel and consonants production according to the speech proficiency level in esophageal speech. 13 normal male speakers and 13 male esophageal speakers (5 unskilled esophageal speakers, 8 skilled esophageal speakers) with age ranging from 50 to 70 years old. The stimuli were sustained /a/ vowel and 36 meaningless two syllable words. Used vowel is /a/ and consonants were 18 : /k, n, t, m, p, s, c, $C^{h},\;k^{h},\;t^{h},\;p^{h}$, h, I, k', t', p', s', c'/. Fundermental frequency (Fx), Jitter, shimmer, HNR, MPT were measured with by electroglottography using Lx speech studio (Laryngograph Ltd, London, UK). 36 meaningless words produced by esophageal speakers were presented to 3 speech-language pathologists who phonetically transcribed their responses. Fx, Jitter, HNR parameters is significant different between skilled esophageal speakers and unskilled esophageal speakers (P<.05). Considering manner of articulation, ANOVA showed that differences in two esophageal speech groups on speech proficiency were significant; Glide had the highest number of confusion with the other phoneme class, affricates are the most intelligible in the unskilled esophageal speech group, whereas in the skilled esophageal speech group fricatives resulted highest number of confusions, nasals are the most intelligible. In the place of articulation, glottal /h/ is the highest confusion consonant in both groups. Bilabials are the most intelligible in the skilled esophageal speech, velars are the most intelligible in the unskilled esophageal speech. In the structure of syllable, 'CV+V' is more confusion in the skilled esophageal group, unskilled esophageal speech group has similar confusion in both structures. In unskilled esophageal speech, significantly different Fx, Jitter, HNR acoustic parameters of vowel and the highest confusions of Liquid, Nasals consonants could be attributed to unstable, improper contact of neoglottis as vibratory source and insufficiency in the phonatory air supply, and higher motoric demand of remaining articulation due to morphological characteristics of vocal tract after laryngectomy.

  • PDF