• 제목/요약/키워드: 4 Formant Frequency

검색결과 71건 처리시간 0.028초

정현파 모델을 이용한 2.4kbps 음성부호화 알고리즘 (2.4kbps Speech Coding Algorithm Using the Sinusoidal Model)

  • 백성기;배건성
    • 한국통신학회논문지
    • /
    • 제27권3A호
    • /
    • pp.196-204
    • /
    • 2002
  • STC(Sinusoidal Transform Coding) 방식은 주파수 영역에서 음성신호의 스펙트럼 피크치들을 정현파로 모델링하여 합성하는 음성부호화 방식을 말한다. 저전송률 STC 방식에서는 스펙트럼의 모든 피크를 이용하는 대신, 기본 주파수와 고조파에 해당하는 스펙트럼 포락선에서의 크기와 그때의 위상을 이용하여 음성을 합성한다. 본 논문에서는 정현파 모델에 기반한 2.4kbps 음성부호화 알고리즘을 제안한다. 피치정보는 모든 스펙트럼 피크를 사용한 합성음과 선택된 주파수와 고조파를 이용한 합성음과의 평균자승에러를 이용하여 추정하고, 위상정보는 여기신호 펄스의 시작시기를 나타내는 onset time과 성도 모델 전달함수의 위상을 이용하여 얻는다. 크기정보는 SEEVOC 알고리즘과 선형예측계수를 이용하여 추정한다. 실험결과, 합성음의 스펙트럼 특성은 원음성의 포만트 정보를 대부분 가지고 있으며, 위상정보도 원음성의 위상을 잘 따라감을 확인하였다. 합성음의 음질평가를 위해서 informal한 MOS(Mean Opinion Score) 테스트를 시행하였으며, 2.0kbps의 HVXC와 비교하여 대체적으로 MOS 3.1 이상의 음질을 얻을 수 있었다.

포만트 주파수를 이용한 음성인식 전처리 시스템의 설계 및 구현 (A Design and Implementation of Speech Recognition Preprocessing System using Formant Frequency)

  • 김태욱;한승진;김민성;이정현
    • 한국정보과학회:학술대회논문집
    • /
    • 한국정보과학회 1999년도 가을 학술발표논문집 Vol.26 No.2 (2)
    • /
    • pp.198-200
    • /
    • 1999
  • 인간이 발성하는 음성에는 의미에 대한 정보 뿐만 아니라 화자의 성별에 따라 고유한 특성을 가지고 있다. 즉 음성은 고음이 강한 여성음성과 남성음성으로 분류할 수 있다. 그러나, 기존의 HMM을 이용한 음성인식시스템에서는 남성과 여성음성의 이러한 특성이 있음에도 불구하고 이를 고려하지 않고, 하나의 HMM으로 구성하고 있다. 본 논문에서 제시하는 알고리즘으로 실험한 결과 남성과 여성의 포만트 주파수가 100~30Hzck이가 나는 것을 알 수 있었고, 이러한 특성을 고려하여 남성과 여성의 음성을 구별할 수 있는 방법을 제안한다. 또한 남성과 여성음성을 각각 구분하여 GMM을 훈련시킨 후 인식과정에서 입력된 음성의 포만트 특성에 따라 남성음성이면 남성 HMM으로 여성음성이면 여성 HMM으로 인식을 수행함으로써 기존의 인식방법보다 남성음성은 5.2% 여성음성은 4.4% 향상된 결과를 얻었다.

  • PDF

비강 공명이 한국어 모음에 미치는 음향학적 영향 (Effect of the Nasal Cavity Resonance on the Acoustic Characteristics of Korean Vowels)

  • 성명훈;오승하;강명구;고태용;김광현;김진영
    • 대한후두음성언어의학회지
    • /
    • 제4권1호
    • /
    • pp.24-32
    • /
    • 1991
  • Cleft palate or velopharyngeal incompetence shows many disorders and disabilities affecting speech transmission. including distortion. substitution. and the nasalization of the vowels. The nasalized vowels are produced primarily by lowering of the velum. resulting in opening a side passage for the air flow through the nasal cavity. These abnormal movements give rise to complex modification of the physical property of the sound or in the sound spectrum. The authors employed Sonagraph$^{\circledR}$ as a sound analyzer in order to ascertain the features which characterize the nasalization of vowels. Twenty healthy Korean male adult voluteers were analyzed in artificial conditions of anterior and posterior nasal obstruction. and velo-pharyngeal incompetence. The results were as follows : 1) Fundamental frequency was not changed by nasal obstruction or velopharyngeal incompetence. 2) There was no significant difference of the formant intensity between normal and nasal vowels. 3) In VPI, a decrease of the frequency of $F_2$ was observed in /e/ and /i/ vowels(p<0.001). 4) In VPI, the $F_2$ was frequently missed in /o/ and /u/ vowels. 5) In the consonant spectra of VPI, the 'release burst' was usually not observed.

  • PDF

경직형과 불수의운동형 뇌성마비아동의 /아/ 모음 음향학적 비교 (A comparative study of the acoustic characteristics of the vowel /a/ between children with spastic and dyskinetic cerebral palsy)

  • 정필연;심현섭
    • 말소리와 음성과학
    • /
    • 제12권1호
    • /
    • pp.65-74
    • /
    • 2020
  • 본 연구의 목적은 경직형과 불수의운동형 뇌성마비 아동의 음향학적 특성에서 차이가 있는지 알아보는 것이다. 연구대상은 만 4~12세의 뇌성마비 아동 34명이 참여하였다(경직형 26명, 불수의운동형 8명). 연구과제는 모음 '아' 연장발성하기이고, Praat을 사용하여 MPT, F0, Jitter, Shimmer, NHR과 F1, F2를 측정하였다. 두 유형 간 음향학적 차이를 알아보기 위해 두 독립표본 t-검정과 등분산가정이 충족되지 않는 경우 Welch-Aspin 검정을 사용하여 통계분석을 실시하였다. 연구결과, 첫째, 경직형 뇌성마비아동에 비해 불수의운동형에서 유의하게 낮은 MPT를 나타내었다. 둘째, Shimmer에서 불수의운동형의 측정치가 유의하게 높았다. 셋째, F1과 F2에서는 두 유형 간에 유의한 차이를 보이지 않았다. 이러한 결과는 경직형에 비해 불수의운동형의 호흡능력과 호흡조절능력이 더 제한적이고, 음성의 불안정성과 불규칙성도 더 많이 나타난다는 것을 시사한다. 본 연구의 결과는 뇌성마비 유형 간 말운동통제능력의 차이를 확인하고, 유형에 따른 중재계획을 수립하는데 필요한 정보를 제공해 줄 수 있을 것이다.

Statistical Speech Feature Selection for Emotion Recognition

  • Kwon Oh-Wook;Chan Kwokleung;Lee Te-Won
    • The Journal of the Acoustical Society of Korea
    • /
    • 제24권4E호
    • /
    • pp.144-151
    • /
    • 2005
  • We evaluate the performance of emotion recognition via speech signals when a plain speaker talks to an entertainment robot. For each frame of a speech utterance, we extract the frame-based features: pitch, energy, formant, band energies, mel frequency cepstral coefficients (MFCCs), and velocity/acceleration of pitch and MFCCs. For discriminative classifiers, a fixed-length utterance-based feature vector is computed from the statistics of the frame-based features. Using a speaker-independent database, we evaluate the performance of two promising classifiers: support vector machine (SVM) and hidden Markov model (HMM). For angry/bored/happy/neutral/sad emotion classification, the SVM and HMM classifiers yield $42.3\%\;and\;40.8\%$ accuracy, respectively. We show that the accuracy is significant compared to the performance by foreign human listeners.

인공와우이식을 받은 아동과 건청 아동이 산출한 단모음의 음향음성학적 특성 (A Comparison fo Formant frequency of Vowels Produed by Cochlear Implanted and Normal-Hearing Children)

  • 이주은;이봉원
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2007년도 한국음성과학회 공동학술대회 발표논문집
    • /
    • pp.64-66
    • /
    • 2007
  • The purpose of this study was to compare and analyze some acoustic parameters of the cochlear implanted children(N=20, aged 3-10) and to suggest a basic data on speech rehabilitaion for the cochlear implanted children. Acoustic analyses of seven Korean monophthongs produced by 4 contexts(V, CV, VC, CVC) were conducted for the cochler implanted children and normal hearing children(N=20, aged 3-10). Subjects were asked to pronounce a list of vowel repeating three times. The results of this study are the same as follows: First, in the case of the cochlear implanted group, there were no significant differences in F1 and F2. Second, in the case of the normal hearing group, there were significant differences in F2 /ㅜ/ between V and CVC, between VC and CVC. Third, there were significant differences in F1, F2 between CI group and normal hearing group.

  • PDF

음장과 외이도 내부에서의 음성 비교 (The comparison of the voice between the free field and the external auditory canal)

  • 허승덕;김리석;고도흥;이정학
    • 음성과학
    • /
    • 제7권4호
    • /
    • pp.83-90
    • /
    • 2000
  • The purpose of this study was to examine some acoustic characteristics in the ear canal. It was assumed that a sound outside the external auditory canal could be different from the sound inside the external auditory canal. The acoustic signals were captured by a probe microphone placed at a distance within 1 cm from the tympanic membrane, and a reference microphone was placed over the upper pinna. Three vowels /a/, /i/, /u/ were recorded from a normal adult male speaker. The parameters such as the formant frequency ($Fl\simF5$) and the peak intensity were measured using a speech analyser, PCquirer. It was found that the entering part of the external auditory canal functions as a narrowing point as to the speech that passes through the free field. Results show that acoustic characteristics were changed for speech discrimination rather than speech perception.

  • PDF

CELP보코더에서 Line Spectrum Frequency를 이용한 고속 피치검색 (A New Fast Pitch Search Algorithm using Line Spectrum Frequency in the CELP Vocoder)

  • 배명진;손상목;유하영;변경진
    • 한국음향학회지
    • /
    • 제15권2호
    • /
    • pp.90-94
    • /
    • 1996
  • 부호여기된 선형예측(CELP) 음성부호화기는 4.8kbps이하의 낮은 전송 비율에서도 좋은 성능을 갖는다. CELP형 부호기의 단점은 많은 계산량을 필요로 한다는 것이다. 본 논문에서, 우리는 복잡성을 줄이면서 CELP보코더의 음질을 유지하는 새로운 피치검색법을 제안하였다. 이 방법은 CELP보코더의 포만트 필터단에서 찾은 제 1 포만트를 이용하여 예비피치를 찾고, 피치검색을 예비피치 구간에서만 수행하는 것이다. 제안한 방법을 CELP보코더에 적용하므로써, 기존의 방법에 비해 약 64%의 복잡성이 감소되었다.

  • PDF

병적인 소리 떨림증과 소리꾼 떨림증의 음향학적인 비교연구 (The comparative Study of the Acoustic Representation between Pansori singer's and Spasmodic dysphonia patient's Voice)

  • 홍기환;김현기;이진국;조재식
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2007년도 한국음성과학회 공동학술대회 발표논문집
    • /
    • pp.143-145
    • /
    • 2007
  • Muscle groups that are located in and around the vocal tract can produce audible changes in frequency and/or intensity of the voice. Vocal vibrato is a characteristic feature in the singing of performers trained in the western classical tradition and vibrato is generally considered to result from modulation in frequency amplitude and timbre. Vocal tremor is also characterized by periodic fluctuations in the voice frequency or intensity and vocal tremor is symptom of a neurological disease as Spasmodic dysphonia , Parkinson's disease. Vocal vibrato and Vocal tremor may have many of the same origins and mechanisms in the voice production systems. The purpose of this study is to find acostic character of Korean traditional song Pansori singer's vibrato and Spasmodic dysphonia patient's vocal tremor. twelve Pansori singers and seven Spasmodic dysponia patients participated to this study. Power spectrum and Real time Spectrogram are used to analyze the acoustic characteristics of Pansori singing and Spasmodic dysphonia patient's voice The results are as follows; First, vowel formant differences between Pansori singing and Spasmodic dysphonia patient's voice are higher F1, F3. Second, The vibrato rate show differences between Pansori singing and Spasmodic dysphonia patients;$4^{\sim}6/sec$ and $5{\sim}6/sec$ Vibrato rate of pitch is 5.7 Hz ${\sim}$ 42.4 Hz for Pansori singing , 3.8 Hz ${\sim}$ 27.9 Hz for Spasmodic dysphonia patients ;Vibrato rate of intensity range is 0.07 dB ${\sim}$ 8.26 dB for Pansori singing and 0.07 dB ${\sim}$ 4.81 dB for Spasmodic dysphonia patients

  • PDF

우리말 모음의 발음시 음형대와 조음위치의 관계에 대한 연구 (Relationship between Formants and Constriction Areas of Vocal Tract in 9 Korean Standard Vowels)

  • 서경식;김재영;김영기
    • 대한후두음성언어의학회지
    • /
    • 제5권1호
    • /
    • pp.44-58
    • /
    • 1994
  • The formants of the 9 Korean standard vowels(which used by the average people of Seoul, central-area of the Korean peninsula) were measured by analysis with the linear predictive coding(LPC) and fast Fourier transform(FFT). The author already had reported the constriction area for the Korean standard vowels, and with the existing data, the distance from glottis to the constriction area in the vocal tract of each vowel was newly measured with videovelopharyngograms and lateral Rontgenograms of the vocal tract. We correlated the formant frequencies with the distance from glottis to the constriction area of the vocal tract. Also we tried to correlate the formant frequencies with the position of tongue in the vocal tract which is divided into 2 categories : The position of tongue in oral cavity by the distance from imaginary palatal line to the highest point of tongue and the position in pharyngeal cavity by the distance from back of tongue to posterior pharyngeal wall. This study was performed with 10 adults(male : 5, female : 5) who spoke primary 9 Korean standard vowels. We had already reported that the Korean vowel [i], [e], $[{\varepsilon}]$ were articulated at hard palate level, [$\dot{+}$], [u] were at soft palate level, [$\wedge$] was at upper pharynx level and the [$\wedge$], [$\partial$], [a] in a previous article. Also we had noted that the significance of pharyngeal cavity in vowel articulation. From this study we have concluded that ; 1) The F$_1$ is related with the oral cavity articulated vowel [i, e, $\varepsilon$, $\dot{+}$, u]. 2) Within the oral cavity articulated vowel [i, e, $\varepsilon$, $\dot{+}$, u] and the upper pharynx articulated vowel [o], the F$_2$ is elevated when the diatance from glottis to the constriction area is longer. But within the lower pharynx articulated vowel [$\partial$, $\wedge$, a], the F$_2$ is elevated when the distance from glottis to the constriction area is shorter. 3) With the stronger tendency of back-vowel, the higher the elevation of the F$_1$ and F$_2$ frequencies. 4) The F$_3$ and F$_4$ showed no correaltion with the constriction area nor the position of tongue in the vocal tract 5) The parameter F$_2$- F$_1$, which is the difference between F$_2$ frequency and F$_1$ frequency showed an excellent indicator of differenciating the oral cavity articulated vowels from pharyngeal cavity articulated vowels. If the F$_2$-F$_1$ is less than about 600Hz which indicates the vowel is articulated in the pharyngeal cavity, and more than about 600Hz, which indicates that the vowel is articulated in the oral cavity.

  • PDF