Search | Korea Science

Performance Improvement of Speaker Recognition Using Enhanced Feature Extraction in Glottal Flow Signals and Multiple Feature Parameter Combination (Glottal flow 신호에서의 향상된 특징추출 및 다중 특징파라미터 결합을 통한 화자인식 성능 향상)

Kang, Jihoon;Kim, Youngil;Jeong, Sangbae
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.19 no.12
- /
- pp.2792-2799
- /
- 2015
In this paper, we utilize source mel-frequency cepstral coefficients (SMFCCs), skewness, and kurtosis extracted in glottal flow signals to improve speaker recognition performance. Generally, because the high band magnitude response of glottal flow signals is somewhat flat, the SMFCCs are extracted using the response below the predefined cutoff frequency. The extracted SMFCC, skewness, and kurtosis are concatenated with conventional feature parameters. Then, dimensional reduction by the principal component analysis (PCA) and the linear discriminat analysis (LDA) is followed to compare performances with conventional systems under equivalent conditions. The proposed recognition system outperformed the conventional system for large scale speaker recognition experiments. Especially, the performance improvement was more noticeable for small Gaussan mixtures.
https://doi.org/10.6109/jkiice.2015.19.12.2792 인용 PDF KSCI

Computation of Laryngeal Flow and Sound through a Dynamic Model of the Vocal Folds (동적 성대 모델을 이용한 후두 내 유동 및 음향장에 대한 수치 연구)

Bae, Young-Min;Moon, Young-J.
- 한국전산유체공학회:학술대회논문집
- /
- 2008.03b
- /
- pp.21-24
- /
- 2008
The present study numerically investigates the glottal airflow characteristics as well as acoustic features of phonation fully coupled with dynamic behavior of vocal folds. The vocal folds are described by a low-dimensional body-covered model characterized by bio-mechanical parameters such as glottal width, vocal folds stiffness, and subglottal pressure. The flow in the vocal tract is modeled as an incompressible, axisymmetric form of the Navier-Stokes equations (INS), while the acoustic field is predicted by the linearized perturbed compressible equations (LPCE). The computed result shows that a two-mass model of vocal folds is sufficient to reproduce temporal variations in oral airflow and glottis motion produced by female speakers. It is also found that i) the glottal width has a significant effect on the amplitude of glottal flow, and thus on the amplitude of acoustic wave in the vocal tract, ii) the vocal fold tension is the main control parameter for the fundamental frequency of phonation, iii) the subglottal pressure plays an appreciable role on reproduction of the self-sustained oscillation of vocal folds, and iv) the strength of pulsating airflow and vortical structures are primarily affected by glottal width and subglottal pressure, and are closely related to pitch, loudness, and voice quality. Finally, more comprehensive explanation about the difference between one- and two-mass models is presented with discussion of effectiveness of vocal folds oscillation and voice quality.
PDF

Differences in Respiratory Function and Vocal Aerodynamics between Professional Sopranos and Female Subjects without Vocal Training (훈련된 여자 성악가와 일반인의 호흡능력에 대한 비교 연구)

최홍식;남도현;안철민;임성은;강성웅
- Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
- /
- v.12 no.2
- /
- pp.121-125
- /
- 2001
Singing requires exquisite coordination between the respiratory and phonatory system to efficiently control glottal airflow. Respiratory function and vocal aerodynamics were investigated in six female professional sopranos and in six female subjects without vocal training. All sopranos had more than 15 years of formal classic vocal training. Pulmonary function test data on simple pulmonary function, flow volume curve, static lung volumes, maximum inspiratory pressure(MIP), and maximum expiratory pressure(MEP) were obtained from all subjects. Vocal aerodynamic studies of maximum phonation time(MPT), phonation quotient, and mean glottal flow rates (MFR) were also measured in all subjects. Simple pulmonary function in professional sopranos was generally the same as that of other female subjects without vocal training. However, MIP and MEP showing respiratory muscle forces were significantly elevated in professional sopranos, compared to those of other female subjects without vocal training. Maximum phonation times and phonation quotient in sopranos are longer than those of other female subjects even though there were no differences in simple pulmonary function. High-pitched tones were made with significantly higher mean glottal flow rates(GFR) in normal subjects than low-pitched tones, whereas no changes in GFR were found in sopranos. The result indicated that sopranos demonstrated significant improvements in aerodynamic measures of GFR, maximum phonation time, suggesting an increase in glottal efficiency.
PDF

Quantitative Measurement of the Glottal Area Waveform(GAW) in Unilateral Vocal Fold Paralysis (편측성대마비환자에서의 성문면적파형(Glottal Area Waveform)의 정량적 측정)

최홍식;김명상;최재영;안성윤;이세영;홍정표
- Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
- /
- v.9 no.1
- /
- pp.71-78
- /
- 1998
Type Ⅰ thyuroplasty in conjunction with arytenoid adduction is one of the excellent techniques in the treatment of unilateral vocal fold paralysis. But perioperative objective evaluation of the patients is difficult. With the development of the videostroboscopy and image analysis program, we could quantify the Glottal Area Waveform(GAW) in patients with unilateral vocal fold paralysis and investigated the relationship between the glottal area and aerodynamic and acoustic parameters. Eight female patients who were performed type Ⅰ thyroplasty in conjunction with arytenoid adduction and 5 females with normal vocal function were involved in this study. Preoperative and postoperative videostroboscopy and vocal function study wire performed. GAW was analysed quantitatively with image analysis program (Kay Stroboscope Image analysis, KSIP) Peak Glottal Area(PGA), Baseline Offset(BO), and Closing Phase(CP) were increased in patients with unilateral vocal fold paralysis and they were reduced after the operation. Mean flow Rate (MFR) was well correlated with the PGA in normal control group and unilateral vocal fold paralysis patients. Noise to harmonic ratio(NHR) was correlated with PGA only in preoperative unilateral vocal fold paralysis patients. In conclusion quantitative measurement of the GAW is useful method in evaluation of unilateral vocal f31d paralysis patients.
PDF

The Role of the Electroglottography on the Laryngeal Articulation of Speech (전기 Glottography(EGG)를 이용한 후두구음역학적 특성)

홍기환;박병암;양윤수;서수영;김현기
- Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
- /
- v.8 no.1
- /
- pp.18-26
- /
- 1997
There are two types of phonetic study, acoustic and physiologic, for differentiating the three manner categories of Korean stop consonants. On the physiologic studies, there are endoscopic, electromyographic(EMG), electroglottographic(EGG) and aerodynamic studies. In this study, I tried to investigate general features of Korean stops using EGG study for the open quotient of vocal fold and baseline shift during speech, and aerodynamic characteristics for e subglottal air pressure, air flow and glottal resistance at consonants. On the aerodynamic study, the glottalized and aspirated stops may be characterized by e increasing subglottal pressure comparing with lenis stop at consonants. The airflow is largest in the aspirated stops followed by lenis stops and glottalized. The glottal airway resistance (GAR) showed highest in the glottalized followed by the lenis, but lowest in e aspirated during e production of consonants, and showed highest in e aspirated, but low in the glottalized and lenis during the production of vowel. The glottal resistance at consonant showed significant difference among consonants and significant interaction between subject and types of consonant. The glottal resistance at vowel showed significant difference among consonants, and e interaction occured between subject and types of consonant. The electroglottography(EGG) has been used for investigating e functioning of e vocal folds during its vibration. The EGG should be related to the patterns of the vocal fold vibration during phonation in characterizing the temporal patterns of each vibratory cycle. The purpose of this study is to investigate the dynamic change of EGG waveforms during continuous speech. The dynamic changes of EGG waveforms fir the three-way distinction of Korean stops were characterized that the aspirated stop appears to be characterized by largest open quotient and smallest glottal contact area of the vocal folds in e initial portion of vocal fold vibration ; the lenis stop by moderate open quotient and glottal contact area ; but the glottalized stop by smallest open quotient and largest glottal contact area. There may be close relationship between the OQ(open quotient) in the initial voice onset and the glottal width at the time of consonant production, the larger glottal width just before vocal fold vibration results in the smaller OQ of the vocal fold vibration in the initial voice onset. The EGG changes of baseline shift during continuous speech production were characterized by the different patterns for the three types of Korean consonants. The small and less stiffness change of baseline shift was found for the lenis and the glottalized, and the largest and stiffest change was found for the aspirated. On the baseline shift for the initial voice onset, they showed so similar patterns with for the consonant production, larger changed in the aspirated. for the lenis and the glottalized during the initial voice onset, three subjects showed individual difference each other. I suggest at s characteristics were strongly related with articulatory activity of vocal tract for the production of consonant, especially for the aspirated stop. The suspecting factors to affect EGG waveforms are glottal width, vertical laryngeal movement and the intrapharyngeal pressure to neighboring tissue during connected spech. So the EGG may be an useful method to describe laryngeal activity to classify pulsing conditions of the larynx during speech production, and EGG research can be controls for monitoring the vocal tract articulation, although above factors to affect EGG would have played such a potentially role on vocal fold vibratory behavior obtained using consonant production.
PDF

Performance Improvement of Speaker Recognition by MCE-based Score Combination of Multiple Feature Parameters (MCE기반의 다중 특징 파라미터 스코어의 결합을 통한 화자인식 성능 향상)

Kang, Ji Hoon;Kim, Bo Ram;Kim, Kyu Young;Lee, Sang Hoon
- Journal of the Korea Academia-Industrial cooperation Society
- /
- v.21 no.6
- /
- pp.679-686
- /
- 2020
In this thesis, an enhanced method for the feature extraction of vocal source signals and score combination using an MCE-Based weight estimation of the score of multiple feature vectors are proposed for the performance improvement of speaker recognition systems. The proposed feature vector is composed of perceptual linear predictive cepstral coefficients, skewness, and kurtosis extracted with lowpass filtered glottal flow signals to eliminate the flat spectrum region, which is a meaningless information section. The proposed feature was used to improve the conventional speaker recognition system utilizing the mel-frequency cepstral coefficients and the perceptual linear predictive cepstral coefficients extracted with the speech signals and Gaussian mixture models. In addition, to increase the reliability of the estimated scores, instead of estimating the weight using the probability distribution of the convectional score, the scores evaluated by the conventional vocal tract, and the proposed feature are fused by the MCE-Based score combination method to find the optimal speaker. The experimental results showed that the proposed feature vectors contained valid information to recognize the speaker. In addition, when speaker recognition is performed by combining the MCE-based multiple feature parameter scores, the recognition system outperformed the conventional one, particularly in low Gaussian mixture cases.
https://doi.org/10.5762/KAIS.2020.21.6.679 인용 PDF KSCI

The Comparison of the Acoustic and Aerodynamic Characteristics of $PROVOX^{(R)}$ Voice and Esophageal Voice Produced by the Same Laryngectomee (동일 후적자가 산출하는 기관식도 발성($PROVOX^{(R)}$ 발성)과 식도 발성에 대한 음향학적 및 공기역학적 특성 비교)

Pyo, H.Y.;Choi, H.S.;Lim, S.E.;Choi, S.H.
- Speech Sciences
- /
- v.5 no.1
- /
- pp.121-139
- /
- 1999
Our experimental subject was a laryngectomee who had undergone total laryngectomy with $PROVOX^{(R)}$ insertion, and learned esophageal speech after the surgery, so he could produce both $PROVOX^{(R)}$ voice and esophageal voice. With this subject's production of $PROVOX^{(R)}$ and esophageal voice, we are to compare the acoustic and aerodynamic characteristics of the two voices, under the same physical conditions of the same person. As a result, the fundamental frequency of esophageal voice was 137.2 Hz, and that of $PROVOX^{(R)}$ was 97.5 Hz. $PROVOX^{(R)}$ voice showed lower jitter, shimmer and NHR than esophageal voice, which means that $PROVOX^{(R)}$ voice showed better voice quality than esophageal voice. In spectrographic analysis, the formation of formants and pseudoformants were more distinct in esophageal voice and several temporal aspects of acoutic features such as VOT and closure duration were more similar with normal voice in $PROVOX^{(R)}$ voice. During the sentence utterance, esophageal voice showed longer pause or silence duration than $PROVOX^{(R)}$ voice. Maximum phonation time and mean flow rate of $PROVOX^{(R)}$ voice were much longer and larger than esophageal voice, but mean and range of sound pressure level, subglottic pressure and voice efficiency were similar in the two voices. Glottal resistance of esophageal voice was much larger than $PROVOX^{(R)}$ voice which showed still larger glottal resistance than normal voice.
PDF

An Aerodynamic Study of Velopharyngeal Closure Function in Cleft Palate Patients (구개열 환자의 비인강폐쇄 기능에 대한 공기역학적 연구)

Ahn, Tae-Sub;Yang, Sang-Ill;Shin, Hyo-Keun
- Speech Sciences
- /
- v.1
- /
- pp.237-259
- /
- 1997
Cleft Palate speech appears to have hyper/hyponasality with velopharyngeal insufficiency and articulation disorders. Previous studies on Cleft Palate speech have shown that speech tends to have lower airflow and air pressure. To examine the aerodynamic characteristics of Cleft Palate speech, Aerophone II Voice function Analyzer was used. We measured sound pressure level, airflow, air pressure and glottal power. Three Cleft Palate adults and five normal adults participated in this experiment. The test words are composed of: (1) the sustained vowel /o/ (2) /CiCi/, where C is one of three different stop consonants in Korean (3) /bimi/. Subjects were asked to produce /bimi/ five times without opening their lips. All the data was statistically tested by t-test for Cleft Palate patients before operation groups and control groups and paired t-test for Cleft Palate patients before and after operation groups. The results were as follow: (1) Cleft Palate patients generally speak with incomplete oral closure and lower oral air pressure. As a result, the SPL of Cleft Palate before operation is 3 dB lower than control groups. (2) Airflow of Cleft Palate in phonation and articulation is lower than that of control groups. However, it increased after operation. Lung volume and mean airflow in phonation are significantly increased (p<0.05). (3) Although velopharyngeal function (velar opening rate) of Cleft Palate is poor in comparison with control groups, it was recovered after operation. In this event maximum flow rate and mean airflow rate are significantly increased (p<0.05). (4) Air pressure of Cleft Palate in speech is lower than that of control groups. In general, the air pressure of Cleft Palate increased after operation. In this event air pressure of glottalized consonant is significantly increased (p<0.04). (5) Glottal Power(mean power, mean efficient and mean resistant) of Cleft Palate patients is lower than that of control groups. But mean efficient and mean resistant of Cleft Palate patients increased significantly (p<0.05) after operation.
PDF

Quantitative Analysis of Voice Quality after Radiation Therapy for Stage T1a Glottic Carcinoma (T1a 병기 성문암의 방사선 치료 후 음성에 관한 연구)

Lee Joon-Kyoo;Chung Woong-Gi
- Radiation Oncology Journal
- /
- v.23 no.1
- /
- pp.17-21
- /
- 2005
Purpose : To evaluate the voices of irradiated patients with early glottic carcinoma and to compare these with the voices of healthy volunteers. Materials and Methods : The voice samples (sustained vowel) of seventeen male patients who had been irradiated for T1a glottic squamous carcinoma at least 1 year prior to the study were analyzed with objective voice analyzer (acoustic voice analysis, aerodynamic test, and videostroboscopic analysis) and compared with those of a normal group of twenty age- and sex-matched volunteers. Average fundamental frequency, jitter, shimmer, and noise-to-harmonic ratio were obtained for acoustic voice analysis. Maximal phonation time, mean flow rate, intensity, subglottic pressure, glottal resistance, glottal efficiency, and glottal power were obtained for aerodynamic test. Results : The irradiated group presented higher values of shimmer in acoustic voice analysis. There was no significant difference between two groups in other parameters. Conclusion : In this study all the objective voice parameters except shimmer were no4 significantly different between the irradiated group and the control group. These results suggest that the voice quality is minimally affected by radiation therapy for 71 a glottic carcinoma.
PDF KSCI

Text-Independent Speaker Recognition Using Glottal Flow Waveform (성문파형을 이용한 문장독립 화자 인식기)

Yang Ki-Hyuk;Jeon Bumki;Baek SeongJoon;Kang Sang-Ki;Sung Koeng-Mo
- Proceedings of the Acoustical Society of Korea Conference
- /
- autumn
- /
- pp.57-60
- /
- 1999
본 논문에서는 성문파에서 화자특성 계수를 추출하여 화자 인식기에 적용하고자 한다. 공분산 방법으로 음성의 잔류신호를 추정하고 이를 적분하여 성문파를 얻어낸다. 하나의 성문파 구간을 성문닫힘순간 사이가 아닌 잔류신호의 오차가 최대가 되는 순간 사이로 잡았다. 구해진 성문파를 M개의 데이터로 다시 샘플링하여 특성 벡터로 삼고 VQ기반 인식기를 사용하여 인식률을 측정하였다. 4초의 test data와 30차의 특성벡터를 사용한 경우 남성의 경우 평균 $96.08\%$, 여성에 대하여 $93.61\%$의 평균 인식률을 얻었다.
PDF

Search Result 14, Processing Time 0.027 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)