통합 검색 | Korea Science

Glottal flow 신호에서의 향상된 특징추출 및 다중 특징파라미터 결합을 통한 화자인식 성능 향상 (Performance Improvement of Speaker Recognition Using Enhanced Feature Extraction in Glottal Flow Signals and Multiple Feature Parameter Combination)

강지훈;김영일;정상배
- 한국정보통신학회논문지
- /
- 제19권12호
- /
- pp.2792-2799
- /
- 2015
본 논문에서는 화자 인식의 성능을 개선하기 위해서 glottal flow로부터 source mel-frequency cepstral coefficient (SMFCC), 왜도, 첨도를 추출하여 활용하였다. 일반적으로 glottal flow의 고주파 대역은 응답의 크기가 평탄하므로 미리 정한 차단주파수 미만에 대해서만 SMFCC를 추출한다. 추출된 SMFCC, 왜도, 첨도는 종래의 특징 파라미터와 결합된 후 종래의 화자인식 시스템과 동등한 조건에서의 성능 비교를 위하여 principal component analysis (PCA) 및 linear discriminiat analysis (LDA)를 통한 차원축소가 행해진다. 대용량의 화자인식 실험결과를 통해서 제안된 인식 시스템이 종래의 화자인식 시스템 보다 더 좋은 성능을 나타냄을 확인할 수 있었으며, 특히 가우시안 혼합이 낮을 때 더 높은 성능향상을 나타내었다.
https://doi.org/10.6109/jkiice.2015.19.12.2792 인용 PDF KSCI

동적 성대 모델을 이용한 후두 내 유동 및 음향장에 대한 수치 연구 (Computation of Laryngeal Flow and Sound through a Dynamic Model of the Vocal Folds)

배영민;문영준
- 한국전산유체공학회:학술대회논문집
- /
- 한국전산유체공학회 2008년도 춘계학술대회논문집
- /
- pp.21-24
- /
- 2008
The present study numerically investigates the glottal airflow characteristics as well as acoustic features of phonation fully coupled with dynamic behavior of vocal folds. The vocal folds are described by a low-dimensional body-covered model characterized by bio-mechanical parameters such as glottal width, vocal folds stiffness, and subglottal pressure. The flow in the vocal tract is modeled as an incompressible, axisymmetric form of the Navier-Stokes equations (INS), while the acoustic field is predicted by the linearized perturbed compressible equations (LPCE). The computed result shows that a two-mass model of vocal folds is sufficient to reproduce temporal variations in oral airflow and glottis motion produced by female speakers. It is also found that i) the glottal width has a significant effect on the amplitude of glottal flow, and thus on the amplitude of acoustic wave in the vocal tract, ii) the vocal fold tension is the main control parameter for the fundamental frequency of phonation, iii) the subglottal pressure plays an appreciable role on reproduction of the self-sustained oscillation of vocal folds, and iv) the strength of pulsating airflow and vortical structures are primarily affected by glottal width and subglottal pressure, and are closely related to pitch, loudness, and voice quality. Finally, more comprehensive explanation about the difference between one- and two-mass models is presented with discussion of effectiveness of vocal folds oscillation and voice quality.
PDF

훈련된 여자 성악가와 일반인의 호흡능력에 대한 비교 연구 (Differences in Respiratory Function and Vocal Aerodynamics between Professional Sopranos and Female Subjects without Vocal Training)

최홍식;남도현;안철민;임성은;강성웅
- 대한후두음성언어의학회지
- /
- 제12권2호
- /
- pp.121-125
- /
- 2001
Singing requires exquisite coordination between the respiratory and phonatory system to efficiently control glottal airflow. Respiratory function and vocal aerodynamics were investigated in six female professional sopranos and in six female subjects without vocal training. All sopranos had more than 15 years of formal classic vocal training. Pulmonary function test data on simple pulmonary function, flow volume curve, static lung volumes, maximum inspiratory pressure(MIP), and maximum expiratory pressure(MEP) were obtained from all subjects. Vocal aerodynamic studies of maximum phonation time(MPT), phonation quotient, and mean glottal flow rates (MFR) were also measured in all subjects. Simple pulmonary function in professional sopranos was generally the same as that of other female subjects without vocal training. However, MIP and MEP showing respiratory muscle forces were significantly elevated in professional sopranos, compared to those of other female subjects without vocal training. Maximum phonation times and phonation quotient in sopranos are longer than those of other female subjects even though there were no differences in simple pulmonary function. High-pitched tones were made with significantly higher mean glottal flow rates(GFR) in normal subjects than low-pitched tones, whereas no changes in GFR were found in sopranos. The result indicated that sopranos demonstrated significant improvements in aerodynamic measures of GFR, maximum phonation time, suggesting an increase in glottal efficiency.
PDF

편측성대마비환자에서의 성문면적파형(Glottal Area Waveform)의 정량적 측정 (Quantitative Measurement of the Glottal Area Waveform(GAW) in Unilateral Vocal Fold Paralysis)

최홍식;김명상;최재영;안성윤;이세영;홍정표
- 대한후두음성언어의학회지
- /
- 제9권1호
- /
- pp.71-78
- /
- 1998
Type Ⅰ thyuroplasty in conjunction with arytenoid adduction is one of the excellent techniques in the treatment of unilateral vocal fold paralysis. But perioperative objective evaluation of the patients is difficult. With the development of the videostroboscopy and image analysis program, we could quantify the Glottal Area Waveform(GAW) in patients with unilateral vocal fold paralysis and investigated the relationship between the glottal area and aerodynamic and acoustic parameters. Eight female patients who were performed type Ⅰ thyroplasty in conjunction with arytenoid adduction and 5 females with normal vocal function were involved in this study. Preoperative and postoperative videostroboscopy and vocal function study wire performed. GAW was analysed quantitatively with image analysis program (Kay Stroboscope Image analysis, KSIP) Peak Glottal Area(PGA), Baseline Offset(BO), and Closing Phase(CP) were increased in patients with unilateral vocal fold paralysis and they were reduced after the operation. Mean flow Rate (MFR) was well correlated with the PGA in normal control group and unilateral vocal fold paralysis patients. Noise to harmonic ratio(NHR) was correlated with PGA only in preoperative unilateral vocal fold paralysis patients. In conclusion quantitative measurement of the GAW is useful method in evaluation of unilateral vocal f31d paralysis patients.
PDF

전기 Glottography(EGG)를 이용한 후두구음역학적 특성 (The Role of the Electroglottography on the Laryngeal Articulation of Speech)

홍기환;박병암;양윤수;서수영;김현기
- 대한후두음성언어의학회지
- /
- 제8권1호
- /
- pp.18-26
- /
- 1997
There are two types of phonetic study, acoustic and physiologic, for differentiating the three manner categories of Korean stop consonants. On the physiologic studies, there are endoscopic, electromyographic(EMG), electroglottographic(EGG) and aerodynamic studies. In this study, I tried to investigate general features of Korean stops using EGG study for the open quotient of vocal fold and baseline shift during speech, and aerodynamic characteristics for e subglottal air pressure, air flow and glottal resistance at consonants. On the aerodynamic study, the glottalized and aspirated stops may be characterized by e increasing subglottal pressure comparing with lenis stop at consonants. The airflow is largest in the aspirated stops followed by lenis stops and glottalized. The glottal airway resistance (GAR) showed highest in the glottalized followed by the lenis, but lowest in e aspirated during e production of consonants, and showed highest in e aspirated, but low in the glottalized and lenis during the production of vowel. The glottal resistance at consonant showed significant difference among consonants and significant interaction between subject and types of consonant. The glottal resistance at vowel showed significant difference among consonants, and e interaction occured between subject and types of consonant. The electroglottography(EGG) has been used for investigating e functioning of e vocal folds during its vibration. The EGG should be related to the patterns of the vocal fold vibration during phonation in characterizing the temporal patterns of each vibratory cycle. The purpose of this study is to investigate the dynamic change of EGG waveforms during continuous speech. The dynamic changes of EGG waveforms fir the three-way distinction of Korean stops were characterized that the aspirated stop appears to be characterized by largest open quotient and smallest glottal contact area of the vocal folds in e initial portion of vocal fold vibration ; the lenis stop by moderate open quotient and glottal contact area ; but the glottalized stop by smallest open quotient and largest glottal contact area. There may be close relationship between the OQ(open quotient) in the initial voice onset and the glottal width at the time of consonant production, the larger glottal width just before vocal fold vibration results in the smaller OQ of the vocal fold vibration in the initial voice onset. The EGG changes of baseline shift during continuous speech production were characterized by the different patterns for the three types of Korean consonants. The small and less stiffness change of baseline shift was found for the lenis and the glottalized, and the largest and stiffest change was found for the aspirated. On the baseline shift for the initial voice onset, they showed so similar patterns with for the consonant production, larger changed in the aspirated. for the lenis and the glottalized during the initial voice onset, three subjects showed individual difference each other. I suggest at s characteristics were strongly related with articulatory activity of vocal tract for the production of consonant, especially for the aspirated stop. The suspecting factors to affect EGG waveforms are glottal width, vertical laryngeal movement and the intrapharyngeal pressure to neighboring tissue during connected spech. So the EGG may be an useful method to describe laryngeal activity to classify pulsing conditions of the larynx during speech production, and EGG research can be controls for monitoring the vocal tract articulation, although above factors to affect EGG would have played such a potentially role on vocal fold vibratory behavior obtained using consonant production.
PDF

MCE기반의 다중 특징 파라미터 스코어의 결합을 통한 화자인식 성능 향상 (Performance Improvement of Speaker Recognition by MCE-based Score Combination of Multiple Feature Parameters)

강지훈;김보람;김규영;이상훈
- 한국산학기술학회논문지
- /
- 제21권6호
- /
- pp.679-686
- /
- 2020
본 논문에서는 화자인식 성능 향상을 위해 음원에서 개선된 특징추출 방식과 최소 분류 오차 기반의 다중 특징 벡터 스코어에 대한 가중치 추정을 사용하여 스코어 결합을 제안하였다. 제안한 특징 벡터는 Glottal Flow에서 무의미한 정보구간인 평탄한 스펙트럼 구간을 제거하기 위하여 저역통과 필터를 수행한 신호에서 인지적 선형 예측 캡스트럼 계수, 왜도, 첨도를 추출하여 구성하였다. 제안한 특징 벡터는 종래의 음원에서 멜-주파수 캡스트럼 계수, 인지적 선형 예측 캡스트럼 계수를 추출하여 가우시안 혼합 모델로 모델링한 화자인식 시스템을 개선하기 위해 사용된다. 또한, 스코어 추정과정의 신뢰성을 높이기 위하여 기존의 스코어의 확률 분포를 사용하여 가중치를 추정하는 대신 제안한 특징 벡터에서 평가된 점수와 종래의 특징 벡터에서 평가된 점수에 대하여 최소 분류 오차 기법으로 가중치를 추정하여 스코어를 결합함으로써 최적의 화자를 찾는다. 실험 결과 제안한 특징 벡터가 화자를 인식하는데 유효한 정보를 포함하고 있는 것을 확인하였다. 또한, 최소 분류 오차 기반의 다중 특징 파라미터 스코어를 결합하여 화자인식을 수행하였을 때, 종래의 화자인식 성능보다 더 우수한 성능을 나타내는 것을 확인할 수 있으며, 특히 가우시안 혼합 모델이 낮을 때 더 높은 성능향상을 보였다.
https://doi.org/10.5762/KAIS.2020.21.6.679 인용 PDF KSCI

동일 후적자가 산출하는 기관식도 발성($PROVOX^{(R)}$ 발성)과 식도 발성에 대한 음향학적 및 공기역학적 특성 비교 (The Comparison of the Acoustic and Aerodynamic Characteristics of $PROVOX^{(R)}$ Voice and Esophageal Voice Produced by the Same Laryngectomee)

표화영;최홍식;임성은;최성희
- 음성과학
- /
- 제5권1호
- /
- pp.121-139
- /
- 1999
Our experimental subject was a laryngectomee who had undergone total laryngectomy with $PROVOX^{(R)}$ insertion, and learned esophageal speech after the surgery, so he could produce both $PROVOX^{(R)}$ voice and esophageal voice. With this subject's production of $PROVOX^{(R)}$ and esophageal voice, we are to compare the acoustic and aerodynamic characteristics of the two voices, under the same physical conditions of the same person. As a result, the fundamental frequency of esophageal voice was 137.2 Hz, and that of $PROVOX^{(R)}$ was 97.5 Hz. $PROVOX^{(R)}$ voice showed lower jitter, shimmer and NHR than esophageal voice, which means that $PROVOX^{(R)}$ voice showed better voice quality than esophageal voice. In spectrographic analysis, the formation of formants and pseudoformants were more distinct in esophageal voice and several temporal aspects of acoutic features such as VOT and closure duration were more similar with normal voice in $PROVOX^{(R)}$ voice. During the sentence utterance, esophageal voice showed longer pause or silence duration than $PROVOX^{(R)}$ voice. Maximum phonation time and mean flow rate of $PROVOX^{(R)}$ voice were much longer and larger than esophageal voice, but mean and range of sound pressure level, subglottic pressure and voice efficiency were similar in the two voices. Glottal resistance of esophageal voice was much larger than $PROVOX^{(R)}$ voice which showed still larger glottal resistance than normal voice.
PDF

구개열 환자의 비인강폐쇄 기능에 대한 공기역학적 연구 (An Aerodynamic Study of Velopharyngeal Closure Function in Cleft Palate Patients)

안태섭;양상일;신효근
- 음성과학
- /
- 제1권
- /
- pp.237-259
- /
- 1997
Cleft Palate speech appears to have hyper/hyponasality with velopharyngeal insufficiency and articulation disorders. Previous studies on Cleft Palate speech have shown that speech tends to have lower airflow and air pressure. To examine the aerodynamic characteristics of Cleft Palate speech, Aerophone II Voice function Analyzer was used. We measured sound pressure level, airflow, air pressure and glottal power. Three Cleft Palate adults and five normal adults participated in this experiment. The test words are composed of: (1) the sustained vowel /o/ (2) /CiCi/, where C is one of three different stop consonants in Korean (3) /bimi/. Subjects were asked to produce /bimi/ five times without opening their lips. All the data was statistically tested by t-test for Cleft Palate patients before operation groups and control groups and paired t-test for Cleft Palate patients before and after operation groups. The results were as follow: (1) Cleft Palate patients generally speak with incomplete oral closure and lower oral air pressure. As a result, the SPL of Cleft Palate before operation is 3 dB lower than control groups. (2) Airflow of Cleft Palate in phonation and articulation is lower than that of control groups. However, it increased after operation. Lung volume and mean airflow in phonation are significantly increased (p<0.05). (3) Although velopharyngeal function (velar opening rate) of Cleft Palate is poor in comparison with control groups, it was recovered after operation. In this event maximum flow rate and mean airflow rate are significantly increased (p<0.05). (4) Air pressure of Cleft Palate in speech is lower than that of control groups. In general, the air pressure of Cleft Palate increased after operation. In this event air pressure of glottalized consonant is significantly increased (p<0.04). (5) Glottal Power(mean power, mean efficient and mean resistant) of Cleft Palate patients is lower than that of control groups. But mean efficient and mean resistant of Cleft Palate patients increased significantly (p<0.05) after operation.
PDF

T1a 병기 성문암의 방사선 치료 후 음성에 관한 연구 (Quantitative Analysis of Voice Quality after Radiation Therapy for Stage T1a Glottic Carcinoma)

이준규;정웅기
- Radiation Oncology Journal
- /
- 제23권1호
- /
- pp.17-21
- /
- 2005
목적 : 후두암에서 방사선 치료는 음성을 보존할 수 있기 때문에 조기 성문암의 일차적인 치료법으로 사용된다. 이에 T1a 병기 성문암에서 방사선 치료가 환자의 음성에 미치는 영향을 알아보고자 하였다. 대상 및 방법 : 조기 성문암(T1a)으로 진단 받고 방사선 치료를 받은 후 최소 1년이 지난 17명의 남자 환자들을 대상으로 객관적인 음성검사들(음향분석, 공기역학검사, 후두 스트로보스코피)을 이용하여 음성을 평가하였고, 이것을 성별과 연령을 맞춘 정상 대조군과 비교하였다. 음향분석으로는 평균 기본주파수(Fo), jitter, shimmer, 잡음 대 조화음 비율(Noise to Harmonics Ratio)을 측정하였다. 공기역학적 검사로는 최대발성지속시간, 평균호기류율, 음강도, 성문하압, 성문저항, 성문효율, 성문력을 측정하였다. 결과 : 방사선 치료를 받은 환자에서 음향분석의 shimmer만이 통계학적으로 의의 있게 높았다. 그 외 다른 검사나 공기역학검사에서는 두 군 간에 통계학적인 유의성이 없었다. 결론 : 본 연구에서는 단지 shimmer만이 방사선 치료 환자군에서 높았기 때문에 T1a 병기 성문암에서의 방사선치료는 음성의 질에 큰 영향을 미치지 않은 것으로 사료된다.
PDF KSCI

성문파형을 이용한 문장독립 화자 인식기 (Text-Independent Speaker Recognition Using Glottal Flow Waveform)

양기혁;전범기;백성준;강상기;성굉모
- 한국음향학회:학술대회논문집
- /
- 한국음향학회 1999년도 학술발표대회 논문집 제18권 2호
- /
- pp.57-60
- /
- 1999
본 논문에서는 성문파에서 화자특성 계수를 추출하여 화자 인식기에 적용하고자 한다. 공분산 방법으로 음성의 잔류신호를 추정하고 이를 적분하여 성문파를 얻어낸다. 하나의 성문파 구간을 성문닫힘순간 사이가 아닌 잔류신호의 오차가 최대가 되는 순간 사이로 잡았다. 구해진 성문파를 M개의 데이터로 다시 샘플링하여 특성 벡터로 삼고 VQ기반 인식기를 사용하여 인식률을 측정하였다. 4초의 test data와 30차의 특성벡터를 사용한 경우 남성의 경우 평균 $96.08\%$, 여성에 대하여 $93.61\%$의 평균 인식률을 얻었다.
PDF

검색결과 14건 처리시간 0.029초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)