Search | Korea Science

Voice conversion using low dimensional vector mapping (낮은 차원의 벡터 변환을 통한 음성 변환)

Lee, Kee-Seung;Doh, Won;Youn, Dae-Hee
- Journal of the Korean Institute of Telematics and Electronics S
- /
- v.35S no.4
- /
- pp.118-127
- /
- 1998
In this paper, we propose a voice personality transformation method which makes one person's voice sound like another person's voice. In order to transform the voice personality, vocal tract transfer function is used as a transformation parameter. Comparing with previous methods, the proposed method can obtain high-quality transformed speech with low computational complexity. Conversion between the vocal tract transfer functions is implemented by a linear mapping based on soft clustering. In this process, mean LPC cepstrum coefficients and mean removed LPC cepstrum modeled by the low dimensional vector are used as transformation parameters. To evaluate the performance of the proposed method, mapping rules are generated from 61 Korean words uttered by two male and one female speakers. These rules are then applied to 9 sentences uttered by the same persons, and objective evaluation and subjective listening tests for the transformed speech are performed.
PDF

The Correlation between GRBAS Scales and MDVP Parameters on the Pathologic Voices of the Patients with Vocal Polyps (성대 폴립 환자를 대상으로 한 GRBAS 척도와 MDVP 측정치 간의 상관관계 연구)

표화영;최성희;임성은;심현섭;최홍식;김광문
- Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
- /
- v.10 no.2
- /
- pp.154-163
- /
- 1999
GRBAS scale, the tool fir the perceptual evaluation of voice, demands the experience of judges, and MDVP parameters of CSL, the tool for the objective measurements of voice quality demands the exact interpretation of the analyzed results. The two tools should be used as compensatory evaluation methods, so the experimental study was performed to investigate the correlation between GRBAS scales and MDVP parameters by using the pathologic voice of the 30 patients with vocal polyps, and to know the significant MDVP parameters which the inexperienced GRBAS scale judges should attend to. The 30 subjects voices, saved in MDVP of CSL were analyzed by its own analysis program, and three experienced voice therapists judged the same voices by using GRBAS scales. The correlations between them were analyzed by Spearman Rank Correlation Coefficient. As results, among the 29 MDVP parameters, 22 parameters showed statistically significant correlation with Grade(G) scale(p＜0.05). And it was found that Roughness(R) scale showed significant correlation with 18 parameters, Breathiness(B) scale with 17 parameters, Strain(S) scale with 12 parameters. In Asthenicity(A) scale, no parameter showed significant correlation. On the whole, significantly high correlation were found in the parameters related with pitch ind amplitude perturbation, especially, the amplitude perturbation.
PDF

Aerodynamic Features and Voice Therapy Interventions of Functional Voice Disorder after Thyroidectomy (갑상선 절제 술 후 기능적 음성장애의 공기역학적 특징과 음성치료 중재)

Lee, Chang-Yoon;An, Soo-Youn;Chang, Hyun;Jeong, Hee Seok;Son, Hee Young
- Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
- /
- v.26 no.1
- /
- pp.25-33
- /
- 2015
Background and Objectives:The objective of this study was to investigate the features of post-thyroidectomy subjective voice disorder by Voice Handicap Index (VHI) and Voice Symptom Scale (VOISS) through aerodynamic analysis and to investigate the appropriate voice therapy intervention. Materials and Methods:Twenty post-thyroidectomy patients who had no recurrent laryngeal nerve paralysis through laryngeal stroboscopy were enrolled for this study. Acoustic and aerodynamic evaluations were performed before operation, 2 weeks and 3 months after operation. Subjective voice evaluation was performed by VHI and VOISS. Aerodynamic evaluation was compared and analysed by maximum phonation time(MPT), phonation threshold pressure(PTP), mean air flow rate(MFR), etc. Subjective voice evaluation was surveyed through VHI and VOISS. To evaluate patients' symptoms related to functional voice disorder, scores on physical domain in VHI and VOISS were selected to be compared for each session. Results: The 10 out of 20 participants who complained of voice symptoms had no significant difference with pre-operation in acoustic evaluation, but all showed higher scores on 2 weeks and 3 months after operation compared to pre-operation, in VHI-physical domain and selected questionnaires in VOISS. They reduced MPT and increased PTP value simultaneously. Laryngeal massage and breathing training were simultaneously treated to them, 5 participants resulting in improvement in MPT and PTP compared to pre-treatment. Conclusion:Patients who complained voice change with no organic damage after thyroidectomy were all shown to have reduced MPT and increased PTP in some by aerodynamic evaluations. Reduced MPT may imply some problem in air flow beneath glottis. Increased PTP suggests much more effort in vocalization mechanism than pre-operation. Comparing aerodynamic evaluations in post-thyroidectomy may provide information on behavioral interventions. Additionally, study on laryngeal massage and breathing training simultaneously treated to patients with such voice disorder is needed to be conducted with larger number of participants.
PDF

Voice Personality Transformation Using an Optimum Classification and Transformation (최적 분류 변환을 이용한 음성 개성 변환)

이기승
- The Journal of the Acoustical Society of Korea
- /
- v.23 no.5
- /
- pp.400-409
- /
- 2004
In this paper. a voice personality transformation method is proposed. which makes one person's voice sound like another person's voice. To transform the voice personality. vocal tract transfer function is used as a transformation parameter. Comparing with previous methods. the proposed method makes transformed speech closer to target speaker's voice in both subjective and objective points of view. Conversion between vocal tract transfer functions is implemented by classification of entire vector space followed by linear transformation for each cluster. LPC cepstrum is used as a feature parameter. A joint classification and transformation method is proposed, where optimum clusters and transformation matrices are simultaneously estimated in the sense of a minimum mean square error criterion. To evaluate the performance of the proposed method. transformation rules are generated from 150 sentences uttered by three male and on female speakers. These rules are then applied to another 150 sentences uttered by the same speakers. and objective evaluation and subjective listening tests are performed.
PDF KSCI

Significance of Acoustic Parameter - RAP, PPQ, APQ- in Hoarseness (애성환자에서 음향지표인 RAP, PPQ 및 APQ의 유용성)

안철민;이종혁;강현국;이용배
- Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
- /
- v.6 no.1
- /
- pp.22-26
- /
- 1995
Change of voice, espicially hoarseness show irregular vibration of vocal cord. So, computerized acoustic analysis has presented many acoustic parameters for objective evaluation of voice. We objectively investigated the vocal vibration of normal persons and hoarseness patients in Korea. The RAP(relative average perturbation), PPQ(pitch period perturbation quotient) and APQ(amplitude perturbation quotient) of normal persons were compared with that of hoarseness patients with multidimensional voice program for the possibility of distinguishing the pathologic vocal vibration from normal. Authors agree that RAP, PPQ and APQ showed interesting differences between the normal and the hoarseness patients by the multivariate statistical analysis. In conculusion, relative average perturbation, pitch period perturbation and amplitude perturbation quotient might be meangingful screening parameters distinguishing hoarseness patients from normal.
PDF

Characteristics of voice quality on clear versus casual speech in individuals with Parkinson's disease (명료발화와 보통발화에서 파킨슨병환자 음성의 켑스트럼 및 스펙트럼 분석)

Shin, Hee-Baek;Shim, Hee-Jeong;Jung, Hun;Ko, Do-Heung
- Phonetics and Speech Sciences
- /
- v.10 no.2
- /
- pp.77-84
- /
- 2018
The purpose of this study is to examine the acoustic characteristics of Parkinsonian speech, with respect to different utterance conditions, by employing acoustic/auditory-perceptual analysis. The subjects of the study were 15 patients (M=7, F=8) with Parkinson's disease who were asked to read out sentences under different utterance conditions (clear/casual). The sentences read out by each subject were recorded, and the recorded speech was subjected to cepstrum and spectrum analysis using Analysis of Dysphonia in Speech and Voice (ADSV). Additionally, auditory-perceptual evaluation of the recorded speech was conducted with respect to breathiness and loudness. Results indicate that in the case of clear speech, there was a statistically significant increase in the cepstral peak prominence (CPP), and a decrease in the L/H ratio SD (ratio of low to high frequency spectral energy SD) and CPP F0 SD values. In the auditory-perceptual evaluation, a decrease in breathiness and an increase in loudness were noted. Furthermore, CPP was found to be highly correlated to breathiness and loudness. This provides objective evidence of the immediate usefulness of clear speech intervention in improving the voice quality of Parkinsonian speech.
https://doi.org/10.13064/KSSS.2018.10.2.077 인용 PDF KSCI

Speech Intelligibility of Alaryngeal Voices and Pre/Post Operative Evaluation of Voice Quality using the Speech Recognition Program(HUVOIS) (음성인식프로그램을 이용한 무후두 음성의 말 명료도와 병적 음성의 수술 전후 개선도 측정)

Kim, Han-Su;Choi, Seong-Hee;Kim, Jae-In;Lee, Jae-Yol;Choi, Hong-Shik
- Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
- /
- v.15 no.2
- /
- pp.92-97
- /
- 2004
Background and Objectives : The purpose of this study was to examine objectively pre and post operative voice quality evaluation and intelligibility of alaryngeal voice using speech recognition program, HUVOIS. Materials and Methods : 2 laryngologists and 1 speech pathologist were evaluated 'G', 'R', 'B' in the GRBAS sclae and speech intelligibility using NTID rating scale from standard paragraph. And also acoustic estimates such as jitter, shimmer, HNR were obtained from Lx Speech Studio. Results : Speech recognition rate was not significantly different between pre and post operation for pathological vocie samples though voice quality(G, B) and acoustic values(Jitter, HNR) were significantly improved after post operation. In Alaryngeal voices, reed type electrolarynx 'Moksori' was the highest both speech intelligibility and speech recognition rate, whereas esophageal speech was the lowest. Coefficient correlation of speech intelligibility and speech recognition rate was found in alaryngeal voices, but not in pathological voices. Conclusion : Current study was not proved speech recognition program, HUVOIS during telephone program was not objective and efficient method for assisting subjective GRBAS scale.
PDF

A Study of the Lesional Grade Discrimination Model for Vocal Fold Nodules and Polyps (성대 결절 및 폴립 병변 판별 예측모형에 대한 연구)

Park, Soo-Jung;Shim, Hyun-Sup;Chung, Sung-Min;Kim, Han-Soo;Park, Ae-Kyung
- Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
- /
- v.15 no.2
- /
- pp.112-117
- /
- 2004
Background and Objectives : This study is purposed to investigate the statistically significant discrimination model for predicting vocal fold nodule and polyp's lesional grade, with patients' background data and objective voice evaluation parameters. Materials and Method : The retrospective research was carried out at the Ewha Womans University Hospital. 122 patients' voice examination data had been selected, and lesion screening (Grade I, II, and III) was conducted by 2 ENT specialists, with each patient's vocal fold pictures achieved during the laryngoscopy examination. Results : The Lesional Grade Discrimination Model with which the lesional grade of vocal fold nodules and polyps could be predicted was derived by the ordinal logistic regression analysis (using SPSS 10.0). With this model the lesional grades of 73 out of 122 patients(59.8%) were correctly predicted to their formerly screened ones. Conclusion : This model applied the multivariate approach, which statistically combined these currently used parameters, Jitter, Shimmer, MFR, MPT, and patient's background data such as gender and dysphonia period. It might explain the status of benign lesion of vocal folds, and furthermore expect the physiological function of vocal folds.
PDF

Current Standpoints on the Pathophysiology of Benign Vocal Fold Lesions (양성 성대 질환의 병태생리에 관한 최신지견)

Kwon, Tack-Kyun;Kim, Min-Su
- Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
- /
- v.26 no.2
- /
- pp.91-93
- /
- 2015
Substantial confusion exists regarding the nomenclature of benign vocal fold lesions (BVFL), because there were no pathologically diagnostic findings and deep understanding of pathogenesis in the past. There is no consensus on specific labels for BVFL, nor are the most commonly used terms defined. There is a need for a defined nomenclature for the purpose of improving communication among clinicians and with patients. Furthermore, precise definitions of BVFL will facilitate clinical research of voice disorders and may lead to a better understanding of outcomes for BVFL treatment. Laryngoscope, stroboscope, voice evaluation are used to diagnose BVFL. The objective of this review article was to develop a new paradigm of BVFL nomenclature using patient's history, stroboscope, voice therapy results, and operative findings. Furthermore, precise definitions of BVFL will facilitate clinical research of voice disorders and may lead to a better understanding of outcomes for BVFL treatment.
PDF

Complex nested U-Net-based speech enhancement model using a dual-branch decoder (이중 분기 디코더를 사용하는 복소 중첩 U-Net 기반 음성 향상 모델)

Seorim Hwang;Sung Wook Park;Youngcheol Park
- The Journal of the Acoustical Society of Korea
- /
- v.43 no.2
- /
- pp.253-259
- /
- 2024
This paper proposes a new speech enhancement model based on a complex nested U-Net with a dual-branch decoder. The proposed model consists of a complex nested U-Net to simultaneously estimate the magnitude and phase components of the speech signal, and the decoder has a dual-branch decoder structure that performs spectral mapping and time-frequency masking in each branch. At this time, compared to the single-branch decoder structure, the dual-branch decoder structure allows noise to be effectively removed while minimizing the loss of speech information. The experiment was conducted on the VoiceBank + DEMAND database, commonly used for speech enhancement model training, and was evaluated through various objective evaluation metrics. As a result of the experiment, the complex nested U-Net-based speech enhancement model using a dual-branch decoder increased the Perceptual Evaluation of Speech Quality (PESQ) score by about 0.13 compared to the baseline, and showed a higher objective evaluation score than recently proposed speech enhancement models.
https://doi.org/10.7776/ASK.2024.43.2.253 인용 PDF

Search Result 52, Processing Time 0.022 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)