• Title/Summary/Keyword: speech parameter

Search Result 373, Processing Time 0.021 seconds

A Study on Generation Method of Intonation using Peak Parameter and Pitch Lookup-Table (Peak 파라미터와 피치 검색테이블을 이용한 억양 생성방식 연구)

  • Jang, Seok-Bok;Kim, Hyung-Soon
    • Annual Conference on Human and Language Technology
    • /
    • 1999.10e
    • /
    • pp.184-190
    • /
    • 1999
  • 본 논문에서는 Text-to-Speech 시스템에서 사용할 억양 모델을 위해 음성 DB에서 모델 파라미터와 피치 검색테이블(lookup-table)을 추출하여 미리 구성하고, 합성시에는 이를 추정하여 최종 F0 값을 생성하는 자료기반 접근방식(data-driven approach)을 사용한다. 어절 경계강도(break-index)는 경계강도의 특성에 따라 고정적 경계강도와 가변적 경계강도로 세분화하여 사용하였고, 예측된 경계강도를 기준으로 억양구(Intonation Phrase)와 액센트구(Accentual Phrase)를 설정하였다. 특히, 액센트구 모델은 인지적, 음향적으로 중요한 정점(peak)을 정확하게 모델링하는 것에 주안점을 두어 정점(peak)의 시간축, 주파수축 값과 이를 기준으로 한 앞뒤 기울기를 추정하여 4개의 파라미터로 설정하였고, 이 파라미터들은 CART(Classification and Regression Tree)를 이용하여 예측규칙을 만들었다. 경계음조가 나타나는 조사, 어미는 정규화된(normalized) 피치값과 key-index로 구성되는 검색테이블을 만들어 보다 정교하게 피치값을 예측하였다. 본 논문에서 제안한 억양 모델을 본 연구실에서 제작한 음성합성기를 통해 합성하여 청취실험을 거친 결과, 기존의 상용 Text-to-Speech 시스템에 비해 자연스러운 합성음을 얻을 수 있었다.

  • PDF

Classification of Diphthongs using Acoustic Phonetic Parameters (음향음성학 파라메터를 이용한 이중모음의 분류)

  • Lee, Suk-Myung;Choi, Jeung-Yoon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.32 no.2
    • /
    • pp.167-173
    • /
    • 2013
  • This work examines classification of diphthongs, as part of a distinctive feature-based speech recognition system. Acoustic measurements related to the vocal tract and the voice source are examined, and analysis of variance (ANOVA) results show that vowel duration, energy trajectory, and formant variation are significant. A balanced error rate of 17.8% is obtained for 2-way diphthong classification on the TIMIT database, and error rates of 32.9%, 29.9%, and 20.2% are obtained for /aw/, /ay/, and /oy/, for 4-way classification, respectively. Adding the acoustic features to widely used Mel-frequency cepstral coefficients also improves classification.

Speaker-Dependent Emotion Recognition For Audio Document Indexing

  • Hung LE Xuan;QUENOT Georges;CASTELLI Eric
    • Proceedings of the IEEK Conference
    • /
    • summer
    • /
    • pp.92-96
    • /
    • 2004
  • The researches of the emotions are currently great interest in speech processing as well as in human-machine interaction domain. In the recent years, more and more of researches relating to emotion synthesis or emotion recognition are developed for the different purposes. Each approach uses its methods and its various parameters measured on the speech signal. In this paper, we proposed using a short-time parameter: MFCC coefficients (Mel­Frequency Cepstrum Coefficients) and a simple but efficient classifying method: Vector Quantification (VQ) for speaker-dependent emotion recognition. Many other features: energy, pitch, zero crossing, phonetic rate, LPC... and their derivatives are also tested and combined with MFCC coefficients in order to find the best combination. The other models: GMM and HMM (Discrete and Continuous Hidden Markov Model) are studied as well in the hope that the usage of continuous distribution and the temporal behaviour of this set of features will improve the quality of emotion recognition. The maximum accuracy recognizing five different emotions exceeds $88\%$ by using only MFCC coefficients with VQ model. This is a simple but efficient approach, the result is even much better than those obtained with the same database in human evaluation by listening and judging without returning permission nor comparison between sentences [8]; And this result is positively comparable with the other approaches.

  • PDF

A SOUND SPECTROGRAPHICAL STUDY ON THE KOREAN VOWELS AND CONSONANTS PRONOUNCED BY OPENBITE PATIENTS - Frequency Analysis - (SOUND SPECTROGRAPH를 이용한 개교환자의 한국어 자${\cdot}$모음의 발성에 관한 연구 - 주파수 분석을 중심으로 -)

  • Kim, Ki-Dal;Yang, Won Sik
    • The korean journal of orthodontics
    • /
    • v.15 no.1
    • /
    • pp.55-66
    • /
    • 1985
  • The study was undertaken to ascertain the speech defect of patients with malocclusion, especially of openbite patients, by means of the spectral analysis method. The experimental group was composed of ten female openbite patients and their mean age was 13.8 yrs. The control group was also composed of ten female girls and their mean age was 13.7 yrs. As for the speech material, eight Korean monophthrongs and two Korean fricatives and two affricatives were used. Speeches were recorded and then analyzed by a Kay 7800 digital sonagraph. Formant frequency level or range was used as a phonemic parameter. The results were as follows: 1. Among Vowels /a:/ : $F_1,\;F_3\;and\;F_1/F_2$ showed abnormality. /o:/ and $/w:/:F_2,\;F_2-F_1\;and\;F_1/F_2$ showed abnormality. 2. Among Consonants /S/ and /h/ : The upper and lower borders of the frequency range showed abnormality. (equation omitted) : The lower border of the frequency range showed abnormality. $/C^{h}/$ : The upper and lower borders of the frequency range and concentration point showed abnormality.

  • PDF

The implementation of children's automated formant setting by Praat scripting (Praat을 이용한 아동 포먼트 자동 세팅 스크립트 구현)

  • Park, Jiyeon;Seong, Cheoljae
    • Phonetics and Speech Sciences
    • /
    • v.10 no.4
    • /
    • pp.1-10
    • /
    • 2018
  • This study introduces an automated Praat script allowing optimal formant analysis for children's vowels. Using Burg's algorithm in Praat, formants can be extracted by setting the maximum formant value and the number of formants. The optimal formant setting was determined by identifying the two conditions, F1 and F2, with minimum standard deviations. When applying the optimal formant setting determined by the script, the results of normality tests were not significant among all vowels except /e/ for the maximum formant value, and among the vowels /a/, /e/, /i/, /o/, /u/ and /ʌ/ for the number of formants. This indicates that when analyzing the formants of children's vowel sounds, the unilateral application of a parameter setting (the maximum formant value and the number of formants) to all vowels is problematic. The performance of the optimal formant setting script was evaluated along with 3 different algorithm in order to determine whether it properly extracts formants for children's vowels. To this end, Korean monophghongs of 6-year-old children were collected and the Praat scripts were applied to the data. Resultant Formant plots and statistical analysis showed that optimum_script and qtone_script, which links to the perceptual unit, performed very well in formant extraction compared to the remaining 2 scripts.

Rank-weighted reconstruction feature for a robust deep neural network-based acoustic model

  • Chung, Hoon;Park, Jeon Gue;Jung, Ho-Young
    • ETRI Journal
    • /
    • v.41 no.2
    • /
    • pp.235-241
    • /
    • 2019
  • In this paper, we propose a rank-weighted reconstruction feature to improve the robustness of a feed-forward deep neural network (FFDNN)-based acoustic model. In the FFDNN-based acoustic model, an input feature is constructed by vectorizing a submatrix that is created by slicing the feature vectors of frames within a context window. In this type of feature construction, the appropriate context window size is important because it determines the amount of trivial or discriminative information, such as redundancy, or temporal context of the input features. However, we ascertained whether a single parameter is sufficiently able to control the quantity of information. Therefore, we investigated the input feature construction from the perspectives of rank and nullity, and proposed a rank-weighted reconstruction feature herein, that allows for the retention of speech information components and the reduction in trivial components. The proposed method was evaluated in the TIMIT phone recognition and Wall Street Journal (WSJ) domains. The proposed method reduced the phone error rate of the TIMIT domain from 18.4% to 18.0%, and the word error rate of the WSJ domain from 4.70% to 4.43%.

Towards better acoustic conditions in school buildings in Korea-a need for Korean standard for classroom acoustics (국내 교육시설의 음향기준 제정의 필요성 제고)

  • Young-Ji Choi
    • The Journal of the Acoustical Society of Korea
    • /
    • v.42 no.2
    • /
    • pp.113-123
    • /
    • 2023
  • This paper describes the acoustical conditions of elementary school and high school classrooms as well as university classrooms in Korea and suggests a need for Korean acoustic standards and guidelines for classroom design. Current standards and guidelines of classroom acoustics in several countries were briefly introduced to understand their acoustical performance criteria for background noise levels and reverberation times, and noise isolation design requirements in various types of classrooms. The results of several acoustic survey of domestic classrooms in elementary school, high school, and university were described and compared to provide information of the acoustic characteristics of Korean school classrooms. The survey includes occupied and unoccupied data on the acoustical conditions, noise levels, and noise isolation performance in the classrooms. Acoustical parameter values for achieving 'good' speech intelligibility in active university classrooms were also presented.

Phonation Threshold Flow and Phonation Threshold Pressure in Patients with Adductor Spasmodic Dysphonia

  • Choi, Seong-Hee;Jiang, Jack J.;Yun, Bo-Ram;Lee, Ji-Yeoun;Lim, Sung-Eun;Choi, Hong-Shik
    • Phonetics and Speech Sciences
    • /
    • v.2 no.3
    • /
    • pp.157-164
    • /
    • 2010
  • This study investigated the characteristics of two aerodynamic indices, PTP (Phonation threshold pressure) and PTF (Phonation threshold flow) in patients with ADSD (adductor spasmodic dysphonia) and to see if two new aerodynamic indices can differentiate between normal and ADSD group. Additionally, PTP and PTF values were compared in terms of overall severity of ADSD in the patient group. The severity of ADSD was rated on a 7-point rating scale by two experienced speech language pathologists. The Kay Elemetrics Phonatory Aerodynamic System (PAS) (Kay Elemetrics Corp., Lincoln Park, NJ) was used to collect PTP and PTF measurements from 16 female normal subjects, 31 female patients with ADSD. Significantly lower PTF values (P< 0.05) were observed in ADSD when compared to those of normal control. Also, significantly lower PTF values in severe ADSD patients (P<.001). However, PTP could not distinguish patients with ADSD from control groups (P=0.119) and among the ADSD groups according to the severity (P=0.177). Consequently, PTF was more sensitive than PTP which might differentiate between normal speakers and ADSD and among different levels of severity within ADSD, suggesting that PTF could be a useful diagnostic parameter to measure the aerodynamic function of ADSD and provide the neurolaryngeal dysfunction in patients with ADSD.

  • PDF

Performance Improvement in Speech Recognition by Weighting HMM Likelihood (은닉 마코프 모델 확률 보정을 이용한 음성 인식 성능 향상)

  • 권태희;고한석
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.2
    • /
    • pp.145-152
    • /
    • 2003
  • In this paper, assuming that the score of speech utterance is the product of HMM log likelihood and HMM weight, we propose a new method that HMM weights are adapted iteratively like the general MCE training. The proposed method adjusts HMM weights for better performance using delta coefficient defined in terms of misclassification measure. Therefore, the parameter estimation and the Viterbi algorithms of conventional 1:.um can be easily applied to the proposed model by constraining the sum of HMM weights to the number of HMMs in an HMM set. Comparing with the general segmental MCE training approach, computing time decreases by reducing the number of parameters to estimate and avoiding gradient calculation through the optimal state sequence. To evaluate the performance of HMM-based speech recognizer by weighting HMM likelihood, we perform Korean isolated digit recognition experiments. The experimental results show better performance than the MCE algorithm with state weighting.

Usefulness of Cepstral Peak Prominence (CPP) in Unilateral Vocal Fold Paralysis Dysphonia Evaluation (일측성 성대마비 환자 평가에서 Cepstral Peak Prominence의 유용성)

  • Lee, Chang-Yoon;Jeong, Hee Seok;Son, Hee Young
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.28 no.2
    • /
    • pp.84-88
    • /
    • 2017
  • Background and Objectives : The purpose of this study was to compare the usefulness of Cepstral peak prominence (CPP) with parameter of Multiple Dimensional Voice Program (MDVP) in evaluating unilateral vocal fold paraylsis patients with subjective voice impairment. Materials and Methods : From July 2014 to August 2016, 37 patients with unilateral vocal fold paralysis who had been diagnosed with unilateral vocal fold paralysis and had received two or more voice tests before and after the diagnosis were evaluated for maximum phonation time (MPT), MDVP and CPP. Respectively. Voice tests were performed with short vowel /a/ and paragraph reading. Results : The CPP-a (CPP with vowel /a/) and CPP-s (CPP with paragraph reading) of the Cepstrum were statistically negatively correlated with G, R, B, and A before the voice therapy. Jitter, Shimmer, and NHR of MDVP were positively correlated with G, R, B. Jitter, Shimmer, and NHR of the MDVP were significantly correlated with the Cepstrum index. G, B, A and CPP-a and CPP-s showed a statistically significant negative correlation and a somewhat higher correlation coefficient between 0.5 and 0.78. On the other hand, in MDVP index, there was a positive correlation with G and B only with Jitter of 0.4. Conclusion : CPP can be an important evaluation tool in the evaluation of speech in the unilateral vocal cord paralysis when speech energy changes or the cycle is not constant during speech.

  • PDF