• Title/Summary/Keyword: Vocal pitch

Search Result 144, Processing Time 0.021 seconds

2.4kbps Speech Coding Algorithm Using the Sinusoidal Model (정현파 모델을 이용한 2.4kbps 음성부호화 알고리즘)

  • 백성기;배건성
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.27 no.3A
    • /
    • pp.196-204
    • /
    • 2002
  • The Sinusoidal Transform Coding(STC) is a vocoding scheme based on a sinusoidal model of a speech signal. The low bit-rate speech coding based on sinusoidal model is a method that models and synthesizes speech with fundamental frequency and its harmonic elements, spectral envelope and phase in the frequency region. In this paper, we propose the 2.4kbps low-rate speech coding algorithm using the sinusoidal model of a speech signal. In the proposed coder, the pitch frequency is estimated by choosing the frequency that makes least mean squared error between synthetic speech with all spectrum peaks and speech synthesized with chosen frequency and its harmonics. The spectral envelope is estimated using SEEVOC(Spectral Envelope Estimation VOCoder) algorithm and the discrete all-pole model. The phase information is obtained using the time of pitch pulse occurrence, i.e., the onset time, as well as the phase of the vocal tract system. Experimental results show that the synthetic speech preserves both the formant and phase information of the original speech very well. The performance of the coder has been evaluated in terms of the MOS test based on informal listening tests, and it achieved over the MOS score of 3.1.

Personal Credit Evaluation System through Telephone Voice Analysis: By Support Vector Machine

  • Park, Hyungwoo
    • Journal of Internet Computing and Services
    • /
    • v.19 no.6
    • /
    • pp.63-72
    • /
    • 2018
  • The human voice is one of the easiest methods for the information transmission between human beings. The characteristics of voice can vary from person to person and include the speed of speech, the form and function of the vocal organ, the pitch tone, speech habits, and gender. The human voice is a key element of human communication. In the days of the Fourth Industrial Revolution, voices are also a major means of communication between humans and humans, between humans and machines, machines and machines. And for that reason, people are trying to communicate their intentions to others clearly. And in the process, it contains various additional information along with the linguistic information. The Information such as emotional status, health status, part of trust, presence of a lie, change due to drinking, etc. These linguistic and non-linguistic information can be used as a device for evaluating the individual's credit worthiness by appearing in various parameters through voice analysis. Especially, it can be obtained by analyzing the relationship between the characteristics of the fundamental frequency(basic tonality) of the vocal cords, and the characteristics of the resonance frequency of the vocal track.In the previous research, the necessity of various methods of credit evaluation and the characteristic change of the voice according to the change of credit status were studied. In this study, we propose a personal credit discriminator by machine learning through parameters extracted through voice.

Emotion recognition in speech using hidden Markov model (은닉 마르코프 모델을 이용한 음성에서의 감정인식)

  • 김성일;정현열
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.3 no.3
    • /
    • pp.21-26
    • /
    • 2002
  • This paper presents the new approach of identifying human emotional states such as anger, happiness, normal, sadness, or surprise. This is accomplished by using discrete duration continuous hidden Markov models(DDCHMM). For this, the emotional feature parameters are first defined from input speech signals. In this study, we used prosodic parameters such as pitch signals, energy, and their each derivative, which were then trained by HMM for recognition. Speaker adapted emotional models based on maximum a posteriori(MAP) estimation were also considered for speaker adaptation. As results, the simulation performance showed that the recognition rates of vocal emotion gradually increased with an increase of adaptation sample number.

  • PDF

Analysis and synthesis of pseudo-periodicity on voice using source model approach (음성의 준주기적 현상 분석 및 구현에 관한 연구)

  • Jo, Cheolwoo
    • Phonetics and Speech Sciences
    • /
    • v.8 no.4
    • /
    • pp.89-95
    • /
    • 2016
  • The purpose of this work is to analyze and synthesize the pseudo-periodicity of voice using a source model. A speech signal has periodic characteristics; however, it is not completely periodic. While periodicity contributes significantly to the production of prosody, emotional status, etc., pseudo-periodicity contributes to the distinctions between normal and abnormal status, the naturalness of normal speech, etc. Measurement of pseudo-periodicity is typically performed through parameters such as jitter and shimmer. For studying the pseudo-periodic nature of voice in a controlled environment, through collected natural voice, we can only observe the distributions of the parameters, which are limited by the size of collected data. If we can generate voice samples in a controlled manner, experiments that are more diverse can be conducted. In this study, the probability distributions of vowel pitch variation are obtained from the speech signal. Based on the probability distribution of vocal folds, pulses with a designated jitter value are synthesized. Then, the target and re-analyzed jitter values are compared to check the validity of the method. It was found that the jitter synthesis method is useful for normal voice synthesis.

Voice Personality Transformation Using a Multiple Response Classification and Regression Tree (다중 응답 분류회귀트리를 이용한 음성 개성 변환)

  • 이기승
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.3
    • /
    • pp.253-261
    • /
    • 2004
  • In this paper, a new voice personality transformation method is proposed. which modifies speaker-dependent feature variables in the speech signals. The proposed method takes the cepstrum vectors and pitch as the transformation paremeters, which represent vocal tract transfer function and excitation signals, respectively. To transform these parameters, a multiple response classification and regression tree (MR-CART) is employed. MR-CART is the vector extended version of a conventional CART, whose response is given by the vector form. We evaluated the performance of the proposed method by comparing with a previously proposed codebook mapping method. We also quantitatively analyzed the performance of voice transformation and the complexities according to various observations. From the experimental results for 4 speakers, the proposed method objectively outperforms a conventional codebook mapping method. and we also observed that the transformed speech sounds closer to target speech.

Formant Trajectories of English Vowels Produced by American Children (미국인 아동이 발음한 영어모음의 포먼트 궤적)

  • Yang, Byung-Gon
    • Phonetics and Speech Sciences
    • /
    • v.3 no.1
    • /
    • pp.23-34
    • /
    • 2011
  • Many Korean children have difficulty learning English vowels. The gestures inside the oral and pharyngeal cavities are hard to control when they cannot see the gestures and the target vowel system is quite different from that of Korean. This study attempts to collect children's acoustic data of twelve English vowels published by Hillenbrand et al. (1995) online and to examine the acoustic features of English vowels for phoneticians and English teachers. The author used Praat to obtain the data systematically at six equidistant timepoints over the vowel segment avoiding any obvious errors. Results show inherent acoustic properties for vowels from the children's distribution of vowel duration, f0 and intensity values. Second, children's gestures for each vowel coincide with the regression analysis of all formant values at different timepoints regardless of the vocal fold and tract difference. Third, locus points appear higher than those of American males and females. Their gestures along the timepoints display almost similar patterns. From the results the author concludes that vowel formant trajectories provide useful and important information on dynamic articulatory gestures, which may be applicable to Korean children's education and correction of English vowels. Further studies on the developmental study of vowel formants and pitch values are desirable.

  • PDF

An Acoustical Study of English Diphthongs Produced by American Males and Females (미국인 남성과 여성이 발음한 영어이중모음의 음향적 연구)

  • Yang, Byung-Gon
    • Phonetics and Speech Sciences
    • /
    • v.2 no.2
    • /
    • pp.43-50
    • /
    • 2010
  • English vowels can be divided into monophthongs and diphthongs depending on the number of vocal tract shapes. Diphthongs are usually produced with more than one shape. This study attempts to collect acoustical data of English diphthongs published by Hillenbrand et al.(1995) online and to examine acoustic features of the diphthongs for phoneticians and English teachers. Sixty three American males and females were chosen after excluding those subjects with different target vowels or ambiguous formant tracks. The author used Praat to obtain the acoustical data systematically at eleven equidistant timepoints over the diphthongal segment. Obvious errors were corrected based on the spectrographic display of each diphthong. Results show that the formant trajectories of the diphthongs produced by the American males and females appeared quite similar. When the female formant values were uniformly normalized to those of the males, almost a perfect collapse occurred. Secondly, the diphthongal movements on the vowel space appeared not linear due to the coarticulatory gesture for the following consonant. Thirdly, the average duration of the diphthongs produced by the females was 1.156 times longer than that of the males while the pitch ratio between the two groups turned out to be 1.746 with a similar contour over measurement points. The author concludes that English diphthongs produced by various groups can be compared systematically when the acoustical values are obtained at proportional timepoints. Further studies will be desirable on the comparison of English diphthongs produced by native and nonnative speakers.

  • PDF

VOICE SOURCE ESTIMATION USING SEQUENTIAL SVD AND EXTRACTION OF COMPOSITE SOURCE PARAMETERS USING EM ALGORITHM

  • Hong, Sung-Hoon;Choi, Hong-Sub;Ann, Sou-Guil
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1994.06a
    • /
    • pp.893-898
    • /
    • 1994
  • In this paper, the influence of voice source estimation and modeling on speech synthesis and coding is examined and then their new estimation and modeling techniques are proposed and verified by computer simulation. It is known that the existing speech synthesizer produced the speech which is dull and inanimated. These problems are arised from the fact that existing estimation and modeling techniques can not give more accurate voice parameters. Therefore, in this paper we propose a new voice source estimation algorithm and modeling techniques which can not give more accurate voice parameters. Therefore, in this paper we propose a new voice source estimation algorithm and modeling techniques which can represent a variety of source characteristics. First, we divide speech samples in one pitch region into four parts having different characteristics. Second, the vocal-tract parameters and voice source waveforms are estimated in each regions differently using sequential SVD. Third, we propose composite source model as a new voice source model which is represented by weighted sum of pre-defined basis functions. And finally, the weights and time-shift parameters of the proposed composite source model are estimeted uning EM(estimate maximize) algorithm. Experimental results indicate that the proposed estimation and modeling methods can estimate more accurate voice source waveforms and represent various source characteristics.

  • PDF

Change Analysis of Vocal Cords Vibration Parameter According to C2H5OH (C2H5OH에 따른 성대 진동 요소의 변화 분석)

  • Kim, Bong-Hyun;Jang, Young-Jo;Ka, Min-Kyoung;Lee, Se-Hwan;Cho, Dong-Uk
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2010.04a
    • /
    • pp.494-497
    • /
    • 2010
  • 본 논문에서는 $C_2H_5OH$를 주성분으로 하고 있는 알코올량에 따른 음성학적 분석 요소값의 변화를 측정하였다. 이를 위해 알코올 섭취량의 변화를 측정하고 다양한 음성 분석기법을 적용한 실험 과정을 수행하였다. 따라서 본 연구의 결과를 토대로 다양한 환경에 용이하게 쓰일 수 있도록 알코올 섭취와 관련된 분야에 적용 가능한 실험 절차를 수행하였다. 따라서 본 논문에서는 음성 분석 요소의 Pitch값과 Shimmer 및 Jitter에 대해 분석을 수행하였으며 실험 결과를 통해 알코올 섭취량이 음성에 미치는 영향을 분석하였다. 최종적으로 실험에 의해 제안한 방법의 유용성을 입증하였다.

Change of Voice Parameters After Thyroidectomy Without Apparent Injury to the Recurrent Laryngeal or External Branch of Superior Laryngeal Nerve: A Prospective Cohort Study

  • Lee, Doh Young;Choe, Goun;Park, Hanaro;Han, Sungjun;Park, Sung Joon;Kim, Seong Dong;Kim, Bo Hae;Jin, Young Ju;Lee, Kyu Eun;Park, Young Joo;Kwon, Tack-Kyun
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.33 no.2
    • /
    • pp.89-96
    • /
    • 2022
  • Background and Objectives The quality of life after thyroidectomy, such as voice change, is considered to be as important as control of the disease. In this study, we aimed to evaluate changes in both subjective and objective voice parameters after thyroidectomy resulting in normal morbidity of the vocal cords. Materials and Method In this prospective cohort study, 204 patients who underwent thyroidectomy with or without central neck dissection at a single referral center from Feb 2015 to Aug 2016 were enrolled. All patients underwent prospective voice evaluations including both subjective and objective assessments preoperatively and then at 2 weeks, 3, 6, and 12 months postoperatively. Temporal changes of the voice parameters were analyzed. Results Values of the subjective assessment tool worsened during the early postoperative follow-up period and did not recover to the preoperative values at 12 months postoperatively. The maximal phonation time gradually decreased, whereas most objective parameters, including maximal vocal pitch (MVP), reached preoperative values at 3-6 months postoperatively. The initial decrease in MVP was significantly greater in patients undergoing total thyroidectomy, and their MVP recovery time was faster than that of patients undergoing lobectomy (p=0.001). Patients whose external branch of the superior laryngeal nerve was confirmed intact by electroidentification showed no difference in recovery speed compared with patients without electroindentification (p=0.102), although the initial decrease in MVP was lower with electroidentification. Conclusion Subjective assessment in voice quality and maximal phonation time after thyroidectomy did not show recovery to preoperative values. Aggravation of MVP was associated with surgical extent and electroidentification.