• Title/Summary/Keyword: speech parameter

Search Result 373, Processing Time 0.028 seconds

Fast Harmonic Synthesis Method for Sinusoidal Speech-Audio Model (정현파 음성-오디오 모델의 빠른 하모닉 합성 방법)

  • Kim, Gyu-Jin;Kim, Jong-Hark;Jung, Gyu-Hyeok;Lee, In-Sung
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.44 no.4 s.316
    • /
    • pp.109-116
    • /
    • 2007
  • Most harmonic synthesis methods using phase information employ a quadratic or cubic phase interpolation. The methods are computationally expensive to implement because every component sinewave must be synthesized on a per sample basis. In this paper, we propose a fast harmonic synthesis method for sinusoidal speech/audio coding based on the quadratic and cubic phase function to overcome the complexity problem. To derive the fast harmonic synthesis method, we define the over-sampling function and phase modulation function by constraining the parameter of phase function to be independent for harmonic index and derive the fast synthesis method using IFFT. Experimental results show that the proposed method significantly reduce the complexity of conventional cosine synthesis method while maintaining the performance.

The Correlation of Voice Characteristics and Depression Index Analysis in Accordance with Menstrual Cycle (월경주기에 따른 우울지수 정도와 음성특성과의 상관관계 분석)

  • Kim, YuMi;Jang, Seoung-Jin;Kim, Eunyeon;Choi, Yaelin
    • Phonetics and Speech Sciences
    • /
    • v.6 no.3
    • /
    • pp.41-48
    • /
    • 2014
  • This study investigated the differences between emotional parameters BDI, VHI, STAI-X-I and STAI-X-II according to the menstrual cycles of the female and the relation between changes of the depression index and voice characteristics (jitter, shimmer, CPP, HNR, $pF0{\cdot}F1{\cdot}F2{\cdot}F3$, sF0, sF4, sB1, $H1_{c/u}$, $A1_u$, $A3_c$, $H1A3_{c/u}$, $H1A1_u$). Twenty three females ($30{\pm}4.4$ years old) living in Seoul and Gyeonggi Province were participated in this study to answer the questionnaires and record their voice. The participants prolonged /a/ vowel for 5 seconds in a natural condition for their voice recording. Voice data were analyzed using the Matlab and Praat program. A t-test and a correlation analysis were conducted by using SPSS for the statistical analysis. The results are as follows. First, the BDI is significantly higher in group I (lurear phase contrast the menstrual period) and group II (follicular phase against the menstrual period) than group III (luteal phase for follicular phase) (p<.05). Second, shimmer, CPP, pF0 showed a statistically high correlation regarding the BDI in group I (lurear phase contrast the menstrual period). Voice parameters may be useful as supplement in evaluating the emotional change in the phase of menstrual cycle.

Analysis of Vocal Cord Function by Humidity Change Based on Voice Signal Analysis (음성신호 분석 기반의 습도 변화에 따른 성대 기능 분석)

  • Kim, Bong-Hyun;Cho, Dong-Uk
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.37A no.9
    • /
    • pp.792-798
    • /
    • 2012
  • Network Quotient, an important figure in modern society, the intelligibility of speech as a conversation partner to maximize pulling up feeling of liking it as much as possible has become an important issue. The humidity of air in the intelligibility of speech have many influences. Therefore, in this paper, we carried out experiment to apply voice signal analysis techniques which to analyze influenced vocal cords in 30%, 50% and 80%, maintaining a constant humidity of the environment. With this in mind, we carried out experiments on intensity and pitch of voice signal on twenty male 20s in maintaining a constant humidity 30%, 50% and 80% of humidity. Finally, we carried out study to draw a significance through statistical analysis measuring characteristic parameter of vocal cord function to change of humidity.

Noisy Environmental Adaptation for Word Recognition System Using Maximum a Posteriori Estimation (최대사후확률 추정법을 이용한 단어인식기의 잡음환경적응화)

  • Lee, Jung-Hoon;Lee, Shi-Wook;Chung, Hyun-Yeol
    • The Journal of the Acoustical Society of Korea
    • /
    • v.16 no.2
    • /
    • pp.107-113
    • /
    • 1997
  • To achive a robust Korean word recognition system for both channel distortion and additive noise, maximum a posteriori estimation(MAP) adaptation is proposed and the effectiveness of environmental adaptation for improving recognition performance is investigated in this paper. To do this, recognition experiments using MAP adaptation are carried out for the three different speech ; 1) channel distortion is introduced, 2) environmental noise is added, 3) both channel distortion and additive noise are presented. Theeffectiveness of additive feature parameters, such as regressive coefficients and durations, for environmental adaptation are also investigated. From the speaker independent 100 words recognition tests, we had 9.0% of recognition improvement for the case 1), more than 75% for the case 2), and 11%~61.4% for the case 3) respectively, resulting that a MAP environmental adaptation is effective for both channel distorted and noise added speech recognition. But it turned out that duration information used as additive feature parameter did not played an important role in the tests.

  • PDF

Speaker Recognition Using Dynamic Time Variation fo Orthogonal Parameters (직교인자의 동적 특성을 이용한 화자인식)

  • 배철수
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.17 no.9
    • /
    • pp.993-1000
    • /
    • 1992
  • Recently, many researchers have found that the speaker recognition rate is high when they perform the speaker recognition using statistical processing method of orthogonal parameter, which are derived from the analysis of speech signal and contain much of the speaker's identity. This method, however, has problems caused by vocalization speed or time varying feature of speed. Thus, to solve these problems, this paper proposes two methods of speaker recognition which combine DTW algorithm with the method using orthogonal parameters extracted from $Karthumem-Lo\'{e}ve$ Transform method which applies orthogonal parameters as feature vector to ETW algorithm and the other is the method which applies orthogonal parameters to the optimal path. In addition, we compare speaker recognition rate obtained from the proposed two method with that from the conventional method of statistical process of orthogonal parameters. Orthogonal parameters used in this paper are derived from both linear prediction coefficients and partial correlation coefficients of speech signal.

  • PDF

Measurement of Cardiac Function Improvement by Auricular Acupuncture Applying Speech Signal Analysis (음성신호 분석을 적용한 이침요법(耳針療法)에 따른 심장 기능 향상 측정)

  • Kim, Bong-Hyun;Cho, Dong-Uk;Han, Kil-Sung
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.12 no.12
    • /
    • pp.5588-5593
    • /
    • 2011
  • In this paper, measure of change the speech analysis parameter by stimulating ears blood points corresponding to cardiac. To do this, we collected voice of before and after a stimulation corresponding points to ears to select normal heart having 10 subjects. We analyzed changes before and after corresponding points to ear in cardiac to apply Jitter, the second zFormant Frequency Bandwidths related to heart of elements of voice analysis. As a result of us experiment, we were able to analyze correlation of voice with cardiac according to corresponding points to ears decreased values of Jitter, the Second Formant Frequency Bandwidths of 90% of subjects. Finally, the effectiveness of proposed method is demonstrated by several experiments.

Acoustic analysis of wet voice among patients with swallowing disorders (삼킴장애 환자의 wet voice 관련 음향학적 분석)

  • Kang, Young Ae;Koo, Bon Seok;Kwon, In Sun;Seong, Cheoljae
    • Phonetics and Speech Sciences
    • /
    • v.10 no.4
    • /
    • pp.147-154
    • /
    • 2018
  • Wet voice quality (WVQ) is a characteristic that appears after swallowing. Although the concept is accepted by many clinicians worldwide, it is nevertheless ambiguous. In this study, we investigated WVQ in patients with swallowing disorders using acoustic analysis. A total of 106 patients diagnosed with penetration-aspiration by the videofluoroscopic swallowing study (VFSS) were recruited. A voice recording of vowel /a/ was conducted before and after the VFSS, and an acoustic analysis was then performed using PRAAT. Voice after VFSS was used for a perceptual judgment and divided into two groups: the Wet group (48 patients) and the Non-wet group (58 patients). At the post-VFSS stage, the two groups displayed significant differences in many acoustic parameters including F0_SD, Jitter, RAP, Shimmer, APQ, HNR, NHR, FUF, DVB, and CPP. The parameter affecting judging wetness resulted into Jitter and NHR by the logistic regression test. At the pre-VFSS stage, the two groups differed significantly in many acoustic parameters including Intensity, Jitter, RAP, Shimmer, NHR, FUF, DVB, and CPP. Both pre-and post-VFSS, the mean values of all significant parameters, except Intensity, HNR, and CPP, were higher in the Wet group. According to pre-and post-VFSS, the two groups displayed interactions in many parameters (Intensity, F0_SD, Jitter, RAP, Shimmer, APQ, HNR, NHR, FUF, DVB, and CPP). In particular, Intensity increased in both groups after the VFSS, although the increase in the Non-wet group was greater. Based on these results, it was conjectured that the WVQ after swallowing resulted from the secretion effect of the mucous membrane due to the dry laryngeal characteristic of elderly patients, rather than aspiration resulting in food on the vocal cords.

Comparisons of voice quality parameter values measured with MDVP, Praat, and TF32 (MDVP, Praat, TF32에 따른 음향학적 측정치에 대한 비교)

  • Ko, Hye-Ju;Woo, Mee-Ryung;Choi, Yaelin
    • Phonetics and Speech Sciences
    • /
    • v.12 no.3
    • /
    • pp.73-83
    • /
    • 2020
  • Measured values may differ between Multi-Dimensional Voice Program (MDVP), Praat, and Time-Frequency Analysis software (TF32), all of which are widely used in voice quality analysis, due to differences in the algorithms used in each analyzer. Therefore, this study aimed to compare the values of parameters of normal voice measured with each analyzer. After tokens of the vowel sound /a/ were collected from 35 normal adult subjects (19 male and 16 female), they were analyzed with MDVP, Praat, and TF32. The mean values obtained from Praat for jitter variables (J local, J abs, J rap, and J ppq), shimmer variables (S local, S dB, and S apq), and noise-to-harmonics ratio (NHR) were significantly lower than those from MDVP in both males and females (p<.01). The mean values of J local, J abs, and S local were significantly lower in the order MDVP, Praat, and TF32 in both genders. In conclusion, the measured values differed across voice analyzers due to the differences in the algorithms each analyzer uses. Therefore, it is important for clinicians to analyze pathologic voice after understanding the normal criteria used by each analyzer when they use a voice analyzer in clinical practice.

Automatic detection and severity prediction of chronic kidney disease using machine learning classifiers (머신러닝 분류기를 사용한 만성콩팥병 자동 진단 및 중증도 예측 연구)

  • Jihyun Mun;Sunhee Kim;Myeong Ju Kim;Jiwon Ryu;Sejoong Kim;Minhwa Chung
    • Phonetics and Speech Sciences
    • /
    • v.14 no.4
    • /
    • pp.45-56
    • /
    • 2022
  • This paper proposes an optimal methodology for automatically diagnosing and predicting the severity of the chronic kidney disease (CKD) using patients' utterances. In patients with CKD, the voice changes due to the weakening of respiratory and laryngeal muscles and vocal fold edema. Previous studies have phonetically analyzed the voices of patients with CKD, but no studies have been conducted to classify the voices of patients. In this paper, the utterances of patients with CKD were classified using the variety of utterance types (sustained vowel, sentence, general sentence), the feature sets [handcrafted features, extended Geneva Minimalistic Acoustic Parameter Set (eGeMAPS), CNN extracted features], and the classifiers (SVM, XGBoost). Total of 1,523 utterances which are 3 hours, 26 minutes, and 25 seconds long, are used. F1-score of 0.93 for automatically diagnosing a disease, 0.89 for a 3-classes problem, and 0.84 for a 5-classes problem were achieved. The highest performance was obtained when the combination of general sentence utterances, handcrafted feature set, and XGBoost was used. The result suggests that a general sentence utterance that can reflect all speakers' speech characteristics and an appropriate feature set extracted from there are adequate for the automatic classification of CKD patients' utterances.

Voice Personality Transformation Using an Optimum Classification and Transformation (최적 분류 변환을 이용한 음성 개성 변환)

  • 이기승
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.5
    • /
    • pp.400-409
    • /
    • 2004
  • In this paper. a voice personality transformation method is proposed. which makes one person's voice sound like another person's voice. To transform the voice personality. vocal tract transfer function is used as a transformation parameter. Comparing with previous methods. the proposed method makes transformed speech closer to target speaker's voice in both subjective and objective points of view. Conversion between vocal tract transfer functions is implemented by classification of entire vector space followed by linear transformation for each cluster. LPC cepstrum is used as a feature parameter. A joint classification and transformation method is proposed, where optimum clusters and transformation matrices are simultaneously estimated in the sense of a minimum mean square error criterion. To evaluate the performance of the proposed method. transformation rules are generated from 150 sentences uttered by three male and on female speakers. These rules are then applied to another 150 sentences uttered by the same speakers. and objective evaluation and subjective listening tests are performed.