• Title/Summary/Keyword: Vocal Tract

Search Result 172, Processing Time 0.027 seconds

Extraction of Speech Features for Emotion Recognition (감정 인식을 위한 음성 특징 도출)

  • Kwon, Chul-Hong;Song, Seung-Kyu;Kim, Jong-Yeol;Kim, Keun-Ho;Jang, Jun-Su
    • Phonetics and Speech Sciences
    • /
    • v.4 no.2
    • /
    • pp.73-78
    • /
    • 2012
  • Emotion recognition is an important technology in the filed of human-machine interface. To apply speech technology to emotion recognition, this study aims to establish a relationship between emotional groups and their corresponding voice characteristics by investigating various speech features. The speech features related to speech source and vocal tract filter are included. Experimental results show that statistically significant speech parameters for classifying the emotional groups are mainly related to speech sources such as jitter, shimmer, F0 (F0_min, F0_max, F0_mean, F0_std), harmonic parameters (H1, H2, HNR05, HNR15, HNR25, HNR35), and SPI.

Flattening Techniques for Pitch Detection (피치 검출을 위한 스펙트럼 평탄화 기법)

  • 김종국;조왕래;배명진
    • Proceedings of the IEEK Conference
    • /
    • 2002.06d
    • /
    • pp.381-384
    • /
    • 2002
  • In speech signal processing, it Is very important to detect the pitch exactly in speech recognition, synthesis and analysis. but, it is very difficult to pitch detection from speech signal because of formant and transition amplitude affect. therefore, in this paper, we proposed a pitch detection using the spectrum flattening techniques. Spectrum flattening is to eliminate the formant and transition amplitude affect. In time domain, positive center clipping is process in order to emphasize pitch period with a glottal component of removed vocal tract characteristic. And rough formant envelope is computed through peak-fitting spectrum of original speech signal in frequency domain. As a results, well get the flattened harmonics waveform with the algebra difference between spectrum of original speech signal and smoothed formant envelope. After all, we obtain residual signal which is removed vocal tract element The performance was compared with LPC and Cepstrum, ACF 0wing to this algorithm, we have obtained the pitch information improved the accuracy of pitch detection and gross error rate is reduced in voice speech region and in transition region of changing the phoneme.

  • PDF

Voice conversion using low dimensional vector mapping (낮은 차원의 벡터 변환을 통한 음성 변환)

  • Lee, Kee-Seung;Doh, Won;Youn, Dae-Hee
    • Journal of the Korean Institute of Telematics and Electronics S
    • /
    • v.35S no.4
    • /
    • pp.118-127
    • /
    • 1998
  • In this paper, we propose a voice personality transformation method which makes one person's voice sound like another person's voice. In order to transform the voice personality, vocal tract transfer function is used as a transformation parameter. Comparing with previous methods, the proposed method can obtain high-quality transformed speech with low computational complexity. Conversion between the vocal tract transfer functions is implemented by a linear mapping based on soft clustering. In this process, mean LPC cepstrum coefficients and mean removed LPC cepstrum modeled by the low dimensional vector are used as transformation parameters. To evaluate the performance of the proposed method, mapping rules are generated from 61 Korean words uttered by two male and one female speakers. These rules are then applied to 9 sentences uttered by the same persons, and objective evaluation and subjective listening tests for the transformed speech are performed.

  • PDF

Short utterance speaker verification using PLDA model adaptation and data augmentation (PLDA 모델 적응과 데이터 증강을 이용한 짧은 발화 화자검증)

  • Yoon, Sung-Wook;Kwon, Oh-Wook
    • Phonetics and Speech Sciences
    • /
    • v.9 no.2
    • /
    • pp.85-94
    • /
    • 2017
  • Conventional speaker verification systems using time delay neural network, identity vector and probabilistic linear discriminant analysis (TDNN-Ivector-PLDA) are known to be very effective for verifying long-duration speech utterances. However, when test utterances are of short duration, duration mismatch between enrollment and test utterances significantly degrades the performance of TDNN-Ivector-PLDA systems. To compensate for the I-vector mismatch between long and short utterances, this paper proposes to use probabilistic linear discriminant analysis (PLDA) model adaptation with augmented data. A PLDA model is trained on vast amount of speech data, most of which have long duration. Then, the PLDA model is adapted with the I-vectors obtained from short-utterance data which are augmented by using vocal tract length perturbation (VTLP). In computer experiments using the NIST SRE 2008 database, the proposed method is shown to achieve significantly better performance than the conventional TDNN-Ivector-PLDA systems when there exists duration mismatch between enrollment and test utterances.

Diagnostic Role of Stroboscopy (후두 내시경의 진단적 역할)

  • Lee, Sang-Hyuk
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.21 no.1
    • /
    • pp.13-16
    • /
    • 2010
  • Diagnosis of a patient with dysphonia begins with a thorough history and physical examination. Larynx can be visualized either indirectly or directly with a rigid or flexible laryngoscope. One notable limitation of simple indirect laryngoscopy is that the examination dose not yields a recordable and reproducible image of the larynx and vocal tract. And unaided human eye is unable to visualize the vibratory patterns of the true vocal cord during phonantion. When available, stroboscopy provides useful information regarding vocal told closure, vibration, and mucosal wave which is useful to decide between microsurgery, vocal reeducation or a combined treatment Even there are some limitations, recognition of the advantages and disadvantages of stroboscopy allows for optimal appreciation and stroboscopy remains an essential diagnostic tool in the assessment of dysphonia.

  • PDF

Oral and Nasal Spectral Outputs in Korean Oral Vowels (정상 모음에 대한 구강 및 비강 spectral output 분석)

  • Hong, Ki-Hwan;Choi, Seung-Chul;Kim, Byum-Kyu;Yang, Yoon-Soo;Shim, Hyun-Ah
    • Speech Sciences
    • /
    • v.10 no.2
    • /
    • pp.145-157
    • /
    • 2003
  • Vowels are classified by the shapes of vocal tract. These shapes form constriction points along the tract, which have an influence on such vocal tract resonance as F1, F2, F3, and so on. The formant frequency is influenced by aperture and placement of tongue and the intensity is influenced by air pressure of subglottis. The object of this study compares to characterize the spectral outputs of oral and nasal spectra for the formant frequencies and intensity of Korean oral vowels. Subjects consisted of 20 normal persons (10 male and 10 female) without laryngeal pathology. The speech sample included /a/, /e/, /i/, /o/, /u/ of Korean oral vowels. The spectrum of each vowel was analysed by Nasal View and Real Analysis Program using Dr. Speech. The result showed that nasal intensity is decreased manifestly from F1 to F2. But oral intensity and Intensity is decreased little bit from F1 to F2. The most of values of nasal formant frequency is similarity oral formant frequency and Formant frequency or little bit smaller.

  • PDF

A Case of Laryngeal Inflammatory Myofibroblastic Tumor (후두에 발생한 염증성 근섬유모세포종 1 례)

  • Park, Sang Gyu;Kim, Yeseul;Woong, Jun Hyun;Song, Chang Myeon
    • Korean Journal of Head & Neck Oncology
    • /
    • v.35 no.2
    • /
    • pp.71-75
    • /
    • 2019
  • Inflammatory myofibrolastic tumor (IMT) is a rare borderline neoplasm. It frequently occurs in the lung but occasionally occurs in extrapulmonary sites such as the genitourinary tract, gastrointestinal tract, breast, salivary glands, sinonasal tract, orbit, and the central nervous system. Laryngeal involvement of IMT is very rare. A 61-year-old woman who complained of hoarseness persisting for 3 months visited our hospital. Laryngoscopy showed an elevated lesion in the right true vocal cord. Incisional biopsy was confirmed as larygeal inflammatory myofibrolastic tumor. We performed a transoral excision with CO2 LASER under suspension examination. Regional recurrence or distant metastasis was not observed after 9 months of follow-up. Herein we report a case of larygeal inflammatory myofibrolastic tumor that was treated with surgery alone, with a literature review.

Myxoma in the Laryngeal Ventricle and the True Vocal Cord:A Case Report (후두실과 진성대에 발생한 점액종 1예)

  • Kim, Seung-Woo;Yum, Dong-Jin;Kang, Jae-Ho;Kim, Choon-Dong
    • Korean Journal of Head & Neck Oncology
    • /
    • v.23 no.1
    • /
    • pp.67-70
    • /
    • 2007
  • Myxoma is an uncertain mesenchymal cell origin, characterized by irregular round, stellate or spindle cells surrounded by the matrix containing abundant mucoid material and scant vascularity. Their occurrence in descending order of frequency is in the heart, subcutaneous tissue, bone and genitourinary tract. In the head and neck region, the most predilection sites are mandible and maxilla(more than 80%). Laryngeal myxoma is extremely rare:only 5 cases have been reported in the English literature. We report a rare case of laryngeal myxoma. A 60-year-old man with hoarseness visited the out-patient department. The mass was located between the vocal fold and the vocal ligament, filling with the left laryngeal ventricle. We planned the laryngo-microsurgery and successfully excised using $CO_2$ laser. The histopathologic finding revealed the myxoma. After 18 months of surgery, there is no evidence of recurrence and mucosal scarring in the vocal fold. This report is the first case of laryngeal myxoma involving the laryngeal ventricle and the true vocal cord together.

Variation Analysis of Spectrogram for Indicators Design of Musicality Evaluation (음악성 평가 지표 설계를 위한 성도 모양의 변화 분석)

  • Kim, Bong-Hyun;Cho, Dong-Uk
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.10 no.8
    • /
    • pp.2110-2116
    • /
    • 2009
  • The culture industry very have interested in modern society so that it is a field to be provided opportunity to can benefits of life with health, medical industry. Especially, music industry to have based on popular support has acknowledged as artistic value to can easily approach that expresses a feeling to exist together with popularity, originality. In this paper, we will want to design indicators to evaluate a singer's musical talent to can speak a key part in these music industry. From this, we applied analysis elements of spectrogram to perform in change of vocal tract shape in singer's voice and public voice about identical music, and performed comparison, analysis of two groups to experiment pattern analysis of result waveform. Therefore, we analyzed pattern in change of vocal tract shape choice a popular music using of experiment to collect singer and public voice about identical part with time so that we designed indicator to can evaluate musicality.

Robust Speech Parameters for the Emotional Speech Recognition (감정 음성 인식을 위한 강인한 음성 파라메터)

  • Lee, Guehyun;Kim, Weon-Goo
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.22 no.6
    • /
    • pp.681-686
    • /
    • 2012
  • This paper studied the speech parameters less affected by the human emotion for the development of the robust emotional speech recognition system. For this purpose, the effect of emotion on the speech recognition system and robust speech parameters of speech recognition system were studied using speech database containing various emotions. In this study, mel-cepstral coefficient, delta-cepstral coefficient, RASTA mel-cepstral coefficient, root-cepstral coefficient, PLP coefficient and frequency warped mel-cepstral coefficient in the vocal tract length normalization method were used as feature parameters. And CMS (Cepstral Mean Subtraction) and SBR(Signal Bias Removal) method were used as a signal bias removal technique. Experimental results showed that the HMM based speaker independent word recognizer using frequency warped RASTA mel-cepstral coefficient in the vocal tract length normalized method, its derivatives and CMS as a signal bias removal showed the best performance.