• 제목/요약/키워드: Vocal Tract

검색결과 172건 처리시간 0.031초

감정 인식을 위한 음성 특징 도출 (Extraction of Speech Features for Emotion Recognition)

  • 권철홍;송승규;김종열;김근호;장준수
    • 말소리와 음성과학
    • /
    • 제4권2호
    • /
    • pp.73-78
    • /
    • 2012
  • Emotion recognition is an important technology in the filed of human-machine interface. To apply speech technology to emotion recognition, this study aims to establish a relationship between emotional groups and their corresponding voice characteristics by investigating various speech features. The speech features related to speech source and vocal tract filter are included. Experimental results show that statistically significant speech parameters for classifying the emotional groups are mainly related to speech sources such as jitter, shimmer, F0 (F0_min, F0_max, F0_mean, F0_std), harmonic parameters (H1, H2, HNR05, HNR15, HNR25, HNR35), and SPI.

피치 검출을 위한 스펙트럼 평탄화 기법 (Flattening Techniques for Pitch Detection)

  • 김종국;조왕래;배명진
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2002년도 하계종합학술대회 논문집(4)
    • /
    • pp.381-384
    • /
    • 2002
  • In speech signal processing, it Is very important to detect the pitch exactly in speech recognition, synthesis and analysis. but, it is very difficult to pitch detection from speech signal because of formant and transition amplitude affect. therefore, in this paper, we proposed a pitch detection using the spectrum flattening techniques. Spectrum flattening is to eliminate the formant and transition amplitude affect. In time domain, positive center clipping is process in order to emphasize pitch period with a glottal component of removed vocal tract characteristic. And rough formant envelope is computed through peak-fitting spectrum of original speech signal in frequency domain. As a results, well get the flattened harmonics waveform with the algebra difference between spectrum of original speech signal and smoothed formant envelope. After all, we obtain residual signal which is removed vocal tract element The performance was compared with LPC and Cepstrum, ACF 0wing to this algorithm, we have obtained the pitch information improved the accuracy of pitch detection and gross error rate is reduced in voice speech region and in transition region of changing the phoneme.

  • PDF

낮은 차원의 벡터 변환을 통한 음성 변환 (Voice conversion using low dimensional vector mapping)

  • 이기승;도원;윤대희
    • 전자공학회논문지S
    • /
    • 제35S권4호
    • /
    • pp.118-127
    • /
    • 1998
  • In this paper, we propose a voice personality transformation method which makes one person's voice sound like another person's voice. In order to transform the voice personality, vocal tract transfer function is used as a transformation parameter. Comparing with previous methods, the proposed method can obtain high-quality transformed speech with low computational complexity. Conversion between the vocal tract transfer functions is implemented by a linear mapping based on soft clustering. In this process, mean LPC cepstrum coefficients and mean removed LPC cepstrum modeled by the low dimensional vector are used as transformation parameters. To evaluate the performance of the proposed method, mapping rules are generated from 61 Korean words uttered by two male and one female speakers. These rules are then applied to 9 sentences uttered by the same persons, and objective evaluation and subjective listening tests for the transformed speech are performed.

  • PDF

PLDA 모델 적응과 데이터 증강을 이용한 짧은 발화 화자검증 (Short utterance speaker verification using PLDA model adaptation and data augmentation)

  • 윤성욱;권오욱
    • 말소리와 음성과학
    • /
    • 제9권2호
    • /
    • pp.85-94
    • /
    • 2017
  • Conventional speaker verification systems using time delay neural network, identity vector and probabilistic linear discriminant analysis (TDNN-Ivector-PLDA) are known to be very effective for verifying long-duration speech utterances. However, when test utterances are of short duration, duration mismatch between enrollment and test utterances significantly degrades the performance of TDNN-Ivector-PLDA systems. To compensate for the I-vector mismatch between long and short utterances, this paper proposes to use probabilistic linear discriminant analysis (PLDA) model adaptation with augmented data. A PLDA model is trained on vast amount of speech data, most of which have long duration. Then, the PLDA model is adapted with the I-vectors obtained from short-utterance data which are augmented by using vocal tract length perturbation (VTLP). In computer experiments using the NIST SRE 2008 database, the proposed method is shown to achieve significantly better performance than the conventional TDNN-Ivector-PLDA systems when there exists duration mismatch between enrollment and test utterances.

후두 내시경의 진단적 역할 (Diagnostic Role of Stroboscopy)

  • 이상혁
    • 대한후두음성언어의학회지
    • /
    • 제21권1호
    • /
    • pp.13-16
    • /
    • 2010
  • Diagnosis of a patient with dysphonia begins with a thorough history and physical examination. Larynx can be visualized either indirectly or directly with a rigid or flexible laryngoscope. One notable limitation of simple indirect laryngoscopy is that the examination dose not yields a recordable and reproducible image of the larynx and vocal tract. And unaided human eye is unable to visualize the vibratory patterns of the true vocal cord during phonantion. When available, stroboscopy provides useful information regarding vocal told closure, vibration, and mucosal wave which is useful to decide between microsurgery, vocal reeducation or a combined treatment Even there are some limitations, recognition of the advantages and disadvantages of stroboscopy allows for optimal appreciation and stroboscopy remains an essential diagnostic tool in the assessment of dysphonia.

  • PDF

정상 모음에 대한 구강 및 비강 spectral output 분석 (Oral and Nasal Spectral Outputs in Korean Oral Vowels)

  • 홍기환;최승철;김범규;양윤수;심현아
    • 음성과학
    • /
    • 제10권2호
    • /
    • pp.145-157
    • /
    • 2003
  • Vowels are classified by the shapes of vocal tract. These shapes form constriction points along the tract, which have an influence on such vocal tract resonance as F1, F2, F3, and so on. The formant frequency is influenced by aperture and placement of tongue and the intensity is influenced by air pressure of subglottis. The object of this study compares to characterize the spectral outputs of oral and nasal spectra for the formant frequencies and intensity of Korean oral vowels. Subjects consisted of 20 normal persons (10 male and 10 female) without laryngeal pathology. The speech sample included /a/, /e/, /i/, /o/, /u/ of Korean oral vowels. The spectrum of each vowel was analysed by Nasal View and Real Analysis Program using Dr. Speech. The result showed that nasal intensity is decreased manifestly from F1 to F2. But oral intensity and Intensity is decreased little bit from F1 to F2. The most of values of nasal formant frequency is similarity oral formant frequency and Formant frequency or little bit smaller.

  • PDF

후두에 발생한 염증성 근섬유모세포종 1 례 (A Case of Laryngeal Inflammatory Myofibroblastic Tumor)

  • 박상규;김예슬;전현웅;송창면
    • 대한두경부종양학회지
    • /
    • 제35권2호
    • /
    • pp.71-75
    • /
    • 2019
  • Inflammatory myofibrolastic tumor (IMT) is a rare borderline neoplasm. It frequently occurs in the lung but occasionally occurs in extrapulmonary sites such as the genitourinary tract, gastrointestinal tract, breast, salivary glands, sinonasal tract, orbit, and the central nervous system. Laryngeal involvement of IMT is very rare. A 61-year-old woman who complained of hoarseness persisting for 3 months visited our hospital. Laryngoscopy showed an elevated lesion in the right true vocal cord. Incisional biopsy was confirmed as larygeal inflammatory myofibrolastic tumor. We performed a transoral excision with CO2 LASER under suspension examination. Regional recurrence or distant metastasis was not observed after 9 months of follow-up. Herein we report a case of larygeal inflammatory myofibrolastic tumor that was treated with surgery alone, with a literature review.

후두실과 진성대에 발생한 점액종 1예 (Myxoma in the Laryngeal Ventricle and the True Vocal Cord:A Case Report)

  • 김승우;염동진;강재호;김춘동
    • 대한두경부종양학회지
    • /
    • 제23권1호
    • /
    • pp.67-70
    • /
    • 2007
  • Myxoma is an uncertain mesenchymal cell origin, characterized by irregular round, stellate or spindle cells surrounded by the matrix containing abundant mucoid material and scant vascularity. Their occurrence in descending order of frequency is in the heart, subcutaneous tissue, bone and genitourinary tract. In the head and neck region, the most predilection sites are mandible and maxilla(more than 80%). Laryngeal myxoma is extremely rare:only 5 cases have been reported in the English literature. We report a rare case of laryngeal myxoma. A 60-year-old man with hoarseness visited the out-patient department. The mass was located between the vocal fold and the vocal ligament, filling with the left laryngeal ventricle. We planned the laryngo-microsurgery and successfully excised using $CO_2$ laser. The histopathologic finding revealed the myxoma. After 18 months of surgery, there is no evidence of recurrence and mucosal scarring in the vocal fold. This report is the first case of laryngeal myxoma involving the laryngeal ventricle and the true vocal cord together.

음악성 평가 지표 설계를 위한 성도 모양의 변화 분석 (Variation Analysis of Spectrogram for Indicators Design of Musicality Evaluation)

  • 김봉현;조동욱
    • 한국산학기술학회논문지
    • /
    • 제10권8호
    • /
    • pp.2110-2116
    • /
    • 2009
  • 문화 산업은 보건, 의료 산업과 함께 삶의 혜택을 누릴 수 있는 기회를 제공해 주는 분야라고 할 수 있을 정도로 현대 사회에서 많은 관심을 받고 있다. 특히, 대중적 지지 기반을 보유하고 있는 음악 산업은 대중성과 독창성이 함께 공존하여 감정을 표출하고 쉽게 접근할 수 있는 예술적 가치로 인정받고 있다. 본 논문에서는 이러한 음악산업에서 핵심적인 부분이라 할 수 있는 가수의 음악적 재능을 평가하는 지표를 설계하고자 한다. 이를 위해 동일한 음악에 대한 가수의 목소리와 일반인의 목소리에서 성도의 모양 변화에 대한 분석을 수행하기 위해 스펙트로그램 분석 요소를 적용하였으며 결과 파형의 패턴 분석을 실험하여 두 집단간의 비교, 분석을 수행하였다. 따라서 실험에 사용될 대중적 음악을 선정하고 동일 부분에 대한 가수와 일반인의 목소리를 수집하여 시간의 흐름에 따른 성도 모양의 변화를 패턴 분석하고 이를 비교, 분석하여 음악성을 평가할 수 있는 지표를 설계하였다.

감정 음성 인식을 위한 강인한 음성 파라메터 (Robust Speech Parameters for the Emotional Speech Recognition)

  • 이규현;김원구
    • 한국지능시스템학회논문지
    • /
    • 제22권6호
    • /
    • pp.681-686
    • /
    • 2012
  • 본 논문에서는 강인한 감정 음성 인식 시스템을 개발하기 위하여 감정의 영향을 적게 받는 음성 파라메터에 대한 연구를 수행하였다. 이러한 목적을 위하여 다양한 감정이 포함된 데이터를 사용하여 감정이 음성 인식 시스템과 음성 파라메터에 미치는 영향을 분석하였다. 본 연구에서는 멜 켑스트럼, 델타 멜 켑스트럼, RASTA 멜 켑스트럼, 루트 켑스트럼, PLP 계수와 성도 길이 정규화 방법에서 주파수 와핑된 멜 켑스트럼 계수를 사용하였다. 또한 신호 편의 제거 방법으로 CMS 방법과 SBR 방법이 사용되었다. 실험결과에서 성도정규화 방법을 사용한 RASTA 멜 켑스트럼, 델타 멜 켑스트럼 및 CMS 방법을 사용한 경우가 HMM 기반의 화자독립 단독음 인식 실험 결과에서 가장 우수한 결과를 나타내었다.