• 제목/요약/키워드: fundamental frequency of speech

검색결과 205건 처리시간 0.03초

한국어 폐쇄음 음향단서의 다차원 표현 (Multi-dimenstional Representation of Acoustic Cues for Korean Stops)

  • 윤원희
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2005년도 춘계 학술대회 발표논문집
    • /
    • pp.25-28
    • /
    • 2005
  • The purpose of this paper is to represent values of acoustic cues for Korean oral stops in the multi-dimensional space, and to attempt to find possible relationships among acoustic cues through correlation coefficient analyses. The acoustic cues used for differentiation of 3 types of Korean stops are closure duration, voice onset time and fundamental frequency of a vowel after a stop. The values of these cues are plotted in the two and three dimensional space and see what the critical cues are for complete separation of different types of stops. Correlation coefficient analyses show that there are statistically significant relationships among acoustic cues but they are not strong enough to make a conjecture that there is a possible articulatory relationship among the mechanisms employed by the acoustic cues.

  • PDF

Role of amplitude and pitch in the perception of Japanese stop length contrasts

  • Idemaru, Kaori
    • 비교문화연구
    • /
    • 제24권
    • /
    • pp.112-119
    • /
    • 2011
  • This study presents experiments which examined the role of amplitude and fundamental frequency (f0) in the phonetic perception of short versus long stop length contrasts in Japanese (e.g., [t] vs. [tt]). Stop length contrasts are normally characterized by differences in the duration of stop closures. However, closure duration can be unreliable as a perceptual cue when one considers variability in the rate at which people speak. Acoustically, the amplitude and f0 of the vowel following stop consonants are known to covary with the length distinction of stops in Japanese. Given this fact, the current study examined amplitude and f0 as potential secondary cues to the distinction. The results indicate that even though both amplitude and f0 are robust correlates, Japanese listeners do not use these cues in categorizing short versus long stops.

Variational autoencoder for prosody-based speaker recognition

  • Starlet Ben Alex;Leena Mary
    • ETRI Journal
    • /
    • 제45권4호
    • /
    • pp.678-689
    • /
    • 2023
  • This paper describes a novel end-to-end deep generative model-based speaker recognition system using prosodic features. The usefulness of variational autoencoders (VAE) in learning the speaker-specific prosody representations for the speaker recognition task is examined herein for the first time. The speech signal is first automatically segmented into syllable-like units using vowel onset points (VOP) and energy valleys. Prosodic features, such as the dynamics of duration, energy, and fundamental frequency (F0), are then extracted at the syllable level and used to train/adapt a speaker-dependent VAE from a universal VAE. The initial comparative studies on VAEs and traditional autoencoders (AE) suggest that the former can efficiently learn speaker representations. Investigations on the impact of gender information in speaker recognition also point out that gender-dependent impostor banks lead to higher accuracies. Finally, the evaluation on the NIST SRE 2010 dataset demonstrates the usefulness of the proposed approach for speaker recognition.

음성과 사상체질: 음원을 중심으로 (Voice and Sasang Constitution: In terms of source functions)

  • 문승재;박종주;황혜정
    • 대한음성학회지:말소리
    • /
    • 제48호
    • /
    • pp.19-33
    • /
    • 2003
  • Sasang Constitutional Medicine, a branch of traditional Korean medicine, believes that the health of human beings can be promoted by taking advantage of the fact that people have different constitutions. It utilizes the characteristics in human voice to diagnose the constitution of the patients. This study aims at establishing the relationship between Sasang constitutions and their corresponding voice characteristics by investigating source-related variables. Voice recordings of 23 patients from three different constitutions were obtained whose constitutions had been already diagnosed by the experts in the fields. Fundamental frequency related variables (average pitch, maximum/minimum pitch, pitch range), phonation type, speaking tempo were measured and analyzed for each group. The phonation type seemed to be a possible candidate for a successful variable to determine constitution. No statistically significant relationship was manifested between other variables and constitutions. Despite its failure to firmly establish the relationship between voice and constitutions, the current study suggests that future research should include not only source-related variables

  • PDF

청각장애아동의 음성 및 조음 특성 연구 (A Study On Voice and Articulation in Children with Hearing Impairment)

  • 박희정;채정희;박현;신혜정;석동일
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2003년도 10월 학술대회지
    • /
    • pp.129-132
    • /
    • 2003
  • The purpose of this study was to investigate the fundamental frequency(Fo) of voice signal, the first to the third(F1-F3), and duration in children with hearing impairment. Each subject made a recording of sustained /i/ and /a/, four VbV as and four VsV. The Praat 4.1.6. was used for analysis. The results of this study were as follows: First, F0 of children with hearing impairment were higher than normal children. Second, /a/ vowel was showed that F1, F2 and duration were higher than normal children. Third, /i/ vowel was showed that F1 and duration were higher than normal children. However, F2 was lower than normal children. Therapeutic implications have been drawn.

  • PDF

사용자 성격 적응형 영어학습 도구에 관한 연구 (Developing English Language Learning Tools Adaptable to Users' Personality)

  • 이인의;권순일;이경랑;김수연
    • 한국정보처리학회:학술대회논문집
    • /
    • 한국정보처리학회 2012년도 추계학술발표대회
    • /
    • pp.1649-1652
    • /
    • 2012
  • 본 연구에서는 사용자의 성격패턴을 사용자의 대화음성 정보만으로 자동 분류할 수 있는 방법과 이를 기반으로 사용자의 성격 맞춤형 학습전략을 적용하는 애플리케이션을 개발하는 것을 목적으로 하였다. 음성대화 속의 발화된 말의 빠르기(speech rate)나 말소리의 크기, 기본주파수(fundamental frequency)의 값과 그들의 변화패턴, 그리고 묵음구간의 여러 가지 통계적 정보 같은 비언어적 단서를 활용하여 성격패턴을 최고 86.3% 까지 정확하게 인식해 낼 수 있었다. 또한 성격 별 영어단어 학습방법을 개발하여 사전 및 사후테스트를 기반으로 실험한 결과 약 24% 성적 향상을 보였다. 이 연구를 통해 확보되는 원천기술은 각종 에듀테인먼트 콘텐츠에는 물론 로봇과의 대화시스템, 치료나 재활을 위한 기능성 콘텐츠 등에 유용하게 사용될 것이다.

동일 후적자가 산출하는 기관식도 발성($PROVOX^{(R)}$ 발성)과 식도 발성에 대한 음향학적 및 공기역학적 특성 비교 (The Comparison of the Acoustic and Aerodynamic Characteristics of $PROVOX^{(R)}$ Voice and Esophageal Voice Produced by the Same Laryngectomee)

  • 표화영;최홍식;임성은;최성희
    • 음성과학
    • /
    • 제5권1호
    • /
    • pp.121-139
    • /
    • 1999
  • Our experimental subject was a laryngectomee who had undergone total laryngectomy with $PROVOX^{(R)}$ insertion, and learned esophageal speech after the surgery, so he could produce both $PROVOX^{(R)}$ voice and esophageal voice. With this subject's production of $PROVOX^{(R)}$ and esophageal voice, we are to compare the acoustic and aerodynamic characteristics of the two voices, under the same physical conditions of the same person. As a result, the fundamental frequency of esophageal voice was 137.2 Hz, and that of $PROVOX^{(R)}$ was 97.5 Hz. $PROVOX^{(R)}$ voice showed lower jitter, shimmer and NHR than esophageal voice, which means that $PROVOX^{(R)}$ voice showed better voice quality than esophageal voice. In spectrographic analysis, the formation of formants and pseudoformants were more distinct in esophageal voice and several temporal aspects of acoutic features such as VOT and closure duration were more similar with normal voice in $PROVOX^{(R)}$ voice. During the sentence utterance, esophageal voice showed longer pause or silence duration than $PROVOX^{(R)}$ voice. Maximum phonation time and mean flow rate of $PROVOX^{(R)}$ voice were much longer and larger than esophageal voice, but mean and range of sound pressure level, subglottic pressure and voice efficiency were similar in the two voices. Glottal resistance of esophageal voice was much larger than $PROVOX^{(R)}$ voice which showed still larger glottal resistance than normal voice.

  • PDF

비강 공명이 한국어 모음에 미치는 음향학적 영향 (Effect of the Nasal Cavity Resonance on the Acoustic Characteristics of Korean Vowels)

  • 성명훈;오승하;강명구;고태용;김광현;김진영
    • 대한후두음성언어의학회지
    • /
    • 제4권1호
    • /
    • pp.24-32
    • /
    • 1991
  • Cleft palate or velopharyngeal incompetence shows many disorders and disabilities affecting speech transmission. including distortion. substitution. and the nasalization of the vowels. The nasalized vowels are produced primarily by lowering of the velum. resulting in opening a side passage for the air flow through the nasal cavity. These abnormal movements give rise to complex modification of the physical property of the sound or in the sound spectrum. The authors employed Sonagraph$^{\circledR}$ as a sound analyzer in order to ascertain the features which characterize the nasalization of vowels. Twenty healthy Korean male adult voluteers were analyzed in artificial conditions of anterior and posterior nasal obstruction. and velo-pharyngeal incompetence. The results were as follows : 1) Fundamental frequency was not changed by nasal obstruction or velopharyngeal incompetence. 2) There was no significant difference of the formant intensity between normal and nasal vowels. 3) In VPI, a decrease of the frequency of $F_2$ was observed in /e/ and /i/ vowels(p<0.001). 4) In VPI, the $F_2$ was frequently missed in /o/ and /u/ vowels. 5) In the consonant spectra of VPI, the 'release burst' was usually not observed.

  • PDF

A Study on Correcting Korean Pronunciation Error of Foreign Learners by Using Supporting Vector Machine Algorithm

  • Jang, Kyungnam;You, Kwang-Bock;Park, Hyungwoo
    • International Journal of Advanced Culture Technology
    • /
    • 제8권3호
    • /
    • pp.316-324
    • /
    • 2020
  • It has experienced how difficult People with foreign language learning, it is to pronounce a new language different from the native language. The goal of various foreigners who want to learn Korean is to speak Korean as well as their native language to communicate smoothly. However, each native language's vocal habits also appear in Korean pronunciation, which prevents accurate information transmission. In this paper, the pronunciation of Chinese learners was compared with that of Korean. For comparison, the fundamental frequency and its variation of the speech signal were examined and the spectrogram was analyzed. The Formant frequencies known as the resonant frequency of the vocal tract were calculated. Based on these characteristics parameters, the classifier of the Supporting Vector Machine was found to classify the pronunciation of Koreans and the pronunciation of Chinese learners. In particular, the linguistic proposition was scientifically proved by examining the Korean pronunciation of /ㄹ/ that the Chinese people were not good at pronouncing.

노인에서 성대 용종의 후두 미세수술 후 음성검사 결과 (Result of Voice Analysis after Laryngeal Microsurgery for Vocal Polyp in Elderly)

  • 최정임;여장옥;진성민;이상혁
    • 대한후두음성언어의학회지
    • /
    • 제22권1호
    • /
    • pp.47-51
    • /
    • 2011
  • Background and Objectives: Vocal polyps arc one of the most frequent benign laryngeal diseases. They arc usually found at the midpoint of the vocal fold. They are mainly caused by vocal overuse. Vocal polyps arc usually removed surgically. Generally, age-related changes to speech are attributed to change in anatomy and physiology of the speech mechanism. These changes result in increased variability in the acoustic properties of speech with age. Still, not 'all studies of age-related changes in speech have taken differences between the young group and adult group after laryngeal microsurgery into account. The aim of this investigation was to compare improvement of acoustic analysis in young patients and elderly patients with vocal polyps, before and after the laryngeal microsurgery. Materials and Method: One hundred and twenty-eight patients who underwent laryngeal microsurgery for vocal polyps from 2008 through 2011 were reviewed retrospectively. 105 of the 128 patients under age 60 were classified as adult group (AG), and remaining 23 patients as elderly group (EG). The speech of AG and EG were evaluated before and after surgery for identification of differences for age group across measures of fundamental frequency (F0), Jitter, Shimmer and Maximum phonation time (MPT). Results: There were not significant differences between two groups for improvement of F0, Jitter, Shimmer, NHR, and MPT before and after surgery. The findings suggest that elderly group compares quite well with adult group in effectiveness of surgery. However, comparison between elderly group and young group (Age under 40) there was significant difference of improvement in Jitter and Shimmer. Conclusion: In general, the results of the present research showed significant improvement in vocal quality after phonosurgery of vocal polyp in both elderly and adult group. However, comparison of improvement between elderly group and young group, there were significant differences of improvement in jitter and shimmer. Therefore, in treatment planning of elderly group, we should consider age related changes of vocal cord.

  • PDF