• Title/Summary/Keyword: Utterance

Search Result 382, Processing Time 0.025 seconds

Short utterance speaker verification using PLDA model adaptation and data augmentation (PLDA 모델 적응과 데이터 증강을 이용한 짧은 발화 화자검증)

  • Yoon, Sung-Wook;Kwon, Oh-Wook
    • Phonetics and Speech Sciences
    • /
    • v.9 no.2
    • /
    • pp.85-94
    • /
    • 2017
  • Conventional speaker verification systems using time delay neural network, identity vector and probabilistic linear discriminant analysis (TDNN-Ivector-PLDA) are known to be very effective for verifying long-duration speech utterances. However, when test utterances are of short duration, duration mismatch between enrollment and test utterances significantly degrades the performance of TDNN-Ivector-PLDA systems. To compensate for the I-vector mismatch between long and short utterances, this paper proposes to use probabilistic linear discriminant analysis (PLDA) model adaptation with augmented data. A PLDA model is trained on vast amount of speech data, most of which have long duration. Then, the PLDA model is adapted with the I-vectors obtained from short-utterance data which are augmented by using vocal tract length perturbation (VTLP). In computer experiments using the NIST SRE 2008 database, the proposed method is shown to achieve significantly better performance than the conventional TDNN-Ivector-PLDA systems when there exists duration mismatch between enrollment and test utterances.

Study of Boundary Tone in Mandarin Chinese (표준 중국어의 경계억양에 관한 연구)

  • Sohn Nam-Ho
    • Proceedings of the KSPS conference
    • /
    • 2003.05a
    • /
    • pp.43-47
    • /
    • 2003
  • This paper is phonetic study of $F_{0}$ range and boundary tone in Mandarin Chinese. The production data from 6 Chinese speakers show that there are declination, pitch resetting and tonal variation of boundary tone. In declarative sentence, $F_{0}$ declines gradually over the utterance but mid-sentence boundary prevents $F_{0}$ of following syllable from declining because of pitch resetting. $F_{0}$ range of syllable is expanded before the mid- and final sentence boundaries. In interrogative one, $F_{0}$ ascends gradually over the utterance and mid-sentence boundary makes $F_{0}$ of following syllable rise more. $F_{0}$ range of sentence final syllable is expanded and $F_{0}$ contour shows rising curve.

  • PDF

DTW based Utterance Rejection on Broadcasting News Keyword Spotting System (방송뉴스 핵심어 검출 시스템에서의 오인식 거부를 위한 DTW의 적용)

  • Park, Kyung-Mi;Park, Jeong-Sik;Oh, Yung-Hwan
    • Proceedings of the KSPS conference
    • /
    • 2005.11a
    • /
    • pp.155-158
    • /
    • 2005
  • Keyword spotting is effective to find keyword from the continuously pronounced speech. However, non-keyword may be accepted as keyword when the environmental noise occurs or speaker changes. To overcome this performance degradation, utterance rejection techniques using confidence measure on the recognition result have been developed. In this paper, we apply DTW to the HMM based broadcasting news keyword spotting system for rejecting non-keyword. Experimental result shows that false acceptance rate is decreased to 50%.

  • PDF

Speech Emotion Recognition by Speech Signals on a Simulated Intelligent Robot (모의 지능로봇에서 음성신호에 의한 감정인식)

  • Jang, Kwang-Dong;Kwon, Oh-Wook
    • Proceedings of the KSPS conference
    • /
    • 2005.11a
    • /
    • pp.163-166
    • /
    • 2005
  • We propose a speech emotion recognition method for natural human-robot interface. In the proposed method, emotion is classified into 6 classes: Angry, bored, happy, neutral, sad and surprised. Features for an input utterance are extracted from statistics of phonetic and prosodic information. Phonetic information includes log energy, shimmer, formant frequencies, and Teager energy; Prosodic information includes pitch, jitter, duration, and rate of speech. Finally a patten classifier based on Gaussian support vector machines decides the emotion class of the utterance. We record speech commands and dialogs uttered at 2m away from microphones in 5different directions. Experimental results show that the proposed method yields 59% classification accuracy while human classifiers give about 50%accuracy, which confirms that the proposed method achieves performance comparable to a human.

  • PDF

English listening error analyses based on intonation phrases (억양단위에 기초한 영어 청해 오류분석)

  • Lee Kyungmi
    • Proceedings of the KSPS conference
    • /
    • 2003.05a
    • /
    • pp.163-167
    • /
    • 2003
  • Intonation as suprasegmental phonetic features conveys meanings on the postlexical or utterance level in a linguistically structured way. It includes three aspects: tunes, relative prominence, and intonational phrasing. In this article, I will treat how prosodic phrasing is functionally related to the listening comprehension of English by analysing the students' errors of listening comprehension. When utterance meaning is conveyed, it is realized to be divided into intonational phrases. The small intonational phrase is regarded as an intermediate phrase which has a primary accent and a phrase tone or audible break. Most students' errors of listening occurred with linking pronunciation in the intermediate phrases of the fast speech. Thus through the smallest unit with tune we can help students improve their pronunciation and listening ability of English.

  • PDF

A Phonetic Study of Vowel Raising: A Closer Look at the Realization of the Suffix {-go} (모음 상승 현상의 음성적 고찰: 어미 {-고}의 실현을 중심으로)

  • LEE, HYANG WON;Shin, Jiyoung
    • Korean Linguistics
    • /
    • v.81
    • /
    • pp.267-297
    • /
    • 2018
  • Vowel raising in Korean has been primarily treated as a phonological, categorical change. This study aims to show how the Korean connective suffix {-go} is realized in various environments, and propose a principle of vowel raising based on both acoustic and perceptual data. To that end, we used a corpus of spoken Korean to analyze the types of syntactic constructions, the realization of prosodic boundaries (IP and PP), and the types of boundary tone associated with {-go}. It was found that the vowel tends to be raised most frequently in utterance-final position, while in utterance-medial position the vowel was raised more when the syntactic and prosodic distance between {-go} and the following constituent was smaller. The results for boundary tone also showed a correlation between vowel raising and the discourse function of the boundary tone. In conclusion, we propose that vowel raising is not simply an optional phenomenon, but rather a type of phonetic reduction related to the comprehension of the following constituent.

Korean Speech Act Tagging using Previous Sentence Features and Following Candidate Speech Acts (이전 문장 자질과 다음 발화의 후보 화행을 이용한 한국어 화행 분석)

  • Kim, Se-Jong;Lee, Yong-Hun;Lee, Jong-Hyeok
    • Journal of KIISE:Software and Applications
    • /
    • v.35 no.6
    • /
    • pp.374-385
    • /
    • 2008
  • Speech act tagging is an important step in various dialogue applications, which recognizes speaker's intentions expressed in natural language utterances. Previous approaches such as rule-based and statistics-based methods utilize the speech acts of previous utterances and sentence features of the current utterance. This paper proposes a method that determines speech acts of the current utterance using the speech acts of the following utterances as well as previous ones. Using the features of following utterances yields the accuracy 95.27%, improving previous methods by 3.65%. Moreover, sentence features of the previous utterances are employed to maximally utilize the information available to the current utterance. By applying the proper probability model for each speech act, final accuracy of 97.97% is achieved.

Study of Developing SOP for Extracting Stable Vocal Features for Accurate Diagnosis (음성의 안정적 변수 추출을 위한 SOP 개발 연구)

  • Kim, Keun-Ho;Jang, Jun-Su;Kim, Young-Su;Kim, Jong-Yeol
    • Journal of Physiology & Pathology in Korean Medicine
    • /
    • v.25 no.6
    • /
    • pp.1108-1112
    • /
    • 2011
  • Voice can be widely used to classify the four constitution types and to recognize one's health condition from extracting meaningful features as physical quantity in traditional Korean medicine or Western medicine. In this paper, we proposed the method to update the standard operating procedure (SOP) to acquire and record voices for extracting stable vocal features since they are sensitive to the variation of a subject's utterance. At first, we obtained pitch frequencies from vowels and the sentence and intensity form the sentence as features with voices acquired under subjects' utterance conditions and then the deviation ratios of features from median values according to the utterance conditions were obtained and the condition to minimize the ratio was selected as a new SOP. As a result, we decided the SOP for a subject to utter vowels with the length of 2s~1s and sentences with over 2s interval between them after practice, in consideration of the deviation and qualitative requirements. Stable voice features obtained from updated SOP produce accurate diagnosis, which will be developed and simplified for using in the u-Healthcare system of personalized medicine.

Ambivalence in "Hy$\breve{o}$nsil kwa Par$\breve{o}$n"'s Relationsip to Industrial Society, Mass Culture, and the City (산업사회, 대중문화, 도시에 대한 '현실과 발언'의 양가적 태도)

  • Shin, Chunghoon
    • The Journal of Art Theory & Practice
    • /
    • no.16
    • /
    • pp.41-69
    • /
    • 2013
  • The inauguration of the collective Reality and Utterance (Hy$\breve{o}$nsil kwa Par$\breve{o}$n) in 1979 and 1980 marked a watershed moment in Korean art. This is not only because the collective gave birth to the politically-engaged art movement that would come to be labeled "Minjung Art" by the middle of the 80s, but also because it enthusiastically embraced a wide range of images from the urban culture. With a special focus on the members' early work, my research explores an issue largely neglected in the dominant narrative of Minjung art as a form of activism against the authoritarian Korean government during the 80s. The issue is what was at stake in Reality and Utterance's exploration of contemporary urban visual culture. The aim of this essay is to recognize the engagement with the urban visual culture as central to the group's early project and to consider it at some distance from the anti-urban and anti-mass culture perspective which was endorsed by the Minjung narrative. Focusing on members' turn to urban visual culture, this essay instead argues that this turn was by no means merely a means to making art as social critique, but more importantly, it was an experiment with the shared image world, as opposed to the rarefied visual vocabularies of abstract modernism. Visual productions such as advertisements, billboards, posters, and kitsch paintings, which come from outside the narrow confines of fine art, were definitely ominous signs of the colonization of everyday life in the capitalist city, but at the same time they were anticipated to be a catalyst for redefining Korean art in a more communicative, accessible, and democratized way. In this regard, in the early 1980s-in particular 1980 and 1982-the members' gesture oscillated between critique and embrace, which allowed the group to occupy a unique domain in the realm of Korean art production.

  • PDF