• Title/Summary/Keyword: speech understanding

Search Result 188, Processing Time 0.025 seconds

An HMM-based Korean TTS synthesis system using phrase information (운율 경계 정보를 이용한 HMM 기반의 한국어 음성합성 시스템)

  • Joo, Young-Seon;Jung, Chi-Sang;Kang, Hong-Goo
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2011.07a
    • /
    • pp.89-91
    • /
    • 2011
  • In this paper, phrase boundaries in sentence are predicted and a phrase break information is applied to an HMM-based Korean Text-to-Speech synthesis system. Synthesis with phrase break information increases a naturalness of the synthetic speech and an understanding of sentences. To predict these phrase boundaries, context-dependent information like forward/backward POS(Part-of-Speech) of eojeol, a position of eojeol in a sentence, length of eojeol, and presence or absence of punctuation marks are used. The experimental results show that the naturalness of synthetic speech with phrase break information increases.

  • PDF

A Preliminary Study on Differences of Phonatory Offset-Onset between the Fluency and a Dysfluency (유창성과 비유창성 화자의 발성 종결-개시 차이에 관한 예비연구)

  • Han Ji-Yeon;Lee Ok-Bun
    • Proceedings of the KSPS conference
    • /
    • 2006.05a
    • /
    • pp.109-112
    • /
    • 2006
  • This study investigated the acoustical characteristics of phonatory offset-onset mechanisms. And this study shows the comparative results between non-stutterers (N=3) and a stutterer (N=1). Phonatory offset-onset means a laryngeal articulatory in the connected speech. In the phonetic context V_V), pattern 0(there is no changes) appeared in all subjects, and pattern 4(this indicate the trace of glottal fry and closure in spectrogram)was only in a Stutterer. In high vowels(/i/, /u/), pattern 3 and 4 appeared only in a stutterer. Although there is no common pattern among the non-stutterers, individual's preference pattern was founded. This study offers the key to an understanding of physiological movement on a block of stutter.

  • PDF

A corpus-based study on the effects of voicing and gender on American English Fricatives (성대진동 및 성별이 미국영어 마찰음에 미치는 효과에 관한 코퍼스 기반 연구)

  • Yoon, Tae-Jin
    • Phonetics and Speech Sciences
    • /
    • v.10 no.2
    • /
    • pp.7-14
    • /
    • 2018
  • The paper investigates the acoustic characteristics of English fricatives in the TIMIT corpus, with a special focus on the role of voicing in rendering fricatives in American English. The TIMIT database includes 630 talkers and 2,342 different sentences, and comprises more than five hours of speech. Acoustic analyses are conducted in the domain of spectral and temporal properties by treating gender, voicing, and place of articulation as independent factors. The results of the acoustic analyses revealed that acoustic signals interact in a complex way to signal the gender, place, and voicing of fricatives. Classification experiments using a multiclass support vector machine (SVM) revealed that 78.7% of fricatives are correctly classified. The majority of errors stem from the misclassification of /θ/ as [f] and /ʒ/ as [z]. The average accuracy of gender classification is 78.7%. Most errors result from the classification of female speakers as male speakers. The paper contributes to the understanding of the effects of voicing and gender on fricatives in a large-scale speech corpus.

A Parallel Speech Recognition Model on Distributed Memory Multiprocessors (분산 메모리 다중프로세서 환경에서의 병렬 음성인식 모델)

  • 정상화;김형순;박민욱;황병한
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.5
    • /
    • pp.44-51
    • /
    • 1999
  • This paper presents a massively parallel computational model for the efficient integration of speech and natural language understanding. The phoneme model is based on continuous Hidden Markov Model with context dependent phonemes, and the language model is based on a knowledge base approach. To construct the knowledge base, we adopt a hierarchically-structured semantic network and a memory-based parsing technique that employs parallel marker-passing as an inference mechanism. Our parallel speech recognition algorithm is implemented in a multi-Transputer system using distributed-memory MIMD multiprocessors. Experimental results show that the parallel speech recognition system performs better in recognition accuracy than a word network-based speech recognition system. The recognition accuracy is further improved by applying code-phoneme statistics. Besides, speedup experiments demonstrate the possibility of constructing a realtime parallel speech recognition system.

  • PDF

Study on Listening Diagnosis to Vocal Sound and Speech (문진(聞診) 중 성음(聲音).언어(言語)에 대한 연구)

  • Kim, Yong-Chan;Kang, Jung-Soo
    • Journal of Physiology & Pathology in Korean Medicine
    • /
    • v.20 no.2
    • /
    • pp.320-327
    • /
    • 2006
  • This study was written in order to help understanding of listening diagnosis to vocal sound and speech. The purpose of listening diagnosis is that we know states of essence(精), Qi(氣) and spirit(神). Vocal sound and speech are made by Qi and spirit. Vocal sound originates from the center of the abdominal region(丹田) and comes out through vocal organs, for example lung, larynx, nose, tongue, tooth, lip and so on. Speech is expressed by vocal sound and spirit. They are controled by the Five Vital organs(五臟). Various changes of vocal sound and speech observe the rules of yinyang. For example, if we consider patient likes to say or not, we can diagnose heat and coldness of illness. If we consider he speaks loudly or quietly, we can diagnose weak and severe of illness. If we consider he speaks clearly or thick, we can diagnose inside and outside of illness. If we consider he speaks damp or dry, we can diagnose yin and yang of illness. If we consider change of voice, we can diagnose new and old illness. Symptoms of changes of five voices, five sounds, dumbness and huskiness are due to abnormal vocal sound, and symptoms of changes of mad talk, mumble, sleep talking and so on are due to abnormal speech.

Differences in Patient Characteristics between Spasmodic Dysphonia and Vocal Tremor (연축성 발성장애와 음성 진전 환자의 감별)

  • Son, Hee Young
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.32 no.1
    • /
    • pp.9-14
    • /
    • 2021
  • Spasmodic dysphonia, essential tremor, and vocal tremor related with Parkinson's disease are different disorders showing fairly similar symptoms such as difficulty in the speech onset, and tremble in the voice. However, the cause and the resulting treatment of these diseases are different. Spasmodic dysphonia is a vocal disorder characterized by spasms of the laryngeal muscles during a speech, invoking broken, tense, forced, and strangled voice patterns. Such difficult-to-treat dysphonia disease is classified as central-origin-focal dystonia, of a yet unknown etiology. Its symptoms arise because of intermittent and involuntary muscle contractions during speech. Essential tremor, on the other hand, is characterized by a rhythmic laryngeal movement, resulting in alterations of rhythmic pitch and loudness during speech or even at rest. Severe cases of tremor may cause speech breaks like those of adductor spasmodic dysphonia. In the case of hyper-functional tension of vocal folds and accompanying tremors, it is necessary to distinguish these disorders from muscular dysfunction. A diversified assessment through the performance of specific speech tasks and a thorough understanding for the identification of the disorder is necessary for accurate diagnosis and effective treatment of patients with vocal tremors.

ToBI and beyond: Phonetic intonation of Seoul Korean ani in Korean Intonation Corpus (KICo)

  • Ji-eun Kim
    • Phonetics and Speech Sciences
    • /
    • v.16 no.1
    • /
    • pp.1-9
    • /
    • 2024
  • This study investigated the variation in the intonation of Seoul Korean interjection ani across different meanings ("no" and "really?") and speech levels (Intimate and Polite) using data from Korean Intonation Corpus (KICo). The investigation was conducted in two stages. First, IP-final tones in the dataset were categorized according to the K-ToBI convention (Jun, 2000). While significant relationships were observed between the meaning of ani and its IP-final tones, substantial overlap between groups was notable. Second, the F0 characteristics of the final syllable of ani were analyzed to elucidate the apparent many-to-many relationships between intonation and meaning/speech level. Results indicated that these seemingly overlapping relationships could be significantly distinguished. Overall, this study advocates for a deeper analysis of phonetic intonation beyond ToBI-based categorical labels. By examining the F0 characteristics of the IP-final syllable, previously unclear connections between meaning/speech level and intonation become more comprehensible. Although ToBI remains a valuable tool and framework for studying intonation, it is imperative to explore beyond these categories to grasp the "distinctiveness" of intonation, thereby enriching our understanding of prosody.

DialogStudio: A Spoken Dialog System Workbench (음성대화시스템 워크벤취로서의 DialogStudio 개발)

  • Jung, Sang-Keun;Lee, Cheong-Jae;Lee, Gary Geun-Bae
    • MALSORI
    • /
    • no.63
    • /
    • pp.101-112
    • /
    • 2007
  • Spoken dialog system development includes many laborious and inefficient tasks. Since there are many components such as speech recognition, language understanding, dialog management and knowledge management in a spoken dialog system, a developer should take an effort to edit corpus and train each model separately. To reduce a cost for editing corpus and training each model, we need more systematic and efficient working environment. For the working environment, we propose DialogStudio as a spoken dialog system workbench.

  • PDF

Phonology of Transcription (음운표기의 음운론)

  • Chung, Kook
    • Speech Sciences
    • /
    • v.10 no.4
    • /
    • pp.23-40
    • /
    • 2003
  • This paper examines transcription of sounds from a phonological perspective. It has found that most of transcriptions have been done on a segmental basis alone, without consideration of the whole phonological systems and levels, and without a full understanding of the nature of the linguistic and phonetic alphabets. In a word, sound transcriptions have not been done on the basis of the phonology of the language and the alphabet. This study shows a phonological model for transcribing foreign and native sounds, suggesting ways of improving some of the current transcription systems such as the Hangeul transcription of loan words and the romanization of Hangeul, as well as the phonetic transcription of English and other foreign languages.

  • PDF

Using Utterance and Semantic Level Confidence for Interactive Spoken Dialog Clarification

  • Jung, Sang-Keun;Lee, Cheong-Jae;Lee, Gary Geunbae
    • Journal of Computing Science and Engineering
    • /
    • v.2 no.1
    • /
    • pp.1-25
    • /
    • 2008
  • Spoken dialog tasks incur many errors including speech recognition errors, understanding errors, and even dialog management errors. These errors create a big gap between the user's intention and the system's understanding, which eventually results in a misinterpretation. To fill in the gap, people in human-to-human dialogs try to clarify the major causes of the misunderstanding to selectively correct them. This paper presents a method of clarification techniques to human-to-machine spoken dialog systems. We viewed the clarification dialog as a two-step problem-Belief confirmation and Clarification strategy establishment. To confirm the belief, we organized the clarification process into three systematic phases. In the belief confirmation phase, we consider the overall dialog system's processes including speech recognition, language understanding and semantic slot and value pairs for clarification dialog management. A clarification expert is developed for establishing clarification dialog strategy. In addition, we proposed a new design of plugging clarification dialog module in a given expert based dialog system. The experiment results demonstrate that the error verifiers effectively catch the word and utterance-level semantic errors and the clarification experts actually increase the dialog success rate and the dialog efficiency.