• Title/Summary/Keyword: Speech Understanding

Search Result 190, Processing Time 0.025 seconds

A Comparative Study of Korean and French Vowel Systems -An Experimental Phonetic and Phonological Perspective-

  • Kim, Seon-Jung;Lee, Eun-Yung
    • Speech Sciences
    • /
    • v.8 no.1
    • /
    • pp.53-66
    • /
    • 2001
  • This paper aims to investigate the acoustic characteristics of the vowels attested in Korean and French and to seek a way of understanding them from a phonological point of view. We first compare the two vowel systems by measuring the actual frequencies of the formants using CSL. It is shown that the first and second formants vary in wider range in French compared to Korean. In order to understand the two vowel systems from a phonological point of view, we apply the theory of Licensing Constraints, proposed and developed by Kaye (1994), and Charette and Kaye (1994). We propose the licensing constraints placed upon the vowels both in Korean and French. For Korean, we propose the licensing constraints such that both elements I and U must be heads. For French, we claim the following licensing constraints: U in a headed expression must be head, A cannot be head, and Nothing can only license an expression A in it.

  • PDF

ConWis: Assistive Software for People with Hearing and Speaking Disorders

  • Kodirov, Khasanboy;Kodirov, Khusanboy;Lee, Young-Hee
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2019.10a
    • /
    • pp.678-679
    • /
    • 2019
  • In this paper, we developed a medical computer application for both disable children and adults in order to provide the chance to communicate easily with others. Although there are many mobile healthcare apps available nowadays, we believe that users should also have many options for choosing different types of healthcare programs developed for computers. That's why we have developed ConWis. This application helps a person with hearing loss, voice, speech, or language disorder to communicate easily with others. Through this software, hearing and understanding what is being said more clearly or to express thoughts become easier. To use this software, patient should input a sentence and it will be converted to audio speech using built-in voices for man or woman. In addition to that, it can convert voice that is received by microphone into text and display it on the screen.

How does focus-induced prominence modulate phonetic realizations for Korean word-medial stops?

  • Choi, Jiyoun
    • Phonetics and Speech Sciences
    • /
    • v.12 no.4
    • /
    • pp.57-61
    • /
    • 2020
  • Previous research has indicated that the patterns of phonetic modulations induced by prominence are not consistent across languages but are conditioned by sound systems specific to a given language. Most studies examining the prominence effects in Korean have been restricted to segments in word-initial and phrase-initial positions. The present study, thus, set out to explore the prominence effects for Korean stop consonants in word-medial intervocalic positions. A total of 16 speakers of Seoul Korean (8 males, 8 females) produced word-medial intervocalic lenis and aspirated stops with and without prominence. The prominence was induced by contrast focus on the phonation-type contrast, that is, lenis vs. aspirated stops. Our results showed that F0 of vowels following both lenis and aspirated stops became higher when the target stops received focus than when they did not, whereas voice onset time (VOT) and voicing during stop closure for both lenis and aspirated stops did not differ between the focus and no-focus conditions. The findings add to our understanding of diverse patterns of prominence-induced strengthening on the acoustic realizations of segments.

A Character Speech Animation System for Language Education for Each Hearing Impaired Person (청각장애우의 언어교육을 위한 캐릭터 구화 애니메이션 시스템)

  • Won, Yong-Tae;Kim, Ha-Dong;Lee, Mal-Rey;Jang, Bong-Seog;Kwak, Hoon-Sung
    • Journal of Digital Contents Society
    • /
    • v.9 no.3
    • /
    • pp.389-398
    • /
    • 2008
  • There has been some research into a speech system for communications between those who are hearing impaired and those who hear normally, but the system has been pursued in inefficient teaching ways in which existing teachers teach each individual due to social indifference and a lack of marketability. In order to overcome such a weakness, there appeared to be a need to develop contents utilizing 3D animation and digital technology. For the investigation of a standard face and a standard spherical shape for the preparation of a character, the study collected sufficient data concerning students in the third-sixth grades in elementary schools in Seoul and Gyeonggi, Korea, and drew up standards for a face and a spherical shape of such students. This data is not merely the basic data of content development for the hearing impaired, but it can also offer a standard measurement and a standard type realistically applicable to them. As a system for understanding conversations by applying 3D character animation and educating self-expression, the character speech animation system supports effective learning for language education for hearing impaired children who need language education within their families and in special education institutions with the combination of 3D technology and motion capture.

  • PDF

Influence of Stimulus Polarity on the Auditory Brainstem Response From Level-Specific Chirp

  • Dzulkarnain, Ahmad Aidil Arafat;Salamat, Sabrina;Shahrudin, Fatin Amira;Jamal, Fatin Nabilah;Zakaria, Mohd Normani
    • Journal of Audiology & Otology
    • /
    • v.25 no.4
    • /
    • pp.199-208
    • /
    • 2021
  • Background and Objectives: No known studies have investigated the influence of stimulus polarity on the Auditory Brainstem Response (ABR) elicited from level-specific (LS) chirp. This study is important as it provides a better understanding of the stimulus polarity selection for ABR elicited from LS chirp stimulus. We explored the influence of stimulus polarity on the ABR from LS chirp compared to the ABR from click at 80 dBnHL in normal-hearing adults. Subjects and Methods: Nineteen adults with normal hearing participated. The ABRs were acquired using click and LS chirp stimuli using three stimulus polarities (rarefaction, condensation, and alternating) at 80 dBnHL. The ABRs were tested only on the right ear at a stimulus rate of 33.33 Hz. The ABR test was stopped when the recording reached the residual noise level of 0.04 μV. The ABRs amplitudes, absolute latencies, inter-peak latencies (IPLs), and the recorded number of averages were statistically compared among ABRs at different stimulus polarities and stimuli combinations. Results: Rarefaction polarity had the largest ABR amplitudes and SNRs compared with other stimulus polarities in both stimuli. There were marginal differences in the absolute latencies and IPLs among stimulus polarities. No significant difference in the number of averages required to reach the stopping criteria was found. Conclusions: Stimulus polarities have a significant influence on the ABR to LS chirp. Rarefaction polarity is recommended for clinical use because of its larger ABR peak I, III, and V amplitudes than those of the other stimulus polarities.

Influence of Stimulus Polarity on the Auditory Brainstem Response From Level-Specific Chirp

  • Dzulkarnain, Ahmad Aidil Arafat;Salamat, Sabrina;Shahrudin, Fatin Amira;Jamal, Fatin Nabilah;Zakaria, Mohd Normani
    • Korean Journal of Audiology
    • /
    • v.25 no.4
    • /
    • pp.199-208
    • /
    • 2021
  • Background and Objectives: No known studies have investigated the influence of stimulus polarity on the Auditory Brainstem Response (ABR) elicited from level-specific (LS) chirp. This study is important as it provides a better understanding of the stimulus polarity selection for ABR elicited from LS chirp stimulus. We explored the influence of stimulus polarity on the ABR from LS chirp compared to the ABR from click at 80 dBnHL in normal-hearing adults. Subjects and Methods: Nineteen adults with normal hearing participated. The ABRs were acquired using click and LS chirp stimuli using three stimulus polarities (rarefaction, condensation, and alternating) at 80 dBnHL. The ABRs were tested only on the right ear at a stimulus rate of 33.33 Hz. The ABR test was stopped when the recording reached the residual noise level of 0.04 μV. The ABRs amplitudes, absolute latencies, inter-peak latencies (IPLs), and the recorded number of averages were statistically compared among ABRs at different stimulus polarities and stimuli combinations. Results: Rarefaction polarity had the largest ABR amplitudes and SNRs compared with other stimulus polarities in both stimuli. There were marginal differences in the absolute latencies and IPLs among stimulus polarities. No significant difference in the number of averages required to reach the stopping criteria was found. Conclusions: Stimulus polarities have a significant influence on the ABR to LS chirp. Rarefaction polarity is recommended for clinical use because of its larger ABR peak I, III, and V amplitudes than those of the other stimulus polarities.

Comparison of subjective voice symptoms in elite vocal performers and professional voice users (전문 음성사용자와 직업적 음성사용자의 주관적 음성증상 비교)

  • Ji-sung Kim
    • Phonetics and Speech Sciences
    • /
    • v.15 no.4
    • /
    • pp.27-34
    • /
    • 2023
  • This study aimed to provide knowledge helpful for understanding voice problems related to occupations in the clinical field through an investigation and comparison of subjective vocal symptoms of 12 professional actors and 12 speech-language pathologists Among the 11 symptoms, "Difficulty with high pitch when singing," "Hypertension in the neck when speaking," and "Feel voice fatigue" were the most frequent symptoms in both groups. Additionally, the professional voice users reported a higher frequency of "Difficulty with high pitch when singing" (p=.049), "Hoarse voice" (p=.021), "Difficulty (requiring effort) when speaking" (p=.032), "Pain in the neck when speaking" (p=.009), and "Feel vocal fatigue" (p=.018) than the elite vocal performer group. This may be due to the different voice-related environments and differences in voice demands during occupational activities between the two groups.

Surgical treatment of velopharyngeal insufficiency

  • Nam, Seung Min
    • Archives of Craniofacial Surgery
    • /
    • v.19 no.3
    • /
    • pp.163-167
    • /
    • 2018
  • Velopharyngeal insufficiency (VPI) is a common complication after primary palatoplasty. Although the several surgical treatments of VPI have been introduced, there is no consensus guide to select the optimal surgical treatment for VPI patients. The selection of surgical treatment for VPI depends on a multimodal patient evaluation, such as perceptual speech evaluation, nasometery and nasoendoscopy. We can provide more adequate treatment for VPI through the deeper understanding of anatomy and physiology in VPI.

Speech Recognition in the Pager System displaying Defined Sentences (문자출력 무선호출기를 위한 음성인식 시스템)

  • Park, Gyu-Bong;Park, Jeon-Gue;Suh, Sang-Weon;Hwang, Doo-Sung;Kim, Hyun-Bin;Han, Mun-Sung
    • Annual Conference on Human and Language Technology
    • /
    • 1996.10a
    • /
    • pp.158-162
    • /
    • 1996
  • 본 논문에서는 문자출력이 가능한 무선호출기에 음성인식 기술을 접목한, 특성화된 한 음성인식 시스템에 대하여 설명하고자 한다. 시스템 동작 과정은, 일단 호출자가 음성인식 서버와 접속하게 되면 서버는 호출자의 자연스런 입력음성을 인식, 그 결과를 문장 형태로 피호출자의 호출기 단말기에 출력시키는 방식으로 되어 있다. 본 시스템에서는 통계적 음성인식 기법을 도입하여, 각 단어를 연속 HMM으로 모델링하였다. 가우시안 혼합 확률밀도함수를 사용하는 각 모델은 전통적인 HMM 학습법들 중의 하나인 Baum-Welch 알고리듬에 의해 학습되고 인식시에는 이들에 비터비 빔 탐색을 적용하여 최선의 결과를 얻도록 한다. MFCC와 파워를 혼용한 26 차원 특징벡터를 각 프레임으로부터 추출하여, 최종적으로, 83 개의 도메인 어휘들 및 무음과 같은 특수어휘들에 대한 모델링을 완성하게 된다. 여기에 구문론적 기능과 의미론적 기능을 함께 수행하는 FSN을 결합시켜 자연발화음성에 대한 연속음성인식 시스템을 구성한다. 본문에서는 이상의 사항들 외에도 음성 데이터베이스, 레이블링 등과 갈이 시스템 성능과 직결되는 시스템의 외적 요소들에 대해 고찰하고, 시스템에 구현되어 있는 다양한 특성들에 대해 밝히며, 실험 결과 및 앞으로의 개선 방향 등에 대해 논의하기로 한다.

  • PDF

Machine Learning Based Domain Classification for Korean Dialog System (기계학습을 이용한 한국어 대화시스템 도메인 분류)

  • Jeong, Young-Seob
    • Journal of Convergence for Information Technology
    • /
    • v.9 no.8
    • /
    • pp.1-8
    • /
    • 2019
  • Dialog system is becoming a new dominant interaction way between human and computer. It allows people to be provided with various services through natural language. The dialog system has a common structure of a pipeline consisting of several modules (e.g., speech recognition, natural language understanding, and dialog management). In this paper, we tackle a task of domain classification for the natural language understanding module by employing machine learning models such as convolutional neural network and random forest. For our dataset of seven service domains, we showed that the random forest model achieved the best performance (F1 score 0.97). As a future work, we will keep finding a better approach for domain classification by investigating other machine learning models.