• Title/Summary/Keyword: Utterance

Search Result 382, Processing Time 0.025 seconds

An analysis of Speech Acts for Korean Using Support Vector Machines (지지벡터기계(Support Vector Machines)를 이용한 한국어 화행분석)

  • En Jongmin;Lee Songwook;Seo Jungyun
    • The KIPS Transactions:PartB
    • /
    • v.12B no.3 s.99
    • /
    • pp.365-368
    • /
    • 2005
  • We propose a speech act analysis method for Korean dialogue using Support Vector Machines (SVM). We use a lexical form of a word, its part of speech (POS) tags, and bigrams of POS tags as sentence features and the contexts of the previous utterance as context features. We select informative features by Chi square statistics. After training SVM with the selected features, SVM classifiers determine the speech act of each utterance. In experiment, we acquired overall $90.54\%$ of accuracy with dialogue corpus for hotel reservation domain.

Some Acoustical Aspects of Korean Stops in Various Utterance Positions : focusing on their temporal characteristics (음성 환경에 따른 한국어 폐쇄음의 음향적 특성 : 시간적 특성을 중심으로)

  • Pae, Jae-Yeon;Shin, Ji-Young;Ko, Do-Heung
    • Speech Sciences
    • /
    • v.5 no.2
    • /
    • pp.139-159
    • /
    • 1999
  • The purposes of this study are two-folds: to find out the acoustic features of Korean stops in various utterance positions and their influence on the neighbouring segments. Korean stops($/p,\;p',\;p^{h};\;t,\;t',\;t^{h};\;k,\;k',\;k^{h}/$) are examined from CV, $V_1CV_2,\;V_1NCV_2,\;V_1LCV_2$ sequences. Three speakers (two male and one female speakers of Seoul dialect) served as subjects for the present study. VOT, closure duration of the target stops and duration of the neighbouring segments were measured from acoustic data. The results can be summarized as follows. First, stops show different temporal aspects depending on their place of articulation as well as their voice types. Velar stops tend to have shorter closure duration and longer VOT due to relatively slower movement of the articulator (i.e. tongue body) and higher supraglottal air pressure during the closure, respectively. Second, temporal aspects of the neighbouring segments appear to be influenced by the voice type of stop. The preceding segment tends to be longer when a stop has shorter duration. On the other hand, the following segment tends to be shorter, when a stop has longer VOT.

  • PDF

An Experimental Study on English Vowel Lengths as Produced by Korean College Students in Chungnam and Gyungnam Provinces (충남.경남지역 대학생들의 영어모음 발음길이에 대한 실험적 연구)

  • Park, Hee-Suk;Kim, Jung-Sook
    • Speech Sciences
    • /
    • v.10 no.3
    • /
    • pp.157-173
    • /
    • 2003
  • The purpose of this experimental study is to investigate and compare the. vowel lengths of English diphthongs and low vowels among native-English-speaking Americans with Korean college students from the Chungnam and Gyungnam provinces. Eight words and sixteen sentences were uttered five times by twenty five subjects from three groups; 1) Chungnam dialect speakers, 2) Gyungnam dialect speakers and 3) five native-English-speaking Americans. Acoustic features (duration) were measured from sound spectrograms made by the PC Quire. Results showed that the vowel lengths of English diphthongs and low vowels between native English speakers and Korean collegians of Chungnam and Gyungnam provinces were different. Comparing the average length of English diphthongs of Korean collegians with those of American natives, we can see that native English speakers tend to pronounce the English diphthongs shorter than Korean collegians do. However, native English speakers tend to pronounce the English low vowels longer than Korean collegians do. In this study we also tried to find out the differences of English diphthongs and low vowel lengths in relation to their utterance positions among American natives and Chungnam and Gyungnam dialect speakers. By the results of this experiment, we observed a lengthening effect in the three groups. However, in the pronunciation of American natives, a lengthening effect of English vowels was more clearly observed, especially in the pronunciation of English diphthongs.

  • PDF

Synthesis of Expressive Talking Heads from Speech with Recurrent Neural Network (RNN을 이용한 Expressive Talking Head from Speech의 합성)

  • Sakurai, Ryuhei;Shimba, Taiki;Yamazoe, Hirotake;Lee, Joo-Ho
    • The Journal of Korea Robotics Society
    • /
    • v.13 no.1
    • /
    • pp.16-25
    • /
    • 2018
  • The talking head (TH) indicates an utterance face animation generated based on text and voice input. In this paper, we propose the generation method of TH with facial expression and intonation by speech input only. The problem of generating TH from speech can be regarded as a regression problem from the acoustic feature sequence to the facial code sequence which is a low dimensional vector representation that can efficiently encode and decode a face image. This regression was modeled by bidirectional RNN and trained by using SAVEE database of the front utterance face animation database as training data. The proposed method is able to generate TH with facial expression and intonation TH by using acoustic features such as MFCC, dynamic elements of MFCC, energy, and F0. According to the experiments, the configuration of the BLSTM layer of the first and second layers of bidirectional RNN was able to predict the face code best. For the evaluation, a questionnaire survey was conducted for 62 persons who watched TH animations, generated by the proposed method and the previous method. As a result, 77% of the respondents answered that the proposed method generated TH, which matches well with the speech.

Possibility for Gamification based on Alternate Reality Game : Toward Pro-baseball Manager Game (대체현실게임기반의 게임화 가능성 : 프로야구매니저게임을 중심으로)

  • Han, Sang Geun;Song, Seung Keun
    • Smart Media Journal
    • /
    • v.4 no.1
    • /
    • pp.52-57
    • /
    • 2015
  • Currently the gamification as marketing technique has appeared in the field non-related with game, applied to game elements. The alternate reality game pulled down the boundary between the real and the virtual has come out. New trend game has shown beyond the range of the existing game. The possibility has import into online baseball simulation game. The objective of this research is the exploratory approach to build GOMS model through three subjects to the famous pro-baseball manager game. After game playing session, three subjects' utterance with game experience was recorded. We built the goal hierarchy of goal in GOMS to pro-baseball manager game to analyze the vocabulary in three subjects' utterance. We try to find the possibility of gamification in alternate reality game interacting between the real and the virtual, and demolishing them.

Characteristics of voice quality on clear versus casual speech in individuals with Parkinson's disease (명료발화와 보통발화에서 파킨슨병환자 음성의 켑스트럼 및 스펙트럼 분석)

  • Shin, Hee-Baek;Shim, Hee-Jeong;Jung, Hun;Ko, Do-Heung
    • Phonetics and Speech Sciences
    • /
    • v.10 no.2
    • /
    • pp.77-84
    • /
    • 2018
  • The purpose of this study is to examine the acoustic characteristics of Parkinsonian speech, with respect to different utterance conditions, by employing acoustic/auditory-perceptual analysis. The subjects of the study were 15 patients (M=7, F=8) with Parkinson's disease who were asked to read out sentences under different utterance conditions (clear/casual). The sentences read out by each subject were recorded, and the recorded speech was subjected to cepstrum and spectrum analysis using Analysis of Dysphonia in Speech and Voice (ADSV). Additionally, auditory-perceptual evaluation of the recorded speech was conducted with respect to breathiness and loudness. Results indicate that in the case of clear speech, there was a statistically significant increase in the cepstral peak prominence (CPP), and a decrease in the L/H ratio SD (ratio of low to high frequency spectral energy SD) and CPP F0 SD values. In the auditory-perceptual evaluation, a decrease in breathiness and an increase in loudness were noted. Furthermore, CPP was found to be highly correlated to breathiness and loudness. This provides objective evidence of the immediate usefulness of clear speech intervention in improving the voice quality of Parkinsonian speech.

Fluency Scoring of English Speaking Tests for Nonnative Speakers Using a Native English Phone Recognizer

  • Jang, Byeong-Yong;Kwon, Oh-Wook
    • Phonetics and Speech Sciences
    • /
    • v.7 no.2
    • /
    • pp.149-156
    • /
    • 2015
  • We propose a new method for automatic fluency scoring of English speaking tests spoken by nonnative speakers in a free-talking style. The proposed method is different from the previous methods in that it does not require the transcribed texts for spoken utterances. At first, an input utterance is segmented into a phone sequence by using a phone recognizer trained by using native speech databases. For each utterance, a feature vector with 6 features is extracted by processing the segmentation results of the phone recognizer. Then, fluency score is computed by applying support vector regression (SVR) to the feature vector. The parameters of SVR are learned by using the rater scores for the utterances. In computer experiments with 3 tests taken by 48 Korean adults, we show that speech rate, phonation time ratio, and smoothed unfilled pause rate are best for fluency scoring. The correlation of between the rater score and the SVR score is shown to be 0.84, which is higher than the correlation of 0.78 among raters. Although the correlation is slightly lower than the correlation of 0.90 when the transcribed texts are given, it implies that the proposed method can be used as a preprocessing tool for fluency evaluation of speaking tests.

An Analysis of Tonal Characteristics in Pre-school Children's Word Utterance (학령전기 아동 발화 단어의 선율 특성 분석)

  • Yi, Soo Yon;Chong, Hyun Ju
    • Phonetics and Speech Sciences
    • /
    • v.7 no.2
    • /
    • pp.85-94
    • /
    • 2015
  • This study is to investigate the characteristic of tonal elements in word utterance of 30 pre-school children. For the analyses, 240 utterances of 4 syllable words were processed to extract acoustic values and then the data was transformed into tonal height in order to examine the contour. The results show that the mean pitch of a note is $C4{\frac{1}{2}}(271.17Hz)$ and high and low pitched notes are $C5{\frac{1}{2}}(452.57Hz)$ and $G{\sharp}3{\frac{1}{2}}(192.54Hz)$. The pitch patterns of the 4 syllables measured at the frication and aspiration portion are $E4{\frac{1}{2}}-F4-B3{\frac{1}{2}}-A3$ and F4-E4-B3-A3. The pitch patterns of consonant clusters are $B3{\frac{1}{2}}-D4-B3{\frac{1}{2}}-A3{\frac{1}{2}}$ and $A{\sharp}3{\frac{1}{2}}-C4-A3-D4{\frac{1}{2}}$. The analyses of tonal elements in this study provide evidentiary data on tonal height helpful for developing melodic contour.

Early Vocalization and Phonological Developments of Typically Developing Children: A longitudinal study (일반 영유아의 초기 발성과 음운 발달에 관한 종단 연구)

  • Ha, Seunghee;Park, Bora
    • Phonetics and Speech Sciences
    • /
    • v.7 no.2
    • /
    • pp.63-73
    • /
    • 2015
  • This study investigated longitudinally early vocalization and phonological developments of typically developing children. Ten typically developing children participated in the study from 9 months to 18 months of age. Spontaneous utterance samples were collected at 9, 12, 15, 18 months of age and phonetically transcribed and analyzed. Utterance samples were classified into 5 levels using Stark Assessment of Early Vocal Development-Revised(SAEVD-R). The data analysis focused on 4 and 5 levels of vocalizations classified by SAEVD-R and word productions. The percentage of each vocalization level, vocalization length, syllable structures, and consonant inventory were obtained. The results showed that the percentages of level 4 and 5 vocalizations and word significantly increased with age and the production of syllable structures containing consonants significantly increased around 12 and 15 months of age. On average, the children produced 4 types of syllable structure and 5.4 consonants at 9 months and they produced 5 types of syllable structure and 9.8 consonants at 18 months. The phonological development patterns in this study were consistent with those analyzed from children's meaningful utterances in previous studies. The results support the perspective on the continuity between babbling and early speech. This study has clinical implications in early identification and speech-language intervention for young children with speech delays or at risk.

Speech Rhythm and the Three Aspects of Speech Timing: Articulatory, Acoustic and Auditory

  • Yun, Il-Sung
    • Speech Sciences
    • /
    • v.8 no.1
    • /
    • pp.67-76
    • /
    • 2001
  • This study is targeted at introducing the three aspects of speech timing (articulatory, acoustic and auditory) and discussing their strong and weak points in describing speech timing. Traditional (extrinsic) articulatory timing theories exclude timing representation in the speaker's articulatory plan for his utterance, while the (intrinsic) articulatory timing theories headed by Fowler incorporate time into the plan for an utterance. As compared with articulatory timing studies with crucial constraints in data collection, acoustic timing studies can deal with even several hours of speech relatively easily. This enables us to perform suprasegmental timing studies as well as segmental timing studies. On the other hand, perception of speech timing is related to psychology rather than physiology and physics. Therefore, auditory timing studies contribute to enhancing our understanding of speech timing from the psychological point of view. Traditionally, some theories of speech timing (e.g. typology of speech rhythm: stress-timing; syllable-timing or mora-timing) have been based on our perception. However, it is problematic that auditory timing can be subjective despite some validity. Many questions as to speech timing are expected to be answered more objectively. Acoustic and articulatory description of timing will be the method of solving such problems of auditory timing.

  • PDF