• Title/Summary/Keyword: Speech articulation

Search Result 355, Processing Time 0.028 seconds

A corpus-based study on the effects of voicing and gender on American English Fricatives (성대진동 및 성별이 미국영어 마찰음에 미치는 효과에 관한 코퍼스 기반 연구)

  • Yoon, Tae-Jin
    • Phonetics and Speech Sciences
    • /
    • v.10 no.2
    • /
    • pp.7-14
    • /
    • 2018
  • The paper investigates the acoustic characteristics of English fricatives in the TIMIT corpus, with a special focus on the role of voicing in rendering fricatives in American English. The TIMIT database includes 630 talkers and 2,342 different sentences, and comprises more than five hours of speech. Acoustic analyses are conducted in the domain of spectral and temporal properties by treating gender, voicing, and place of articulation as independent factors. The results of the acoustic analyses revealed that acoustic signals interact in a complex way to signal the gender, place, and voicing of fricatives. Classification experiments using a multiclass support vector machine (SVM) revealed that 78.7% of fricatives are correctly classified. The majority of errors stem from the misclassification of /θ/ as [f] and /ʒ/ as [z]. The average accuracy of gender classification is 78.7%. Most errors result from the classification of female speakers as male speakers. The paper contributes to the understanding of the effects of voicing and gender on fricatives in a large-scale speech corpus.

Computer-Based Fluency Evaluation of English Speaking Tests for Koreans (한국인을 위한 영어 말하기 시험의 컴퓨터 기반 유창성 평가)

  • Jang, Byeong-Yong;Kwon, Oh-Wook
    • Phonetics and Speech Sciences
    • /
    • v.6 no.2
    • /
    • pp.9-20
    • /
    • 2014
  • In this paper, we propose an automatic fluency evaluation algorithm for English speaking tests. In the proposed algorithm, acoustic features are extracted from an input spoken utterance and then fluency score is computed by using support vector regression (SVR). We estimate the parameters of feature modeling and SVR using the speech signals and the corresponding scores by human raters. From the correlation analysis results, it is shown that speech rate, articulation rate, and mean length of runs are best for fluency evaluation. Experimental results show that the correlation between the human score and the SVR score is 0.87 for 3 speaking tests, which suggests the possibility of the proposed algorithm as a secondary fluency evaluation tool.

Use of Acoustic Analysis for Indivisualised Therapeutic Planning and Assessment of Treatment Effect in the Dysarthric Children (조음장애 환아에서 개별화된 치료계획 수립과 효과 판정을 위한 음향음성학적 분석방법의 활용)

  • Kim, Yun-Hee;Yu, Hee;Shin, Seung-Hun;Kim, Hyun-Gi
    • Speech Sciences
    • /
    • v.7 no.2
    • /
    • pp.19-35
    • /
    • 2000
  • Speech evaluation and treatment planning for the patients with articulation disorders have traditionally been based on perceptual judgement by speech pathologists. Recently, various computerized speech analysis systems have been developed and commonly used in clinical settings to obtain the objective and quantitative data and specific treatment strategies. 10 dysarthric children (6 neurogenic and 4 functional dysarthria) participated in this experiment. Speech evaluation of dysarthria was performed in two ways; first, the acoustic analysis by Visi-Pitch and a Computerized Speech Lab and second, the perceptual scoring of phonetic errors rates in 100 word test. The results of the initial evaluation served as primary guidlines for the indivisualized treatment planning of each patient's speech problems. After mean treatment period of 5 months, the follow-up data of both dysarthric groups showed increased maximum phonation time, increased alternative motion rate and decreased occurrence of articulatory deviation. The changes of acoustic data and therapeutic effects were more prominent in children with dysarthria due to neurologic causes than with functional dysarthria. Three cases including their pre- and post treatment data were illustrated in detail.

  • PDF

Efficacy of intensive treatment of dysarthria for people with multiple system atrophy (다계통위축증 환자를 대상으로 한 마비말장애 집중 치료의 효과)

  • Park, Youngmi
    • Phonetics and Speech Sciences
    • /
    • v.10 no.4
    • /
    • pp.163-171
    • /
    • 2018
  • A mixed dysarthria with combinations of hypokinetic, ataxic, and spastic components is a common clinical feature of multiple system atrophy (MSA). Due to the rapid progress of dysarthria after diagnosis, people with MSA experience difficulty with verbal communication, which eventually affects their quality of life negatively. In this study, SPEAK $OUT!^{(R)}$, an intensive 1:1 treatment of dysarthria for improving functional communicative ability, was provided to twelve people with MSA. To evaluate the efficacy of SPEAK $OUT!^{(R)}$ in people with MSA, aerodynamic, acoustic, and perceptual analyses were conducted. Pre-and post-therapy data included maximum phonation time, vocal intensity, and fundamental frequency during /a/ sustained phonation and passage reading; frequency range between high /a/ and low /a/ phonation; jitter, shimmer, and HNR for vocal quality; speech rate during passage reading; and perceptual evaluation scores for articulation precision and intonation. The participants achieved statistically significant improvement in vocal intensity, pitch range, vocal quality, speech rate, and speech intelligibility. In conclusion, SPEAK $OUT!^{(R)}$ is a feasible treatment for people with MSA to efficaciously improve their speech ability.

Perception of Tamil Mono-Syllabic and Bi-Syllabic Words in Multi-Talker Speech Babble by Young Adults with Normal Hearing

  • Gnanasekar, Sasirekha;Vaidyanath, Ramya
    • Journal of Audiology & Otology
    • /
    • v.23 no.4
    • /
    • pp.181-186
    • /
    • 2019
  • Background and Objectives: This study compared the perception of mono-syllabic and bisyllabic words in Tamil by young normal hearing adults in the presence of multi-talker speech babble at two signal-to-noise ratios (SNRs). Further for this comparison, a speech perception in noise test was constructed using existing mono-syllabic and bi-syllabic word lists in Tamil. Subjects and Methods: A total of 30 participants with normal hearing in the age range of 18 to 25 years participated in the study. Speech-in-noise test in Tamil (SPIN-T) constructed using mono-syllabic and bi-syllabic words in Tamil was used as stimuli. The stimuli were presented in the background of multi-talker speech babble at two SNRs (0 dB and +10 dB SNR). Results: The effect of noise on SPIN-T varied with SNR. All the participants performed better at +10 dB SNR, the higher of the two SNRs considered. Additionally, at +10 dB SNR performance did not vary significantly for neither mono-syllabic or bi-syllabic words. However, a significant difference existed at 0 dB SNR. Conclusions: The current study indicated that higher SNR leads to better performance. In addition, bi-syllabic words were identified with minimal errors compared to mono-syllabic words. Spectral cues were the most affected in the presence of noise leading to more of place of articulation errors for both mono-syllabic and bi-syllabic words.

Perception of Tamil Mono-Syllabic and Bi-Syllabic Words in Multi-Talker Speech Babble by Young Adults with Normal Hearing

  • Gnanasekar, Sasirekha;Vaidyanath, Ramya
    • Korean Journal of Audiology
    • /
    • v.23 no.4
    • /
    • pp.181-186
    • /
    • 2019
  • Background and Objectives: This study compared the perception of mono-syllabic and bisyllabic words in Tamil by young normal hearing adults in the presence of multi-talker speech babble at two signal-to-noise ratios (SNRs). Further for this comparison, a speech perception in noise test was constructed using existing mono-syllabic and bi-syllabic word lists in Tamil. Subjects and Methods: A total of 30 participants with normal hearing in the age range of 18 to 25 years participated in the study. Speech-in-noise test in Tamil (SPIN-T) constructed using mono-syllabic and bi-syllabic words in Tamil was used as stimuli. The stimuli were presented in the background of multi-talker speech babble at two SNRs (0 dB and +10 dB SNR). Results: The effect of noise on SPIN-T varied with SNR. All the participants performed better at +10 dB SNR, the higher of the two SNRs considered. Additionally, at +10 dB SNR performance did not vary significantly for neither mono-syllabic or bi-syllabic words. However, a significant difference existed at 0 dB SNR. Conclusions: The current study indicated that higher SNR leads to better performance. In addition, bi-syllabic words were identified with minimal errors compared to mono-syllabic words. Spectral cues were the most affected in the presence of noise leading to more of place of articulation errors for both mono-syllabic and bi-syllabic words.

The Percentage of Consonants Correct and the Ages of Consonantal Aquisition for 'Korean-Test of Articulation for Children(K-TAC)' (`아동용 조음검사`를 이용한 연령별 자음정확도와 우리말 자음의 습득연령)

  • Kim, Min-Jung;Pae, So-Yeong
    • Speech Sciences
    • /
    • v.12 no.2
    • /
    • pp.139-149
    • /
    • 2005
  • The purpose of this study was to propose a preliminary norm for 'Korean-Test of Articulation for Children(K-TAC)'. The K-TAC was designed to test 19 Korean consonants in various phonetic contexts through 37 words. We collected the data of 220 normally developing children aged 2;6(years;months) to 6;5. We analyzed the mean percentage of consonants correct and the age of acquisition for K-TAC. The results were as follows: first, The mean percentage was over 60% at late 2 years of age, over 80% at th age of 3, and over 90% after the age of 4. There were significant differences among age groups. Second, based on the criterion of correct production by 75% of children, Korean children acquired stops and nasals except for SF velars, glottal fricative, SF liquid and affricates by late 2 or 3 years of age. After that they acquired SF velars at the age of 4 and SI liquid at the age of 5. However, they could not acquire alveolar fricatives by the age of late 6. Third, if the distorted sounds were scored as correct, they acquired SI liquid at 4 years of age and alveolar fricatives at 5 years of age.

  • PDF

Korean and English affricates in bilingual children

  • Yu, Hye Jeong
    • Phonetics and Speech Sciences
    • /
    • v.9 no.3
    • /
    • pp.1-6
    • /
    • 2017
  • This study examined how early bilingual children produce sounds in their two languages articulated with the same manner of articulation but at different places of articulation. English affricates are palato-alveolar and Korean affricates are alveolar. This study analyzed the frequencies of center of gravity (COG), spectral peak (SP), and the second formant (F2) of word-initial affricates in English and Korean produced by twenty-four early Korean-English bilingual children (aged 4 to 7), and compared them with those of monolingual counterparts in the two languages. If early Korean-English bilingual children produce palato-alveolar affricates in English and alveolar affricates in Korean, they may produce Korean affricates with higher COGs, SPs, and F2s than English affricates. The early Korean-English bilingual children at the age of 4 produced English and Korean affricates with similar COGs, SPs, and F2s, and the COGs, SPs, and F2s of their Korean affricates were similar to those of the Korean monolingual counterparts. However, the early bilingual children at the age of 5 to 7 had lower COGs and SPs for English affricates with higher F2s compared to Korean affricates, and the COGs, SPs, and F2s of their English affricates were similar to those of the English monolingual counterparts.

The Production and Perception of the Korean Stops by English Learners (영어권 화자의 국어 폐쇄음 발화와 지각)

  • Kim, Kee-Ho;Park, Yoon-Jin;Chun, Yun-Sil
    • Speech Sciences
    • /
    • v.13 no.4
    • /
    • pp.51-67
    • /
    • 2006
  • This study examined the acoustic properties of initial stops in Korean, produced by Korean native speakers and English Korean learners. The productions of Korean native speakers were compared with those of beginners and advanced learners of Korean. Fundamental frequency(F0) and Voice Onset Time(VOT) were measured in condition of one or two syllable words, containing word-initial lenis, fortis, and aspirated stops. English Korean Learners showed that they produced stops with relatively shorter VOT and lower F0, compared with those of Korean native speakers. In case of the manner of articulation, English Korean learners have production difficulties in order of lenis stops, aspirated stops, and fortis stops. In regard to the place of articulation, English Korean learners showed production troubles in order of labial stops, velar stops, and alveolar stops. In the experiment of perception, it is hard for English Korean learners to distinguish stops of lenis and aspirated. Therefore, the results of production experiment were almost consistent with those of the perception experiment. Finally, according to both groups of proficiency, the results demonstrated that the advanced learners produce or perceive Korean stops easier than the beginners.

  • PDF

The Duration Feature of Acoustic Signals and Korean Speakers' Perception of English Stops (구간 신호 길이 자질과 한국인의 영어 파열음 지각)

  • Kim, Mun-Hyong;Jun, Jong-Sup
    • Phonetics and Speech Sciences
    • /
    • v.1 no.3
    • /
    • pp.19-28
    • /
    • 2009
  • This paper reports experimental findings about the duration feature of the acoustic components of English stops in Korean speakers' voicing perception. In our experiment, 35 participants discriminated between recorded stimuli and digitally transformed stimuli with different duration features from the original stimuli. 72 sets of paired stimuli are generated to test the effects of the duration feature in various phonetic contexts. The result of our experiment is a complicated cross-tabulation with 540 cells defined by five categorical independent variables plus one response variable. To find a meaningful generalization out of this complex frequency table, we ran logit log-linear regression analyses. Surprisingly, we have found that there is no single effect of the duration feature in all phonetic contexts on Korean speakers' perception of the voicing contrasts of English stops. Instead, the logit log-linear analyses reveal that there are interaction effects among phonetic contexts (=C), the places of articulation of stops (=P), and the voicing contrast (=V), and among duration (=T), phonetic contexts, and the places of articulation. To put it in mathematical terms, the distribution of the data can be explained by a simple log-linear equation, logF=${\mu}+{\lambda}CPV+{\lambda}TCP$.

  • PDF