• Title/Summary/Keyword: speech duration

Search Result 469, Processing Time 0.028 seconds

Analysis of the Timing of Spoken Korean Using a Classification and Regression Tree (CART) Model

  • Chung, Hyun-Song;Huckvale, Mark
    • Speech Sciences
    • /
    • v.8 no.1
    • /
    • pp.77-91
    • /
    • 2001
  • This paper investigates the timing of Korean spoken in a news-reading speech style in order to improve the naturalness of durations used in Korean speech synthesis. Each segment in a corpus of 671 read sentences was annotated with 69 segmental and prosodic features so that the measured duration could be correlated with the context in which it occurred. A CART model based on the features showed a correlation coefficient of 0.79 with an RMSE (root mean squared prediction error) of 23 ms between actual and predicted durations in reserved test data. These results are comparable with recent published results in Korean and similar to results found in other languages. An analysis of the classification tree shows that phrasal structure has the greatest effect on the segment duration, followed by syllable structure and the manner features of surrounding segments. The place features of surrounding segments only have small effects. The model has application in Korean speech synthesis systems.

  • PDF

Lexical Status and the Degree of /l/-darkening

  • Ahn, Miyeon
    • Phonetics and Speech Sciences
    • /
    • v.7 no.3
    • /
    • pp.73-78
    • /
    • 2015
  • This study explores the degree of velarization of English word-final /l/ (i.e., /l/-darkness) according to the lexical status. Lexical status is defined as whether a speech stimulus is considered as a word or a non-word. We examined the temporal and spectral properties of word-final /l/ in terms of the duration and the frequency difference of F2-F1 values by varying the immediate pre-liquid vowels. The result showed that both temporal and spectral properties were contrastive across all vowel contexts in the way of real words having shorter [l] duration and low F2-F1 values, compared to non-words. That is, /l/ is more heavily velarized in words than in non-words, which suggests that lexical status whether language users encode the speech signal as a word or not is deeply involved in their speech production.

Changes of Speech Discrimination Score Depending on Inter-syllable Pause Duration in Normal Hearing Children (정상 청력 아동의 음절 간 쉼 간격에 따른 어음이해도 변화)

  • Park, J.I.;Lee, J.Y.;Heo, S.D.
    • Journal of rehabilitation welfare engineering & assistive technology
    • /
    • v.8 no.2
    • /
    • pp.139-144
    • /
    • 2014
  • Speech discrimination is affected by the speed of speech. The speed of speech can be adjusted at the pause duration, the pause duration can take the resting time to avoid in overloading information. The study will be examine the effects of aging and audiological rehabilitation, and the auditory processing as basic research to investigate the normative data. 7 boys and 8 girls were participated. They have no problem with speech language pathologically and audiologically. There are 4 sets of test implement, and each test set was made out with 20 3-syllable words. Pause duration of all of these words are adjusted in normal(250 ms), slow(500 ms) and very slow(1000 ms). There are 4 words for a multiple-choice that including one word with written correctly and three words with written 1 phoneme wrong. Participant hear the word, and then have to choose one. Speech discrimination score in 250, 500, 1,000 ms of pause duration were $73{\pm}19.4%$, $84{\pm}12.2%$, $88{\pm}8.8%$, respectively.

  • PDF

The Contribution of Prosody to the Foreign Accent of Chinese Talkers' English Speech

  • Liu, Xing;Lee, Joo-Kyeong
    • Phonetics and Speech Sciences
    • /
    • v.4 no.3
    • /
    • pp.59-73
    • /
    • 2012
  • This study attempts to investigate the contribution of prosody to the foreign accent in Chinese speakers' English production by examining the synthesized speech of crossing native and non-native talkers' prosody and segments. For the stimuli of the foreign accent ratings, we transplanted gender-matched native speakers' prosody onto non-native talkers' segments and vice versa, utilizing the TD-PSOLA algorithm. Eight English native listeners participated in judging foreign accent and comprehensibility of the transplanted stimuli. Results showed that the synthesized stimuli were perceived as stronger foreign accent regardless of speakers' proficiency when English speakers' prosody was crossed with Chinese speakers' segments. This suggests that segments contribute more than prosody to native listeners' evaluation of foreign accent. When transplanted with English speakers' segments, Chinese speakers' prosody showed a difference in duration rather than pitch between high and low proficiency such that stronger foreign accent was detected when low proficient Chinese speakers' duration was crossed with English speakers' segments. This indicated that prosody, more specifically duration, plays a role though the prosodic role is not overall as significant as segments. According to the post acoustic analysis, the temporal features contributing to making the duration parameter prominent as opposed to pitch were found out to be speaking rate, pause duration and pause frequency. Finally, foreign accent and comprehensibility showed no significant correlation such that native listeners had no difficulty listening to highly foreign accented speech.

Acoustic analysis of English lexical stress produced by Korean, Japanese and Taiwanese-Chinese speakers

  • Jung, Ye-Jee;Rhee, Seok-Chae
    • Phonetics and Speech Sciences
    • /
    • v.10 no.1
    • /
    • pp.15-22
    • /
    • 2018
  • Stressed vowels in English are usually produced using longer duration, higher pitch, and greater intensity than unstressed vowels. However, many English as a foreign language (EFL) learners have difficulty producing English lexical stress because their mother tongues do not have such features. In order to investigate if certain non-native English speakers (Korean, Japanese, and Taiwanese-Chinese native speakers) are able to produce English lexical stress in a native-like manner, speech samples were extracted from the L2 learners' corpus known as AESOP (the Asian English Speech cOrpus Project). Sixteen disyllabic words were analyzed in terms of the ratio of duration, pitch, and intensity. The results demonstrate that non-native English speakers are able to produce English stress in a similar way to native English speakers, and all speakers (both native and non-native) show a tendency to use duration as the strongest cue in producing stress. The results also show that the duration ratio of native English speakers was significantly higher than that of non-native speakers, indicating that native speakers produce a bigger difference in duration between stressed and unstressed vowels.

Influences of Inter-syllable Pause Duration on Speech Discrimination Score in Children with Cochlear Implantation (음절 간 쉼 간격이 인공와우 아동의 어음이해도에 미치는 영향)

  • Park, J.I.;Heo, S.D.
    • Journal of rehabilitation welfare engineering & assistive technology
    • /
    • v.8 no.4
    • /
    • pp.245-250
    • /
    • 2014
  • The aims of this study was to investigate influences of speech discrimination score(SDS) depending on inter-syllable pause duration in participant with child of cochlear implantation(CI). 12 child of CI-user participated. The word for SDS was used self-made meaningless three-syllable. The pause duration of inter-syllable was adjusted to 250, 500, 1,000 millisecond(ms). Discrimination score of closed-set speech was obtained at most comfortable loudness(MCL). SDS were improved in CI group for 62.08, 63.75, 69.58 %, but there were no significant changes in child of CI group(p = .4635). SDS was improved depending on inter-syllable pause duration in child of CI.

  • PDF

Design and Implementation of a Text-to Speech System using the Prosody and Duration Information (운율 및 길이 정보를 이용한 무제한 음성 합성기의 설계 및 구현)

  • Yang, Jin-Seok;Kim, Jae-Beom;Lee, Jeong-Hyeon
    • The Transactions of the Korea Information Processing Society
    • /
    • v.3 no.5
    • /
    • pp.1121-1129
    • /
    • 1996
  • To produce more natural speech in a Text-to-Speech system, the processing of the prosody and duration must be processing in advance, and then extracted the prosody and duration information by means of trial-and-error experiments. In this paper, a method is proposed to improve the naturalness in a Text-to Speech system using this information. As the results, the Text-to-Speech system proposed and implemented in this paper showed more natural speech synthesis than the systems, which do not use this information, did.

  • PDF

A phoneme duration modeling in a speech recognition system based on decision tree state tying (결정트리기반 음성인식 시스템에서의 음소지속시간 사용방법)

  • Koo Myoun-Wan;Kim Ho-Kyoung
    • Proceedings of the KSPS conference
    • /
    • 2002.11a
    • /
    • pp.197-200
    • /
    • 2002
  • In this paper, we propose a phoneme duration modeling in a speech recognition system based on disicion tree state tying. We assume that phone duration has a Gamma distribution. In a training mode, we model mean and variance of each state duration in context-independent phone model based on decision tree state tying. In a recognition mode, we get mean and variance of each context-dependent phone duration form state duration information obtaind during training mode. We make a comparative study of the proposed meth with conventinal methods. Our method results in good performance compared with conventional methods.

  • PDF

The Usage of Phoneme Duration Information for Rejecting Garbage Sentences (소음문장 제거를 위한 음소지속시간 사용)

  • Koo Myoung-Wan;Kim Ho-Kyoung;Park Sung-Joon;Kim Jae-In
    • Proceedings of the KSPS conference
    • /
    • 2003.05a
    • /
    • pp.219-222
    • /
    • 2003
  • In this paper, we study the usage of phoneme duration information for rejection garbage sentence. First, we build a phoneme duration modeling in a speech recognition system based on dicicion tree state tying, We assume that phone duration has a Gamma distribution. Next, we build a verification module in which word-level confidence measure is used. Finally, we make a comparative study on phoneme duration with speech DB obtained from the live system. This DB consistes of OOT(out-of-task) and ING(in-grammar) utterences. the usage of phone duration information yields that OOT recognition rate is improved by 46% and that another 8.4% error rate is reduced when combined with utterence verification module.

  • PDF

On Tensity of Korean Stops (Electropalatographic Study)

  • Baik, Woon-Il
    • Speech Sciences
    • /
    • v.2
    • /
    • pp.149-158
    • /
    • 1997
  • An Electropalatographic (EPG) study was made to investigate the articulatory distinction of three series of Korean stops according to tensity and the articulatory mechanism associated between tensity and coarticulatory effects. The results indicated that tensity of Korean stops is closely related to contact width and duration of complete closure, and that coarticulatory vocalic effects vary inversely with the degree of contact width and duration of complete closure.

  • PDF