• Title/Summary/Keyword: Phonetics

Search Result 948, Processing Time 0.02 seconds

Short utterance speaker verification using PLDA model adaptation and data augmentation (PLDA 모델 적응과 데이터 증강을 이용한 짧은 발화 화자검증)

  • Yoon, Sung-Wook;Kwon, Oh-Wook
    • Phonetics and Speech Sciences
    • /
    • v.9 no.2
    • /
    • pp.85-94
    • /
    • 2017
  • Conventional speaker verification systems using time delay neural network, identity vector and probabilistic linear discriminant analysis (TDNN-Ivector-PLDA) are known to be very effective for verifying long-duration speech utterances. However, when test utterances are of short duration, duration mismatch between enrollment and test utterances significantly degrades the performance of TDNN-Ivector-PLDA systems. To compensate for the I-vector mismatch between long and short utterances, this paper proposes to use probabilistic linear discriminant analysis (PLDA) model adaptation with augmented data. A PLDA model is trained on vast amount of speech data, most of which have long duration. Then, the PLDA model is adapted with the I-vectors obtained from short-utterance data which are augmented by using vocal tract length perturbation (VTLP). In computer experiments using the NIST SRE 2008 database, the proposed method is shown to achieve significantly better performance than the conventional TDNN-Ivector-PLDA systems when there exists duration mismatch between enrollment and test utterances.

Examination of aspiration in Korean fricatives and affricates

  • Lee, Goun
    • Phonetics and Speech Sciences
    • /
    • v.9 no.2
    • /
    • pp.31-38
    • /
    • 2017
  • This study aims to examine the acoustic characteristics of Korean sibilant, especially aspiration in Korean fricatives (plain: /s/, fortis: /s'/) and affricates (aspirated: /$ts^h$/, lenis: /ts/, and fortis: /ts'/). Duration values (closure duration, frication duration, aspiration duration), center of gravity (COG) (of the total duration, of the two portions, in 10 ms), H1-H2 values (at the vowel onset) were examined in order to investigate the phonetic feature of aspiration in frication noise. This study further discusses how to define criteria for identifying aspiration in sibilant sounds by adopting 3 visual criteria for assessing aspiration. This visually-designated aspiration onset points are further matched with the COG decline points in 10 ms windows. The result shows that all the non-fortis sounds (/s/, /$ts^h$/, /ts/) contain aspiration, causing similar values of COG and H1-H2.

DNN-based acoustic modeling for speech recognition of native and foreign speakers (원어민 및 외국인 화자의 음성인식을 위한 심층 신경망 기반 음향모델링)

  • Kang, Byung Ok;Kwon, Oh-Wook
    • Phonetics and Speech Sciences
    • /
    • v.9 no.2
    • /
    • pp.95-101
    • /
    • 2017
  • This paper proposes a new method to train Deep Neural Network (DNN)-based acoustic models for speech recognition of native and foreign speakers. The proposed method consists of determining multi-set state clusters with various acoustic properties, training a DNN-based acoustic model, and recognizing speech based on the model. In the proposed method, hidden nodes of DNN are shared, but output nodes are separated to accommodate different acoustic properties for native and foreign speech. In an English speech recognition task for speakers of Korean and English respectively, the proposed method is shown to slightly improve recognition accuracy compared to the conventional multi-condition training method.

Selective pole filtering based feature normalization for performance improvement of short utterance recognition in noisy environments (잡음 환경에서 짧은 발화 인식 성능 향상을 위한 선택적 극점 필터링 기반의 특징 정규화)

  • Choi, Bo Kyeong;Ban, Sung Min;Kim, Hyung Soon
    • Phonetics and Speech Sciences
    • /
    • v.9 no.2
    • /
    • pp.103-110
    • /
    • 2017
  • The pole filtering concept has been successfully applied to cepstral feature normalization techniques for noise-robust speech recognition. In this paper, it is proposed to apply the pole filtering selectively only to the speech intervals, in order to further improve the recognition performance for short utterances in noisy environments. Experimental results on AURORA 2 task with clean-condition training show that the proposed selectively pole-filtered cepstral mean normalization (SPFCMN) and selectively pole-filtered cepstral mean and variance normalization (SPFCMVN) yield error rate reduction of 38.6% and 45.8%, respectively, compared to the baseline system.

Individual differences in the reduction degree of the Korean suffix 'nɨn'

  • Kim, Jungsun
    • Phonetics and Speech Sciences
    • /
    • v.12 no.2
    • /
    • pp.9-16
    • /
    • 2020
  • The present study examines the degree of suffix reduction that occurs when the Korean suffix [-nɨn] was attached to the root in spontaneous Seoul Korean speech. Specifically, it focuses on the degrees of reduction produced by individual speakers. The degree of reduction was assessed as the duration of the suffix [-nɨn] to clarify the continuum between the full and reduced forms. The results revealed that, first, the reduced forms of the suffix [-nɨn] were significantly distinguished from the full forms in the suffixation processes. Second, regarding parts of speech, the differences among individual speakers on the degrees of reduction were clearer when the suffix [-nɨn] was attached to verbs, rather than nouns and pronouns. Finally, the length of a root played a critical role in determining the degree of reduction of the suffix [-nɨn]. The degrees of reduction for individual speakers significantly differed when the suffix [-nɨn] was attached to two-syllable roots than three- and four-syllable roots. In conclusion, individual differences in the degrees of reduction were likely to occur when the roots are verbs and when two-syllable roots.

Role of amplitude and pitch in the perception of Japanese stop length contrasts

  • Idemaru, Kaori
    • Cross-Cultural Studies
    • /
    • v.24
    • /
    • pp.112-119
    • /
    • 2011
  • This study presents experiments which examined the role of amplitude and fundamental frequency (f0) in the phonetic perception of short versus long stop length contrasts in Japanese (e.g., [t] vs. [tt]). Stop length contrasts are normally characterized by differences in the duration of stop closures. However, closure duration can be unreliable as a perceptual cue when one considers variability in the rate at which people speak. Acoustically, the amplitude and f0 of the vowel following stop consonants are known to covary with the length distinction of stops in Japanese. Given this fact, the current study examined amplitude and f0 as potential secondary cues to the distinction. The results indicate that even though both amplitude and f0 are robust correlates, Japanese listeners do not use these cues in categorizing short versus long stops.

The imitation patterns of adults and children on f0 intervals in North Kyungsang Korean

  • Kim, Jungsun
    • Phonetics and Speech Sciences
    • /
    • v.11 no.2
    • /
    • pp.23-31
    • /
    • 2019
  • The present study examines whether pitch range variation in North Kyunsang Korean shows a categorical or continuous function. Specifically, the study is focused on the data imitated by adults and children in the North Kyungsang region. To investigate pitch range variation, the log-produced f0 intervals were measured and statistically analyzed. The results of the study are as follows. First, both the adults' and children's imitations were more categorical than continuous, especially for the HL-LH patterns. For the other pitch accent patterns, such as HH-HL and HH-LH, the curves were continuous or flat for most of the speakers. Second, the children's imitations were poorer than those of the adults. That is, the children's imitative responses were shown as more continuous or flat curves than categorical. For the children, the HL-LH pattern showed a categorical function at the midpoint of the curves, though the shifts were not as distinctive as the adults' data. This implies that the imitative responses of children follow the perceptual and productive trace of adults' speech behavior.

Acoustic characteristics of the sustained vowel phonation according to age groups (모음 연장 발성이 보이는 연령대별 음향음성학적 특성 연구)

  • Seo, Yoon-Jeong;Shin, Jiyoung
    • Phonetics and Speech Sciences
    • /
    • v.10 no.4
    • /
    • pp.67-76
    • /
    • 2018
  • This study was performed to investigate acoustic characteristics of sustained vowels produced by Seoul Korean speakers. For this study, three hundred nine healthy adults were chosen as participants from Korean Standard Speech Database. These subjects were divided into five chronological age groups (20s, 30s, 40s, 50s, 60-70s) and two gender groups (male and female). Fundamental frequency (f0), jitter, shimmer, and NHR (noise-to-harmonics ratio) was measured with 8 Korean vowels (/ɑ/, /æ/, /ʌ/, /e/, /o/, /u/, /ɯ/, /i/) by using Praat. The results showed that the vowel type significantly affected all acoustic parameters. Gender affected f0, jitter, and NHR significantly. The mean female speakers' f0 was greater than the males', and the mean jitter and NHR of male speakers was greater than the females'. Moreover, age affected shimmer and NHR significantly; in particular, the shimmer and NHR of elderly speakers was greater than the young speakers.

A longitudinal study on the development of English phonological awareness in preschool children (어린이집 유아의 영어 음운 인식 발달 종단 연구)

  • Chung, Hyunsong
    • Phonetics and Speech Sciences
    • /
    • v.10 no.4
    • /
    • pp.53-66
    • /
    • 2018
  • This study investigated the development of English phonological awareness in preschool children based on a longitudinal study. It carried out a phonological matching task, mispronunciation task, articulation test, explicit phoneme awareness task, rhyme matching task, and initial-phoneme matching task for three-, four- and five-year-old children. A letter knowledge test was also added to the tests for the 5-year-old children. The results revealed that the development of phonological awareness follows a progression of syllable, then onset and rhyme, then phoneme. It was also revealed that language skills such as vocabulary, detection of mispronunciations, and articulation were partially related to the development of phoneme awareness. Finally, we also found that letter knowledge partially affected the children's development of phonological awareness.

Effects of gender, age, and individual speakers on articulation rate in Seoul Korean spontaneous speech

  • Kim, Jungsun
    • Phonetics and Speech Sciences
    • /
    • v.10 no.4
    • /
    • pp.19-29
    • /
    • 2018
  • The present study investigated whether there are differences in articulation rate by gender, age, and individual speakers in a spontaneous speech corpus produced by 40 Seoul Korean speakers. This study measured their articulation rates using a second-per-syllable metric and a syllable-per-second metric. The findings are as follows. First, in spontaneous Seoul Korean speech, there was a gender difference in articulation rates only in age group 10-19, among whom men tended to speak faster than women. Second, individual speakers showed variability in their rates of articulation. The tendency for some speakers to speak faster than others was variable. Finally, there were metric differences in articulation rate. That is, regarding the coefficients of variation, the values of the second-per-syllable metric were much higher than those for the syllable-per-second metric. The articulation rate for the syllable-per-second metric tended to be more distinct among individual speakers. The present results imply that data gathered in a corpus of Seoul Korean spontaneous speech may reflect speaker-specific differences in articulatory movements.