• Title/Summary/Keyword: Corpus phonetics

Search Result 79, Processing Time 0.021 seconds

Frequency of grammar items for Korean substitution of /u/ for /o/ in the word-final position (어말 위치 /ㅗ/의 /ㅜ/ 대체 현상에 대한 문법 항목별 출현빈도 연구)

  • Yoon, Eunkyung
    • Phonetics and Speech Sciences
    • /
    • v.12 no.1
    • /
    • pp.33-42
    • /
    • 2020
  • This study identified the substitution of /u/ for /o/ (e.g., pyəllo [pyəllu]) in Korean based on the speech corpus as a function of grammar items. Korean /o/ and /u/ share the vowel feature [+rounded], but are distinguished in terms of tongue height. However, researchers have reported that the merger of Korean /o/ and /u/ is in progress, making them indistinguishable. Thus, in this study, the frequency of the phonetic manifestation /u/ of the underlying form of /o/ for each grammar item was calculated in The Korean Corpus of Spontaneous Speech (Seoul Corpus 2015) which is a large corpus from a total of 40 speakers from Seoul or Gyeonggi-do. It was then confirmed that linking endings, particles, and adverbs ending with /o/ in the word-final position were substituted for /u/ approximately 50% of the stimuli, whereas, in nominal items, they were replaced at a frequency of less than 5%. The high rates of substitution were the special particle "-do[du]" (59.6%) and the linking ending "-go[gu]" (43.5%) among high-frequency items. Observing Korean pronunciation in real life provides deep insight into its theoretical implications in terms of speech recognition.

Transition of vowel harmony in Korean verbal conjugation: Patterns of variation in a spoken corpus (구어 말뭉치를 통한 한국어 용언활용에서의 모음조화 변이 및 변화 추이 연구)

  • Hijo Kang
    • Phonetics and Speech Sciences
    • /
    • v.15 no.2
    • /
    • pp.21-29
    • /
    • 2023
  • This study investigates the transitional aspect of vowel harmony in Korean verbal conjugation. By observing the patterns of harmonic and disharmonic tokens of 42 verbal stems searched for in the National Institute of Korean Language (NIKL) Korean Dialogue Corpus 2020/2021, I found that disharmonic tokens appeared less than 0.1% of time, most of which consisted of an /a/-stem with a monosyllabic sentence-final suffix. It was noted that disharmonic pattern started to spread to other suffixes and possibly to /o/-stems. A simple perception test showed that the disharmonic forms might have originated from vowel reduction or undershoot. These results suggest that the ongoing change is accounted for from both the articulatory and perceptual perspectives.

Prominence Detection Using Feature Differences of Neighboring Syllables for English Speech Clinics (영어 강세 교정을 위한 주변 음 특징 차를 고려한 강조점 검출)

  • Shim, Sung-Geon;You, Ki-Sun;Sung, Won-Yong
    • Phonetics and Speech Sciences
    • /
    • v.1 no.2
    • /
    • pp.15-22
    • /
    • 2009
  • Prominence of speech, which is often called 'accent,' affects the fluency of speaking American English greatly. In this paper, we present an accurate prominence detection method that can be utilized in computer-aided language learning (CALL) systems. We employed pitch movement, overall syllable energy, 300-2200 Hz band energy, syllable duration, and spectral and temporal correlation as features to model the prominence of speech. After the features for vowel syllables of speech were extracted, prominent syllables were classified by SVM (Support Vector Machine). To further improve accuracy, the differences in characteristics of neighboring syllables were added as additional features. We also applied a speech recognizer to extract more precise syllable boundaries. The performance of our prominence detector was measured based on the Intonational Variation in English (IViE) speech corpus. We obtained 84.9% accuracy which is about 10% higher than previous research.

  • PDF

Acoustic Cues in Spoken French for the Pronunciation Assessment Multimedia System (발음평가용 멀티미디어 시스템 구현을 위한 구어 프랑스어의 음향학적 단서)

  • Lee, Eun-Yung;Song, Mi-Young
    • Speech Sciences
    • /
    • v.12 no.3
    • /
    • pp.185-200
    • /
    • 2005
  • The objective of this study is to examine acoustic cues in spoken French for the assessment of pronunciation which is necessary to realization of the multimedia system. The corpus is composed of simple expressions which consist of the French phonological system include all phonemes. This experiment was made on 4 male and female French native speakers and on 20 Korean speakers, university students who had learned the French language more than two years. We analyzed the recorded data by using spectrograph and measured comparative features by the numerical values. First of all, we found the mean and the deviation of all phonemes, and then chose features which had high error frequency and great differences between French and Korean pronunciations. The selected data were simplified and compared among them. After we judged whether the problems of pronunciation in each Korean speaker were either the utterance mistake or the interference of mother tongue, in terms of articulatory and auditory aspects, we tried to find acoustic features as simplified as possible. From this experiment, we could extract acoustic cues for the construction of the French pronunciation training system.

  • PDF

A Study of Intonation Curve Slopes in Korean Spontaneous Speech (자유 발화 자료에서 나타나는 한국어 억양 곡선의 기울기 특성에 대한 연구)

  • Oh, Jeahyuk
    • Phonetics and Speech Sciences
    • /
    • v.6 no.1
    • /
    • pp.21-30
    • /
    • 2014
  • This study aims to discuss pitch slope on Korean intonation curve in spontaneous speech data. For this study, 656 utterances were taken in the spoken corpus and used 'close-copy stylization'. And then the physical feature of pitch movements was extracted for the study. The pitch slope was calculated on the basis of time and pitch range in each utterance. As a result, the average and distribution of pitch slope is similar between men and women in the range of the pitch movement except for essential differences. The slope of pitch movement confirms that there are no differences between men and women. Pitch slope on a scale of -10 to 10 is 90% of the entire pitch slope; pitch slope that moves by time scale without curve is 33.1%; pitch slope that moves half of the pitch bandwidth during the average time for pitch movement is 23.4%; pitch slope that moves 100% of pitch bandwidth during a half of the average time for pitch movement is 10.4%. Those results imply the possibility of standardization methods of Korean intonation by pitch slope.

Performance of music section detection in broadcast drama contents using independent component analysis and deep neural networks (ICA와 DNN을 이용한 방송 드라마 콘텐츠에서 음악구간 검출 성능)

  • Heo, Woon-Haeng;Jang, Byeong-Yong;Jo, Hyeon-Ho;Kim, Jung-Hyun;Kwon, Oh-Wook
    • Phonetics and Speech Sciences
    • /
    • v.10 no.3
    • /
    • pp.19-29
    • /
    • 2018
  • We propose to use independent component analysis (ICA) and deep neural network (DNN) to detect music sections in broadcast drama contents. Drama contents mainly comprise silence, noise, speech, music, and mixed (speech+music) sections. The silence section is detected by signal activity detection. To detect the music section, we train noise, speech, music, and mixed models with DNN. In computer experiments, we used the MUSAN corpus for training the acoustic model, and conducted an experiment using 3 hours' worth of Korean drama contents. As the mixed section includes music signals, it was regarded as a music section. The segmentation error rate (SER) of music section detection was observed to be 19.0%. In addition, when stereo mixed signals were separated into music signals using ICA, the SER was reduced to 11.8%.

Vowel Variation in PC Communication Language and Phonetic Similarity (통신언어의 모음변이와 음성학적 유사성)

  • Ji, Yunjoo;Kim, Ilkyu
    • Phonetics and Speech Sciences
    • /
    • v.7 no.1
    • /
    • pp.133-138
    • /
    • 2015
  • The purpose of this study is to provide deeper understanding of how it is possible for people to understand PC communication language they have never seen or heard before without any problem. In order to answer this question, we focused on the vowel variation through which new variants are created (for PC communication), and hypothesized that there is a phonetic constraint which requires the vowel of the variant to be phonetically similar (to the maximum) to the vowel of the original word. Through the corpus analysis of the dictionary of PC communication language, we show that our hypothesis is justified by the fact that most of the variants we collected from the dictionary, that is, 90% of them, conformed to the phonetic constraint we postulated.

Performance of Pseudomorpheme-Based Speech Recognition Units Obtained by Unsupervised Segmentation and Merging (비교사 분할 및 병합으로 구한 의사형태소 음성인식 단위의 성능)

  • Bang, Jeong-Uk;Kwon, Oh-Wook
    • Phonetics and Speech Sciences
    • /
    • v.6 no.3
    • /
    • pp.155-164
    • /
    • 2014
  • This paper proposes a new method to determine the recognition units for large vocabulary continuous speech recognition (LVCSR) in Korean by applying unsupervised segmentation and merging. In the proposed method, a text sentence is segmented into morphemes and position information is added to morphemes. Then submorpheme units are obtained by splitting the morpheme units through the maximization of posterior probability terms. The posterior probability terms are computed from the morpheme frequency distribution, the morpheme length distribution, and the morpheme frequency-of-frequency distribution. Finally, the recognition units are obtained by sequentially merging the submorpheme pair with the highest frequency. Computer experiments are conducted using a Korean LVCSR with a 100k word vocabulary and a trigram language model obtained by a 300 million eojeol (word phrase) corpus. The proposed method is shown to reduce the out-of-vocabulary rate to 1.8% and reduce the syllable error rate relatively by 14.0%.

Patterns of consonant deletion in the word-internal onset position: Evidence from spontaneous Seoul Korean speech

  • Kim, Jungsun;Yun, Weonhee;Kang, Ducksoo
    • Phonetics and Speech Sciences
    • /
    • v.8 no.1
    • /
    • pp.45-51
    • /
    • 2016
  • This study examined the deletion of onset consonant in the word-internal structure in spontaneous Seoul Korean speech. It used the dataset of speakers in their 20s extracted from the Korean Corpus of Spontaneous Speech (Yun et al., 2015). The proportion of deletion of word-internal onset consonants was analyzed using the linear mixed-effects regression model. The factors that promoted the deletion of onsets were primarily the types of consonants and their phonetic contexts. The results showed that onset deletion was more likely to occur for a lenis velar stop [k] than the other consonants, and in the phonetic contexts, when the preceding vowel was a low central vowel [a]. Moreover, some speakers tended to more frequently delete onset consonants (e.g., [k] and [n]) than other speakers, which reflected individual differences. This study implies that word-internal onsets undergo a process of gradient reduction within individuals' articulatory strategies.

Feature Vector Processing for Speech Emotion Recognition in Noisy Environments (잡음 환경에서의 음성 감정 인식을 위한 특징 벡터 처리)

  • Park, Jeong-Sik;Oh, Yung-Hwan
    • Phonetics and Speech Sciences
    • /
    • v.2 no.1
    • /
    • pp.77-85
    • /
    • 2010
  • This paper proposes an efficient feature vector processing technique to guard the Speech Emotion Recognition (SER) system against a variety of noises. In the proposed approach, emotional feature vectors are extracted from speech processed by comb filtering. Then, these extracts are used in a robust model construction based on feature vector classification. We modify conventional comb filtering by using speech presence probability to minimize drawbacks due to incorrect pitch estimation under background noise conditions. The modified comb filtering can correctly enhance the harmonics, which is an important factor used in SER. Feature vector classification technique categorizes feature vectors into either discriminative vectors or non-discriminative vectors based on a log-likelihood criterion. This method can successfully select the discriminative vectors while preserving correct emotional characteristics. Thus, robust emotion models can be constructed by only using such discriminative vectors. On SER experiment using an emotional speech corpus contaminated by various noises, our approach exhibited superior performance to the baseline system.

  • PDF