Search | Korea Science

Phonetic Tied-Mixture Syllable Model for Efficient Decoding in Korean ASR (효율적 한국어 음성 인식을 위한 PTM 음절 모델)

Kim Bong-Wan;Lee Yong-Jn
- MALSORI
- /
- no.50
- /
- pp.139-150
- /
- 2004
A Phonetic Tied-Mixture (PTM) model has been proposed as a way of efficient decoding in large vocabulary continuous speech recognition systems (LVCSR). It has been reported that PTM model shows better performance in decoding than triphones by sharing a set of mixture components among states of the same topological location[5]. In this paper we propose a Phonetic Tied-Mixture Syllable (PTMS) model which extends PTM technique up to syllables. The proposed PTMS model shows 13% enhancement in decoding speed than PTM. In spite of difference in context dependent modeling (PTM : cross-word context dependent modeling, PTMS : word-internal left-phone dependent modeling), the proposed model shows just less than 1% degradation in word accuracy than PTM with the same beam width. With a different beam width, it shows better word accuracy than in PTM at the same or higher speed.
PDF

Subglottic Air Pressure in Different Phonetic Context (음성학적 문맥에 따른 성문하압의 차이에 관한 연구)

박상희;정옥란;석동일
- Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
- /
- v.13 no.1
- /
- pp.23-27
- /
- 2002
The purpose of the study is to examine differences in subglottic air pressure as a function of phonetic context. The phonetic contexts consisted of $/i:{p^h}i:{p^h}i:/,/{p^h}i:{p^h}i:/, and /{p^h}{p^h}/$. The aerodynamic and phonatory parameters are investigated in 20 female normal adults. All measurements are taken and analysed using Aerophone II voice function analyzer. The aerodynamic parameters are Peak Air Pressure(PAP) and Mean Air Pressure(MAP), and the phonatory parameters are Phonatory Flow Rate(PFR) Maximum SPL(MSPL), Phonatory SPL(PSPL), Phonatory Power (PP), Phonatory Efficiency(PE), and Phonatory $Resistance^*$ 10-5(PR). A one-way ANOVA revealed the following results. First, the aerodynamic parameters are not significantly different. Second, Peak Air Pressure(PAP) and Mean Air Pressure(MAP), as well as the phonatory parameters such as Phonatory Flow Rate(PFR) Maximum SPL(MSPL), Phonatory SPL(PSPL), and Phonatory Efficiency(PE) were significantly different. Therefore, it is advised that clinicians use only aerodynamic parameters but phonatory parameters when using Aerophone II.
PDF

Development of a test of Korean Speech Intelligibility in Noise(KSPIN) using sentence materials with controlled word predictability (소음환경에서 표적단어의 예상도가 조절된 한국어의 문장검사목록개발 시안)

Kim, Jin-Sook;Pae, So-Yeong;Lee, Jung-Hak
- Speech Sciences
- /
- v.7 no.2
- /
- pp.37-50
- /
- 2000
This paper describes a test of everyday speech understanding ability, in which a listener's utilization of the context-situational information of speech is assessed, and is compared with the utilization of acoustic-phonetic information. The test items are sentences which are presented in a babble type of noise, and the listener response is the key word in the sentence. The key words are always two-syllabic nouns and the questioning sentences are added to obtain the responding key words. Two types of sentences are used. One is the high-predictable sentences for which the key word is somewhat predictable from the context. The other is the low-predictable sentences for which the key-word cannot be predicted from the context. Both types are included in six 40-item forms of the test, which are balanced for intelligibility, key-word familiarity and predictability, phonetic content, and length. Performance of normally hearing listeners shows significantly different functions for various signal-to-noise ratios. The potential applications of this test, particularly in the assessment of speech understanding ability in the hearing impaired, are discussed.
PDF

Phonetic Vowel Reduction Conditioned by Voicing of Adjacent Stops in English (음성적 모음 축소 현상에 영어 자음의 유무성 환경이 미치는 효과)

Oh, Eun-Jin
- Speech Sciences
- /
- v.14 no.4
- /
- pp.81-98
- /
- 2007
This study aims to investigate whether shortened vowel duration conditioned by a following voiceless stop induces phonetic reduction of vowel space in English, and whether the reduction appears more in the height dimension than in the backness dimension (Lindblom, 1963; Flemming, 2005). Fifteen native speakers of American English read minimal pairs containing ten American English vowels in [bVd] and [bVt] syllables in a carrier phrase. All the subjects produced shorter vowels in the voiceless than in the voiced context. However, a reduction in vowel space and a raising of low vowels due to the shortened vowel duration were generally not found. To the contrary, the speakers tended to exhibit even more lowering of low vowels in the voiceless context, and vowel space was more commonly compressed in the backness dimension than in the height dimension. Many speakers, in particular, demonstrated fronting of the high back vowel [u] in the voiceless context. It was interpreted that due to a relatively large number of English vowels in the narrower low vowel space, the raising of low vowels may give rise to confusion in vowel contrasts, and therefore the degree of phonetic vowel reduction is restricted in that region. On the other hand, the high vowel region, being relatively spacious in English, allows a certain degree of phonetic vowel reduction in the F2 dimension. It is possible that heavy requirements for maintaining vowel contrasts may cause speakers to overachieve vowel target values, especially when faced with vowels which are difficult to distinguish due to shortened vowel duration, leading to an over-lowering of the low vowels.
PDF

Phonetic Question Set Generation Algorithm (음소 질의어 집합 생성 알고리즘)

김성아;육동석;권오일
- The Journal of the Acoustical Society of Korea
- /
- v.23 no.2
- /
- pp.173-179
- /
- 2004
Due to the insufficiency of training data in large vocabulary continuous speech recognition, similar context dependent phones can be clustered by decision trees to share the data. When the decision trees are built and used to predict unseen triphones, a phonetic question set is required. The phonetic question set, which contains categories of the phones with similar co-articulation effects, is usually generated by phonetic or linguistic experts. This knowledge-based approach for generating phonetic question set, however, may reduce the homogeneity of the clusters. Moreover, the experts must adjust the question sets whenever the language or the PLU (phone-like unit) of a recognition system is changed. Therefore, we propose a data-driven method to automatically generate phonetic question set. Since the proposed method generates the phone categories using speech data distribution, it is not dependent on the language or the PLU, and may enhance the homogeneity of the clusters. In large vocabulary speech recognition experiments, the proposed algorithm has been found to reduce the error rate by 14.3%.
PDF KSCI

SEGMENTAL COARTICULATION STUDY IN DISYLLABIC CONTEXT IN STANDARD CHINESE

Chen, Xiao-xia
- Proceedings of the KSPS conference
- /
- 1996.10a
- /
- pp.515-520
- /
- 1996
PDF

Prosodic Modifications of the Internal Phonetic Structure of Monosyllabic CVC Words in Conversational Speech

Mo, Yoonsook
- Phonetics and Speech Sciences
- /
- v.5 no.1
- /
- pp.99-108
- /
- 2013
Previous laboratory studies have shown that prosodic structures are encoded in the modulations of phonetic patterns of speech including suprasegmental as well as segmental features. In particular, effects of prosodic context on duration and intensity of syllables and words have been widely reported. Drawing on prosodically annotated large-scale speech data from the Buckeye corpus of conversational speech of American English, the current study attempted to examine whether and how prosodic prominence and phrase boundary of everyday conversational speech, as determined by a large group of ordinary listeners, are related to the phonetic realization of duration and intensity. The results showed that the patterns of word durations and intensities are influenced by prosodic structure. Closer examinations revealed, however, that the effects of prosodic prominence are not the same as those of prosodic phrase boundary. With regard to intensity measures, the results revealed the systematic changes in the patterns of overall RMS intensity near prosodic phrase boundary but the prominence effects are restricted to the nucleus. In terms of duration measures, both prosodic prominence and phrase boundary are the most closely related to the lengthening of the nucleus. Yet, prosodic prominence is more closely related to the lengthening of the onset while phrase boundary lengthens the coda duration more. The findings from the current study suggest that the phonetic realizations of prosodic prominence are different from those of prosodic phrase boundary, and speakers signal different prosodic structures through deliberate modulations of the internal phonetic structure of words and listeners attend to such phonetic variations.
https://doi.org/10.13064/KSSS.2013.5.1.099 인용 PDF

Normalization in Collection Procedures of Emotional Speech by Scriptual Context (대본 내용에 의한 정서음성 수집과정의 정규화에 대하여)

Jo Cheol-Woo
- Proceedings of the KSPS conference
- /
- 2006.05a
- /
- pp.123-125
- /
- 2006
One of the biggest problems unsolved in emotional speech acquisition is how to make or find a situation which is close to natual or desired state from humans. We proposed a method to collect emotional speech data by scriptual context. Several contexts from the scripts of drama were chosen by the experts in the area. Context were divided into 6 classes according to the contents. Two actors, one male and one female, read the text after recognizing the emotional situations in the script.
PDF

Modified Phonetic Decision Tree For Continuous Speech Recognition

Kim, Sung-Ill;Kitazoe, Tetsuro;Chung, Hyun-Yeol
- The Journal of the Acoustical Society of Korea
- /
- v.17 no.4E
- /
- pp.11-16
- /
- 1998
For large vocabulary speech recognition using HMMs, context-dependent subword units have been often employed. However, when context-dependent phone models are used, they result in a system which has too may parameters to train. The problem of too many parameters and too little training data is absolutely crucial in the design of a statistical speech recognizer. Furthermore, when building large vocabulary speech recognition systems, unseen triphone problem is unavoidable. In this paper, we propose the modified phonetic decision tree algorithm for the automatic prediction of unseen triphones which has advantages solving these problems through following two experiments in Japanese contexts. The baseline experimental results show that the modified tree based clustering algorithm is effective for clustering and reducing the number of states without any degradation in performance. The task experimental results show that our proposed algorithm also has the advantage of providing a automatic prediction of unseen triphones.
PDF

The Acoustic Characteristics of Focus Associated with the Korean Particle' -man' (한국어 특수조사 ‘-만’에 연계된 초점의 음향음성학적 특성)

Choe, J.W.;Jeon, Y.S.;C., Y.;Park, S.B.;Kim, K.H.
- Speech Sciences
- /
- v.5 no.2
- /
- pp.77-91
- /
- 1999
The purpose of this paper is to investigate the phonetic characteristics of the 'focus' phrases associated with the particle '-man' in Korean. The particle '-man' is a bound morpheme which, like other postpositions such as the subject marker '-ka' and the object marker '-lil', the so-called 'case markers' in Korean, typically attaches to a noun (phrase). The semantics of '-man' roughly corresponds to that of only, its counterpart in English, and is thus classified as a 'delimiter' (Yang 1973). It is assumed in this paper that '-man', like only in English, should have a 'focus' associated with it (von Stechow 1991, Rooth 1992). In general, '-man' attached phrases get the focus, but sometimes the association is not clear-cut, especially in the cases of emphatic use of '-man' or when the context strongly favors other phrase as the focus (Choe 1996). In this paper, we compare the phonetic characteristics of the '-man' marked phrases with those to which '-ka'/'-lil' is attached, and conclude that the focused '-man' phrases show higher fundamental frequencies than their equally focused 'case' -marked counterparts. However, when the context clearly forces the focus to fall on phrases other than the '-man' or '-ka'/'-lil' attached ones, there is no meaningful difference in fundamental frequency between the '-man' and '-ka'/'-lil' attached phrases. We also compare the phonetic characteristics of the regular use of '-man' with those of the emphatic '-man'. According to our experiments, the emphatic '-man' does not bring forth its phonetic effects, namely, higher fundamental frequencies, on the' -man' attached words or phrases but rather in various other ways such as higher fundamental frequencies in '-man', lengthening of the following word-initial syllable, or the inclusion of the following word in the same accentual phrase. Finally, it is claimed that '-man' associated focus phenomena, especially the emphatic use of '-man', show some typical acoustic characteristics of the other well-known focus phenomena, namely, wh-interrogatives.
PDF

Search Result 72, Processing Time 0.041 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)