• Title/Summary/Keyword: Confusability

Search Result 7, Processing Time 0.019 seconds

Optimizing Multiple Pronunciation Dictionary Based on a Confusability Measure for Non-native Speech Recognition (타언어권 화자 음성 인식을 위한 혼잡도에 기반한 다중발음사전의 최적화 기법)

  • Kim, Min-A;Oh, Yoo-Rhee;Kim, Hong-Kook;Lee, Yeon-Woo;Cho, Sung-Eui;Lee, Seong-Ro
    • MALSORI
    • /
    • no.65
    • /
    • pp.93-103
    • /
    • 2008
  • In this paper, we propose a method for optimizing a multiple pronunciation dictionary used for modeling pronunciation variations of non-native speech. The proposed method removes some confusable pronunciation variants in the dictionary, resulting in a reduced dictionary size and less decoding time for automatic speech recognition (ASR). To this end, a confusability measure is first defined based on the Levenshtein distance between two different pronunciation variants. Then, the number of phonemes for each pronunciation variant is incorporated into the confusability measure to compensate for ASR errors due to words of a shorter length. We investigate the effect of the proposed method on ASR performance, where Korean is selected as the target language and Korean utterances spoken by Chinese native speakers are considered as non-native speech. It is shown from the experiments that an ASR system using the multiple pronunciation dictionary optimized by the proposed method can provide a relative average word error rate reduction of 6.25%, with 11.67% less ASR decoding time, as compared with that using a multiple pronunciation dictionary without the optimization.

  • PDF

Optimal Cognitive System Modeling Using the Stimulus-Response Matrix (자극-반응 행렬을 이용한 인지 시스템 최적화 모델)

  • Choe, Gyeong-Hyeon;Park, Min-Yong;Im, Eun-Yeong
    • Journal of the Ergonomics Society of Korea
    • /
    • v.19 no.1
    • /
    • pp.11-22
    • /
    • 2000
  • In this research report, we are presenting several optimization models for cognitive systems by using stimulus-response matrix (S-R Matrix). Stimulus-response matrices are widely used for tabulating results from various experiments and cognition systems design in which the recognition and confusability of stimuli. This paper is relevant to analyze the optimization/mathematical programming models. The weakness and restrictions of the existing models are resolved by generalization considering average confusion of each subset of stimuli. Also, clustering strategies are used in the extended model to obtain centers of cluster in terms of minimal confusion as well as the character of each cluster.

  • PDF

Building a Morpheme-Based Pronunciation Lexicon for Korean Large Vocabulary Continuous Speech Recognition (한국어 대어휘 연속음성 인식용 발음사전 자동 생성 및 최적화)

  • Lee Kyong-Nim;Chung Minhwa
    • MALSORI
    • /
    • v.55
    • /
    • pp.103-118
    • /
    • 2005
  • In this paper, we describe a morpheme-based pronunciation lexicon useful for Korean LVCSR. The phonemic-context-dependent multiple pronunciation lexicon improves the recognition accuracy when cross-morpheme pronunciation variations are distinguished from within-morpheme pronunciation variations. Since adding all possible pronunciation variants to the lexicon increases the lexicon size and confusability between lexical entries, we have developed a lexicon pruning scheme for optimal selection of pronunciation variants to improve the performance of Korean LVCSR. By building a proposed pronunciation lexicon, an absolute reduction of $0.56\%$ in WER from the baseline performance of $27.39\%$ WER is achieved by cross-morpheme pronunciation variations model with a phonemic-context-dependent multiple pronunciation lexicon. On the best performance, an additional reduction of the lexicon size by $5.36\%$ is achieved from the same lexical entries.

  • PDF

Performance Improvement of Korean Connected Digit Recognition Using Various Discriminant Analyses (다양한 변별분석을 통한 한국어 연결숫자 인식 성능향상에 관한 연구)

  • Song Hwa Jeon;Kim Hyung Soon
    • MALSORI
    • /
    • no.44
    • /
    • pp.105-113
    • /
    • 2002
  • In Korean, each digit is monosyllable and some pairs are known to have high confusability, causing performance degradation of connected digit recognition systems. To improve the performance, in this paper, we employ various discriminant analyses (DA) including Linear DA (LDA), Weighted Pairwise Scatter LDA WPS-LDA), Heteroscedastic Discriminant Analysis (HDA), and Maximum Likelihood Linear Transformation (MLLT). We also examine several combinations of various DA for additional performance improvement. Experimental results show that applying any DA mentioned above improves the string accuracy, but the amount of improvement of each DA method varies according to the model complexity or number of mixtures per state. Especially, more than 20% of string error reduction is achieved by applying MLLT after WPS-LDA, compared with the baseline system, when class level of DA is defined as a tied state and 1 mixture per state is used.

  • PDF

Place Perception in Korean Consonants

  • Oh, Mi-Ra
    • Speech Sciences
    • /
    • v.9 no.4
    • /
    • pp.131-142
    • /
    • 2002
  • Place assimilation in Korean has been argued to reflect the consonantal strength hierarchy in which velar is stronger than labial which is in turn stronger than coronal. The strength relationship has been manifested in two ways in literature. One is through phonological representation as shown in Iverson and Lee (1994). The other is through perceptual salience ranking as suggested by Jun (1995). The goal of this study is to examine the perceptual salience of placed consonants through an identification experiment. The experiment conducted in this study reveals four facts. First, place identification of a prevocalic consonant is higher than that of a postvocalic one. Second, place identification of a stop in coda is more confusable than that of a nasal counterpart in Korean contrary to other previous studies. Third, velar is most confusable in place identification in contrast to Jun (1995) and Hume et al. (1999). Finally, place perception of consonants can vary depending on adjacent vocalic context. These results suggest that perceptual salience is one of the possibly several factors affecting a phonological process.

  • PDF

A Novel Integration Scheme for Audio Visual Speech Recognition

  • Pham, Than Trung;Kim, Jin-Young;Na, Seung-You
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.8
    • /
    • pp.832-842
    • /
    • 2009
  • Automatic speech recognition (ASR) has been successfully applied to many real human computer interaction (HCI) applications; however, its performance tends to be significantly decreased under noisy environments. The invention of audio visual speech recognition (AVSR) using an acoustic signal and lip motion has recently attracted more attention due to its noise-robustness characteristic. In this paper, we describe our novel integration scheme for AVSR based on a late integration approach. Firstly, we introduce the robust reliability measurement for audio and visual modalities using model based information and signal based information. The model based sources measure the confusability of vocabulary while the signal is used to estimate the noise level. Secondly, the output probabilities of audio and visual speech recognizers are normalized respectively before applying the final integration step using normalized output space and estimated weights. We evaluate the performance of our proposed method via Korean isolated word recognition system. The experimental results demonstrate the effectiveness and feasibility of our proposed system compared to the conventional systems.

Development of ITS sequence based SCAR marker and multiplex-SCAR assay for the rapid authentication of Tetrapanacis Medulla and Akebiae Caulis (통초(通草), 목통(木通) 신속 감별용 ITS 염기서열 기반 SCAR 마커 및 Multiplex-SCAR 분석법 개발)

  • Noh, Pureum;Kim, Wook Jin;Park, Inkyu;Yang, Sungyu;Choi, Goya;Moon, Byeong Cheol
    • The Korea Journal of Herbology
    • /
    • v.36 no.1
    • /
    • pp.9-17
    • /
    • 2021
  • Objectives : Tetrapanacis Medulla and Akebiae Caulis are one of the most frequently adulterated herbal medicines because of their confusability of terms in the ancient writings and the similarity of morphological features of dried herbal products. The major adulterant is Aristolochia manshuriensis (Guanmutong) which has a serious safety concern with its toxicity. To ensure the safety and quality of the two herbal medicines, it is necessary to discriminate the toxic adulterant from authentic species. The aim of this study is to develop SCAR markers and to establish the multiplex-SCAR assay for discrimination of four plant species related to Tetrapanacis Medulla and Akebiae Caulis. Methods : ITS regions of fifteen samples of four species (Tetrapanax papyrifer, Fatsia japonica, Aristolochia manshuriensis, and Akebia quinata) collected from different sites were amplified and sequenced. Fifteen obtained ITS sequences were aligned and analysed for the detection of species-specific sequence variations. The SCAR markers were designed based on the sequence alignments and then, multiplex-SCAR assay enhancing rapidity was optimized. Results : ITS sequences clearly distinguished the four species at the species level. The developed SCAR markers and multiplex-SCAR assay were successfully discriminated four species and detected the adulteration of commercial product samples by comparison of the amplified DNA fragment sizes. Conclusions : These SCAR markers and multiplex-SCAR assay are a rapid, simple, and reliable method to identify the authentic Tetrapanacis Medulla and Akebiae Caulis from adulterants. These genetic tools will be useful to ensure the safety and to standardize the quality of the two herbal medicines.