• Title/Summary/Keyword: Utterance verification

Search Result 42, Processing Time 0.026 seconds

New Postprocessing Methods for Rejectin Out-of-Vocabulary Words

  • Song, Myung-Gyu
    • The Journal of the Acoustical Society of Korea
    • /
    • v.16 no.3E
    • /
    • pp.19-23
    • /
    • 1997
  • The goal of postprocessing in automatic speech recognition is to improve recognition performance by utterance verification at the output of recognition stage. It is focused on the effective rejection of out-of vocabulary words based on the confidence score of hypothesized candidate word. We present two methods for computing confidence scores. Both methods are based on the distance between each observation vector and the representative code vector, which is defined by the most likely code vector at each state. While the first method employs simple time normalization, the second one uses a normalization technique based on the concept of on-line garbage mode[1]. According to the speaker independent isolated words recognition experiment with discrete density HMM, the second method outperforms both the first one and conventional likelihood ratio scoring method[2].

  • PDF

Development of a Speech Recognizer on PDAs (PDA 기반 음성 인식기 개발)

  • Koo Myoung-Wan;Park Sung-Joon;Son Dan-Young;Han Ki-Soo
    • Proceedings of the KSPS conference
    • /
    • 2006.05a
    • /
    • pp.33-36
    • /
    • 2006
  • This paper describes a speech recognizer implemented on PDAs. The recognizer consists of feature extraction module, search module and utterance verification module. It can recognize 37 words that can be used in the telematics application and fixed-point operation is performed for real-time processing. Simulation results show that recognition accuracy is 94.5% for the in-vocabulary words and 56.8% for the out-of-task words.

  • PDF

Enhancement of Rejection Performance using the PSO-NCM in Noisy Environment (잡음 환경하에서의 PSO-NCM을 이용한 거절기능 성능 향상)

  • Kim, Byoung-Don;Song, Min-Gyu;Choi, Seung-Ho;Kim, Jin-Young
    • Speech Sciences
    • /
    • v.15 no.4
    • /
    • pp.85-96
    • /
    • 2008
  • Automatic speech recognition has severe performance degradation under noisy environments. To cope with the noise problem, many methods have been proposed. Most of them focused on noise-robust features or model adaptation. However, researchers have overlooked utterance verification (UV) under noisy environments. In this paper we discuss UV problems based on the normalized confidence measure. First, we show that UV performance is also degraded in noisy environments with the experiments of an isolated word recognition. Then we observe how the degradation of UV performances is suffered. Based on the UV experiments we propose a modeling method of the statistics of phone confidences using sigmoid functions. For obtaining the parameters of the sigmoidal models, the particle swarm optimization (PSO) is adopted. The proposed method improves 20% rejection performance. Our experimental results show that the PSO-NCM can apply noise speech recognition successfully.

  • PDF

Utterance Verification and Substitution Error Correction In Korean Connected Digit Recognition (한국어 연결숫자 인식에서의 발화검증과 대체오류수정)

  • Jung Du Kyung;Kim Hyung Soon
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • spring
    • /
    • pp.111-114
    • /
    • 2002
  • 음성인식에서 발화검증은 비인식대상어휘(OOV)를 기각시키고, 인식대상어휘라도 오인식 가능성이 높은 결과를 기각시키는 기술을 말한다. 본 논문에서는 혼동가능성 높은 숫자쌍들이 존재하는 한국어 연결 숫자 인식에서 발화검증 결과로 숫자열 기각시 오인식 가능성이 높은 숫자열을 그냥 기각시키는 대신에 대체오류를 수정하여 인식성능을 향상시키고자 하였다. N-best decoding 결과에 따르면 $2^{nd}\;best$$3^{rd}\;best$안에 대부분의 제대로 된 인식결과들이 포함된다. 따라서, N-best decoding을 이용해, 숫자열 기각시 $2^{nd}\;best$ 숫자열로 대체된 것이라고 가정한 후, 개별숫자 log likelihood ratio(LLR)과 N-best 기반의 숫자열 LLR[3] 등을 함께 고려한 신뢰도 측정방식에 의해 그 가정이 맞다고 판단이 되면 $2^{nd}\;best$ 의 숫자열과 대체함으로써 부분적으로 오류를 수정하였다.

  • PDF

Korean Native Speakers Auditory Cognitive Reactions to Chinese Korean-learners' Pronunciation: Centered on the utterance of consonants in the Korean Language (중국인 학습자의 한국어 발음에 대한 한국인 모어 화자의 청각 인지 반응 -중국인 학습자의 자음 발음을 중심으로-)

  • Kim, Ji-hyung
    • Journal of Korean language education
    • /
    • v.28 no.2
    • /
    • pp.37-60
    • /
    • 2017
  • This research has its basis with focus on the way Korean native speakers recognize Chinese Korean-learners' pronunciation. The objective of the study is to lay the cornerstone for establishing effective teaching-learning strategies for the education of the Korean phonetic system. In this study, the results of the experiment are presented which shows how native speakers of Korean identify Chinese Korean-learners' pronunciation of consonants. In the first place, stimulation tones were created from the original utterances of Chinese Korean-learners and seven scripts were made through the Pratt program. In addition, the subjects were asked to choose what the phonetic materials sounded like. The results of the research are represented as the ratio of frequency of Korean native speakers' response to each utterance to the total frequency. In addition, the paired t-test was taken in order to explore any relatedness to the changes in the level of proficiency of the Korean phonetic system, ranging from beginners to advanced learners. The outcome shows that the mistakes which Chinese Korean-learners make in pronouncing the consonants of Korean are relatively well-reflected in Korean native speakers' auditory cognitive reactions. To put it concretely, there is some difficulty in differentiating lax consonants from aspirates in the cases of plosives and affricates, but relatively little trouble with fortes. However, it is revealed that there is also a slight difference in relation to articulatory positions in detailed aspects. To provide an effective teaching method for the Korean phonetic system, it is essential to comprehend learners' phonetic mistakes through the precise analysis of data in terms of 'production.' Also, a more meticulous observation of 'phenomena' must be made through verification from the view of 'reception,' as attempted in this study. A more thorough diagnosis by applying methodology makes it possible to lay the foundation for developing effective teaching-learning strategies for the instruction of the Korean phonetic system. This study has its significance in making such attempts.

A Study for Complexity Improvement of Automatic Speaker Verification in PDA Environment (PDA 환경에서 자동화자 확인의 계산량 개선을 위한 연구)

  • Seo, Chang-Woo;Lim, Young-Hwan;Jeon, Sung-Chae;Jang, Nam-Young
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.10 no.3
    • /
    • pp.170-175
    • /
    • 2009
  • In this paper, we propose real time automatic speaker verification (ASV) system to protect personal information on personal digital assistant (PDA) device. Recently, the capacity of PDA has extended and been popular, especially for mobile environment such as mobile commerce (M-commerce). However, there still exist lots of difficulties for practical application of ASV utility to PDA device because it requires too much computational complexity. To solve this problem, we apply the method to relieve the computational burden by performing the preprocessing such as spectral subtraction and speech detection during the speech utterance. Also by applying the hidden Markov model (HMM) optimal state alignment and the sequential probability ratio test (SPRT), we can get much faster processing results. The whole system implementation is simple and compact enough to fit well with PDA device's limited memory and low CPU speed.

  • PDF

Performance Enhancement for Speaker Verification Using Incremental Robust Adaptation in GMM (가무시안 혼합모델에서 점진적 강인적응을 통한 화자확인 성능개선)

  • Kim, Eun-Young;Seo, Chang-Woo;Lim, Yong-Hwan;Jeon, Seong-Chae
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.3
    • /
    • pp.268-272
    • /
    • 2009
  • In this paper, we propose a Gaussian Mixture Model (GMM) based incremental robust adaptation with a forgetting factor for the speaker verification. Speaker recognition system uses a speaker model adaptation method with small amounts of data in order to obtain a good performance. However, a conventional adaptation method has vulnerable to the outlier from the irregular utterance variations and the presence noise, which results in inaccurate speaker model. As time goes by, a rate in which new data are adapted to a model is reduced. The proposed algorithm uses an incremental robust adaptation in order to reduce effect of outlier and use forgetting factor in order to maintain adaptive rate of new data on GMM based speaker model. The incremental robust adaptation uses a method which registers small amount of data in a speaker recognition model and adapts a model to new data to be tested. Experimental results from the data set gathered over seven months show that the proposed algorithm is robust against outliers and maintains adaptive rate of new data.

Automatic Coarticulation Detection for Continuous Sign Language Recognition (연속된 수화 인식을 위한 자동화된 Coarticulation 검출)

  • Yang, Hee-Deok;Lee, Seong-Whan
    • Journal of KIISE:Software and Applications
    • /
    • v.36 no.1
    • /
    • pp.82-91
    • /
    • 2009
  • Sign language spotting is the task of detecting and recognizing the signs in a signed utterance. The difficulty of sign language spotting is that the occurrences of signs vary in both motion and shape. Moreover, the signs appear within a continuous gesture stream, interspersed with transitional movements between signs in a vocabulary and non-sign patterns(which include out-of-vocabulary signs, epentheses, and other movements that do not correspond to signs). In this paper, a novel method for designing a threshold model in a conditional random field(CRF) model is proposed. The proposed model performs an adaptive threshold for distinguishing between signs in the vocabulary and non-sign patterns. A hand appearance-based sign verification method, a short-sign detector, and a subsign reasoning method are included to further improve sign language spotting accuracy. Experimental results show that the proposed method can detect signs from continuous data with an 88% spotting rate and can recognize signs from isolated data with a 94% recognition rate, versus 74% and 90% respectively for CRFs without a threshold model, short-sign detector, subsign reasoning, and hand appearance-based sign verification.

Multi channel far field speaker verification using teacher student deep neural networks (교사 학생 심층신경망을 활용한 다채널 원거리 화자 인증)

  • Jung, Jee-weon;Heo, Hee-Soo;Shim, Hye-jin;Yu, Ha-Jin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.37 no.6
    • /
    • pp.483-488
    • /
    • 2018
  • Far field input utterance is one of the major causes of performance degradation of speaker verification systems. In this study, we used teacher student learning framework to compensate for the performance degradation caused by far field utterances. Teacher student learning refers to training the student deep neural network in possible performance degradation condition using the teacher deep neural network trained without such condition. In this study, we use the teacher network trained with near distance utterances to train the student network with far distance utterances. However, through experiments, it was found that performance of near distance utterances were deteriorated. To avoid such phenomenon, we proposed techniques that use trained teacher network as initialization of student network and training the student network using both near and far field utterances. Experiments were conducted using deep neural networks that input raw waveforms of 4-channel utterances recorded in both near and far distance. Results show the equal error rate of near and far-field utterances respectively, 2.55 % / 2.8 % without teacher student learning, 9.75 % / 1.8 % for conventional teacher student learning, and 2.5 % / 2.7 % with proposed techniques.

Educational Implications about Online Debates on a Socio-Scientific Issue from a Postmodernist Perspective: Focus on the Mad Cow Disease (포스트모더니즘의 관점에서 본 과학 관련 사회적 쟁점에 대한 온라인 토론의 과학교육적 함의: 광우병 사례를 중심으로)

  • Jho, Hun-Koog;Song, Jin-Woong
    • Journal of The Korean Association For Science Education
    • /
    • v.30 no.8
    • /
    • pp.933-952
    • /
    • 2010
  • This study aims to characterize debate on a socio-scientific issue in the Internet and to provide implications from a postmodernist perspective. This study concentrates on disentanglement of the complex relationship among society, economy, politics and science in an issue and characterization of the given text centering on its originality, the relationship between writer and reader, and the purpose of utterance. Sixty-six most read articles on a web message board were chosen and analyzed as a typical case of a socio-scientific issue in the internet. In them, five scientific disputes were identified: the cause of mad cow disease (MCD), specified risk material and the incubation period, the cause of new variant Creutzfeld-Jakob disease (vCJD), vulnerability of vCJD and the relation of Alzheimer and vCJD in American patients. Each argument is intertwined with social, economic and political problems such as its impact on the domestic beef market, feeding environment of imported cattle and the retaliation against denial of importation. With regard to originality, it is found that the originality of an author is weakened but communal through repetitive quotation of 'Peom', cutting and pasting, and engagement of readers with their comments. Furthermore, in order to close the gap between writer and reader, identity and personal narrative of the writers are often introduced into their writing. In terms of purpose of utterance, these are intended to deliver one's feelings or facilitate human behavior rather than inform through verification of a principle.