• Title/Summary/Keyword: Continuous Speech Recognition

Search Result 224, Processing Time 0.019 seconds

N-gram Based Robust Spoken Document Retrievals for Phoneme Recognition Errors (음소인식 오류에 강인한 N-gram 기반 음성 문서 검색)

  • Lee, Su-Jang;Park, Kyung-Mi;Oh, Yung-Hwan
    • MALSORI
    • /
    • no.67
    • /
    • pp.149-166
    • /
    • 2008
  • In spoken document retrievals (SDR), subword (typically phonemes) indexing term is used to avoid the out-of-vocabulary (OOV) problem. It makes the indexing and retrieval process independent from any vocabulary. It also requires a small corpus to train the acoustic model. However, subword indexing term approach has a major drawback. It shows higher word error rates than the large vocabulary continuous speech recognition (LVCSR) system. In this paper, we propose an probabilistic slot detection and n-gram based string matching method for phone based spoken document retrievals to overcome high error rates of phone recognizer. Experimental results have shown 9.25% relative improvement in the mean average precision (mAP) with 1.7 times speed up in comparison with the baseline system.

  • PDF

Emotion recognition in speech using hidden Markov model (은닉 마르코프 모델을 이용한 음성에서의 감정인식)

  • 김성일;정현열
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.3 no.3
    • /
    • pp.21-26
    • /
    • 2002
  • This paper presents the new approach of identifying human emotional states such as anger, happiness, normal, sadness, or surprise. This is accomplished by using discrete duration continuous hidden Markov models(DDCHMM). For this, the emotional feature parameters are first defined from input speech signals. In this study, we used prosodic parameters such as pitch signals, energy, and their each derivative, which were then trained by HMM for recognition. Speaker adapted emotional models based on maximum a posteriori(MAP) estimation were also considered for speaker adaptation. As results, the simulation performance showed that the recognition rates of vocal emotion gradually increased with an increase of adaptation sample number.

  • PDF

Plosive consonants recognition using acoustic properties with the frames representing each phoneme (조음 특성과 음소 대표 구간을 이용한 우리말 파열음의 인식)

  • 박찬응;이쾌희
    • Journal of the Korean Institute of Telematics and Electronics S
    • /
    • v.34S no.4
    • /
    • pp.33-41
    • /
    • 1997
  • Korean unvoiced phonemes consist of nonstationary parts comparing that the vowels and nasal consonants consist of quasi-stationary part. And some phonemes, which have smae point of articulation but differnt manner of articulation, has similar characteristics, so it makes to be hard to distinguish each other. A new method usin gchanges and characteristics of acoustic properties of these phonemes to improve recognition rate are proposed. And because these changes and cahracteristics evidently occur in continuous speech except some unvoiced consonants are articulated as voiced phoneme in case to be used as an midial between voiced phonemes, this method can be applied easily. The features of the frames extracted to represent each phonemes are used asinputs to the hierarchical neural network. And with these results final decision for phoneme recognition is made thorugh post processing which the new method is applied to. Through the experimental recognition results for 9 unvoiced consonants which belong to bilabial, alveolar, and velar phoneme series, 89.4% recognition rate to distinguish in same phoneme series is obtained, and 85.6% recognition rate is obtained in case of including cistinguishing phoneme series.

  • PDF

A Study on the Continuous Speech Recognition for the Automatic Creation of International Phonetics (국제 음소의 자동 생성을 활용한 연속음성인식에 관한 연구)

  • Kim, Suk-Dong;Hong, Seong-Soo;Shin, Chwa-Cheul;Woo, In-Sung;Kang, Heung-Soon
    • Journal of Korea Game Society
    • /
    • v.7 no.2
    • /
    • pp.83-90
    • /
    • 2007
  • One result of the trend towards globalization is an increased number of projects that focus on natural language processing. Automatic speech recognition (ASR) technologies, for example, hold great promise in facilitating global communications and collaborations. Unfortunately, to date, most research projects focus on single widely spoken languages. Therefore, the cost to adapt a particular ASR tool for use with other languages is often prohibitive. This work takes a more general approach. We propose an International Phoneticizing Engine (IPE) that interprets input files supplied in our Phonetic Language Identity (PLI) format to build a dictionary. IPE is language independent and rule based. It operates by decomposing the dictionary creation process into a set of well-defined steps. These steps reduce rule conflicts, allow for rule creation by people without linguistics training, and optimize run-time efficiency. Dictionaries created by the IPE can be used with the speech recognition system. IPE defines an easy-to-use systematic approach that can obtained 92.55% for the recognition rate of Korean speech and 89.93% for English.

  • PDF

A Study on Speech Recognition Using the HM-Net Topology Design Algorithm Based on Decision Tree State-clustering (결정트리 상태 클러스터링에 의한 HM-Net 구조결정 알고리즘을 이용한 음성인식에 관한 연구)

  • 정현열;정호열;오세진;황철준;김범국
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.2
    • /
    • pp.199-210
    • /
    • 2002
  • In this paper, we carried out the study on speech recognition using the KM-Net topology design algorithm based on decision tree state-clustering to improve the performance of acoustic models in speech recognition. The Korean has many allophonic and grammatical rules compared to other languages, so we investigate the allophonic variations, which defined the Korean phonetics, and construct the phoneme question set for phonetic decision tree. The basic idea of the HM-Net topology design algorithm is that it has the basic structure of SSS (Successive State Splitting) algorithm and split again the states of the context-dependent acoustic models pre-constructed. That is, it have generated. the phonetic decision tree using the phoneme question sets each the state of models, and have iteratively trained the state sequence of the context-dependent acoustic models using the PDT-SSS (Phonetic Decision Tree-based SSS) algorithm. To verify the effectiveness of the above algorithm we carried out the speech recognition experiments for 452 words of center for Korean language Engineering (KLE452) and 200 sentences of air flight reservation task (YNU200). Experimental results show that the recognition accuracy has progressively improved according to the number of states variations after perform the splitting of states in the phoneme, word and continuous speech recognition experiments respectively. Through the experiments, we have got the average 71.5%, 99.2% of the phoneme, word recognition accuracy when the state number is 2,000, respectively and the average 91.6% of the continuous speech recognition accuracy when the state number is 800. Also we haute carried out the word recognition experiments using the HTK (HMM Too1kit) which is performed the state tying, compared to share the parameters of the HM-Net topology design algorithm. In word recognition experiments, the HM-Net topology design algorithm has an average of 4.0% higher recognition accuracy than the context-dependent acoustic models generated by the HTK implying the effectiveness of it.

Continuous Speech Recognition Using N-gram Language Models Constructed by Iterative Learning (반복학습법에 의해 작성한 N-gram 언어모델을 이용한 연속음성인식에 관한 연구)

  • 오세진;황철준;김범국;정호열;정현열
    • The Journal of the Acoustical Society of Korea
    • /
    • v.19 no.6
    • /
    • pp.62-70
    • /
    • 2000
  • In usual language models(LMs), the probability has been estimated by selecting highly frequent words from a large text side database. However, in case of adopting LMs in a specific task, it is unnecessary to using the general method; constructing it from a large size tent, considering the various kinds of cost. In this paper, we propose a construction method of LMs using a small size text database in order to be used in specific tasks. The proposed method is efficient in increasing the low frequent words by applying same sentences iteratively, for it will robust the occurrence probability of words as well. We carried out continuous speech recognition(CSR) experiments on 200 sentences uttered by 3 speakers using LMs by iterative teaming(IL) in a air flight reservation task. The results indicated that the performance of CSR, using an IL applied LMs, shows an 20.4% increased recognition accuracy compared to those without it. This system, using the IL method, also shows an average of 13.4% higher recognition accuracy than the previous one, which uses context-free grammar(CFG), implying the effectiveness of it.

  • PDF

A Study on the Neural Networks for Korean Phoneme Recognition (한국어 음소 인식을 위한 신경회로망에 관한 연구)

  • Choi, Young-Bae;Yang, Jin-Woo;Lee, Hyung-Jun;Kim, Soon-Hyob
    • The Journal of the Acoustical Society of Korea
    • /
    • v.13 no.1
    • /
    • pp.5-13
    • /
    • 1994
  • This paper presents a study on Neural Networks for Phoneme Recognition and performs the Phoneme Recognition using TDNN (Time Delay Neural Network). Also, this paper proposes training algorithm for speech recognition using neural nets that is a proper to large scale TDNN. Because Phoneme Recognition is indispensable for continuous speech recognition, this paper uses TDNN to get accurate recognition result of phonemes. And this paper proposes new training algorithm that can converge TDNN to an optimal state regardless of the number of phonemes to be recognized. The recognition experiment was performed with new training algorithm for TDNN that combines backpropagation and Cauchy algorithm using stochastic approach. The results of the recognition experiment for three phoneme classes for two speakers show the recognition rates of $98.1\%$. And this paper yielded that the proposed algorithm is an efficient method for higher performance recognition and more reduced convergence time than TDNN.

  • PDF

Pattern Recognition of Rotor Fault Signal Using Bidden Markov Model (은닉 마르코프 모형을 이용한 회전체 결함신호의 패턴 인식)

  • Lee, Jong-Min;Kim, Seung-Jong;Hwang, Yo-Ha;Song, Chang-Seop
    • Transactions of the Korean Society of Mechanical Engineers A
    • /
    • v.27 no.11
    • /
    • pp.1864-1872
    • /
    • 2003
  • Hidden Markov Model(HMM) has been widely used in speech recognition, however, its use in machine condition monitoring has been very limited despite its good potential. In this paper, HMM is used to recognize rotor fault pattern. First, we set up rotor kit under unbalance and oil whirl conditions. Time signals of two failure conditions were sampled and translated to auto power spectrums. Using filter bank, feature vectors were calculated from these auto power spectrums. Next, continuous HMM and discrete HMM were trained with scaled forward/backward variables and diagonal covariance matrix. Finally, each HMM was applied to all sampled data to prove fault recognition ability. It was found that HMM has good recognition ability despite of small number of training data set in rotor fault pattern recognition.

1-Pass Semi-Dynamic Network Decoding Using a Subnetwork-Based Representation for Large Vocabulary Continuous Speech Recognition (대어휘 연속음성인식을 위한 서브네트워크 기반의 1-패스 세미다이나믹 네트워크 디코딩)

  • Chung Minhwa;Ahn Dong-Hoon
    • MALSORI
    • /
    • no.50
    • /
    • pp.51-69
    • /
    • 2004
  • In this paper, we present a one-pass semi-dynamic network decoding framework that inherits both advantages of fast decoding speed from static network decoders and memory efficiency from dynamic network decoders. Our method is based on the novel language model network representation that is essentially of finite state machine (FSM). The static network derived from the language model network [1][2] is partitioned into smaller subnetworks which are static by nature or self-structured. The whole network is dynamically managed so that those subnetworks required for decoding are cached in memory. The network is near-minimized by applying the tail-sharing algorithm. Our decoder is evaluated on the 25k-word Korean broadcast news transcription task. In case of the search network itself, the network is reduced by 73.4% from the tail-sharing algorithm. Compared with the equivalent static network decoder, the semi-dynamic network decoder has increased at most 6% in decoding time while it can be flexibly adapted to the various memory configurations, giving the minimal usage of 37.6% of the complete network size.

  • PDF

HMM-based Music Identification System for Copyright Protection (저작권 보호를 위한 HMM기반의 음악 식별 시스템)

  • Kim, Hee-Dong;Kim, Do-Hyun;Kim, Ji-Hwan
    • Phonetics and Speech Sciences
    • /
    • v.1 no.1
    • /
    • pp.63-67
    • /
    • 2009
  • In this paper, in order to protect music copyrights, we propose a music identification system which is scalable to the number of pieces of registered music and robust to signal-level variations of registered music. For its implementation, we define the new concepts of 'music word' and 'music phoneme' as recognition units to construct 'music acoustic models'. Then, with these concepts, we apply the HMM-based framework used in continuous speech recognition to identify the music. Each music file is transformed to a sequence of 39-dimensional vectors. This sequence of vectors is represented as ordered states with Gaussian mixtures. These ordered states are trained using Baum-Welch re-estimation method. Music files with a suspicious copyright are also transformed to a sequence of vectors. Then, the most probable music file is identified using Viterbi algorithm through the music identification network. We implemented a music identification system for 1,000 MP3 music files and tested this system with variations in terms of MP3 bit rate and music speed rate. Our proposed music identification system demonstrates robust performance to signal variations. In addition, scalability of this system is independent of the number of registered music files, since our system is based on HMM method.

  • PDF