• Title/Summary/Keyword: Vocabulary recognition

Search Result 221, Processing Time 0.023 seconds

Automatic Generation of Concatenate Morphemes for Korean LVCSR (대어휘 연속음성 인식을 위한 결합형태소 자동생성)

  • 박영희;정민화
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.4
    • /
    • pp.407-414
    • /
    • 2002
  • In this paper, we present a method that automatically generates concatenate morpheme based language models to improve the performance of Korean large vocabulary continuous speech recognition. The focus was brought into improvement against recognition errors of monosyllable morphemes that occupy 54% of the training text corpus and more frequently mis-recognized. Knowledge-based method using POS patterns has disadvantages such as the difficulty in making rules and producing many low frequency concatenate morphemes. Proposed method automatically selects morpheme-pairs from training text data based on measures such as frequency, mutual information, and unigram log likelihood. Experiment was performed using 7M-morpheme text corpus and 20K-morpheme lexicon. The frequency measure with constraint on the number of morphemes used for concatenation produces the best result of reducing monosyllables from 54% to 30%, bigram perplexity from 117.9 to 97.3. and MER from 21.3% to 17.6%.

Isolated Word Recognition using Modified Dynamic Averaging Method (변형된 Dynamic Averaging 방법을 이용한 단독어인식)

  • Jeoung, Eui-Bung;Ko, Young-Hyuk;Lee, Jong-Arc
    • The Journal of the Acoustical Society of Korea
    • /
    • v.10 no.2
    • /
    • pp.23-28
    • /
    • 1991
  • This paper is a study on isolated word recognition by independent speaker, we propose DTW speech recognition system by modified dynamic averaging method as reference pattern. 57 city names are selected as recognition vocabulary and 2th LPC cepstrum coefficients are used as the feature parameter. In this paper, besides recognition experiment using modified dynamic averaging method as reference pattern, we perform recognition experiments using causal method, dynamic averaging method, linear averaging method and clustering method with the same data in the same conditions for comparison with it. Through the experiment result, it is proved that recogntion rate by DTW using modified dynamic averaging method is the best as 97.6 percent.

  • PDF

Gesture Recognition by Analyzing a Trajetory on Spatio-Temporal Space (시공간상의 궤적 분석에 의한 제스쳐 인식)

  • 민병우;윤호섭;소정;에지마 도시야끼
    • Journal of KIISE:Software and Applications
    • /
    • v.26 no.1
    • /
    • pp.157-157
    • /
    • 1999
  • Researches on the gesture recognition have become a very interesting topic in the computer vision area, Gesture recognition from visual images has a number of potential applicationssuch as HCI (Human Computer Interaction), VR(Virtual Reality), machine vision. To overcome thetechnical barriers in visual processing, conventional approaches have employed cumbersome devicessuch as datagloves or color marked gloves. In this research, we capture gesture images without usingexternal devices and generate a gesture trajectery composed of point-tokens. The trajectory Is spottedusing phase-based velocity constraints and recognized using the discrete left-right HMM. Inputvectors to the HMM are obtained by using the LBG clustering algorithm on a polar-coordinate spacewhere point-tokens on the Cartesian space .are converted. A gesture vocabulary is composed oftwenty-two dynamic hand gestures for editing drawing elements. In our experiment, one hundred dataper gesture are collected from twenty persons, Fifty data are used for training and another fifty datafor recognition experiment. The recognition result shows about 95% recognition rate and also thepossibility that these results can be applied to several potential systems operated by gestures. Thedeveloped system is running in real time for editing basic graphic primitives in the hardwareenvironments of a Pentium-pro (200 MHz), a Matrox Meteor graphic board and a CCD camera, anda Window95 and Visual C++ software environment.

Improving Phoneme Recognition based on Gaussian Model using Bhattacharyya Distance Measurement Method (바타챠랴 거리 측정 기법을 사용한 가우시안 모델 기반 음소 인식 향상)

  • Oh, Sang-Yeob
    • Journal of Korea Multimedia Society
    • /
    • v.14 no.1
    • /
    • pp.85-93
    • /
    • 2011
  • Previous existing vocabulary recognition programs calculate general vector values from a database, so they can not process phonemes that form during a search. And because they can not create a model for phoneme data, the accuracy of the Gaussian model can not secure. Therefore, in this paper, we recommend use of the Bhattacharyya distance measurement method based on the features of the phoneme-thus allowing us to improve the recognition rate by picking up accurate phonemes and minimizing recognition of similar and erroneous phonemes. We test the Gaussian model optimization through share continuous probability distribution, and we confirm the heighten recognition rate. The Bhattacharyya distance measurement method suggest in this paper reflect an average 1.9% improvement in performance compare to previous methods, and it has average 2.9% improvement based on reliability in recognition rate.

Monophone and Biphone Compuond Unit for Korean Vocabulary Speech Recognition (한국어 어휘 인식을 위한 혼합형 음성 인식 단위)

  • 이기정;이상운;홍재근
    • Journal of the Korea Computer Industry Society
    • /
    • v.2 no.6
    • /
    • pp.867-874
    • /
    • 2001
  • In this paper, considering the pronunciation characteristic of Korean, recognition units which can shorten the recognition time and reflect the coarticulation effect simultaneously are suggested. These units are composed of monophone and hipbone ones. Monophone units are applied to the vowels which represent stable characteristic. Biphones are used to the consonant which vary according to adjacent vowel. In the experiment of word recognition of PBW445 database, the compound units result in comparable recognition accuracy with 57% speed up compared with triphone units and better recognition accuracy with similar speed. In addition, we can reduce the memory size because of fewer units.

  • PDF

An Arabic Script Recognition System

  • Alginahi, Yasser M.;Mudassar, Mohammed;Nomani Kabir, Muhammad
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.9 no.9
    • /
    • pp.3701-3720
    • /
    • 2015
  • A system for the recognition of machine printed Arabic script is proposed. The Arabic script is shared by three languages i.e., Arabic, Urdu and Farsi. The three languages have a descent amount of vocabulary in common, thus compounding the problems for identification. Therefore, in an ideal scenario not only the script has to be differentiated from other scripts but also the language of the script has to be recognized. The recognition process involves the segregation of Arabic scripted documents from Latin, Han and other scripted documents using horizontal and vertical projection profiles, and the identification of the language. Identification mainly involves extracting connected components, which are subjected to Principle Component Analysis (PCA) transformation for extracting uncorrelated features. Later the traditional K-Nearest Neighbours (KNN) algorithm is used for recognition. Experiments were carried out by varying the number of principal components and connected components to be extracted per document to find a combination of both that would give the optimal accuracy. An accuracy of 100% is achieved for connected components >=18 and Principal components equals to 15. This proposed system would play a vital role in automatic archiving of multilingual documents and the selection of the appropriate Arabic script in multi lingual Optical Character Recognition (OCR) systems.

A Study on the Triphone Replacement in a Speech Recognition System with DMS Phoneme Models

  • Lee, Gang-Seong
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.3E
    • /
    • pp.21-25
    • /
    • 1999
  • This paper proposes methods that replace a missing triphone with a new one selected or created by existing triphones, and compares the results. The recognition system uses DMS (Dynamic Multisection) model for acoustic modeling. DMS is one of the statistical recognition techniques proper to a small - or mid - size vocabulary system, while HMM (Hidden Markov Model) is a probabilistic technique suitable for a middle or large system. Accordingly, it is reasonable to use an effective algorithm that is proper to DMS, rather than using a complicated method like a polyphone clustering technique employed in HMM-based systems. In this paper, four methods of filling missing triphones are presented. The result shows that a proposed replacing algorithm works almost as well as if all the necessary triphones existed. The experiments are performed on the 500+ word DMS speech recognizer.

  • PDF

A Low-Power LSI Design of Japanese Word Recognition System

  • Yoshizawa, Shingo;Miyanaga, Yoshikazu;Wada, Naoya;Yoshida, Norinobu
    • Proceedings of the IEEK Conference
    • /
    • 2002.07a
    • /
    • pp.98-101
    • /
    • 2002
  • This paper reports a parallel architecture in a HMM based speech recognition system for a low-power LSI design. The proposed architecture calculates output probability of continuous HMM (CHMM) by using concurrent and pipeline processing. They enable to reduce memory access and have high computing efficiency. The novel point is the efficient use of register arrays that reduce memory access considerably compared with any conventional method. The implemented system can achieve a real time response with lower clock in a middle size vocabulary recognition task (100-1000 words) by using this technique.

  • PDF

Building a Morpheme-Based Pronunciation Lexicon for Korean Large Vocabulary Continuous Speech Recognition (한국어 대어휘 연속음성 인식용 발음사전 자동 생성 및 최적화)

  • Lee Kyong-Nim;Chung Minhwa
    • MALSORI
    • /
    • v.55
    • /
    • pp.103-118
    • /
    • 2005
  • In this paper, we describe a morpheme-based pronunciation lexicon useful for Korean LVCSR. The phonemic-context-dependent multiple pronunciation lexicon improves the recognition accuracy when cross-morpheme pronunciation variations are distinguished from within-morpheme pronunciation variations. Since adding all possible pronunciation variants to the lexicon increases the lexicon size and confusability between lexical entries, we have developed a lexicon pruning scheme for optimal selection of pronunciation variants to improve the performance of Korean LVCSR. By building a proposed pronunciation lexicon, an absolute reduction of $0.56\%$ in WER from the baseline performance of $27.39\%$ WER is achieved by cross-morpheme pronunciation variations model with a phonemic-context-dependent multiple pronunciation lexicon. On the best performance, an additional reduction of the lexicon size by $5.36\%$ is achieved from the same lexical entries.

  • PDF

Korean LVCSR for Broadcast News Speech

  • Lee, Gang-Seong
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.2E
    • /
    • pp.3-8
    • /
    • 2001
  • In this paper, we will examine a Korean large vocabulary continuous speech recognition (LVCSR) system for broadcast news speech. The combined vowel and implosive unit is included in a phone set together with other short phone units in order to obtain a longer unit acoustic model. The effect of this unit is compared with conventional phone units. The dictionary units for language processing are automatically extracted from eojeols appearing in transcriptions. Triphone models are used for acoustic modeling and a trigram model is used for language modeling. Among three major speaker groups in news broadcasts-anchors, journalists and people (those other than anchors or journalists, who are being interviewed), the speech of anchors and journalists, which has a lot of noise, was used for testing and recognition.

  • PDF