• 제목/요약/키워드: recognition task

검색결과 616건 처리시간 0.026초

Feature Extraction Based on Speech Attractors in the Reconstructed Phase Space for Automatic Speech Recognition Systems

  • Shekofteh, Yasser;Almasganj, Farshad
    • ETRI Journal
    • /
    • 제35권1호
    • /
    • pp.100-108
    • /
    • 2013
  • In this paper, a feature extraction (FE) method is proposed that is comparable to the traditional FE methods used in automatic speech recognition systems. Unlike the conventional spectral-based FE methods, the proposed method evaluates the similarities between an embedded speech signal and a set of predefined speech attractor models in the reconstructed phase space (RPS) domain. In the first step, a set of Gaussian mixture models is trained to represent the speech attractors in the RPS. Next, for a new input speech frame, a posterior-probability-based feature vector is evaluated, which represents the similarity between the embedded frame and the learned speech attractors. We conduct experiments for a speech recognition task utilizing a toolkit based on hidden Markov models, over FARSDAT, a well-known Persian speech corpus. Through the proposed FE method, we gain 3.11% absolute phoneme error rate improvement in comparison to the baseline system, which exploits the mel-frequency cepstral coefficient FE method.

Malay Syllables Speech Recognition Using Hybrid Neural Network

  • Ahmad, Abdul Manan;Eng, Goh Kia
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 제어로봇시스템학회 2005년도 ICCAS
    • /
    • pp.287-289
    • /
    • 2005
  • This paper presents a hybrid neural network system which used a Self-Organizing Map and Multilayer Perceptron for the problem of Malay syllables speech recognition. The novel idea in this system is the usage of a two-dimension Self-organizing feature map as a sequential mapping function which transform the phonetic similarities or acoustic vector sequences of the speech frame into trajectories in a square matrix where elements take on binary values. This property simplifies the classification task. An MLP is then used to classify the trajectories that each syllable in the vocabulary corresponds to. The system performance was evaluated for recognition of 15 Malay common syllables. The overall performance of the recognizer showed to be 91.8%.

  • PDF

HMnet Evaluation for Phonetic Environment Variations of Traning Data in Speech Recognition

  • Kim, Hoi-Rin
    • The Journal of the Acoustical Society of Korea
    • /
    • 제15권4E호
    • /
    • pp.28-36
    • /
    • 1996
  • In this paper, we propose a new evaluation methodology which can more clearly show the performance of the allophone modeling algorithm generally used in large vocabulary speech recognition. The proposed evaluation method shows the running characteristics and limitations of the modeling algorithm by testing how the variation of phonetic environments of training data affects the recognition performance and the desirable number of free parameters to be estimated. Using the method, we experiment results, we conclude that, in vocabulary-independent recognition task, the phonetic diversity of training data greatly affects the robustness of model, and it is necessary to develop a proper measure which can determine the number of states compromizing the robustness and the precision of the HMnet better than the conventional modeling efficiency.

  • PDF

The Effect of the Number of Training Data on Speech Recognition

  • Lee, Chang-Young
    • The Journal of the Acoustical Society of Korea
    • /
    • 제28권2E호
    • /
    • pp.66-71
    • /
    • 2009
  • In practical applications of speech recognition, one of the fundamental questions might be on the number of training data that should be provided for a specific task. Though plenty of training data would undoubtedly enhance the system performance, we are then faced with the problem of heavy cost. Therefore, it is of crucial importance to determine the least number of training data that will afford a certain level of accuracy. For this purpose, we investigate the effect of the number of training data on the speaker-independent speech recognition of isolated words by using FVQ/HMM. The result showed that the error rate is roughly inversely proportional to the number of training data and grows linearly with the vocabulary size.

Human Activities Recognition Based on Skeleton Information via Sparse Representation

  • Liu, Suolan;Kong, Lizhi;Wang, Hongyuan
    • Journal of Computing Science and Engineering
    • /
    • 제12권1호
    • /
    • pp.1-11
    • /
    • 2018
  • Human activities recognition is a challenging task due to its complexity of human movements and the variety performed by different subjects for the same action. This paper presents a recognition algorithm by using skeleton information generated from depth maps. Concatenating motion features and temporal constraint feature produces feature vector. Reducing dictionary scale proposes an improved fast classifier based on sparse representation. The developed method is shown to be effective by recognizing different activities on the UTD-MHAD dataset. Comparison results indicate superior performance of our method over some existing methods.

A Korean Flight Reservation System Using Continuous Speech Recognition

  • Choi, Jong-Ryong;Kim, Bum-Koog;Chung, Hyun-Yeol;Nakagawa, Seiichi
    • The Journal of the Acoustical Society of Korea
    • /
    • 제15권3E호
    • /
    • pp.60-65
    • /
    • 1996
  • This paper describes on the Korean continuous speech recognition system for flight reservation. It adopts a frame-synchronous One-Pass DP search algorithm driven by syntactic constraints of context free grammar(CFG). For recognition, 48 phoneme-like units(PLU) were defined and used as basic units for acoustic modeling of Korean. This modeling was conducted using a HMM technique, where each model has 4-states 3-continuous output probability distributions and 3-discrete-duration distributions. Language modeling by CFG was also applied to the task domain of flight reservation, which consisted of 346 words and 422 rewriting rules. In the tests, the sentence recognition rate of 62.6% was obtained after speaker adaptation.

  • PDF

가변어휘 핵심어 검출을 위한 비핵심어 모델링 및 후처리 성능평가 (Performance Evaluation of Nonkeyword Modeling and Postprocessing for Vocabulary-independent Keyword Spotting)

  • 김형순;김영국;신영욱
    • 음성과학
    • /
    • 제10권3호
    • /
    • pp.225-239
    • /
    • 2003
  • In this paper, we develop a keyword spotting system using vocabulary-independent speech recognition technique, and investigate several non-keyword modeling and post-processing methods to improve its performance. In order to model non-keyword speech segments, monophone clustering and Gaussian Mixture Model (GMM) are considered. We employ likelihood ratio scoring method for the post-processing schemes to verify the recognition results, and filler models, anti-subword models and N-best decoding results are considered as an alternative hypothesis for likelihood ratio scoring. We also examine different methods to construct anti-subword models. We evaluate the performance of our system on the automatic telephone exchange service task. The results show that GMM-based non-keyword modeling yields better performance than that using monophone clustering. According to the post-processing experiment, the method using anti-keyword model based on Kullback-Leibler distance and N-best decoding method show better performance than other methods, and we could reduce more than 50% of keyword recognition errors with keyword rejection rate of 5%.

  • PDF

Reference String Recognition based on Word Sequence Tagging and Post-processing: Evaluation with English and German Datasets

  • Kang, In-Su
    • 한국컴퓨터정보학회논문지
    • /
    • 제23권5호
    • /
    • pp.1-7
    • /
    • 2018
  • Reference string recognition is to extract individual reference strings from a reference section of an academic article, which consists of a sequence of reference lines. This task has been attacked by heuristic-based, clustering-based, classification-based approaches, exploiting lexical and layout characteristics of reference lines. Most classification-based methods have used sequence labeling to assign labels to either a sequence of tokens within reference lines, or a sequence of reference lines. Unlike the previous token-level sequence labeling approach, this study attempts to assign different labels to the beginning, intermediate and terminating tokens of a reference string. After that, post-processing is applied to identify reference strings by predicting their beginning and/or terminating tokens. Experimental evaluation using English and German reference string recognition datasets shows that the proposed method obtains above 94% in the macro-averaged F1.

PCA vs. ICA for Face Recognition

  • Lee, Oyoung;Park, Hyeyoung;Park, Seung-Jin
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2000년도 ITC-CSCC -2
    • /
    • pp.873-876
    • /
    • 2000
  • The information-theoretic approach to face recognition is based on the compact coding where face images are decomposed into a small set of basis images. Most popular method for the compact coding may be the principal component analysis (PCA) which eigenface methods are based on. PCA based methods exploit only second-order statistical structure of the data, so higher- order statistical dependencies among pixels are not considered. Independent component analysis (ICA) is a signal processing technique whose goal is to express a set of random variables as linear combinations of statistically independent component variables. ICA exploits high-order statistical structure of the data that contains important information. In this paper we employ the ICA for the efficient feature extraction from face images and show that ICA outperforms the PCA in the task of face recognition. Experimental results using a simple nearest classifier and multi layer perceptron (MLP) are presented to illustrate the performance of the proposed method.

  • PDF

잡음 환경에서 짧은 발화 인식 성능 향상을 위한 선택적 극점 필터링 기반의 특징 정규화 (Selective pole filtering based feature normalization for performance improvement of short utterance recognition in noisy environments)

  • 최보경;반성민;김형순
    • 말소리와 음성과학
    • /
    • 제9권2호
    • /
    • pp.103-110
    • /
    • 2017
  • The pole filtering concept has been successfully applied to cepstral feature normalization techniques for noise-robust speech recognition. In this paper, it is proposed to apply the pole filtering selectively only to the speech intervals, in order to further improve the recognition performance for short utterances in noisy environments. Experimental results on AURORA 2 task with clean-condition training show that the proposed selectively pole-filtered cepstral mean normalization (SPFCMN) and selectively pole-filtered cepstral mean and variance normalization (SPFCMVN) yield error rate reduction of 38.6% and 45.8%, respectively, compared to the baseline system.