• 제목/요약/키워드: Recognition ratio

검색결과 620건 처리시간 0.018초

음의 유사도 비율 누적 방법을 이용한 발화검증 연구 (A Study on Utterance Verification Using Accumulation of Negative Log-likelihood Ratio)

  • 한명희;이호준;김순협
    • 한국음향학회지
    • /
    • 제22권3호
    • /
    • pp.194-201
    • /
    • 2003
  • 음성인식에서 신뢰도 측정이란 인식된 결과에 대한 신뢰 여부를 결정하는 것이다. 신뢰도는 프레임을 음소 및 단어 수준으로 통합하여 측정된다. 단어 인식의 경우, 신뢰도를 이용하여 인식 결과와 미등록 어휘를 검증한다. 따라서 이러한 후처리를 통해 이를 인식 결과로 승인하지 않음으로써 성능을 높일 수 있다. 본 논문에서는 기존의 신뢰도 측정 방법인 로그 유사도 비를 수정하여 신뢰도를 측정하였다. 제안된 방법은 프레임 수준에서 음소 수준으로 신뢰도를 통합할 때 로그 유사도 비가 음수인 것만을 누적하는 것이다. 단어 인식기의 인식 결과에 대한 검증 성능을 기존의 방법과 비교한 결과, CAR (Correct Acceptance Ratio)이 90%인 지점에서 FAR (False Acceptance Ratio)을 미등록 어휘에 대해서는 약 3.49%, 오인식에 대해서는 15.25% 감소시킬 수 있었다

임베디드 시스템에서 사용 가능한 적응형 MFCC 와 Deep Learning 기반의 음성인식 (Voice Recognition-Based on Adaptive MFCC and Deep Learning for Embedded Systems)

  • 배현수;이호진;이석규
    • 제어로봇시스템학회논문지
    • /
    • 제22권10호
    • /
    • pp.797-802
    • /
    • 2016
  • This paper proposes a noble voice recognition method based on an adaptive MFCC and deep learning for embedded systems. To enhance the recognition ratio of the proposed voice recognizer, ambient noise mixed into the voice signal has to be eliminated. However, noise filtering processes, which may damage voice data, diminishes the recognition ratio. In this paper, a filter has been designed for the frequency range within a voice signal, and imposed weights are used to reduce data deterioration. In addition, a deep learning algorithm, which does not require a database in the recognition algorithm, has been adapted for embedded systems, which inherently require small amounts of memory. The experimental results suggest that the proposed deep learning algorithm and HMM voice recognizer, utilizing the proposed adaptive MFCC algorithm, perform better than conventional MFCC algorithms in its recognition ratio within a noisy environment.

Low-Quality Banknote Serial Number Recognition Based on Deep Neural Network

  • Jang, Unsoo;Suh, Kun Ha;Lee, Eui Chul
    • Journal of Information Processing Systems
    • /
    • 제16권1호
    • /
    • pp.224-237
    • /
    • 2020
  • Recognition of banknote serial number is one of the important functions for intelligent banknote counter implementation and can be used for various purposes. However, the previous character recognition method is limited to use due to the font type of the banknote serial number, the variation problem by the solid status, and the recognition speed issue. In this paper, we propose an aspect ratio based character region segmentation and a convolutional neural network (CNN) based banknote serial number recognition method. In order to detect the character region, the character area is determined based on the aspect ratio of each character in the serial number candidate area after the banknote area detection and de-skewing process is performed. Then, we designed and compared four types of CNN models and determined the best model for serial number recognition. Experimental results showed that the recognition accuracy of each character was 99.85%. In addition, it was confirmed that the recognition performance is improved as a result of performing data augmentation. The banknote used in the experiment is Indian rupee, which is badly soiled and the font of characters is unusual, therefore it can be regarded to have good performance. Recognition speed was also enough to run in real time on a device that counts 800 banknotes per minute.

LPC 켑스트럼 계수를 이용한 특정인의 코골이 인식 (Snorer-Dependent Snore Recognition Using LPC Cepstral Coefficients)

  • 최호선;장원규;이경중
    • 대한전기학회논문지:시스템및제어부문D
    • /
    • 제52권9호
    • /
    • pp.554-559
    • /
    • 2003
  • In this paper the possibility of snorer-dependent snore recognition using cepstral coefficients was suggested. We assumed that snore and speech sounds have some similarities and we used cepstral coefficients which are widely used for speech recognition. Snoring data were acquired from 18 persons including 5 patients diagnosed as snore patient. To evaluate the performance of proposed method, the distance ratio based on LPC cepstral coefficients was selected as an index for snorer-dependent snore recognition. As a result, distance ratio of 3 was selected as optimal value showing the most efficient snorer-dependent snore recognition, which is high accuracy of 95.05% on average. In conclusion, the proposed method showed the possibilities to be applied in clinical applications for snorer-dependent snore recognition.

증강현실 기반 아동 학습 어플리케이션을 위한 실시간 영상 인식 (Real-Time Object Recognition for Children Education Applications based on Augmented Reality)

  • 박강규;이강
    • 한국멀티미디어학회논문지
    • /
    • 제20권1호
    • /
    • pp.17-31
    • /
    • 2017
  • The aim of the paper is to present an object recognition method toward augmented reality system that utilizes existing education instruments that was designed without any consideration on image processing and recognition. The light reflection, sizes, shapes, and color range of the existing target education instruments are major hurdles to our object recognition. In addition, the real-time performance requirements on embedded devices and user experience constraints for children users are quite challenging issues to be solved for our image processing and object recognition approach. In order to meet these requirements we employed a method cascading light-weight weak classification methods that are complimentary each other to make a resultant complicated and highly accurate object classifier toward practically reasonable precision ratio. We implemented the proposed method and tested the performance by video with more than 11,700 frames of actual playing scenario. The experimental result showed 0.54% miss ratio and 1.35% false hit ratio.

모음길이 비율에 따른 발화속도 보상을 이용한 한국어 음성인식 성능향상 (An Improvement of Korean Speech Recognition Using a Compensation of the Speaking Rate by the Ratio of a Vowel length)

  • 박준배;김태준;최성용;이정현
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2003년도 컴퓨터소사이어티 추계학술대회논문집
    • /
    • pp.195-198
    • /
    • 2003
  • The accuracy of automatic speech recognition system depends on the presence of background noise and speaker variability such as sex, intonation of speech, and speaking rate. Specially, the speaking rate of both inter-speaker and intra-speaker is a serious cause of mis-recognition. In this paper, we propose the compensation method of the speaking rate by the ratio of each vowel's length in a phrase. First the number of feature vectors in a phrase is estimated by the information of speaking rate. Second, the estimated number of feature vectors is assigned to each syllable of the phrase according to the ratio of its vowel length. Finally, the process of feature vector extraction is operated by the number that assigned to each syllable in the phrase. As a result the accuracy of automatic speech recognition was improved using the proposed compensation method of the speaking rate.

  • PDF

음소기반 인식 네트워크에서의 단어 검출률을 이용한 문장거부 (Sentence Rejection using Word Spotting Ratio in the Phoneme-based Recognition Network)

  • 김형태;하진영
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2005년도 춘계 학술대회 발표논문집
    • /
    • pp.99-102
    • /
    • 2005
  • Research efforts have been made for out-of-vocabulary word rejection to improve the confidence of speech recognition systems. However, little attention has been paid to non-recognition sentence rejection. According to the appearance of pronunciation correction systems using speech recognition technology, it is needed to reject non-recognition sentences to provide users with more accurate and robust results. In this paper, we introduce standard phoneme based sentence rejection system with no need of special filler models. Instead we used word spotting ratio to determine whether input sentences would be accepted or rejected. Experimental results show that we can achieve comparable performance using only standard phoneme based recognition network in terms of the average of FRR and FAR.

  • PDF

음소기반 인식 네트워크에서의 비인식 대상 문장 거부 기능의 비교 연구 (Comparison Research of Non-Target Sentence Rejection on Phoneme-Based Recognition Networks)

  • 김형태;하진영
    • 대한음성학회지:말소리
    • /
    • 제59호
    • /
    • pp.27-51
    • /
    • 2006
  • For speech recognition systems, rejection function as well as decoding function is necessary to improve the reliability. There have been many research efforts on out-of-vocabulary word rejection, however, little attention has been paid on non-target sentence rejection. Recently pronunciation approaches using speech recognition increase the need for non-target sentence rejection to provide more accurate and robust results. In this paper, we proposed filler model method and word/phoneme detection ratio method to implement non-target sentence rejection system. We made performance evaluation of filler model along to word-level, phoneme-level, and sentence-level filler models respectively. We also perform the similar experiment using word-level and phoneme-level word/phoneme detection ratio method. For the performance evaluation, the minimized average of FAR and FRR is used for comparing the effectiveness of each method along with the number of words of given sentences. From the experimental results, we got to know that word-level method outperforms the other methods, and word-level filler mode shows slightly better results than that of word detection ratio method.

  • PDF

적응 MFCC와 Neural Network 기반의 음성인식법 (Voice Recognition Based on Adaptive MFCC and Neural Network)

  • 배현수;이석규
    • 대한임베디드공학회논문지
    • /
    • 제5권2호
    • /
    • pp.57-66
    • /
    • 2010
  • In this paper, we propose an enhanced voice recognition algorithm using adaptive MFCC(Mel Frequency Cepstral Coefficients) and neural network. Though it is very important to extract voice data from the raw data to enhance the voice recognition ratio, conventional algorithms are subject to deteriorating voice data when they eliminate noise within special frequency band. Differently from the conventional MFCC, the proposed algorithm imposed bigger weights to some specified frequency regions and unoverlapped filterbank to enhance the recognition ratio without deteriorating voice data. In simulation results, the proposed algorithm shows better performance comparing with MFCC since it is robust to variation of the environment.

청각장애 유소아의 신호대소음비에 따른 문장인지 능력 (The Effect of Signal-to-Noise Ratio on Sentence Recognition Performance in Pre-school Age Children with Hearing Impairment)

  • 이미숙
    • 말소리와 음성과학
    • /
    • 제3권1호
    • /
    • pp.117-123
    • /
    • 2011
  • Most individuals with hearing impairment have difficulty in understanding speech in noisy situations. This study was conducted to investigate sentence recognition ability using the Korean Standard-Sentence Lists for Preschoolers (KS-SL-P2) in pre-school age children with cochlear implants and hearing aids. The subjects of this study were 10 pre-school age children with hearing aids, 12 pre-school age children with cochlear implants, and 10 pre-school age children with normal hearing. Three kinds of signal-to-noise (SNR) conditions (+10 dB, +5 dB, 0 dB) were applied. The results for all pre-school age children with cochlear implants and hearing aids presented a significant increase in the score for sentence recognition as SNR increased. The sentence recognition score in speech noise were obtained with the SNR +10 dB. Significant differences existed between groups in terms of their sentence recognition ability, with the cochlear implant group performing better than the hearing aid group. These findings suggest the presence of a sentence recognition test using speech noise is useful for evaluating pre-school age children's listening skill.

  • PDF