• 제목/요약/키워드: Recognition Improvement

검색결과 1,491건 처리시간 0.023초

Intra-and Inter-frame Features for Automatic Speech Recognition

  • Lee, Sung Joo;Kang, Byung Ok;Chung, Hoon;Lee, Yunkeun
    • ETRI Journal
    • /
    • 제36권3호
    • /
    • pp.514-517
    • /
    • 2014
  • In this paper, alternative dynamic features for speech recognition are proposed. The goal of this work is to improve speech recognition accuracy by deriving the representation of distinctive dynamic characteristics from a speech spectrum. This work was inspired by two temporal dynamics of a speech signal. One is the highly non-stationary nature of speech, and the other is the inter-frame change of a speech spectrum. We adopt the use of a sub-frame spectrum analyzer to capture very rapid spectral changes within a speech analysis frame. In addition, we attempt to measure spectral fluctuations of a more complex manner as opposed to traditional dynamic features such as delta or double-delta. To evaluate the proposed features, speech recognition tests over smartphone environments were conducted. The experimental results show that the feature streams simply combined with the proposed features are effective for an improvement in the recognition accuracy of a hidden Markov model-based speech recognizer.

유전자 알고리즘을 이용한 화자인식 시스템 성능 향상 (Performance Improvement of Speaker Recognition System Using Genetic Algorithm)

  • 문인섭;김종교
    • 한국음향학회지
    • /
    • 제19권8호
    • /
    • pp.63-67
    • /
    • 2000
  • 본 논문에서는 화자인식의 성능향상을 위한 dynamic time warping (DTW) 기반의 문맥 제시형 화자인식에 대해 연구하였다. 화자인식에 있어 중요한 요소인 화자의 특성을 잘 반영할 수 있는 참조패턴을 생성하기 위해 유전자 알고리즘을 적용하였다. 또한, 문맥 종속형과 문맥 독립형 화자인식의 단점을 개선하기 위해 문맥 제시형 화자인식을 수행하였다. Clos set에서 화자식별과 open set에서 화자확인 실험을 하였으며 실험결과 기존 방법의 참조패턴을 이용하였을 경우보다 유전자 알고리즘에 의한 참조패턴이 인식률과 인식속도 면에서 우수함을 보였다.

  • PDF

Improvement of Recognition Performance for Limabeam Algorithm by using MLLR Adaptation

  • Nguyen, Dinh Cuong;Choi, Suk-Nam;Chung, Hyun-Yeol
    • 대한임베디드공학회논문지
    • /
    • 제8권4호
    • /
    • pp.219-225
    • /
    • 2013
  • This paper presents a method using Maximum-Likelihood Linear Regression (MLLR) adaptation to improve recognition performance of Limabeam algorithm for speech recognition using microphone array. From our investigation on Limabeam algorithm, we can see that the performance of filtering optimization depends strongly on the supporting optimal state sequence and this sequence is created by using Viterbi algorithm trained with HMM model. So we propose an approach using MLLR adaptation for the recognition of speech uttered in a new environment to obtain better optimal state sequence that support for the filtering parameters' optimal step. Experimental results show that the system embedded with MLLR adaptation presents the word correct recognition rate 2% higher than that of original calibrate Limabeam and also present 7% higher than that of Delay and Sum algorithm. The best recognition accuracy of 89.4% is obtained when we use 4 microphones with 5 utterances for adaptation.

고차 반사계수 특성을 이용한 화자인식의 성능 향상에 관한 연구 (On a Study of the Improvement of Speaker Recognition with Characteristics of High Order Reflection Coefficients)

  • 이윤주;오세영;함명규;배명진
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 1999년도 하계종합학술대회 논문집
    • /
    • pp.667-670
    • /
    • 1999
  • As the number of reference patterns increase in the text dependant speaker recognition, the recognition performance of the system degrades. So, if reference patterns were decreased the high recognition rate can be obtained. It’s because the speaker recognition can obtain the high discrimination. In this paper, to decrease the number of reference patterns, we choose candidate reference patterns to perform pattern matching with test pattern by high order component of the reflection coefficients of the uttered speech signal Consequently the total recognition rate of the proposed method is about 2% higher than that of the conventional method.

  • PDF

The Performance Improvement of Speech Recognition System based on Stochastic Distance Measure

  • Jeon, B.S.;Lee, D.J.;Song, C.K.;Lee, S.H.;Ryu, J.W.
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • 제4권2호
    • /
    • pp.254-258
    • /
    • 2004
  • In this paper, we propose a robust speech recognition system under noisy environments. Since the presence of noise severely degrades the performance of speech recognition system, it is important to design the robust speech recognition method against noise. The proposed method adopts a new distance measure technique based on stochastic probability instead of conventional method using minimum error. For evaluating the performance of the proposed method, we compared it with conventional distance measure for the 10-isolated Korean digits with car noise. Here, the proposed method showed better recognition rate than conventional distance measure for the various car noisy environments.

색 및 패턴 정보 다중화를 이용한 칼라 QR코드의 비트 인식률 개선 (Improvement of Bit Recognition Rate for Color QR Codes By Multiplexing Color and Pattern Information)

  • 김진수
    • 한국멀티미디어학회논문지
    • /
    • 제24권8호
    • /
    • pp.1012-1019
    • /
    • 2021
  • Currently, since the black-white QR (Quick Response) codes have limited storage capacity, color QR codes have been actively being studied. By multiplexing 3 colors, the color QR codes can allow the code capacity to be increased by three times, however, the color multiplexing brings about the possibility of crosstalk and noises in the acquisition process of the final image, incurring the decrease of bit-recognition rate. In order to improve the bit recognition rate, while keeping the storage capacity high, this paper proposes a new type of color QR code which uses the pattern information as well as the color information, and then analyzes how to increase the bit recognition rate. For this aim, the paper presents an efficient system which extracts embedded information from color QR code and then, through practical experiments, it is shown that the proposed color QR codes improves the bit recognition rate and are useful for commercial applications, compared to the conventional color codes.

A Study on the Performance Analysis of Entity Name Recognition Techniques Using Korean Patent Literature

  • Gim, Jangwon
    • 한국정보기술학회 영문논문지
    • /
    • 제10권2호
    • /
    • pp.139-151
    • /
    • 2020
  • Entity name recognition is a part of information extraction that extracts entity names from documents and classifies the types of extracted entity names. Entity name recognition technologies are widely used in natural language processing, such as information retrieval, machine translation, and query response systems. Various deep learning-based models exist to improve entity name recognition performance, but studies that compared and analyzed these models on Korean data are insufficient. In this paper, we compare and analyze the performance of CRF, LSTM-CRF, BiLSTM-CRF, and BERT, which are actively used to identify entity names using Korean data. Also, we compare and evaluate whether embedding models, which are variously used in recent natural language processing tasks, can affect the entity name recognition model's performance improvement. As a result of experiments on patent data and Korean corpus, it was confirmed that the BiLSTM-CRF using FastText method showed the highest performance.

KMSAV: Korean multi-speaker spontaneous audiovisual dataset

  • Kiyoung Park;Changhan Oh;Sunghee Dong
    • ETRI Journal
    • /
    • 제46권1호
    • /
    • pp.71-81
    • /
    • 2024
  • Recent advances in deep learning for speech and visual recognition have accelerated the development of multimodal speech recognition, yielding many innovative results. We introduce a Korean audiovisual speech recognition corpus. This dataset comprises approximately 150 h of manually transcribed and annotated audiovisual data supplemented with additional 2000 h of untranscribed videos collected from YouTube under the Creative Commons License. The dataset is intended to be freely accessible for unrestricted research purposes. Along with the corpus, we propose an open-source framework for automatic speech recognition (ASR) and audiovisual speech recognition (AVSR). We validate the effectiveness of the corpus with evaluations using state-of-the-art ASR and AVSR techniques, capitalizing on both pretrained models and fine-tuning processes. After fine-tuning, ASR and AVSR achieve character error rates of 11.1% and 18.9%, respectively. This error difference highlights the need for improvement in AVSR techniques. We expect that our corpus will be an instrumental resource to support improvements in AVSR.

동작 인식 게임의 융합 발전 방향 (A Study on Convergence Development Direction of Gesture Recognition Game)

  • 이면재
    • 한국융합학회논문지
    • /
    • 제5권4호
    • /
    • pp.1-7
    • /
    • 2014
  • 동작 인식은 동작을 인식하여 처리하는 기술로 사용자에게 편이성과 직관성을 제공한다. 이러한 장점 때문에 동작 인식 기술은 군사, 의료, 교육 등 여러 분야에 융합되어 응용되고 있다. 특히, 게임 분야에서 동작 인식은 실제 동작과 유사하게 플레이할 수 있다는 장점 때문에, 의료, 군사, 교육 등의 분야와 융합되어지고 있다. 본 논문은 이러한 배경을 바탕으로 동작 인식 게임의 융합 발전 방향을 논하기 위한 것이다. 이를 위하여 본 논문에서는 동작 인식 기술 현황과 게임을 살펴보고 동작 인식 게임의 문제점과 개선 방안을 기술한다. 본 논문은 국내 동작 인식게임의 융합 경쟁력을 향상시키는데 도움을 줄 수 있다.

음향학적 및 언어적 탐색을 이용한 어휘 인식 최적화 (The Vocabulary Recognition Optimize using Acoustic and Lexical Search)

  • 안찬식;오상엽
    • 한국멀티미디어학회논문지
    • /
    • 제13권4호
    • /
    • pp.496-503
    • /
    • 2010
  • 어휘인식 시스템은 스탠드 얼론(Standalone)으로 개발되어 지고 있으며 휴대용 단말기에서 사용하였을 경우 메모리 공간의 제약과 오디오 압축으로 인해 인식률이 낮게 나타난다. 본 연구에서는 휴대용 단말기의 성능과 인식률 향상을 위하여 음향학적 탐색과 언어적 탐색을 분리하여 어휘 인식 속도를 개선한 시스템을 제안하였다. 음향학적 탐색은 휴대용 단말기에서 수행하고 보다 복잡한 언어적 탐색은 서버에서 처리하는 시스템으로 음성신호로부터 특징벡터를 추출하여 GMM을 이용한 음소인식을 수행하고, 인식된 음소 열을 서버로 전송하여 렉시컬 트리 탐색 알고리즘을 사용하여 언어적 탐색 단계에서 어휘 인식을 수행하였다. 시스템 성능 평가 결과 어휘 종속 인식률은 98.01%, 어휘 독립 인식률은 97.71%의 인식률을 나타냈으며 인식속도는 1.58초로 나타내었다.