Search | Korea Science

The Effect of the Number of Training Data on Speech Recognition

Lee, Chang-Young
- The Journal of the Acoustical Society of Korea
- /
- v.28 no.2E
- /
- pp.66-71
- /
- 2009
In practical applications of speech recognition, one of the fundamental questions might be on the number of training data that should be provided for a specific task. Though plenty of training data would undoubtedly enhance the system performance, we are then faced with the problem of heavy cost. Therefore, it is of crucial importance to determine the least number of training data that will afford a certain level of accuracy. For this purpose, we investigate the effect of the number of training data on the speaker-independent speech recognition of isolated words by using FVQ/HMM. The result showed that the error rate is roughly inversely proportional to the number of training data and grows linearly with the vocabulary size.
PDF KSCI

A Study of Preprocessing in the Speech Recognition System Using HMM Algorithm (HMM을 이용한 음성인식 시스템의 전처리에 관한 연구)

이윤주;오세영;이순규;배명진
- Proceedings of the IEEK Conference
- /
- 1999.11a
- /
- pp.668-671
- /
- 1999
현대 사회의 컴퓨터 사용자 계층은 점점 그 범위와 수가 커지고 있다 이러한 추세는 앞으로도 계속 증가할 것이다. 따라서 많은 사람들은 더 편리하고 익히기 쉬운 컴퓨터의 사용법을 원하고 생활속에서 더 많이 컴퓨터를 활용하기를 원한다. 그러므로 인간에게 가장 친숙한 음성을 이용함으로써 이런 사용자들의 필요를 충족시킬 수 있을 뿐 아니라 사용자가 쉽게 접할 수 있도록 할 수 있다. 그러므로 본 논문의 목적은 이러한 상황에서 인간과 기계와의 인터페이스를 인간의 기본적인 의사소통 수단인 음성을 이용하여 보다 빨리 작업 할 수 있게 하는 취지에 있다. 기존의 인식알고리즘은 그 복잡성이 높을수록 인식률은 증가하나 계산시간이 많이 걸린다는 단점이 있다. 이러한 계산시간의 증가는 윈도우환경의 컴퓨터 사용시 다른 프로그램의 실행에 지장을 줄 수 있다. 따라서 인식률은 증가시키면서 인식 시간은 감소시킬 수 있는 방법들이 필요하다. 본 논문에서는 컴퓨터 사용시 쓰이는 명령어를 기본으로 하여 보다 빠른 인식 처리를 수행하기 위해 기준 패턴의 후보자를 선정하는 방법을 제안한다
PDF

Isolated Word Recognition Using a Speaker-Adaptive Neural Network (화자적응 신경망을 이용한 고립단어 인식)

이기희;임인칠
- Journal of the Korean Institute of Telematics and Electronics B
- /
- v.32B no.5
- /
- pp.765-776
- /
- 1995
This paper describes a speaker adaptation method to improve the recognition performance of MLP(multiLayer Perceptron) based HMM(Hidden Markov Model) speech recognizer. In this method, we use lst-order linear transformation network to fit data of a new speaker to the MLP. Transformation parameters are adjusted by back-propagating classification error to the transformation network while leaving the MLP classifier fixed. The recognition system is based on semicontinuous HMM's which use the MLP as a fuzzy vector quantizer. The experimental results show that rapid speaker adaptation resulting in high recognition performance can be accomplished by this method. Namely, for supervised adaptation, the error rate is signifecantly reduced from 9.2% for the baseline system to 5.6% after speaker adaptation. And for unsupervised adaptation, the error rate is reduced to 5.1%, without any information from new speakers.
PDF

Off-line recognition of handwritten korean and alphanumeric characters using hidden markov models (Hidden Markov Model을 이용한 필기체 한글 및 영.숫자 오프라인 인식)

김우성;박래홍
- Journal of the Korean Institute of Telematics and Electronics B
- /
- v.31B no.9
- /
- pp.85-100
- /
- 1994
This paper proposes a recognition system of constrained handwritten Hangul and alphanumeric characters using discrete hidden Markov models (HMM). HMM process encodes the distortion and similarity among patterns of a class through a doubly stochastic approach. Characterizing the statistical properties of characters using selected features, a recognition system can be implemented by absorbing possible variations in the form. Hangul shapes are classified into six types by fuzzy inference, and their recognition is performed based on quantized features by optimally ordering features according to their effectiveness in each class. The constrained alphanumerics recognition is also performed using the same features used in Hangul recognition. The forward-backward, Viterbi, and Baum-Welch reestimation algorithms are used for training and recognition of handwritten Hangul and alphanumeric characters. Simulation result shows that the proposed method recognizes handwritten Korean characters and alphanumerics effectively.
PDF

Speech Feature Extraction Based on the Human Hearing Model

Chung, Kwang-Woo;Kim, Paul;Hong, Kwang-Seok
- Proceedings of the KSPS conference
- /
- 1996.10a
- /
- pp.435-447
- /
- 1996
In this paper, we propose the method that extracts the speech feature using the hearing model through signal processing techniques. The proposed method includes the following procedure ; normalization of the short-time speech block by its maximum value, multi-resolution analysis using the discrete wavelet transformation and re-synthesize using the discrete inverse wavelet transformation, differentiation after analysis and synthesis, full wave rectification and integration. In order to verify the performance of the proposed speech feature in the speech recognition task, korean digit recognition experiments were carried out using both the DTW and the VQ-HMM. The results showed that, in the case of using DTW, the recognition rates were 99.79% and 90.33% for speaker-dependent and speaker-independent task respectively and, in the case of using VQ-HMM, the rate were 96.5% and 81.5% respectively. And it indicates that the proposed speech feature has the potential for use as a simple and efficient feature for recognition task
PDF

Analysis of Table Tennis Swing using Action Recognition (동작인식을 이용한 탁구 스윙 분석)

Heo, Geon;Ha, Jong-Eun
- Journal of Institute of Control, Robotics and Systems
- /
- v.21 no.1
- /
- pp.40-45
- /
- 2015
In this paper, we present an algorithm for the analysis of poses while playing table-tennis using action recognition. We use Kinect as the 3D sensor and 3D skeleton data provided by Kinect for further processing. We adopt a spherical coordinate system and feature selected using k-means clustering. We automatically detect the starting and ending frame and discriminate the action of table-tennis into two groups of forehand and backhand swing. Each swing is modeled using HMM(Hidden Markov Model) and we used a dataset composed of 200 sequences from two players. We can discriminate two types of table tennis swing in real-time. Also, it can provide analysis according to similarities found in good poses.
https://doi.org/10.5302/J.ICROS.2015.14.0078 인용 PDF KSCI

A Study on the Non-keyword Models in the Keyword Spotting System using the Phone-Based Hidden Markov Models (음소 HMM을 이용한 Keyword Spotting 시스템에서의 Non-Keyword 모델에 관한 연구)

이활림
- Proceedings of the Acoustical Society of Korea Conference
- /
- 1995.06a
- /
- pp.83-87
- /
- 1995
Keyword Spotting 이란 음성인식의 한 분야로서 입력된 음성에서 미리 정해진 특정단어 또는 복수 개의 단어들 중 어느 것이 포함되어 있는지의 여부를 찾아내고 이 단어를 식별해 내는 작업을 의미한다. 음소모델을 이용하여 Keyword Spotting 시스템을 구성할 경우 새로운 keyword의 추가 또는 변경이 필요할 때 단순히 그 발음사전에 따라 음소모델들을 연결시킴으로써 keyword 모델을 구성할 수 있으므로 단어모델에 의한 방법에 비해 장점이 있다. 본 논문에서는 triphone을 기본단위로 하는 HMM 에 의해 keyword 모델을 구성하고, non-keyword 모델 및 silence 모델을 함께 사용하는 keyword spotting 시스템을 구성하였다. 이러한 시스템에서 non-keyword 모델은 keyword와 keyword가 아닌 음성을 구분 지어주는 역할을 하므로 인식성능의 향상을 위해서는 적절한 non-keyword 모델의 선택이 필요하다. 본 논문에서는 10개의 state를 갖는 단일모델, 조음방법에 의해 음소들을 clustering 한 모델, 그리고 통계적 방법에 의해 음소들을 clustering 한 모델들을 각각 non-keyword 모델로 사용하여 그 성능을 비교하였다. 6개의 keyword를 대상으로 한 화자독립 keyword spotting 실험결과, 통계적 방법에 의해 음소들을 6 또는 7개의 그룹으로 clustering 한 방법이 가장 우수한 인식성능을 나타냈다.
PDF

On Codebook Fesign to Improve Speaker Adaptation (화자 적응 성능 향상을 위한 코드북 설계)

양태영
- Proceedings of the Acoustical Society of Korea Conference
- /
- 1995.06a
- /
- pp.228-231
- /
- 1995
반연속 HMM 음성인식 시스템의 화자 적응 성능 향상을 위해 코드북 변환 알고리즘을 제안하였다. 기존의 화자 적응 알고리즘으로는 새로운 화자의 적응 데이터 특징의 분포와 HMM 모수의 사전밀도를 함께 고려하는 베이시안 화자적응 알고리즘이 있다. 그러나 새로운 화자의 특징분포와 코드북 사전 밀도의 차이가 큰 경우 적응 데이터와 코드북간의 잘못된 대응 관계를 얻을 수 있으며, 기준 코드북에 필요 이상으로 많은 코드워드가 존재하는 경우 적응된 코드북에도 불필요한 코드워드 들이 남아 인식 과정에 혼란을 줄 수 있다. 이 문제점을 해결하기 위하여 제안된 코드북 변환 알고리즘에서는 주파수 영역의 포만트 정보를 이용하였다. 화자 적응을 수행하기 앞서 코드북의 켑스트럼으로부터 포만트를 추출해 내고, 이들의 분포를 적응 화자의 포만트 분포와 일치되도록 변환시켜 주었다. 이 변환된 포만트들로부터 다시 켑스트럼을 구하여 변환된 코드북을 얻고 이를 화자 적응의 초기 코드북으로 사용하였다. 제안된 알고리즘을 이용하였을 경우 코드북과 적응 화자의 음성 간의 정확한 대응관계를 찾을 수 있었고, 불필요한 코드워드들이 인식 과정에서 사용되지 않도록 변환되어 인식률이 향상되는 것을 실험을 통해 확인하였다.
PDF

A Study on the Speaker Adaptation in HMM Using Variable Number of Branches in Each State (상태당 가지수를 가변시킨 HMM을 이용한 화자적응화에 관한 연구)

김광태;서정일;한유수;홍재근
- The Journal of the Acoustical Society of Korea
- /
- v.17 no.3
- /
- pp.90-95
- /
- 1998
본 논문에서는 CHMM인 CDHMM과 ARHMM을 이용하여 화자적응화 하는 방법을 각각 연구하였다. CDHMM에서는 최대사후화확률 추정법에 의하여 각 상태마다 하나의 가 지를 이용하여 화자에 적응시킨다. 본 논문에서는 음성의 다양한 음향학적 특징을 표현하기 위하여 상태마다 여러 개의 가지를 갖는 방법을 제안하였다. 상태마다의 적절한 가지 수를 결정하기 위하여 각 상태에 속하는 프레임 수와 특징 벡터들의 분산행렬의 행렬식값을 이용 하였다. ARHMM에서는 특징벡터로 선형예측계수를 사용하기 때문에 최대사후화확률 추정 법을 사용할 수 없게 된다. 따라서 화자독립모델을 이용하여 적응화자에 대한 음성을 Viterbi 알고리듬으로 상태별로 분할한 후 k-means 알고리듬을 이용하여 각 상태마다 하나 의 가지를 갖는 모델로 적응시키는 방법을 제안하였다.
PDF

Emotion Recognition using Speech Recognition Information (음성 인식 정보를 사용한 감정 인식)

Kim, Won-Gu
- Proceedings of the Korean Institute of Intelligent Systems Conference
- /
- 2008.04a
- /
- pp.425-428
- /
- 2008
본 논문은 음성을 사용한 인간의 감정 인식 시스템의 성능을 향상시키기 위하여 감정 변화에 강인한 음성 인식 시스템과 결합된 감정 인식 시스템에 관하여 연구하였다. 이를 위하여 우선 다양한 감정이 포함된 음성 데이터베이스를 사용하여 감정 변화가 음성 인식 시스템의 성능에 미치는 영향에 관한 연구와 감정 변화의 영향을 적게 받는 음성 인식 시스템을 구현하였다. 감정 인식은 음성 인식의 결과에 따라 입력 문장에 대한 각각의 감정 모델을 비교하여 입력 음성에 대한 최종 감정 인식을 수행한다. 실험 결과에서 강인한 음성 인식 시스템은 음성 파라메터로 RASTA 멜 켑스트럼과 델타 켑스트럼을 사용하고 신호편의 제거 방법으로 CMS를 사용한 HMM 기반의 화자독립 단어 인식기를 사용하였다. 이러한 음성 인식기와 결합된 감정 인식을 수행한 결과 감정 인식기만을 사용한 경우보다 좋은 성능을 나타내었다.
PDF

Search Result 963, Processing Time 0.03 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)