Search | Korea Science

On-Line Blind Channel Normalization for Noise-Robust Speech Recognition

Jung, Ho-Young
- IEIE Transactions on Smart Processing and Computing
- /
- 제1권3호
- /
- pp.143-151
- /
- 2012
A new data-driven method for the design of a blind modulation frequency filter that suppresses the slow-varying noise components is proposed. The proposed method is based on the temporal local decorrelation of the feature vector sequence, and is done on an utterance-by-utterance basis. Although the conventional modulation frequency filtering approaches the same form regardless of the task and environment conditions, the proposed method can provide an adaptive modulation frequency filter that outperforms conventional methods for each utterance. In addition, the method ultimately performs channel normalization in a feature domain with applications to log-spectral parameters. The performance was evaluated by speaker-independent isolated-word recognition experiments under additive noise environments. The proposed method achieved outstanding improvement for speech recognition in environments with significant noise and was also effective in a range of feature representations.
PDF

Adaptive Channel Normalization Based on Infomax Algorithm for Robust Speech Recognition

Jung, Ho-Young
- ETRI Journal
- /
- 제29권3호
- /
- pp.300-304
- /
- 2007
This paper proposes a new data-driven method for high-pass approaches, which suppresses slow-varying noise components. Conventional high-pass approaches are based on the idea of decorrelating the feature vector sequence, and are trying for adaptability to various conditions. The proposed method is based on temporal local decorrelation using the information-maximization theory for each utterance. This is performed on an utterance-by-utterance basis, which provides an adaptive channel normalization filter for each condition. The performance of the proposed method is evaluated by isolated-word recognition experiments with channel distortion. Experimental results show that the proposed method yields outstanding improvement for channel-distorted speech recognition.
PDF

Feature Extraction Based on Speech Attractors in the Reconstructed Phase Space for Automatic Speech Recognition Systems

Shekofteh, Yasser;Almasganj, Farshad
- ETRI Journal
- /
- 제35권1호
- /
- pp.100-108
- /
- 2013
In this paper, a feature extraction (FE) method is proposed that is comparable to the traditional FE methods used in automatic speech recognition systems. Unlike the conventional spectral-based FE methods, the proposed method evaluates the similarities between an embedded speech signal and a set of predefined speech attractor models in the reconstructed phase space (RPS) domain. In the first step, a set of Gaussian mixture models is trained to represent the speech attractors in the RPS. Next, for a new input speech frame, a posterior-probability-based feature vector is evaluated, which represents the similarity between the embedded frame and the learned speech attractors. We conduct experiments for a speech recognition task utilizing a toolkit based on hidden Markov models, over FARSDAT, a well-known Persian speech corpus. Through the proposed FE method, we gain 3.11% absolute phoneme error rate improvement in comparison to the baseline system, which exploits the mel-frequency cepstral coefficient FE method.
https://doi.org/10.4218/etrij.13.0112.0074 인용 PDF KSCI

Effects of JPEG Compression on Joint Transform Correlator

Widjaja, Joewono;Suripon, Ubon
- 제어로봇시스템학회:학술대회논문집
- /
- 제어로봇시스템학회 2004년도 ICCAS
- /
- pp.1662-1665
- /
- 2004
A real-time joint transform correlator by using JPEG-compressed reference images is proposed as practical solution to storage problem and improvement of processing time of automatic target recognition system [1]. Effects of compression on recognition performance of join transform correlator are quantitatively investigated under situations where the target is suffered from noise and has contrast difference with respect to the reference. Two images with different spatial-frequency contents and contrast were used as the test scenes. The simulation results show that, the recognition performance of joint transform correlator by using the compressed reference images with high spatial-frequency components is more sensitive to noise and contrast difference than the low spatial-frequency image.
PDF

Vector space based augmented structural kinematic feature descriptor for human activity recognition in videos

Dharmalingam, Sowmiya;Palanisamy, Anandhakumar
- ETRI Journal
- /
- 제40권4호
- /
- pp.499-510
- /
- 2018
A vector space based augmented structural kinematic (VSASK) feature descriptor is proposed for human activity recognition. An action descriptor is built by integrating the structural and kinematic properties of the actor using vector space based augmented matrix representation. Using the local or global information separately may not provide sufficient action characteristics. The proposed action descriptor combines both the local (pose) and global (position and velocity) features using augmented matrix schema and thereby increases the robustness of the descriptor. A multiclass support vector machine (SVM) is used to learn each action descriptor for the corresponding activity classification and understanding. The performance of the proposed descriptor is experimentally analyzed using the Weizmann and KTH datasets. The average recognition rate for the Weizmann and KTH datasets is 100% and 99.89%, respectively. The computational time for the proposed descriptor learning is 0.003 seconds, which is an improvement of approximately 1.4% over the existing methods.
https://doi.org/10.4218/etrij.2018-0102 인용 PDF KSCI

Language Model Adaptation for Broadcast News Recognition (방송 뉴스 인식을 위한 언어 모델 적응)

Kim Hyun Suk;Jeon Hyung Bae;Kim Sanghun;Choi Joon Ki;Yun Seung
- MALSORI
- /
- 제51호
- /
- pp.99-115
- /
- 2004
In this parer, we propose LM adaptation for broadcast news recognition. We collect information of recent articles from the internet on real time, make a recent small size LM, and then interpolate recent LM with a existing LM composed of existing large broadcast news corpus. We performed interpolation experiments to get the best type of articles from recent corpus because collected recent corpus is composed of articles which are related with test set, and which are unrelated. When we made an adapted LM using recent LM with similar articles to test set through Tf-Idf method and existing LM, we got the best result that ERR of pseudo-morpheme based recognition performance has 17.2 % improvement and the number of OOV has reduction from 70 to 27.
PDF

Improvement of Confidence Measure Performance in Keyword Spotting using Background Model Set Algorithm (BMS 알고리즘을 이용한 핵심어 검출기 거절기능 성능 향상 실험)

Kim Byoung-Don;Kim Jin-Young;Choi Seung-Ho
- MALSORI
- /
- 제46호
- /
- pp.103-115
- /
- 2003
In this paper, we proposed Background Model Set algorithm used in the speaker verification to improve calculating confidence measure(CM) in speech recognition. CM is to display relative likelihood between recognized models and antiphone models. In previous method calculating of CM, we calculated probability and standard deviation using all phonemes in composition of antiphone models. At this process, antiphone CM brought bad recognition result. Also, recognition time increases. In order to solve this problem, we studied about method to reconstitute average and standard deviation using BMS algorithm in CM calculation.
PDF

State-Dependent Weighting of Multiple Feature Parameters in HMM Recognizer (HMM 인식기에서 상태별 다중 특징 파라미터 가중)

손종목;배건성
- The Journal of the Acoustical Society of Korea
- /
- 제18권4호
- /
- pp.47-52
- /
- 1999
In this paper, we proposed a new approach to weight each feature parameter by considering the dispersion of feature parameters and its degree of contribution to recognition rate. We determined the total distribution factor that is proportional to recognition rate of each feature parameter and the dispersion factor according to the dispersion of each feature parameter. Then. we determined state-dependent weighting using the total distribution factor and dispersion factor. To verify the validity of the proposed approach, recognition experiments were performed using the PLU(Phoneme-Like Unit)-based HMM. Experimental results showed the improvement of 7.7% at the recognition rate using the proposed method.
PDF

A study for improvement of Recognition velocity of Korean Character using Neural Oscillator (신경 진동자를 이용한 한글 문자의 인식 속도의 개선에 관한 연구)

Kwon, Yong-Bum;Lee, Joon-Tark
- Proceedings of the Korean Institute of Intelligent Systems Conference
- /
- 한국퍼지및지능시스템학회 2004년도 춘계학술대회 학술발표 논문집 제14권 제1호
- /
- pp.491-494
- /
- 2004
Neural Oscillator can be applied to oscillatory systems such as the image recognition, the voice recognition, estimate of the weather fluctuation and analysis of geological fluctuation etc in nature and principally, it is used often to pattern recoglition of image information. Conventional BPL(Back-Propagation Learning) and MLNN(Multi Layer Neural Network) are not proper for oscillatory systems because these algorithm complicate Learning structure, have tedious procedures and sluggish convergence problem. However, these problems can be easily solved by using a synchrony characteristic of neural oscillator with PLL(phase-Locked Loop) function and by using a simple Hebbian learning rule. And also, Recognition velocity of Korean Character can be improved by using a Neural Oscillator's learning accelerator factor η$\_$ij/
PDF

An Improvement of Korean Speech Recognition Using a Compensation of the Speaking Rate by the Ratio of a Vowel length (모음길이 비율에 따른 발화속도 보상을 이용한 한국어 음성인식 성능향상)

박준배;김태준;최성용;이정현
- Proceedings of the IEEK Conference
- /
- 대한전자공학회 2003년도 컴퓨터소사이어티 추계학술대회논문집
- /
- pp.195-198
- /
- 2003
The accuracy of automatic speech recognition system depends on the presence of background noise and speaker variability such as sex, intonation of speech, and speaking rate. Specially, the speaking rate of both inter-speaker and intra-speaker is a serious cause of mis-recognition. In this paper, we propose the compensation method of the speaking rate by the ratio of each vowel's length in a phrase. First the number of feature vectors in a phrase is estimated by the information of speaking rate. Second, the estimated number of feature vectors is assigned to each syllable of the phrase according to the ratio of its vowel length. Finally, the process of feature vector extraction is operated by the number that assigned to each syllable in the phrase. As a result the accuracy of automatic speech recognition was improved using the proposed compensation method of the speaking rate.
PDF

검색결과 1,496건 처리시간 0.028초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)