• 제목/요약/키워드: Feature normalization

검색결과 155건 처리시간 0.025초

강인한 음성인식을 위한 켑스트럼 거리와 로그 에너지 기반 묵음 특징 정규화 (Cepstral Distance and Log-Energy Based Silence Feature Normalization for Robust Speech Recognition)

  • 신광호;정현열
    • 한국음향학회지
    • /
    • 제29권4호
    • /
    • pp.278-285
    • /
    • 2010
  • 훈련 환경과 인식 환경의 차이가 음성인식 성능저하의 주요요인이다. 이러한 환경의 불일치를 줄이기 위한 방법으로 다양한 묵음특징 정규화 방법이 제안되고 있다. 기존의 묵음특징 정규화 방법은 낮은 SNR (Signal-to-Noise Ratio)에서 묵음구간의 에너지 레벨이 증가하여 음성/묵음 분류의 정확도가 떨어짐으로 인해 인식성능이 저하되는 문제점이 있었다. 본 논문에서는 로그 에너지와 음성/묵음(또는잡음)의 켑스트럼 특징의 분포 특성의 차이를 나타내는 켑스트럼 유클리디언(Euclidean) 거리를 결합하여 음성/묵음을 분류하는 묵음특징 정규화 방법 (Cepstral distance and Log-energy based Silence Feature Normalization)을 제안하였다. 제안한 방법은 높은 SNR에서는 로그 에너지 특징이 잡음의 영향을 적게 받는 특성을 반영하여 기존의 묵음 특징 정규화 (Silence Feature Normalization)방법의 우수성을 그대로 유지하는 반면, 낮은 SNR에서는 로그 에너지 대신 음성/묵음 분류의 분별력이 우수한 켑스트럼 거리 정보를 이용함으로써 인식성능을 향상시킬 수 있다. 인식실험결과 기존의 SFN-I/II, CSFN 방법에 비해 전반적으로 향상된 인식성능을 얻을 수 있어 그 유효성을 확인할 수 있었다.

Harmonics-based Spectral Subtraction and Feature Vector Normalization for Robust Speech Recognition

  • Beh, Joung-Hoon;Lee, Heung-Kyu;Kwon, Oh-Il;Ko, Han-Seok
    • 음성과학
    • /
    • 제11권1호
    • /
    • pp.7-20
    • /
    • 2004
  • In this paper, we propose a two-step noise compensation algorithm in feature extraction for achieving robust speech recognition. The proposed method frees us from requiring a priori information on noisy environments and is simple to implement. First, in frequency domain, the Harmonics-based Spectral Subtraction (HSS) is applied so that it reduces the additive background noise and makes the shape of harmonics in speech spectrum more pronounced. We then apply a judiciously weighted variance Feature Vector Normalization (FVN) to compensate for both the channel distortion and additive noise. The weighted variance FVN compensates for the variance mismatch in both the speech and the non-speech regions respectively. Representative performance evaluation using Aurora 2 database shows that the proposed method yields 27.18% relative improvement in accuracy under a multi-noise training task and 57.94% relative improvement under a clean training task.

  • PDF

실시간 객체 검출을 위한 개선된 Haar-like Feature 정규화 방법 (An Improved Normalization Method for Haar-like Features for Real-time Object Detection)

  • 박기영;황선영
    • 한국통신학회논문지
    • /
    • 제36권8C호
    • /
    • pp.505-515
    • /
    • 2011
  • 본 논문에서는 객체 검출에 사용되는 Haar-like feature의 정규화 방법에 대해 다룬다. 기존의 Haar-like feature의 분산 정규화는 후보 윈도우 픽셀들에 대한 표준편차 계산에 사용되는 별도의 적분 영상 생성을 위해 많은 연산을 필요로 했으며 밝기 변화가 작은 영역에서 오검출이 증가하는 문제를 가지고 있으나, 제안하는 정규화 방법은 별도의 적분 영상을 사용하지 않아 처리 속도가 빠르며 제안하는 방법을 사용하여 학습시킨 분류기는 밝기 변화에 대해 강건한 성능을 보인다. 실험 결과 제안한 방법을 사용했을 때 객체 검출기의 처리 속도는 26% 향상 되었으며, 제안한 방법을 사용하여 학습시킨 분류기들은 5% 이상 향상된 검출률을 보였으며, 밝기 변화가 심한 경우는 45% 향상된 검출률을 보였다.

SVM에 기반한 음악 장르 분류를 위한 특징벡터 정규화 방법 (Feature-Vector Normalization for SVM-based Music Genre Classification)

  • 임신철;장세진;이석필;김무영
    • 전자공학회논문지SC
    • /
    • 제48권5호
    • /
    • pp.31-36
    • /
    • 2011
  • 본 논문에서는 Mel-Frequency Cepstral Coefficient (MFCC), Decorrelated Filter Bank (DFB), Octave-based Spectral Contrast (OSC), Zero-Crossing Rate (ZCR), 그리고 Spectral Contract/Roll-Off를 복합 특징벡터로 결합하여 Support Vector Machine (SVM)을 이용한 음악 장르 분류 시스템을 설계하였다. 기존 방식에서는 전체 학습 데이터에 대한 특징벡터를 정규화를 한 후 SVM 모델을 생성하여 분류를 시행하였다. 본 논문에서는 비교 대상이 되는 한 쌍의 클래스에 대해서 One-Against-One (OAO) SVM으로 모델을 생성할 때 선택된 두 클래스의 특징벡터에 대해서만 정규화를 시행하는 방식을 제안한다. 기존 정규화 방식을 이용하면 단일 특징벡터로 OSC를 사용할 경우에는 60.8%, 복합 특징벡터를 모두 이용하는 경우에는 77.4%의 인식율을 얻을 수 있었다. 또한, 제안된 정규화 방식을 이용하면 OSC와 복합 특징벡터에 대해서 각각 8.2%와 3.3%의 추가적인 성능 향상을 얻을 수 있었다.

Representative Batch Normalization for Scene Text Recognition

  • Sun, Yajie;Cao, Xiaoling;Sun, Yingying
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제16권7호
    • /
    • pp.2390-2406
    • /
    • 2022
  • Scene text recognition has important application value and attracted the interest of plenty of researchers. At present, many methods have achieved good results, but most of the existing approaches attempt to improve the performance of scene text recognition from the image level. They have a good effect on reading regular scene texts. However, there are still many obstacles to recognizing text on low-quality images such as curved, occlusion, and blur. This exacerbates the difficulty of feature extraction because the image quality is uneven. In addition, the results of model testing are highly dependent on training data, so there is still room for improvement in scene text recognition methods. In this work, we present a natural scene text recognizer to improve the recognition performance from the feature level, which contains feature representation and feature enhancement. In terms of feature representation, we propose an efficient feature extractor combined with Representative Batch Normalization and ResNet. It reduces the dependence of the model on training data and improves the feature representation ability of different instances. In terms of feature enhancement, we use a feature enhancement network to expand the receptive field of feature maps, so that feature maps contain rich feature information. Enhanced feature representation capability helps to improve the recognition performance of the model. We conducted experiments on 7 benchmarks, which shows that this method is highly competitive in recognizing both regular and irregular texts. The method achieved top1 recognition accuracy on four benchmarks of IC03, IC13, IC15, and SVTP.

On-Line Blind Channel Normalization for Noise-Robust Speech Recognition

  • Jung, Ho-Young
    • IEIE Transactions on Smart Processing and Computing
    • /
    • 제1권3호
    • /
    • pp.143-151
    • /
    • 2012
  • A new data-driven method for the design of a blind modulation frequency filter that suppresses the slow-varying noise components is proposed. The proposed method is based on the temporal local decorrelation of the feature vector sequence, and is done on an utterance-by-utterance basis. Although the conventional modulation frequency filtering approaches the same form regardless of the task and environment conditions, the proposed method can provide an adaptive modulation frequency filter that outperforms conventional methods for each utterance. In addition, the method ultimately performs channel normalization in a feature domain with applications to log-spectral parameters. The performance was evaluated by speaker-independent isolated-word recognition experiments under additive noise environments. The proposed method achieved outstanding improvement for speech recognition in environments with significant noise and was also effective in a range of feature representations.

  • PDF

Line feature extraction in a noisy image

  • Lee, Joon-Woong;Oh, Hak-Seo;Kweon, In-So
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 제어로봇시스템학회 1996년도 Proceedings of the Korea Automatic Control Conference, 11th (KACC); Pohang, Korea; 24-26 Oct. 1996
    • /
    • pp.137-140
    • /
    • 1996
  • Finding line segments in an intensity image has been one of the most fundamental issues in computer vision. In complex scenes, it is hard to detect the locations of point features. Line features are more robust in providing greater positional accuracy. In this paper we present a robust "line features extraction" algorithm which extracts line feature in a single pass without using any assumptions and constraints. Our algorithm consists of five steps: (1) edge scanning, (2) edge normalization, (3) line-blob extraction, (4) line-feature computation, and (5) line linking. By using edge scanning, the computational complexity due to too many edge pixels is drastically reduced. Edge normalization improves the local quantization error induced from the gradient space partitioning and minimizes perturbations on edge orientation. We also analyze the effects of edge processing, and the least squares-based method and the principal axis-based method on the computation of line orientation. We show its efficiency with some real images.al images.

  • PDF

Energy Feature Normalization for Robust Speech Recognition in Noisy Environments

  • Lee, Yoon-Jae;Ko, Han-Seok
    • 음성과학
    • /
    • 제13권1호
    • /
    • pp.129-139
    • /
    • 2006
  • In this paper, we propose two effective energy feature normalization methods for robust speech recognition in noisy environments. In the first method, we estimate the noise energy and remove it from the noisy speech energy. In the second method, we propose a modified algorithm for the Log-energy Dynamic Range Normalization (ERN) method. In the ERN method, the log energy of the training data in a clean environment is transformed into the log energy in noisy environments. If the minimum log energy of the test data is outside of a pre-defined range, the log energy of the test data is also transformed. Since the ERN method has several weaknesses, we propose a modified transform scheme designed to reduce the residual mismatch that it produces. In the evaluation conducted on the Aurora2.0 database, we obtained a significant performance improvement.

  • PDF

심층신경망을 이용한 짧은 발화 음성인식에서 극점 필터링 기반의 특징 정규화 적용 (Applying feature normalization based on pole filtering to short-utterance speech recognition using deep neural network)

  • 한재민;김민식;김형순
    • 한국음향학회지
    • /
    • 제39권1호
    • /
    • pp.64-68
    • /
    • 2020
  • 가우스 혼합 모델-은닉 마코프 모델(Gaussian Mixture Model-Hidden Markov Model, GMM-HMM)을 이용하는 전통적인 음성인식 시스템에서는, 극점 필터링 기반의 켑스트럼 특징 정규화 방식이 잡음 환경에서 짧은 발화의 인식 성능을 향상시키는데 효과적이었다. 본 논문에서는 심층신경망(Deep Neural Network, DNN)을 이용하는 최신의 음성인식 시스템에서도 이 방식의 유용성이 있는지 검토한다. AURORA 2 DB에 대한 실험 결과, 특히 훈련 및 테스트 환경 사이의 불일치가 클 때에, 극점 필터링 기반의 켑스트럼 평균 분산 정규화 방식이 극점 필터링을 사용하지 않는 방식에 비해 매우 짧은 발화의 인식 성능을 개선시킴을 보여 준다.

파워 스펙트럼 warping을 이용한 성도 정규화 (Vocal Tract Normalization Using The Power Spectrum Warping)

  • 유일수;김동주;노용완;홍광석
    • 대한전기학회:학술대회논문집
    • /
    • 대한전기학회 2003년도 학술회의 논문집 정보 및 제어부문 A
    • /
    • pp.215-218
    • /
    • 2003
  • The method of vocal tract normalization has been known as a successful method for improving the accuracy of speech recognition. A frequency warping procedure based low complexity and maximum likelihood has been generally applied for vocal tract normalization. In this paper, we propose a new power spectrum warping procedure that can be improve on vocal tract normalization performance than a frequency warping procedure. A mechanism for implementing this method can be simply achieved by modifying the power spectrum of filter bank in Mel-frequency cepstrum feature(MFCC) analysis. Experimental study compared our Proposal method with the well-known frequency warping method. The results have shown that the power spectrum warping is better 50% about the recognition performance than the frequency warping.

  • PDF