Search | Korea Science

An Improved Digit Recognition using Normalized mel-cepstrum (정규화된 Mel-cepstrum을 이용한 숫자음 인식성능 향상에 관한 연구)

이기철
- Proceedings of the Acoustical Society of Korea Conference
- /
- 1994.06c
- /
- pp.403-406
- /
- 1994
음성은 화자의 상태 및 주변 환경에 따라 그 특징이 다양하게 변화한다. 본 논문에서는 음성신호의 특징 파라미터로 널리 쓰이고 있는 mel-cepstrum에 대해, 단어내에서의 변화를 정규화함으로써 인식성능을 향상시키고자 하였다. mel-cepstrum이란 단어 전체에 대한 mel-cepstrum의 평균 값으로 normalize 시킨 것이다. 한국어 숫자음에 대한 인식 실험결과, 본 논문에서 제안한 정규화된 mel-cepstrum이 정규화되지 않은 mel-cepstrum에 비해 우수한 인식 성능을 나타내었다. 또한 잡음 환경하에서 비교 실험한 결과에서도 상대적으로 우수한 인식률을 보였다.
PDF

Compromised feature normalization method for deep neural network based speech recognition (심층신경망 기반의 음성인식을 위한 절충된 특징 정규화 방식)

Kim, Min Sik;Kim, Hyung Soon
- Phonetics and Speech Sciences
- /
- v.12 no.3
- /
- pp.65-71
- /
- 2020
Feature normalization is a method to reduce the effect of environmental mismatch between the training and test conditions through the normalization of statistical characteristics of acoustic feature parameters. It demonstrates excellent performance improvement in the traditional Gaussian mixture model-hidden Markov model (GMM-HMM)-based speech recognition system. However, in a deep neural network (DNN)-based speech recognition system, minimizing the effects of environmental mismatch does not necessarily lead to the best performance improvement. In this paper, we attribute the cause of this phenomenon to information loss due to excessive feature normalization. We investigate whether there is a feature normalization method that maximizes the speech recognition performance by properly reducing the impact of environmental mismatch, while preserving useful information for training acoustic models. To this end, we introduce the mean and exponentiated variance normalization (MEVN), which is a compromise between the mean normalization (MN) and the mean and variance normalization (MVN), and compare the performance of DNN-based speech recognition system in noisy and reverberant environments according to the degree of variance normalization. Experimental results reveal that a slight performance improvement is obtained with the MEVN over the MN and the MVN, depending on the degree of variance normalization.
https://doi.org/10.13064/KSSS.2020.12.3.065 인용 PDF KSCI

Moving Object Tracking Method Using Feature Vector (특징 벡터를 이용한 이동 물체 추적)

Kim, Se-Jin;Jeon, Hyung-Suk;Joo, Young-Hoon;Park, Jin-Bae
- Proceedings of the KIEE Conference
- /
- 2009.07a
- /
- pp.1845_1846
- /
- 2009
본 논문에서는 특징 벡터를 이용한 강인한 물체 추적 방법을 제안한다. 먼저, 초기 이동 물체의 움직임 영역을 추출하고, KLT알고리즘을 입력 영상에 적용시켜 특징 벡터들을 추출한다. 초기 추출된 이동 물체의 움직임 영역에 추출된 특징 벡터를 적용시켜 1차 정규화 한다. 그 후, RGB 칼라모델과 HSI 칼라모델을 이용하여 이동 물체에 대한 Blob 영역을 설정하고 설정된 Blob 영역에 대해 1차 특징벡터를 Snake 알고리즘으로 동정하여 2차 정규화 과정을 마무리 한다. 최종 정규화 된 특징 벡터를 Particle filter에 입력 데이터로 이용하여 이동 물체를 추적 한다. 마지막으로, 복잡한 환경에서 실험을 통해 그 응용 가능성을 증명한다.
PDF

Voice Activity Detection in Noisy Environment using Speech Energy Maximization and Silence Feature Normalization (음성 에너지 최대화와 묵음 특징 정규화를 이용한 잡음 환경에 강인한 음성 검출)

Ahn, Chan-Shik;Choi, Ki-Ho
- Journal of Digital Convergence
- /
- v.11 no.6
- /
- pp.169-174
- /
- 2013
Speech recognition, the problem of performance degradation is the difference between the model training and recognition environments. Silence features normalized using the method as a way to reduce the inconsistency of such an environment. Silence features normalized way of existing in the low signal-to-noise ratio. Increase the energy level of the silence interval for voice and non-voice classification accuracy due to the falling. There is a problem in the recognition performance is degraded. This paper proposed a robust speech detection method in noisy environments using a silence feature normalization and voice energy maximize. In the high signal-to-noise ratio for the proposed method was used to maximize the characteristics receive less characterized the effects of noise by the voice energy. Cepstral feature distribution of voice / non-voice characteristics in the low signal-to-noise ratio and improves the recognition performance. Result of the recognition experiment, recognition performance improved compared to the conventional method.
https://doi.org/10.14400/JDPM.2013.11.6.169 인용 PDF

Feature-Vector Normalization for SVM-based Music Genre Classification (SVM에 기반한 음악 장르 분류를 위한 특징벡터 정규화 방법)

Lim, Shin-Cheol;Jang, Sei-Jin;Lee, Seok-Pil;Kim, Moo-Young
- Journal of the Institute of Electronics Engineers of Korea SC
- /
- v.48 no.5
- /
- pp.31-36
- /
- 2011
In this paper, Mel-Frequency Cepstral Coefficient (MFCC), Decorrelated Filter Bank (DFB), Octave-based Spectral Contrast (OSC), Zero-Crossing Rate (ZCR), and Spectral Contract/Roll-Off are combined as a set of multiple feature-vectors for the music genre classification system based on the Support Vector Machine (SVM) classifier. In the conventional system, feature vectors for the entire genre classes are normalized for the SVM model training and classification. However, in this paper, selected feature vectors that are compared based on the One-Against-One (OAO) SVM classifier are only used for normalization. Using OSC as a single feature-vector and the multiple feature-vectors, we obtain the genre classification rates of 60.8% and 77.4%, respectively, with the conventional normalization method. Using the proposed normalization method, we obtain the increased classification rates by 8.2% and 3.3% for OSC and the multiple feature-vectors, respectively.
PDF KSCI

Cepstral Normalization using Non-Linear Transform for Speech Recognition in Additive Noise Environments (부가 잡음 환경에서의 음성인식을 위한 비선형 변환을 이용한 캡스트럼 정규화 기법)

석용호
- Proceedings of the Acoustical Society of Korea Conference
- /
- 1998.06c
- /
- pp.115-118
- /
- 1998
본 연구에서는 입력 음성 특징 파라메터를 선형 및 비선형 변환함으로써 음성 특징의 1 차, 2 차 및 고차 통계치를 정규화하였다. 이러한 정규화 기법을 통해서 부가잡음 환경에서의 음성인식 성능향상을 얻을 수 있었다.
PDF

Face Recognition with Normalized Wavelet Features (정규화된 웨이블렛 특징에 의한 얼굴 인식)

Lee, Chan-Ho;Park, Ju-Chul;Choi, Hyung-Il
- Journal of KIISE:Software and Applications
- /
- v.27 no.10
- /
- pp.1046-1053
- /
- 2000
본 논문에서는 정규화된 웨이블렛 방법에 기반한 얼굴 인식 방법을 제안한다. 추출된 얼굴 영역의 크기를 정규화하고. 배경 영역을 제거하기 위해 이진화된 가우시안 윈도우를 사용하였으며, 또한 조명의 영향을 줄이기 위해 얼굴 영역의 히스토그램을 상세화하고, 왼쪽 부분의 밝기와 오른쪽 부분의 밝기를 평균하였다. 정규화된 얼굴 영역은 극좌표계로 표현하여 사각형의 형태를 가지게 하였다. 특징으로는 가보 웨이블렛 계수를 사용한다. 가보 웨이블렛 변환은 매개 변수와 정규화된 얼굴 영역의 해상도를 바꾸어 가며 여러 차례 적용하였다. 인식에 유용한 계수들을 선택하기 위해 FD 분석을 수행하였다. 선택된 특정들은 FD 값과 함께 인식에 사용되었다. 실험 결과를 보면 제안된 방법이 매우 유망하다는 것을 알 수 있다.
PDF

Loss-adjusted Regularization based on Prediction for Improving Robustness in Less Reliable FAQ Datasets (신뢰성이 부족한 FAQ 데이터셋에서의 강건성 개선을 위한 모델의 예측 강도 기반 손실 조정 정규화)

Park, Yewon;Yang, Dongil;Kim, Soofeel;Lee, Kangwook
- Annual Conference on Human and Language Technology
- /
- 2019.10a
- /
- pp.18-22
- /
- 2019
FAQ 분류는 자주 묻는 질문을 범주화하고 사용자 질의에 대해 가장 유사한 클래스를 추론하는 방식으로 진행된다. FAQ 데이터셋은 클래스가 다수 존재하기 때문에 클래스 간 포함 및 연관 관계가 존재하고 특정 데이터가 서로 다른 클래스에 동시에 속할 수 있다는 특징이 있다. 그러나 최근 FAQ 분류는 다중 클래스 분류 방법론을 적용하는 데 그쳤고 FAQ 데이터셋의 특징을 모델에 반영하는 연구는 미미했다. 현 분류 방법론은 이러한 FAQ 데이터셋의 특징을 고려하지 못하기 때문에 정답으로 해석될 수 있는 예측도 오답으로 여기는 경우가 발생한다. 본 논문에서는 신뢰성이 부족한 FAQ 데이터셋에서도 분류를 잘 하기 위해 손실 함수를 조정하는 정규화 기법을 소개한다. 이 정규화 기법은 클래스 간 포함 및 연관 관계를 반영할 수 있도록 오답을 예측한 경우에도 예측 강도에 비례하여 손실을 줄인다. 이는 오답을 높은 확률로 예측할수록 데이터의 신뢰성이 낮을 가능성이 크다고 판단하여 학습을 강하게 하지 않게 하기 위함이다. 실험을 위해서는 다중 클래스 분류에서 가장 좋은 성능을 보이고 있는 모형인 BERT를 이용했으며, 비교 실험을 위한 정규화 방법으로는 통상적으로 사용되는 라벨 스무딩을 채택했다. 실험 결과, 본 연구에서 제안한 방법은 기존 방법보다 성능이 개선되고 보다 안정적으로 학습이 된다는 것을 확인했으며, 데이터의 신뢰성이 부족한 상황에서 효과적으로 분류를 수행함을 알 수 있었다.
PDF

Cepstral Distance and Log-Energy Based Silence Feature Normalization for Robust Speech Recognition (강인한 음성인식을 위한 켑스트럼 거리와 로그 에너지 기반 묵음 특징 정규화)

Shen, Guang-Hu;Chung, Hyun-Yeol
- The Journal of the Acoustical Society of Korea
- /
- v.29 no.4
- /
- pp.278-285
- /
- 2010
The difference between training and test environments is one of the major performance degradation factors in noisy speech recognition and many silence feature normalization methods were proposed to solve this inconsistency. Conventional silence feature normalization method represents higher classification performance in higher SNR, but it has a problem of performance degradation in low SNR due to the low accuracy of speech/silence classification. On the other hand, cepstral distance represents well the characteristic distribution of speech/silence (or noise) in low SNR. In this paper, we propose a Cepstral distance and Log-energy based Silence Feature Normalization (CLSFN) method which uses both log-energy and cepstral euclidean distance to classify speech/silence for better performance. Because the proposed method reflects both the merit of log energy being less affected with noise in high SNR and the merit of cepstral distance having high discrimination accuracy for speech/silence classification in low SNR, the classification accuracy will be considered to be improved. The experimental results showed that our proposed CLSFN presented the improved recognition performances comparing with the conventional SFN-I/II and CSFN methods in all kinds of noisy environments.
https://doi.org/10.7776/ASK.2010.29.4.278 인용 PDF KSCI

Character Recognition of Vehicle Number Plate Using Feature Based Neural Network (특징 추출에 기반한 신경망 시스템을 이용한 차량 번호판 문자인식)

이현숙;김희승
- Proceedings of the Korean Information Science Society Conference
- /
- 2000.10b
- /
- pp.383-385
- /
- 2000
차량 번호판 문자영상으로부터 여러 가지 특징 추출 방법을 조합하여 입력특징소를 재구성하고, 신경망을 이용하여 문자를 인식한다. 속도 개선을 위해 특별한 전처리 과정없이 이치화와 크기 정규화만을 수행한 후 그물망 방법과 BLT 방법, 정규화된 투영값 특정 방법을 조합하여 입력특징소를 구성한다. 본 연구에서는 숫자 인식에서 그물망 방법과 BLT 방법을 이용하여 잡음으로 인한 유사 문자의 오인식을 해결하였고, 문자 인식에서는 정규화된 투영값 특징을 이용하여 문자의 유형을 분류한 후 자소를 개별적으로 인식하였다. 이로써 모음 인식 경우에 중요한 역할을 하는 작은 획의 영역에 BLT 방법을 사용함으로 기존 연구에서의 모음 오인식 문제를 해결하였다.
PDF

Search Result 357, Processing Time 0.028 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)