Search | Korea Science

Parameter Considering Variance Property for Speech Recognition in Noisy Environment (잡음환경에서의 음성인식을 위한 변이특성을 고려한 파라메터)

Park, Jin-Young;Lee, Kwang-Seok;Koh, Si-Young;Hur, Kang-In
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- v.9 no.2
- /
- pp.469-472
- /
- 2005
This paper propose about effective speech feature parameter that have robust character in effect of noise in realizing speech recognition system. Established MFCC that is the basic parameter used to ASR(Automatic Speech Recognition) and DCTCs that use DCT in basic parameter. Also, proposed delta-Cepstrum and delta-delta-Cepstrum parameter that reconstruct Cepstrum to have information for variation of speech. And compared recognition performance in using HMM. For dimension reduction of each parameter LDA algorithm apply and compared recognition. Results are presented reduced dimension delta-delta-Cepstrum parameter in using LDA recognition performance that improve more than existent parameter in noise environment of various condition.
PDF

New Data Extraction Method using the Difference in Speaker Recognition (화자인식에서 차분을 이용한 새로운 데이터 추출 방법)

Seo, Chang-Woo;Ko, Hee-Ae;Lim, Yong-Hwan;Choi, Min-Jung;Lee, Youn-Jeong
- Speech Sciences
- /
- v.15 no.3
- /
- pp.7-15
- /
- 2008
This paper proposes the method to extract new feature vectors using the difference between the cepstrum for static characteristics and delta cepstrum for dynamic characteristics in speaker recognition (SR). The difference vector (DV) which it proposes from this paper is containing the static and the dynamic characteristics simultaneously at the intermediate characteristic vector which uses the deference between the static and the dynamic characteristics and as the characteristic vector which is new there is a possibility of doing. Compared to the conventional method, the proposed method can achieve new feature vector without increasing of new parameter, but only need the calculation process for the difference between the cepstrum and delta cepstrum. Experimental results show that the proposed method has a good performance more than 2.03%, on average, compared with conventional method in speaker identification (SI).
PDF

EFFICIENCY OF SPEECH FEATURES (음성 특징의 효율성)

황규웅
- Proceedings of the Acoustical Society of Korea Conference
- /
- 1995.06a
- /
- pp.225-227
- /
- 1995
This paper compared waveform, cepstrum, and spline wavelet features with nonlinear discriminant analysis. This measure shows efficiency of speech parametrization better than old linear separability criteria and can be used to measure the efficiency of each layer of certain system. Spline wavelet transform has larger gap among classes and cepstrum is clustered better than the spline wavelet feature. Both features do not have good property for classification and we will compare Gabor wavelet transform, Mel cepstrum, delta cepstrum, etc.
PDF

A Study on the Spoken KOrean-Digit Recognition Using the Neural Netwok (神經網을 利用한 韓國語數字音認識에 관한 硏究)

Park, Hyun-Hwa;Gahang, Hae Dong;Bae, Keun Sung
- The Journal of the Acoustical Society of Korea
- /
- v.11 no.3
- /
- pp.5-13
- /
- 1992
Taking devantage of the property that Korean digit is a mono-syllable word, we proposed a spoken Korean-digit recognition scheme using the multi-layer perceptron. The spoken Korean-digit is divided into three segments (initial sound, medial vowel, and final consonant) based on the voice starting / ending points and a peak point in the middle of vowel sound. The feature vectors such as cepstrum, reflection coefficients, ${\Delta}$cepstrum and ${\Delta}$energy are extracted from each segment. It has been shown that cepstrum, as an input vector to the neural network, gives higher recognition rate than reflection coefficients. Regression coefficients of cepstrum did not affect as much as we expected on the recognition rate. That is because, it is believed, we extracted features from the selected stationary segments of the input speech signal. With 150 ceptral coefficients obtained from each spoken digit, we achieved correct recognition rate of 97.8%.
PDF

Speech Emotion Recognition using Feature Selection and Fusion Method (특징 선택과 융합 방법을 이용한 음성 감정 인식)

Kim, Weon-Goo
- The Transactions of The Korean Institute of Electrical Engineers
- /
- v.66 no.8
- /
- pp.1265-1271
- /
- 2017
In this paper, the speech parameter fusion method is studied to improve the performance of the conventional emotion recognition system. For this purpose, the combination of the parameters that show the best performance by combining the cepstrum parameters and the various pitch parameters used in the conventional emotion recognition system are selected. Various pitch parameters were generated using numerical and statistical methods using pitch of speech. Performance evaluation was performed on the emotion recognition system using Gaussian mixture model(GMM) to select the pitch parameters that showed the best performance in combination with cepstrum parameters. As a parameter selection method, sequential feature selection method was used. In the experiment to distinguish the four emotions of normal, joy, sadness and angry, fifteen of the total 56 pitch parameters were selected and showed the best recognition performance when fused with cepstrum and delta cepstrum coefficients. This is a 48.9% reduction in the error of emotion recognition system using only pitch parameters.
https://doi.org/10.5370/KIEE.2017.66.8.1265 인용 PDF KSCI

A Comparative Study of Speech Parameters for Speech Recognition Neural Network (음성 인식 신경망을 위한 음성 파라키터들의 성능 비교)

Kim, Ki-Seok;Im, Eun-Jin;Hwang, Hee-Yung
- The Journal of the Acoustical Society of Korea
- /
- v.11 no.3
- /
- pp.61-66
- /
- 1992
There have been many researches that uses neural network models for automatic speech recognition, but the main trend was finding the neural network models and learning rules appropriate to automatic speech recognition. However, the choice of the input speech parameter for the neural network as well as neural network model itself is a very important factor for the improvement of performance of the automatic speech recognition system using neural network. In this paper we select 6 speech parameters from surveys of the speech recognition papers which uses neural networks, and analyze the performance for the same data and the same neural network model. We use 8 sets of 9 Korean plosives and 18 sets of 8 Korean vowels. We use recurrent neural network and compare the performance of the 6 speech parameters while the number of nodes is constant. The delta cepstrum of linear predictive coefficients showed best result and the recognition rates are 95.1% for the vowels and 100.0% for plosives.
PDF

Effective Combination of Temporal Information and Linear Transformation of Feature Vector in Speaker Verification (화자확인에서 특징벡터의 순시 정보와 선형 변환의 효과적인 적용)

Seo, Chang-Woo;Zhao, Mei-Hua;Lim, Young-Hwan;Jeon, Sung-Chae
- Phonetics and Speech Sciences
- /
- v.1 no.4
- /
- pp.127-132
- /
- 2009
The feature vectors which are used in conventional speaker recognition (SR) systems may have many correlations between their neighbors. To improve the performance of the SR, many researchers adopted linear transformation method like principal component analysis (PCA). In general, the linear transformation of the feature vectors is based on concatenated form of the static features and their dynamic features. However, the linear transformation which based on both the static features and their dynamic features is more complex than that based on the static features alone due to the high order of the features. To overcome these problems, we propose an efficient method that applies linear transformation and temporal information of the features to reduce complexity and improve the performance in speaker verification (SV). The proposed method first performs a linear transformation by PCA coefficients. The delta parameters for temporal information are then obtained from the transformed features. The proposed method only requires 1/4 in the size of the covariance matrix compared with adding the static and their dynamic features for PCA coefficients. Also, the delta parameters are extracted from the linearly transformed features after the reduction of dimension in the static features. Compared with the PCA and conventional methods in terms of equal error rate (EER) in SV, the proposed method shows better performance while requiring less storage space and complexity.
PDF

Search Result 7, Processing Time 0.018 seconds

Parameter Considering Variance Property for Speech Recognition in Noisy Environment (잡음환경에서의 음성인식을 위한 변이특성을 고려한 파라메터)

New Data Extraction Method using the Difference in Speaker Recognition (화자인식에서 차분을 이용한 새로운 데이터 추출 방법)

EFFICIENCY OF SPEECH FEATURES (음성 특징의 효율성)

A Study on the Spoken KOrean-Digit Recognition Using the Neural Netwok (神經網을 利用한 韓國語 數字音 認識에 관한 硏究)

Speech Emotion Recognition using Feature Selection and Fusion Method (특징 선택과 융합 방법을 이용한 음성 감정 인식)

A Comparative Study of Speech Parameters for Speech Recognition Neural Network (음성 인식 신경망을 위한 음성 파라키터들의 성능 비교)

Effective Combination of Temporal Information and Linear Transformation of Feature Vector in Speaker Verification (화자확인에서 특징벡터의 순시 정보와 선형 변환의 효과적인 적용)

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)

A Study on the Spoken KOrean-Digit Recognition Using the Neural Netwok (神經網을 利用한 韓國語數字音認識에 관한 硏究)