Search | Korea Science

PCA-based Variational Model Composition Method for Roust Speech Recognition with Time-Varying Background Noise (시변 잡음에 강인한 음성 인식을 위한 PCA 기반의 Variational 모델 생성 기법)

Kim, Wooil
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.17 no.12
- /
- pp.2793-2799
- /
- 2013
This paper proposes an effective feature compensation method to improve speech recognition performance in time-varying background noise condition. The proposed method employs principal component analysis to improve the variational model composition method. The proposed method is employed to generate multiple environmental models for the PCGMM-based feature compensation scheme. Experimental results prove that the proposed scheme is more effective at improving speech recognition accuracy in various SNR conditions of background music, compared to the conventional front-end methods. It shows 12.14% of average relative improvement in WER compared to the previous variational model composition method.
https://doi.org/10.6109/jkiice.2013.17.12.2793 인용 PDF KSCI

Recognition experiment of Korean connected digit telephone speech using the temporal filter based on training speech data (훈련데이터 기반의 temporal filter를 적용한 한국어 4연숫자 전화음성의 인식실험)

Jung Sung Yun;Kim Min Sung;Son Jong Mok;Bae Keun Sung;Kang Jeom Ja
- Proceedings of the KSPS conference
- /
- 2003.10a
- /
- pp.149-152
- /
- 2003
In this paper, data-driven temporal filter methods[1] are investigated for robust feature extraction. A principal component analysis technique is applied to the time trajectories of feature sequences of training speech data to get appropriate temporal filters. We did recognition experiments on the Korean connected digit telephone speech database released by SITEC, with data-driven temporal filters. Experimental results are discussed with our findings.
PDF

Speaker Identification Using an Ensemble of Feature Enhancement Methods (특징 강화 방법의 앙상블을 이용한 화자 식별)

Yang, IL-Ho;Kim, Min-Seok;So, Byung-Min;Kim, Myung-Jae;Yu, Ha-Jin
- Phonetics and Speech Sciences
- /
- v.3 no.2
- /
- pp.71-78
- /
- 2011
In this paper, we propose an approach which constructs classifier ensembles of various channel compensation and feature enhancement methods. CMN and CMVN are used as channel compensation methods. PCA, kernel PCA, greedy kernel PCA, and kernel multimodal discriminant analysis are used as feature enhancement methods. The proposed ensemble system is constructed with the combination of 15 classifiers which include three channel compensation methods (including 'without compensation') and five feature enhancement methods (including 'without enhancement'). Experimental results show that the proposed ensemble system gives highest average speaker identification rate in various environments (channels, noises, and sessions).
PDF

An acoustical analysis of speech of different speaking rates and genders using intonation curve stylization of English (영어의 억양 유형화를 이용한 발화 속도와 남녀 화자에 따른 음향 분석)

Yi, So Pae
- Phonetics and Speech Sciences
- /
- v.6 no.4
- /
- pp.79-90
- /
- 2014
An intonation curve stylization was used for an acoustical analysis of English speech. For the analysis, acoustical feature values were extracted from 1,848 utterances produced with normal and fast speech rate by 28 (12 women and 16 men) native speakers of English. Men are found to speak faster than women at normal speech rate but no difference is found between genders at fast speech rate. Analysis of pitch point features has it that fast speech has greater Pt (pitch point movement time), Pr (pitch point pitch range), and Pd (pitch point distance) but smaller Ps (pitch point slope) than normal speech. Men show greater Pt, Pr, and Pd than women. Analysis of sentence level features reveals that fast speech has smaller Sr (sentence level pitch range), Sd (sentence duration), and Max (maximum pitch) but greater Ss (sentence slope) than normal speech. Women show greater Sr, Ss, Sp (pitch difference between the first pitch point and the last), Sd, MaxNr (normalized Max), and MinNr (normalized Min) than men. As speech rate increases, women speak with greater Ss and Sr than men.
https://doi.org/10.13064/KSSS.2014.6.4.079 인용 PDF KSCI

Features for Figure Speech Recognition in Noise Environment (잡음환경에서의 숫자음 인식을 위한 특징파라메타)

Lee, Jae-Ki;Koh, Si-Young;Lee, Kwang-Suk;Hur, Kang-In
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- v.9 no.2
- /
- pp.473-476
- /
- 2005
This paper is proposed a robust various feature parameters in noise. Feature parameter MFCC(Mel Frequency Cepstral Coefficient) used in conventional speech recognition shows good performance. But, parameter transformed feature space that uses PCA(Principal Component Analysis)and ICA(Independent Component Analysis) that is algorithm transformed parameter MFCC's feature space that use in old for more robust performance in noise is compared with the conventional parameter MFCC's performance. The result shows more superior performance than parameter and MFCC that feature parameter transformed by the result ICA is transformed by PCA.
PDF

3D Lip Feature Analysis for Visual Speech Recognition (시각적 음성 인식을 위한 입술의 3차원 특징 분석)

Koh, H.S.;Youn, I.;Lee, Y.J.;Hwang, D.;Choi, K.
- Proceedings of the Korean Society of Precision Engineering Conference
- /
- 2013.05a
- /
- pp.1443-1444
- /
- 2013
PDF

Performance Comparison of Deep Feature Based Speaker Verification Systems (깊은 신경망 특징 기반 화자 검증 시스템의 성능 비교)

Kim, Dae Hyun;Seong, Woo Kyeong;Kim, Hong Kook
- Phonetics and Speech Sciences
- /
- v.7 no.4
- /
- pp.9-16
- /
- 2015
In this paper, several experiments are performed according to deep neural network (DNN) based features for the performance comparison of speaker verification (SV) systems. To this end, input features for a DNN, such as mel-frequency cepstral coefficient (MFCC), linear-frequency cepstral coefficient (LFCC), and perceptual linear prediction (PLP), are first compared in a view of the SV performance. After that, the effect of a DNN training method and a structure of hidden layers of DNNs on the SV performance is investigated depending on the type of features. The performance of an SV system is then evaluated on the basis of I-vector or probabilistic linear discriminant analysis (PLDA) scoring method. It is shown from SV experiments that a tandem feature of DNN bottleneck feature and MFCC feature gives the best performance when DNNs are configured using a rectangular type of hidden layers and trained with a supervised training method.
https://doi.org/10.13064/KSSS.2015.7.4.009 인용 PDF KSCI

Correlation analysis of antipsychotic dose and speech characteristics according to extrapyramidal symptoms (추체외로 증상에 따른 항정신병 약물 복용량과 음성 특성의 상관관계 분석)

Lee, Subin;Kim, Seoyoung;Kim, Hye Yoon;Kim, Euitae;Yu, Kyung-Sang;Lee, Ho-Young;Lee, Kyogu
- The Journal of the Acoustical Society of Korea
- /
- v.41 no.3
- /
- pp.367-374
- /
- 2022
In this paper, correlation analysis between speech characteristics and the dose of antipsychotic drugs was performed. To investigate the pattern of speech characteristics of ExtraPyramidal Symptoms (EPS) related to voice change, a common side effect of antipsychotic drugs, a Korean-based extrapyramidal symptom speech corpus was constructed through the sentence development. Through this, speech patterns of EPS and non-EPS groups were investigated, and in particular, a strong speech feature correlation was shown in the EPS group. In addition, it was confirmed that the type of speech sentence affects the speech feature pattern, and these results suggest the possibility of early detection of antipsychotics-induced EPS based on the speech features.
https://doi.org/10.7776/ASK.2022.41.3.367 인용 PDF KSCI

Classification of pathological and normal voice based on dimension reduction of feature vectors (피처벡터 축소방법에 기반한 장애음성 분류)

Lee, Ji-Yeoun;Jeong, Sang-Bae;Choi, Hong-Shik;Hahn, Min-Soo
- Proceedings of the KSPS conference
- /
- 2007.05a
- /
- pp.123-126
- /
- 2007
This paper suggests a method to improve the performance of the pathological/normal voice classification. The effectiveness of the mel frequency-based filter bank energies using the fisher discriminant ratio (FDR) is analyzed. And mel frequency cepstrum coefficients (MFCCs) and the feature vectors through the linear discriminant analysis (LDA) transformation of the filter bank energies (FBE) are implemented. This paper shows that the FBE LDA-based GMM is more distinct method for the pathological/normal voice classification than the MFCC-based GMM.
PDF

Speech/Music Discrimination Using Spectral Peak Track Analysis (스펙트럴 피크 트랙 분석을 이용한 음성/음악 분류)

Keum, Ji-Soo;Lee, Hyon-Soo
- Proceedings of the IEEK Conference
- /
- 2006.06a
- /
- pp.243-244
- /
- 2006
In this study, we propose a speech/music discrimination method using spectral peak track analysis. The proposed method uses the spectral peak track's duration at the same frequency channel for feature parameter. And use the duration threshold to discriminate the speech/music. Experiment result, correct discrimination ratio varies according to threshold, but achieved a performance comparable to another method and has a computational efficient for discrimination.
PDF

Search Result 178, Processing Time 0.032 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)