Search | Korea Science

Filtering of Filter-Bank Energies for Robust Speech Recognition

Jung, Ho-Young
- ETRI Journal
- /
- v.26 no.3
- /
- pp.273-276
- /
- 2004
We propose a novel feature processing technique which can provide a cepstral liftering effect in the log-spectral domain. Cepstral liftering aims at the equalization of variance of cepstral coefficients for the distance-based speech recognizer, and as a result, provides the robustness for additive noise and speaker variability. However, in the popular hidden Markov model based framework, cepstral liftering has no effect in recognition performance. We derive a filtering method in log-spectral domain corresponding to the cepstral liftering. The proposed method performs a high-pass filtering based on the decorrelation of filter-bank energies. We show that in noisy speech recognition, the proposed method reduces the error rate by 52.7% to conventional feature.
PDF

Modeling feature inference in causal categories (인과적 범주의 속성추론 모델링)

Kim, ShinWoo;Li, Hyung-Chul O.
- Korean Journal of Cognitive Science
- /
- v.28 no.4
- /
- pp.329-347
- /
- 2017
Early research into category-based feature inference reported various phenomena in human thinking including typicality, diversity, similarity effects, etc. Later research discovered that participants' prior knowledge has an extensive influence on these sorts of reasoning. The current research tested the effects of causal knowledge on feature inference and conducted modeling on the results. Participants performed feature inference for categories consisted of four features where the features were connected either in common cause or common effect structure. The results showed typicality effects along with violations of causal Markov condition in common cause structure and causal discounting in common effect structure. To model the results, it was assumed that participants perform feature inference based on the difference between the probabilities of an exemplar with the target feature and an exemplar without the target feature (that is, $p(E_{F(X)}{\mid}Cat)-p(E_{F({\sim}X)}{\mid}Cat)$). Exemplar probabilities were computed based on causal model theory (Rehder, 2003) and applied to inference for target features. The results showed that the model predicts not only typicality effects but also violations of causal Markov condition and causal discounting observed in participants' data.
https://doi.org/10.19066/cogsci.2017.28.4.007 인용 PDF

Statistical Speech Feature Selection for Emotion Recognition

Kwon Oh-Wook;Chan Kwokleung;Lee Te-Won
- The Journal of the Acoustical Society of Korea
- /
- v.24 no.4E
- /
- pp.144-151
- /
- 2005
We evaluate the performance of emotion recognition via speech signals when a plain speaker talks to an entertainment robot. For each frame of a speech utterance, we extract the frame-based features: pitch, energy, formant, band energies, mel frequency cepstral coefficients (MFCCs), and velocity/acceleration of pitch and MFCCs. For discriminative classifiers, a fixed-length utterance-based feature vector is computed from the statistics of the frame-based features. Using a speaker-independent database, we evaluate the performance of two promising classifiers: support vector machine (SVM) and hidden Markov model (HMM). For angry/bored/happy/neutral/sad emotion classification, the SVM and HMM classifiers yield $42.3\%\;and\;40.8\%$ accuracy, respectively. We show that the accuracy is significant compared to the performance by foreign human listeners.
PDF KSCI

Content-based Image Retrieval using an Improved Chain Code and Hidden Markov Model (개선된 chain code와 HMM을 이용한 내용기반 영상검색)

조완현;이승희;박순영;박종현
- Proceedings of the IEEK Conference
- /
- 2000.09a
- /
- pp.375-378
- /
- 2000
In this paper, we propose a novo] content-based image retrieval system using both Hidden Markov Model(HMM) and an improved chain code. The Gaussian Mixture Model(GMM) is applied to statistically model a color information of the image, and Deterministic Annealing EM(DAEM) algorithm is employed to estimate the parameters of GMM. This result is used to segment the given image. We use an improved chain code, which is invariant to rotation, translation and scale, to extract the feature vectors of the shape for each image in the database. These are stored together in the database with each HMM whose parameters (A, B, $\pi$) are estimated by Baum-Welch algorithm. With respect to feature vector obtained in the same way from the query image, a occurring probability of each image is computed by using the forward algorithm of HMM. We use these probabilities for the image retrieval and present the highest similarity images based on these probabilities.
PDF

HMM-based missing feature reconstruction for robust speech recognition in additive noise environments (가산잡음환경에서 강인음성인식을 위한 은닉 마르코프 모델 기반 손실 특징 복원)

Cho, Ji-Won;Park, Hyung-Min
- Phonetics and Speech Sciences
- /
- v.6 no.4
- /
- pp.127-132
- /
- 2014
This paper describes a robust speech recognition technique by reconstructing spectral components mismatched with a training environment. Although the cluster-based reconstruction method can compensate the unreliable components from reliable components in the same spectral vector by assuming an independent, identically distributed Gaussian-mixture process of training spectral vectors, the presented method exploits the temporal dependency of speech to reconstruct the components by introducing a hidden-Markov-model prior which incorporates an internal state transition plausible for an observed spectral vector sequence. The experimental results indicate that the described method can provide temporally consistent reconstruction and further improve recognition performance on average compared to the conventional method.
https://doi.org/10.13064/KSSS.2014.6.4.127 인용 PDF KSCI

HMM-Based Automatic Speech Recognition using EMG Signal

Lee Ki-Seung
- Journal of Biomedical Engineering Research
- /
- v.27 no.3
- /
- pp.101-109
- /
- 2006
It has been known that there is strong relationship between human voices and the movements of the articulatory facial muscles. In this paper, we utilize this knowledge to implement an automatic speech recognition scheme which uses solely surface electromyogram (EMG) signals. The EMG signals were acquired from three articulatory facial muscles. Preliminary, 10 Korean digits were used as recognition variables. The various feature parameters including filter bank outputs, linear predictive coefficients and cepstrum coefficients were evaluated to find the appropriate parameters for EMG-based speech recognition. The sequence of the EMG signals for each word is modelled by a hidden Markov model (HMM) framework. A continuous word recognition approach was investigated in this work. Hence, the model for each word is obtained by concatenating the subword models and the embedded re-estimation techniques were employed in the training stage. The findings indicate that such a system may have a capacity to recognize speech signals with an accuracy of up to 90%, in case when mel-filter bank output was used as the feature parameters for recognition.
https://doi.org/10.9718/JBER.2006.27.3.101 인용 PDF KSCI

A Study on the Criteria to Decide the Number of Aircrafts Considering Operational Characteristics (항공기 운용 특성을 고려한 적정 운용 대수 산정 기준 연구)

Son, Young-Su;Kim, Seong-Woo;Yoon, Bong-Kyoo
- Journal of the Korea Institute of Military Science and Technology
- /
- v.17 no.1
- /
- pp.41-49
- /
- 2014
In this paper, we consider a method to access the number of aircraft requirement which is a strategic variable in national security. This problem becomes more important considering the F-X and KF-X project in ROKAF. Traditionally, ATO(Air Tasking Order) and fighting power index have been used to evaluate the number of aircrafts required in ROKAF. However, those methods considers static aspect of aircraft requirement. This paper deals with a model to accommodate dynamic feature of aircraft requirement using absorbing Markov chain. In conclusion, we suggest a dynamic model to evaluate the number of aircrafts required with key decision variables such as destroying rate, failure rate and repair rate.
https://doi.org/10.9766/KIMST.2014.17.1.041 인용 PDF KSCI

Applying feature normalization based on pole filtering to short-utterance speech recognition using deep neural network (심층신경망을 이용한 짧은 발화 음성인식에서 극점 필터링 기반의 특징 정규화 적용)

Han, Jaemin;Kim, Min Sik;Kim, Hyung Soon
- The Journal of the Acoustical Society of Korea
- /
- v.39 no.1
- /
- pp.64-68
- /
- 2020
In a conventional speech recognition system using Gaussian Mixture Model-Hidden Markov Model (GMM-HMM), the cepstral feature normalization method based on pole filtering was effective in improving the performance of recognition of short utterances in noisy environments. In this paper, the usefulness of this method for the state-of-the-art speech recognition system using Deep Neural Network (DNN) is examined. Experimental results on AURORA 2 DB show that the cepstral mean and variance normalization based on pole filtering improves the recognition performance of very short utterances compared to that without pole filtering, especially when there is a large mismatch between the training and test conditions.
https://doi.org/10.7776/ASK.2020.39.1.064 인용 PDF KSCI

Analyzing performance of time series classification using STFT and time series imaging algorithms

Sung-Kyu Hong;Sang-Chul Kim
- Journal of the Korea Society of Computer and Information
- /
- v.28 no.4
- /
- pp.1-11
- /
- 2023
In this paper, instead of using recurrent neural network, we compare a classification performance of time series imaging algorithms using convolution neural network. There are traditional algorithms that imaging time series data (e.g. GAF(Gramian Angular Field), MTF(Markov Transition Field), RP(Recurrence Plot)) in TSC(Time Series Classification) community. Furthermore, we compare STFT(Short Time Fourier Transform) algorithm that can acquire spectrogram that visualize feature of voice data. We experiment CNN's performance by adjusting hyper parameters of imaging algorithms. When evaluate with GunPoint dataset in UCR archive, STFT(Short-Time Fourier transform) has higher accuracy than other algorithms. GAF has 98~99% accuracy either, but there is a disadvantage that size of image is massive.
https://doi.org/10.9708/jksci.2023.28.04.001 인용 PDF HTML

Video-based fall detection algorithm combining simple threshold method and Hidden Markov Model (단순 임계치와 은닉마르코프 모델을 혼합한 영상 기반 낙상 알고리즘)

Park, Culho;Yu, Yun Seop
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.18 no.9
- /
- pp.2101-2108
- /
- 2014
Automatic fall-detection algorithms using video-data are proposed. Six types of fall-feature parameters are defined applying the optical flows extracted from differential images to principal component analysis(PCA). One fall-detection algorithm is the simple threshold method that a fall is detected when a fall-feature parameter is over a threshold, another is to use the HMM, and the other is to combine the simple threshold and HMM. Comparing the performances of three types of fall-detection algorithm, the algorithm combining the simple threshold and HMM requires less computational resources than HMM and exhibits a higher accuracy than the simple threshold method.
https://doi.org/10.6109/jkiice.2014.18.9.2101 인용 PDF KSCI

Search Result 195, Processing Time 0.025 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)