Search | Korea Science

Voice Recognition-Based on Adaptive MFCC and Deep Learning for Embedded Systems (임베디드 시스템에서 사용 가능한 적응형 MFCC 와 Deep Learning 기반의 음성인식)

Bae, Hyun Soo;Lee, Ho Jin;Lee, Suk Gyu
- Journal of Institute of Control, Robotics and Systems
- /
- v.22 no.10
- /
- pp.797-802
- /
- 2016
This paper proposes a noble voice recognition method based on an adaptive MFCC and deep learning for embedded systems. To enhance the recognition ratio of the proposed voice recognizer, ambient noise mixed into the voice signal has to be eliminated. However, noise filtering processes, which may damage voice data, diminishes the recognition ratio. In this paper, a filter has been designed for the frequency range within a voice signal, and imposed weights are used to reduce data deterioration. In addition, a deep learning algorithm, which does not require a database in the recognition algorithm, has been adapted for embedded systems, which inherently require small amounts of memory. The experimental results suggest that the proposed deep learning algorithm and HMM voice recognizer, utilizing the proposed adaptive MFCC algorithm, perform better than conventional MFCC algorithms in its recognition ratio within a noisy environment.
https://doi.org/10.5302/J.ICROS.2016.16.0136 인용 PDF KSCI

Implementation of Speech Recognizer using Relevance Vector Machine (RVM을 이용한 음성인식기의 구현)

Kim, Chang-Keun;Koh, Si-Young;Hur, Kang-In;Lee, Kwang-Seok
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.11 no.8
- /
- pp.1596-1603
- /
- 2007
In this paper, we experimented by three kind of method for feature parameter, training method and recognition algorithm of most suitable for speech recognition system and considered. We decided speech recognition system of most suitable through two kind of experiment after we make speech recognizer. First, we did an experiment about three kind of feature parameter to evaluate recognition performance of it in speech recognizer using existent MFCC and MFCC new feature parameter that change characteristic space using PCA and ICA. Second, we experimented recognition performance or HMM, SVM and RVM by studying data number. By an experiment until now, feature parameter by ICA showed performance improvement of average 1.5% than MFCC by high linear discrimination from characteristic space. RVM showed performance improvement of maximum 3.25% than HMM in an experiment by decrease of studying data. As such result, effective method for speech recognition system to propose in this paper derives feature parameters using ICA and un recognition using RVM.
https://doi.org/10.6109/jkiice.2007.11.8.1596 인용 PDF KSCI

The Effect of FIR Filtering and Spectral Tilt on Speech Recognition with MFCC (FIR 필터링과 스펙트럼 기울이기가 MFCC를 사용하는 음성인식에 미치는 효과)

Lee, Chang-Young
- The Journal of the Korea institute of electronic communication sciences
- /
- v.5 no.4
- /
- pp.363-371
- /
- 2010
In an effort to enhance the quality of feature vector classification and thereby reduce the recognition error rate for the speaker-independent speech recognition, we study the effect of spectral tilt on the Fourier magnitude spectrum en route to the extraction of MFCC. The effect of FIR filtering on the speech signal on the speech recognition is also investigated in parallel. Evaluation of the proposed methods are performed by two independent ways of the Fisher discriminant objective function and speech recognition test by hidden Markov model with fuzzy vector quantization. From the experiments, the recognition error rate is found to show about 10% relative improvements over the conventional method by an appropriate choice of the tilt factor.
PDF KSCI

GMM-based Emotion Recognition Using Speech Signal (음성 신호를 사용한 GMM기반의 감정 인식)

서정태;김원구;강면구
- The Journal of the Acoustical Society of Korea
- /
- v.23 no.3
- /
- pp.235-241
- /
- 2004
This paper studied the pattern recognition algorithm and feature parameters for speaker and context independent emotion recognition. In this paper, KNN algorithm was used as the pattern matching technique for comparison, and also VQ and GMM were used for speaker and context independent recognition. The speech parameters used as the feature are pitch. energy, MFCC and their first and second derivatives. Experimental results showed that emotion recognizer using MFCC and its derivatives showed better performance than that using the pitch and energy parameters. For pattern recognition algorithm. GMM-based emotion recognizer was superior to KNN and VQ-based recognizer.
PDF KSCI

A Study on the Signal Processing Techiques for Pattern Classification of Electrical Loads (전기부하 패턴분류를 위한 신호처리 기법에 관한 연구)

Lim, Young Bae;Kim, Dong Woo;Jin, Sangmin;Cho, Seongwon
- Journal of the Korean Institute of Intelligent Systems
- /
- v.26 no.5
- /
- pp.409-415
- /
- 2016
Recently several techniques for disaster prevention based on IoT(Internet of Things) are being developed. In this paper, a new smart pattern classification method for electric loads is proposed. CT(Current Transformer) data are extracted from electric loads, and then the sampled CT data are converted using FFT and MFCC. FFT and FMCC data are used for the input data of neural networks. Experiments were conducted using FFT and MFCC data for 7 kinds of electric loads. Experiments results indicate the superiority of MFCC in comparison to FFT.
https://doi.org/10.5391/JKIIS.2016.26.5.409 인용 PDF KSCI

Performance Improvement of EMG-Pattern Recognition Using MFCC-HMM-GMM (MFCC-HMM-GMM을 이용한 근전도(EMG)신호 패턴인식의 성능 개선)

Choi, Heung-Ho;Kim, Jung-Ho;Kwon, Jang-Woo
- Journal of Biomedical Engineering Research
- /
- v.27 no.5
- /
- pp.237-244
- /
- 2006
This study proposes an approach to the performance improvement of EMG(Electromyogram) pattern recognition. MFCC(Mel-Frequency Cepstral Coefficients)'s approach is molded after the characteristics of the human hearing organ. While it supplies the most typical feature in frequency domain, it should be reorganized to detect the features in EMG signal. And the dynamic aspects of EMG are important for a task, such as a continuous prosthetic control or various time length EMG signal recognition, which have not been successfully mastered by the most approaches. Thus, this paper proposes reorganized MFCC and HMM-GMM, which is adaptable for the dynamic features of the signal. Moreover, it requires an analysis on the most suitable system setting fur EMG pattern recognition. To meet the requirement, this study balanced the recognition-rate against the error-rates produced by the various settings when loaming based on the EMG data for each motion.
https://doi.org/10.9718/JBER.2006.27.5.237 인용 PDF KSCI

Discrimination Between Natural and Artificial Seismic Sounds by Using 20 MSVQ Algorithm (20 MSVQ 알고리즘을 이용한 자연 및 인공 지진음 식별)

Yoon, Sang-Hoon;Song, Young-Hwan;Bae, Myung-Jin
- The Journal of the Acoustical Society of Korea
- /
- v.28 no.3
- /
- pp.251-259
- /
- 2009
This paper proposes an identification technique to discriminate natural and artificial seismic sounds by using the 20 MSVQ algorithm with the data measured by using a hydrophone. Spectrum band energy and MFCC were used as representative parameters for sake of discriminating natural and artificial seismic sounds, and the orders of characterized parameters were determined through experiments. As a result of using 20 MSVQ algorithm with the 2 characterized parameters, MFCC had 99.9% and the spectrum energy parameter had 83.9% percent of success. It was verified that it is extremely accurate when seismic sounds were discriminated by using the method suggested by this paper.
https://doi.org/10.7776/ASK.2009.28.3.251 인용 PDF KSCI

Improving Speech/Music Discrimination Parameter Using Time-Averaged MFCC (MFCC의 단구간 시간 평균을 이용한 음성/음악 판별 파라미터 성능 향상)

Choi, Mu-Yeol;Kim, Hyung-Soon
- MALSORI
- /
- no.64
- /
- pp.155-169
- /
- 2007
Discrimination between speech and music is important in many multimedia applications. In our previous work, focusing on the spectral change characteristics of speech and music, we presented a method using the mean of minimum cepstral distances (MMCD), and it showed a very high discrimination performance. In this paper, to further improve the performance, we propose to employ time-averaged MFCC in computing the MMCD. Our experimental results show that the proposed method enhances the discrimination between speech and music. Moreover, the proposed method overcomes the weakness of the conventional MMCD method whose performance is relatively sensitive to the choice of the frame interval to compute the MMCD.
PDF

Digital Isolated Word Recognition System based on MFCC and DTW Algorithm (MFCC와 DTW에 알고리즘을 기반으로 한 디지털 고립단어 인식 시스템)

Zang, Xian;Chong, Kil-To
- Proceedings of the KIEE Conference
- /
- 2008.10b
- /
- pp.290-291
- /
- 2008
The most popular speech feature used in speech recognition today is the Mel-Frequency Cepstral Coefficients (MFCC) algorithm, which could reflect the perception characteristics of the human ear more accurately than other parameters. This paper adopts MFCC and its first order difference, which could reflect the dynamic character of speech signal, as synthetical parametric representation. Furthermore, we quote Dynamic Time Warping (DTW) algorithm to search match paths in the pattern recognition process. We use the software "GoldWave" to record English digitals in the lab environments and the simulation results indicate the algorithm has higher recognition accuracy than others using LPCC, etc. as character parameters in the experiment for Digital Isolated Word Recognition (DIWR) system.
PDF

Improvements on MFCC by Elaboration of the Filter Banks and Windows

Lee, Chang-Young
- Speech Sciences
- /
- v.14 no.4
- /
- pp.131-144
- /
- 2007
In an effort to improve the performance of mel frequency cepstral coefficients (MFCC), we investigate the effects of varying the parameters for the filter banks and their associated windows on speech recognition rates. Specifically, the mel and bark scales are combined with various types of filter bank windows. Comparison and evaluation of the suggested methods are performed by two independent ways of speech recognition and the Fisher discriminant objective function. It is shown that the Hanning window based on the bark scale yields 28.1% relative performance improvements over the triangular window with the mel scale in speech recognition error rate. Further work on incorporating PCA and/or LDA would be desirable as a postprocessor to MFCC extraction.
PDF

Search Result 271, Processing Time 0.036 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)