• Title/Summary/Keyword: Mfcc

Search Result 271, Processing Time 0.033 seconds

Frame Based Classification of Underwater Transient Signal Using MFCC Feature Vector and Neural Network (MFCC 특징벡터와 신경회로망을 이용한 프레임 기반의 수중 천이신호 식별)

  • Lim, Tae-Gyun;Kim, Il-Hwan;Kim, Tae-Hwan;Bae, Keun-Sung
    • Proceedings of the IEEK Conference
    • /
    • 2008.06a
    • /
    • pp.883-884
    • /
    • 2008
  • This paper presents a method for classification of underwater transient signals using, which employs a binary image pattern of the mel-frequency cepstral coefficients(MFCC) as a feature vector and a neural network as a classifier. A feature vector is obtained by taking DCT and 1-bit quantization for the square matrix of the MFCC sequences. The classifier is a feed-forward neural network having one hidden layer and one output layer, and a back propagation algorithm is used to update the weighting vector of each layer. Experimental results with some underwater transient signals demonstrate that the proposed method is very promising for classification of underwater transient signals.

  • PDF

Feature Extraction and Classification of Underwater Transient Signal using MFCC and Wavelet Packet Based on Entropy (MFCC과 엔트로피 기반의 웨이블릿 패킷 기법을 이용한 수중 천이신호의 특징추출 및 식별)

  • Jung, Jae-Gun;Park, Jeong-Hyun;Kim, Dong-Wook;Hwang, Chan-Sik
    • Proceedings of the KAIS Fall Conference
    • /
    • 2009.05a
    • /
    • pp.781-784
    • /
    • 2009
  • 본 논문에서는 실제 수중 환경에서 선박 또는 잠수함으로부터 발생하는 인위적인 천이신호와 돌고래, 새우 등의 해양 생물로부터 발생하는 천이신호들을 식별하기 위한 특징벡터 추출 기법을 제안하였다. MFCC와 엔트로피 기반의 웨이블릿 패킷 기법을 이용하여 특징을 추출하고, 이 두 특징들을 동시에 적용하여 수중 천이신호를 식별하고자 한다. 기존의 방법인 MFCC와 웨이블릿 패킷 기법과 이 두 방법을 동시에 적용했을 때의 식별률을 비교하였고, 전방향 신경회로망(feed-forward neural network)을 그 특징벡터의 성능을 평가하기 위한 식별기로 사용하였다.

  • PDF

EMG Pattern Recognition based on MFCC-HMM-GMM for Prosthetic Arm Control (의수 제어를 위한 MFCC-HMM-GMM 기반의 근전도(EMG) 신호 패턴 인식)

  • Kim, Jung-Ho;Hong, Joon-Eui;Lee, Dong-Hoon;Choi, Heung-Ho;Kwon, Jang-Woo
    • Proceedings of the IEEK Conference
    • /
    • 2006.06a
    • /
    • pp.245-246
    • /
    • 2006
  • In this paper, we proposed using MFCC coefficients(Mel-Scaled Cepstral Coefficients) and a simple but efficient classifying method. Many other features: IAV, zero crossing, LPCC, $\ldot$ and their derivatives are also tested and compared with MFCC coefficients in order to find the best combination. GMM and HMM (Discrete and Continuous Hidden Markov Model), are studied as well in the hope that the use of continuous distribution and the temporal evolution of this set of features will improve the quality of emotion recognition.

  • PDF

Context Recognition Using Environmental Sound for Client Monitoring System (피보호자 모니터링 시스템을 위한 환경음 기반 상황 인식)

  • Ji, Seung-Eun;Jo, Jun-Yeong;Lee, Chung-Keun;Oh, Siwon;Kim, Wooil
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.19 no.2
    • /
    • pp.343-350
    • /
    • 2015
  • This paper presents a context recognition method using environmental sound signals, which is applied to a mobile-based client monitoring system. Seven acoustic contexts are defined and the corresponding environmental sound signals are obtained for the experiments. To evaluate the performance of the context recognition, MFCC and LPCC method are employed as feature extraction, and statistical pattern recognition method are used employing GMM and HMM as acoustic models, The experimental results show that LPCC and HMM are more effective at improving context recognition accuracy compared to MFCC and GMM respectively. The recognition system using LPCC and HMM obtains 96.03% in recognition accuracy. These results demonstrate that LPCC is effective to represent environmental sounds which contain more various frequency components compared to human speech. They also prove that HMM is more effective to model the time-varying environmental sounds compared to GMM.

Dialect classification based on the speed and the pause of speech utterances (발화 속도와 휴지 구간 길이를 사용한 방언 분류)

  • Jonghwan Na;Bowon Lee
    • Phonetics and Speech Sciences
    • /
    • v.15 no.2
    • /
    • pp.43-51
    • /
    • 2023
  • In this paper, we propose an approach for dialect classification based on the speed and pause of speech utterances as well as the age and gender of the speakers. Dialect classification is one of the important techniques for speech analysis. For example, an accurate dialect classification model can potentially improve the performance of speaker or speech recognition. According to previous studies, research based on deep learning using Mel-Frequency Cepstral Coefficients (MFCC) features has been the dominant approach. We focus on the acoustic differences between regions and conduct dialect classification based on the extracted features derived from the differences. In this paper, we propose an approach of extracting underexplored additional features, namely the speed and the pauses of speech utterances along with the metadata including the age and the gender of the speakers. Experimental results show that our proposed approach results in higher accuracy, especially with the speech rate feature, compared to the method only using the MFCC features. The accuracy improved from 91.02% to 97.02% compared to the previous method that only used MFCC features, by incorporating all the proposed features in this paper.

Voice-Based Gender Identification Employing Support Vector Machines (음성신호 기반의 성별인식을 위한 Support Vector Machines의 적용)

  • Lee, Kye-Hwan;Kang, Sang-Ick;Kim, Deok-Hwan;Chang, Joon-Hyuk
    • The Journal of the Acoustical Society of Korea
    • /
    • v.26 no.2
    • /
    • pp.75-79
    • /
    • 2007
  • We propose an effective voice-based gender identification method using a support vector machine(SVM). The SVM is a binary classification algorithm that classifies two groups by finding the voluntary nonlinear boundary in a feature space and is known to yield high classification performance. In the present work, we compare the identification performance of the SVM with that of a Gaussian mixture model(GMM) using the mel frequency cepstral coefficients(MFCC). A novel means of incorporating a features fusion scheme based on a combination of the MFCC and pitch is proposed with the aim of improving the performance of gender identification using the SVM. Experiment results indicate that the gender identification performance using the SVM is significantly better than that of the GMM. Moreover, the performance is substantially improved when the proposed features fusion technique is applied.

Speech/Music Signal Classification Based on Spectrum Flux and MFCC For Audio Coder (오디오 부호화기를 위한 스펙트럼 변화 및 MFCC 기반 음성/음악 신호 분류)

  • Sangkil Lee;In-Sung Lee
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.16 no.5
    • /
    • pp.239-246
    • /
    • 2023
  • In this paper, we propose an open-loop algorithm to classify speech and music signals using the spectral flux parameters and Mel Frequency Cepstral Coefficients(MFCC) parameters for the audio coder. To increase responsiveness, the MFCC was used as a short-term feature parameter and spectral fluxes were used as a long-term feature parameters to improve accuracy. The overall voice/music signal classification decision is made by combining the short-term classification method and the long-term classification method. The Gaussian Mixed Model (GMM) was used for pattern recognition and the optimal GMM parameters were extracted using the Expectation Maximization (EM) algorithm. The proposed long-term and short-term combined speech/music signal classification method showed an average classification error rate of 1.5% on various audio sound sources, and improved the classification error rate by 0.9% compared to the short-term single classification method and 0.6% compared to the long-term single classification method. The proposed speech/music signal classification method was able to improve the classification error rate performance by 9.1% in percussion music signals with attacks and 5.8% in voice signals compared to the Unified Speech Audio Coding (USAC) audio classification method.

A New Feature for Speech Segments Extraction with Hidden Markov Models (숨은마코프모형을 이용하는 음성구간 추출을 위한 특징벡터)

  • Hong, Jeong-Woo;Oh, Chang-Hyuck
    • Communications for Statistical Applications and Methods
    • /
    • v.15 no.2
    • /
    • pp.293-302
    • /
    • 2008
  • In this paper we propose a new feature, average power, for speech segments extraction with hidden Markov models, which is based on mel frequencies of speech signals. The average power is compared with the mel frequency cepstral coefficients, MFCC, and the power coefficient. To compare performances of three types of features, speech data are collected for words with explosives which are generally known hard to be detected. Experiments show that the average power is more accurate and efficient than MFCC and the power coefficient for speech segments extraction in environments with various levels of noise.

Speech Recognition Error Compensation using MFCC and LPC Feature Extraction Method (MFCC와 LPC 특징 추출 방법을 이용한 음성 인식 오류 보정)

  • Oh, Sang-Yeob
    • Journal of Digital Convergence
    • /
    • v.11 no.6
    • /
    • pp.137-142
    • /
    • 2013
  • Speech recognition system is input of inaccurate vocabulary by feature extraction case of recognition by appear result of unrecognized or similar phoneme recognized. Therefore, in this paper, we propose a speech recognition error correction method using phoneme similarity rate and reliability measures based on the characteristics of the phonemes. Phonemes similarity rate was phoneme of learning model obtained used MFCC and LPC feature extraction method, measured with reliability rate. Minimize the error to be unrecognized by measuring the rate of similar phonemes and reliability. Turned out to error speech in the process of speech recognition was error compensation performed. In this paper, the result of applying the proposed system showed a recognition rate of 98.3%, error compensation rate 95.5% in the speech recognition.

FPGA-Based Hardware Accelerator for Feature Extraction in Automatic Speech Recognition

  • Choo, Chang;Chang, Young-Uk;Moon, Il-Young
    • Journal of information and communication convergence engineering
    • /
    • v.13 no.3
    • /
    • pp.145-151
    • /
    • 2015
  • We describe in this paper a hardware-based improvement scheme of a real-time automatic speech recognition (ASR) system with respect to speed by designing a parallel feature extraction algorithm on a Field-Programmable Gate Array (FPGA). A computationally intensive block in the algorithm is identified implemented in hardware logic on the FPGA. One such block is mel-frequency cepstrum coefficient (MFCC) algorithm used for feature extraction process. We demonstrate that the FPGA platform may perform efficient feature extraction computation in the speech recognition system as compared to the generalpurpose CPU including the ARM processor. The Xilinx Zynq-7000 System on Chip (SoC) platform is used for the MFCC implementation. From this implementation described in this paper, we confirmed that the FPGA platform is approximately 500× faster than a sequential CPU implementation and 60× faster than a sequential ARM implementation. We thus verified that a parallelized and optimized MFCC architecture on the FPGA platform may significantly improve the execution time of an ASR system, compared to the CPU and ARM platforms.