• Title/Summary/Keyword: MFCCs

Search Result 27, Processing Time 0.017 seconds

An Efficient Voice Activity Detection Method using Bi-Level HMM (Bi-Level HMM을 이용한 효율적인 음성구간 검출 방법)

  • Jang, Guang-Woo;Jeong, Mun-Ho
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.10 no.8
    • /
    • pp.901-906
    • /
    • 2015
  • We presented a method for Vad(Voice Activity Detection) using Bi-level HMM. Conventional methods need to do an additional post processing or set rule-based delayed frames. To cope with the problem, we applied to VAD a Bi-level HMM that has an inserted state layer into a typical HMM. And we used posterior ratio of voice states to detect voice period. Considering MFCCs(: Mel-Frequency Cepstral Coefficients) as observation vectors, we performed some experiments with voice data of different SNRs and achieved satisfactory results compared with well-known methods.

A Study on Serendipity-Oriented Music Recommendation Based on Play Information (재생 정보 기반 우연성 지향적 음악 추천에 관한 연구)

  • Ha, Taehyun;Lee, Sangwon
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.41 no.2
    • /
    • pp.128-136
    • /
    • 2015
  • With the recent interests with culture technologies, many studies for recommendation systems have been done. In this vein, various music recommendation systems have been developed. However, they have often focused on the technical aspects such as feature extraction and similarity comparison, and have not sufficiently addressed them in user-centered perspectives. For users' high satisfaction with recommended music items, it is necessary to study how the items are connected to the users' actual desires. For this, our study proposes a novel music recommendation method based on serendipity, which means the freshness users feel for their familiar items. The serendipity is measured through the comparison of users' past and recent listening tendencies. We utilize neural networks to apply these tendencies to the recommendation process and to extract the features of music items as MFCCs (Mel-frequency cepstral coefficients). In that the recommendation method is developed based on the characteristics of user behaviors, it is expected that user satisfaction for the recommended items can be actually increased.

Evaluation of Frequency Warping Based Features and Spectro-Temporal Features for Speaker Recognition (화자인식을 위한 주파수 워핑 기반 특징 및 주파수-시간 특징 평가)

  • Choi, Young Ho;Ban, Sung Min;Kim, Kyung-Wha;Kim, Hyung Soon
    • Phonetics and Speech Sciences
    • /
    • v.7 no.1
    • /
    • pp.3-10
    • /
    • 2015
  • In this paper, different frequency scales in cepstral feature extraction are evaluated for the text-independent speaker recognition. To this end, mel-frequency cepstral coefficients (MFCCs), linear frequency cepstral coefficients (LFCCs), and bilinear warped frequency cepstral coefficients (BWFCCs) are applied to the speaker recognition experiment. In addition, the spectro-temporal features extracted by the cepstral-time matrix (CTM) are examined as an alternative to the delta and delta-delta features. Experiments on the NIST speaker recognition evaluation (SRE) 2004 task are carried out using the Gaussian mixture model-universal background model (GMM-UBM) method and the joint factor analysis (JFA) method, both based on the ALIZE 3.0 toolkit. Experimental results using both the methods show that BWFCC with appropriate warping factor yields better performance than MFCC and LFCC. It is also shown that the feature set including the spectro-temporal information based on the CTM outperforms the conventional feature set including the delta and delta-delta features.

A MFCC-based CELP Speech Coder for Server-based Speech Recognition in Network Environments (네트워크 환경에서 서버용 음성 인식을 위한 MFCC 기반 음성 부호화기 설계)

  • Lee, Gil-Ho;Yoon, Jae-Sam;Oh, Yoo-Rhee;Kim, Hong-Kook
    • MALSORI
    • /
    • no.54
    • /
    • pp.27-43
    • /
    • 2005
  • Existing standard speech coders can provide speech communication of high quality while they degrade the performance of speech recognition systems that use the reconstructed speech by the coders. The main cause of the degradation is that the spectral envelope parameters in speech coding are optimized to speech quality rather than to the performance of speech recognition. For example, mel-frequency cepstral coefficient (MFCC) is generally known to provide better speech recognition performance than linear prediction coefficient (LPC) that is a typical parameter set in speech coding. In this paper, we propose a speech coder using MFCC instead of LPC to improve the performance of a server-based speech recognition system in network environments. However, the main drawback of using MFCC is to develop the efficient MFCC quantization with a low-bit rate. First, we explore the interframe correlation of MFCCs, which results in the predictive quantization of MFCC. Second, a safety-net scheme is proposed to make the MFCC-based speech coder robust to channel error. As a result, we propose a 8.7 kbps MFCC-based CELP coder. It is shown from a PESQ test that the proposed speech coder has a comparable speech quality to 8 kbps G.729 while it is shown that the performance of speech recognition using the proposed speech coder is better than that using G.729.

  • PDF

A cable tension identification technology using percussion sound

  • Wang, Guowei;Lu, Wensheng;Yuan, Cheng;Kong, Qingzhao
    • Smart Structures and Systems
    • /
    • v.29 no.3
    • /
    • pp.475-484
    • /
    • 2022
  • The loss of cable tension for civil infrastructure reduces structural bearing capacity and causes harmful deformation of structures. Currently, most of the structural health monitoring (SHM) approaches for cables rely on contact transducers. This paper proposes a cable tension identification technology using percussion sound, which provides a fast determination of steel cable tension without physical contact between cables and sensors. Notably, inspired by the concept of tensioning strings for piano tuning, this proposed technology predicts cable tension value by deep learning assisted classification of "percussion" sound from tapping a steel cable. To simulate the non-linear mapping of human ears to sound and to better quantify the minor changes in the high-frequency bands of the sound spectrum generated by percussions, Mel-frequency cepstral coefficients (MFCCs) were extracted as acoustic features to train the deep learning network. A convolutional neural network (CNN) with four convolutional layers and two global pooling layers was employed to identify the cable tension in a certain designed range. Moreover, theoretical and finite element methods (FEM) were conducted to prove the feasibility of the proposed technology. Finally, the identification performance of the proposed technology was experimentally investigated. Overall, results show that the proposed percussion-based technology has great potentials for estimating cable tension for in-situ structural safety assessment.

RoutingConvNet: A Light-weight Speech Emotion Recognition Model Based on Bidirectional MFCC (RoutingConvNet: 양방향 MFCC 기반 경량 음성감정인식 모델)

  • Hyun Taek Lim;Soo Hyung Kim;Guee Sang Lee;Hyung Jeong Yang
    • Smart Media Journal
    • /
    • v.12 no.5
    • /
    • pp.28-35
    • /
    • 2023
  • In this study, we propose a new light-weight model RoutingConvNet with fewer parameters to improve the applicability and practicality of speech emotion recognition. To reduce the number of learnable parameters, the proposed model connects bidirectional MFCCs on a channel-by-channel basis to learn long-term emotion dependence and extract contextual features. A light-weight deep CNN is constructed for low-level feature extraction, and self-attention is used to obtain information about channel and spatial signals in speech signals. In addition, we apply dynamic routing to improve the accuracy and construct a model that is robust to feature variations. The proposed model shows parameter reduction and accuracy improvement in the overall experiments of speech emotion datasets (EMO-DB, RAVDESS, and IEMOCAP), achieving 87.86%, 83.44%, and 66.06% accuracy respectively with about 156,000 parameters. In this study, we proposed a metric to calculate the trade-off between the number of parameters and accuracy for performance evaluation against light-weight.

An Effective Feature Extraction Method for Fault Diagnosis of Induction Motors (유도전동기의 고장 진단을 위한 효과적인 특징 추출 방법)

  • Nguyen, Hung N.;Kim, Jong-Myon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.18 no.7
    • /
    • pp.23-35
    • /
    • 2013
  • This paper proposes an effective technique that is used to automatically extract feature vectors from vibration signals for fault classification systems. Conventional mel-frequency cepstral coefficients (MFCCs) are sensitive to noise of vibration signals, degrading classification accuracy. To solve this problem, this paper proposes spectral envelope cepstral coefficients (SECC) analysis, where a 4-step filter bank based on spectral envelopes of vibration signals is used: (1) a linear predictive coding (LPC) algorithm is used to specify spectral envelopes of all faulty vibration signals, (2) all envelopes are averaged to get general spectral shape, (3) a gradient descent method is used to find extremes of the average envelope and its frequencies, (4) a non-overlapped filter is used to have centers calculated from distances between valley frequencies of the envelope. This 4-step filter bank is then used in cepstral coefficients computation to extract feature vectors. Finally, a multi-layer support vector machine (MLSVM) with various sigma values uses these special parameters to identify faulty types of induction motors. Experimental results indicate that the proposed extraction method outperforms other feature extraction algorithms, yielding more than about 99.65% of classification accuracy.