• Title/Summary/Keyword: MFCC

Search Result 272, Processing Time 0.03 seconds

New Temporal Features for Cardiac Disorder Classification by Heart Sound (심음 기반의 심장질환 분류를 위한 새로운 시간영역 특징)

  • Kwak, Chul;Kwon, Oh-Wook
    • The Journal of the Acoustical Society of Korea
    • /
    • v.29 no.2
    • /
    • pp.133-140
    • /
    • 2010
  • We improve the performance of cardiac disorder classification by adding new temporal features extracted from continuous heart sound signals. We add three kinds of novel temporal features to a conventional feature based on mel-frequency cepstral coefficients (MFCC): Heart sound envelope, murmur probabilities, and murmur amplitude variation. In cardiac disorder classification and detection experiments, we evaluate the contribution of the proposed features to classification accuracy and select proper temporal features using the sequential feature selection method. The selected features are shown to improve classification accuracy significantly and consistently for neural network-based pattern classifiers such as multi-layer perceptron (MLP), support vector machine (SVM), and extreme learning machine (ELM).

A COVID-19 Diagnosis Model based on Various Transformations of Cough Sounds (기침 소리의 다양한 변환을 통한 코로나19 진단 모델)

  • Minkyung Kim;Gunwoo Kim;Keunho Choi
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.3
    • /
    • pp.57-78
    • /
    • 2023
  • COVID-19, which started in Wuhan, China in November 2019, spread beyond China in 2020 and spread worldwide in March 2020. It is important to prevent a highly contagious virus like COVID-19 in advance and to actively treat it when confirmed, but it is more important to identify the confirmed fact quickly and prevent its spread since it is a virus that spreads quickly. However, PCR test to check for infection is costly and time consuming, and self-kit test is also easy to access, but the cost of the kit is not easy to receive every time. Therefore, if it is possible to determine whether or not a person is positive for COVID-19 based on the sound of a cough so that anyone can use it easily, anyone can easily check whether or not they are confirmed at anytime, anywhere, and it can have great economic advantages. In this study, an experiment was conducted on a method to identify whether or not COVID-19 was confirmed based on a cough sound. Cough sound features were extracted through MFCC, Mel-Spectrogram, and spectral contrast. For the quality of cough sound, noisy data was deleted through SNR, and only the cough sound was extracted from the voice file through chunk. Since the objective is COVID-19 positive and negative classification, learning was performed through XGBoost, LightGBM, and FCNN algorithms, which are often used for classification, and the results were compared. Additionally, we conducted a comparative experiment on the performance of the model using multidimensional vectors obtained by converting cough sounds into both images and vectors. The experimental results showed that the LightGBM model utilizing features obtained by converting basic information about health status and cough sounds into multidimensional vectors through MFCC, Mel-Spectogram, Spectral contrast, and Spectrogram achieved the highest accuracy of 0.74.

Laryngeal Cancer Screening using Cepstral Parameters (켑스트럼 파라미터를 이용한 후두암 검진)

  • 이원범;전경명;권순복;전계록;김수미;김형순;양병곤;조철우;왕수건
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.14 no.2
    • /
    • pp.110-116
    • /
    • 2003
  • Background and Objectives : Laryngeal cancer discrimination using voice signals is a non-invasive method that can carry out the examination rapidly and simply without giving discomfort to the patients. n appropriate analysis parameters and classifiers are developed, this method can be used effectively in various applications including telemedicine. This study examines voice analysis parameters used for laryngeal disease discrimination to help discriminate laryngeal diseases by voice signal analysis. The study also estimates the laryngeal cancer discrimination activity of the Gaussian mixture model (GMM) classifier based on the statistical modelling of voice analysis parameters. Materials and Methods : The Multi-dimensional voice program (MDVP) parameters, which have been widely used for the analysis of laryngeal cancer voice, sometimes fail to analyze the voice of a laryngeal cancer patient whose cycle is seriously damaged. Accordingly, it is necessary to develop a new method that enables an analysis of high reliability for the voice signals that cannot be analyzed by the MDVP. To conduct the experiments of laryngeal cancer discrimination, the authors used three types of voices collected at the Department of Otorhinorlaryngology, Pusan National University Hospital. 50 normal males voice data, 50 voices of males with benign laryngeal diseases and 105 voices of males laryngeal cancer. In addition, the experiment also included 11 voices data of males with laryngeal cancer that cannot be analyzed by the MDVP, Only monosyllabic vowel /a/ was used as voice data. Since there were only 11 voices of laryngeal cancer patients that cannot be analyzed by the MDVP, those voices were used only for discrimination. This study examined the linear predictive cepstral coefficients (LPCC) and the met-frequency cepstral coefficients (MFCC) that are the two major cepstrum analysis methods in the area of acoustic recognition. Results : The results showed that this met frequency scaling process was effective in acoustic recognition but not useful for laryngeal cancer discrimination. Accordingly, the linear frequency cepstral coefficients (LFCC) that excluded the met frequency scaling from the MFCC was introduced. The LFCC showed more excellent discrimination activity rather than the MFCC in predictability of laryngeal cancer. Conclusion : In conclusion, the parameters applied in this study could discriminate accurately even the terminal laryngeal cancer whose periodicity is disturbed. Also it is thought that future studies on various classification algorithms and parameters representing pathophysiology of vocal cords will make it possible to discriminate benign laryngeal diseases as well, in addition to laryngeal cancer.

  • PDF

Motion Study of Treatment Robot for Autistic Children Using Speech Data Classification Based on Artificial Neural Network (음성 분류 인공신경망을 활용한 자폐아 치료용 로봇의 지능화 동작 연구)

  • Lee, Jin-Gyu;Lee, Bo-Hee
    • Journal of IKEEE
    • /
    • v.23 no.4
    • /
    • pp.1440-1447
    • /
    • 2019
  • Currently, the prevalence of autism spectrum disorders in children is reported to be higher and shows various types of disorders. In particular, they are having difficulty in communication due to communication impairment in the area of social communication and need to be improved through training. Thus, this study proposes a method of acquiring voice information through a microphone mounted on a robot designed through preliminary research and using this information to make intelligent motions. An ANN(Artificial Neural Network) was used to classify the speech data into robot motions, and we tried to improve the accuracy by combining the Recurrent Neural Network based on Convolutional Neural Network. The preprocessing of input speech data was analyzed using MFCC(Mel-Frequency Cepstral Coefficient), and the motion of the robot was estimated using various data normalization and neural network optimization techniques. In addition, the designed ANN showed a high accuracy by conducting an experiment comparing the accuracy with the existing architecture and the method of human intervention. In order to design robot motions with higher accuracy in the future and to apply them in the treatment and education environment of children with autism.

A study on user authentication method using speaker authentication mechanism in login process (로그인 과정에서의 화자인증 메커니즘을 이용한 사용자인증 방안 연구)

  • Kim, Nam-Ho;Choi, Ji-Young
    • Smart Media Journal
    • /
    • v.8 no.3
    • /
    • pp.23-30
    • /
    • 2019
  • With the popularization of the Internet and smartphone uses, people in the modern era are living in a multi-channel environment in which they access the information system freely through various methods and media. In the process of utilizing such services, users must authenticate themselves, the typical of which is ID & password authentication. It is considered the most convenient method as it can be authenticated only through the keyboard after remembering its own credentials. On the other hand, modern web services only allow passwords to be set with high complexity by different combinations. Passwords consisting of these complex strings also increase proportionally, since the more services users want to use, the more user authentication information they need to remember is recommended periodically to prevent personal information leakage. It is difficult for the blind, the disabled, or the elderly to remember the authentication information of users with such high entropy values and to use it through keyboard input. Therefore, this paper proposes a user authentication method using Google Assistant, MFCC and DTW algorithms and speaker authentication to provide the handicapped users with an easy user authentication method in the login process.

A Study on the Extraction of Specific Audio Feature In Basketball Video (농구 비디오에서 특정 음성 특징 추출에 관한 연구)

  • 공현장;김원필;김판구
    • Proceedings of the Korea Multimedia Society Conference
    • /
    • 2002.05d
    • /
    • pp.1075-1080
    • /
    • 2002
  • 최근 멀티미디어 정보 시스템에서의 음성 핀 시각적 내용의 분류에 관한 연구가 활발히 진행되고 있다. 이에 본 논문에서는 농구 경기의 비디오 데이터로부터 특정 음성 정보를 추출하는 방법과 이를 농구 게임의 중요 이벤트 검출에 이용하는 방법을 제안한다. MFCC 특징들과 LPC 엔트로피의 조합을 이용하여 검출된 관중들의 환호 소리로부터 중요한 이벤트의 위치를 예측할 수 있다. 농구 경기의 다양한 소리들 중에서 관중들의 환호 소리를 분류하여 이를 농구 비디오 데이터에서 중요한 이벤트들을 검출하는데 사용함으로써 매우 효과적 결과를 얻을 수 있었다.

  • PDF

RECOGNITION SYSTEM USING VOCAL-CORD SIGNAL (성대 신호를 이용한 인식 시스템)

  • Cho, Kwan-Hyun;Han, Mun-Sung;Park, Jun-Seok;Jeong, Young-Gyu
    • Proceedings of the KIEE Conference
    • /
    • 2005.10b
    • /
    • pp.216-218
    • /
    • 2005
  • This paper present a new approach to a noise robust recognizer for WPS interface. In noisy environments, performance of speech recognition is decreased rapidly. To solve this problem, We propose the recognition system using vocal-cord signal instead of speech. Vocal-cord signal has low quality but it is more robust to environment noise than speech signal. As a result, we obtained 75.21% accuracy using MFCC with CMS and 83.72% accuracy using ZCPA with RASTA.

  • PDF

Adoption of Support Vector Machine and Independent Component Analysis for Implementation of Speech Recognizer (음성인식기 구현을 위한 SVM과 독립성분분석 기법의 적용)

  • 박정원;김평환;김창근;허강인
    • Proceedings of the IEEK Conference
    • /
    • 2003.07e
    • /
    • pp.2164-2167
    • /
    • 2003
  • In this paper we propose effective speech recognizer through recognition experiments for three feature parameters(PCA, ICA and MFCC) using SVM(Support Vector Machine) classifier In general, SVM is classification method which classify two class set by finding voluntary nonlinear boundary in vector space and possesses high classification performance under few training data number. In this paper we compare recognition result for each feature parameter and propose ICA feature as the most effective parameter

  • PDF

Discriminative Feature Vector Selection for Emotion Classification Based on Speech. (음성신호기반의 감정분석을 위한 특징벡터 선택)

  • Choi, Ha-Na;Byun, Sung-Woo;Lee, Seok-Pil
    • Proceedings of the KIEE Conference
    • /
    • 2015.07a
    • /
    • pp.1391-1392
    • /
    • 2015
  • 최근 컴퓨터 기술이 발전하고, 컴퓨터의 형태가 다양해지면서 여러 wearable device들이 생겨났다. 이에 따라 휴먼 인터페이스 기술에서 사람의 감정정보가 중요해졌고, 감정인식에 대한 연구들이 많이 진행 되어 왔다. 본 논문에서는 감정분석에 적합한 특징벡터를 제시하고자 한다. 이를 위해 사람의 감정을 보통, 기쁨, 슬픔, 화남 4가지로 분류하고 방송매체를 통하여 잡음 없이 녹음하였다. 특징벡터는 MFCC, LPC, LPCC 3가지를 추출하였고 Bhattacharyya거리 측정을 통하여 분리도를 비교하였다.

  • PDF

Speaker and Context Independent Emotion Recognition using Speech Signal (음성을 이용한 화자 및 문장독립 감정인식)

  • 강면구;김원구
    • Proceedings of the IEEK Conference
    • /
    • 2002.06d
    • /
    • pp.377-380
    • /
    • 2002
  • In this paper, speaker and context independent emotion recognition using speech signal is studied. For this purpose, a corpus of emotional speech data recorded and classified according to the emotion using the subjective evaluation were used to make statical feature vectors such as average, standard deviation and maximum value of pitch and energy and to evaluate the performance of the conventional pattern matching algorithms. The vector quantization based emotion recognition system is proposed for speaker and context independent emotion recognition. Experimental results showed that vector quantization based emotion recognizer using MFCC parameters showed better performance than that using the Pitch and energy Parameters.

  • PDF