• Title/Summary/Keyword: voice classification

Search Result 149, Processing Time 0.031 seconds

Hand Gesture Recognition using Multivariate Fuzzy Decision Tree and User Adaptation (다변량 퍼지 의사결정트리와 사용자 적응을 이용한 손동작 인식)

  • Jeon, Moon-Jin;Do, Jun-Hyeong;Lee, Sang-Wan;Park, Kwang-Hyun;Bien, Zeung-Nam
    • The Journal of Korea Robotics Society
    • /
    • v.3 no.2
    • /
    • pp.81-90
    • /
    • 2008
  • While increasing demand of the service for the disabled and the elderly people, assistive technologies have been developed rapidly. The natural signal of human such as voice or gesture has been applied to the system for assisting the disabled and the elderly people. As an example of such kind of human robot interface, the Soft Remote Control System has been developed by HWRS-ERC in $KAIST^[1]$. This system is a vision-based hand gesture recognition system for controlling home appliances such as television, lamp and curtain. One of the most important technologies of the system is the hand gesture recognition algorithm. The frequently occurred problems which lower the recognition rate of hand gesture are inter-person variation and intra-person variation. Intra-person variation can be handled by inducing fuzzy concept. In this paper, we propose multivariate fuzzy decision tree(MFDT) learning and classification algorithm for hand motion recognition. To recognize hand gesture of a new user, the most proper recognition model among several well trained models is selected using model selection algorithm and incrementally adapted to the user's hand gesture. For the general performance of MFDT as a classifier, we show classification rate using the benchmark data of the UCI repository. For the performance of hand gesture recognition, we tested using hand gesture data which is collected from 10 people for 15 days. The experimental results show that the classification and user adaptation performance of proposed algorithm is better than general fuzzy decision tree.

  • PDF

A Train Ticket Reservation Aid System Using Automated Call Routing Technology Based on Speech Recognition (음성인식을 이용한 자동 호 분류 철도 예약 시스템)

  • Shim Yu-Jin;Kim Jae-In;Koo Myung-Wan
    • MALSORI
    • /
    • no.52
    • /
    • pp.161-169
    • /
    • 2004
  • This paper describes the automated call routing for train ticket reservation aid system based on speech recognition. We focus on the task of automatically routing telephone calls based on user's fluently spoken response instead of touch tone menus in an interactive voice response system. Vector-based call routing algorithm is investigated and mapping table for key term is suggested. Korail database collected by KT is used for call routing experiment. We evaluate call-classification experiments for transcribed text from Korail database. In case of small training data, an average call routing error reduction rate of 14% is observed when mapping table is used.

  • PDF

Multi-resolution DenseNet based acoustic models for reverberant speech recognition (잔향 환경 음성인식을 위한 다중 해상도 DenseNet 기반 음향 모델)

  • Park, Sunchan;Jeong, Yongwon;Kim, Hyung Soon
    • Phonetics and Speech Sciences
    • /
    • v.10 no.1
    • /
    • pp.33-38
    • /
    • 2018
  • Although deep neural network-based acoustic models have greatly improved the performance of automatic speech recognition (ASR), reverberation still degrades the performance of distant speech recognition in indoor environments. In this paper, we adopt the DenseNet, which has shown great performance results in image classification tasks, to improve the performance of reverberant speech recognition. The DenseNet enables the deep convolutional neural network (CNN) to be effectively trained by concatenating feature maps in each convolutional layer. In addition, we extend the concept of multi-resolution CNN to multi-resolution DenseNet for robust speech recognition in reverberant environments. We evaluate the performance of reverberant speech recognition on the single-channel ASR task in reverberant voice enhancement and recognition benchmark (REVERB) challenge 2014. According to the experimental results, the DenseNet-based acoustic models show better performance than do the conventional CNN-based ones, and the multi-resolution DenseNet provides additional performance improvement.

Collection, Analysis and Classification of Pathological Voice from ARS using Neural Network (ARS와 신경회로망을 이용한 장애음성의 수집, 분석 및 식별에 관한 연구)

  • 김광인;조철우;김대현;왕수건;전계록;안시훈;김기련;김용주
    • Proceedings of the IEEK Conference
    • /
    • 2000.09a
    • /
    • pp.955-958
    • /
    • 2000
  • 본 논문은 음성신호를 이용해 성대의 질환이 있는 환자를 진단하고 병명을 판별하게끔 유도하는 자동 진단 시스템을 개발하기 위한 연구의 일부로, 그중 ARS를 이용하여 환자의 음성을 수집, 분석, 식별의 실험에 대한 연구이다. 본 연구 팀에서는 이미 CSL을 이용한 장애음성 데이터의 수집과 식별에 관한 연구 결과를 발표한바 있다. 하지만 선행연구에서는 방음실에서 디지털 녹음기를 이용하여 수집한 음성을 사용했기 때문에, ARS를 통하여 녹음한 음성과는 샘플링 주파수나 대역폭, 잡음성분등의 데이터의 특성이 상당한 차이가 있다. 이러한 이유로 ARS를 통하여 녹음한 음성에 보다 적합한 파라미터 분석프로그램을 작성하여 파라미터를 구하였다. 이 파라미터들은 Kay사의 MDVP를 기초로하여 작성하였고, 대부분 80%정도의 신뢰성을 가졌다. 수집한 음성의 식별은 정상음성과 양성음성의 두가지 경우로 분리하였다. 식별기법으로는 신경망을 이용하였고, 식별파라미터는 구한 파라미터중 6개의 파라미터를 선별하여 식별한 결과 약 90%정도의 식별율을 가졌다.

  • PDF

The Aerodynamic Comparisons between Pathologic Whispers and Phonation in Patients with Muscle Misuse Dysphonia (병리적 속삭임과 발성의 공기역학적 비교 -근오용성음성장애를 가진 동일 환자를 대상으로-)

  • Seo, Inhyo;Hwang, Youngjin;Seong, Cheoljae
    • Phonetics and Speech Sciences
    • /
    • v.5 no.1
    • /
    • pp.55-62
    • /
    • 2013
  • This study compared the aerodynamic multiparameters of whispers and phonation in patients with muscle misuse dysphonia(MMD) to evaluate the voice aerodynamic analysis for discrimination between whispers and phonation. Eleven patients with muscle misuse dysphonia were examined. Whispers were shorter with a maximum phonation time(MPT; p<.01), a lower phonatory sound pressure level(SPLp; p<.01), a higher phonatory flow rate (PFR; p<01), lower phonatory efficiency(PE; p<.01), and a lower phonatory resistance (PR; p<.05) than phonation. The subglottal pressure level was not significantly different between whispers and phonation. (Psub; p>.05). The ROC analysis showed that the threshold of 23.83 ppm for PE achieved a good classification for whispers, with the perfect sensitivity(100%) and specificity(100%). Those results indicate PE reliably distinguished between whispers and phonation. The results also suggest that PE may provide a useful tool for studying the laryngeal source.

A Comparison of Effective Feature Vectors for Speech Emotion Recognition (음성신호기반의 감정인식의 특징 벡터 비교)

  • Shin, Bo-Ra;Lee, Soek-Pil
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.67 no.10
    • /
    • pp.1364-1369
    • /
    • 2018
  • Speech emotion recognition, which aims to classify speaker's emotional states through speech signals, is one of the essential tasks for making Human-machine interaction (HMI) more natural and realistic. Voice expressions are one of the main information channels in interpersonal communication. However, existing speech emotion recognition technology has not achieved satisfactory performances, probably because of the lack of effective emotion-related features. This paper provides a survey on various features used for speech emotional recognition and discusses which features or which combinations of the features are valuable and meaningful for the emotional recognition classification. The main aim of this paper is to discuss and compare various approaches used for feature extraction and to propose a basis for extracting useful features in order to improve SER performance.

Voice Classification Algorithm for Sasang Constitution (음성을 이용한 사상체질 분류 보조 알고리즘)

  • Kang, Jae-Hwan;Lee, Hae-Jung
    • Proceedings of the KIEE Conference
    • /
    • 2009.07a
    • /
    • pp.1982_1983
    • /
    • 2009
  • 본 연구에서는 기존의 특정 음성 변수에 대한 모수적 통계 접근 방법을 탈피하고 새로운 음성을 이용한 사상체질 분류 알고리즘을 개발하고자 먼저 5개의 모음과 2개의 문장으로 이루어진 총 120명의 여성 음성 데이터 수집하였다. 이후 다양한 음성 신호 분석 방법과 툴을 이용하여 총 134개의 음성 변수를 추출하였다. 각 변수에서는 체질별 최대값들의 최소값, 최소값들의 최대값을 이용해 4개의 조건 변수를 새로 생성하고 이를 관리하기 위한 메모리와 체질 점수 개념을 도입하여 비모수적인 통계 방법을 기반으로 한 분류 알고리즘을 개발하였다. 알고리즘 성능 테스트를 위해 10-fold cross 검정테스트를 실시하였으며 본 알고리즘은 최종적으로 이진 분류에서 진단률 41.5%와 정확률 79.5%를 가지는 것으로 확인되었다.

  • PDF

Quantitative Measure of Speaker Specific Information in Human Voice: From the Perspective of Information Theoretic Approach (정보이론 관점에서 음성 신호의 화자 특징 정보를 정량적으로 측정하는 방법에 관한 연구)

  • Kim Samuel;Seo Jung Tae;Kang Hong Goo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.24 no.1E
    • /
    • pp.16-20
    • /
    • 2005
  • A novel scheme to measure the speaker information in speech signal is proposed. We develope the theory of quantitative measurement of the speaker characteristics in the information theoretic point of view, and connect it to the classification error rate. Homomorphic analysis based features, such as mel frequency cepstral coefficient (MFCC), linear prediction cepstral coefficient (LPCC), and linear frequency cepstral coefficient (LFCC) are studied to measure speaker specific information contained in those feature sets by computing mutual information. Theories and experimental results provide us quantitative measure of speaker information in speech signal.

A Simple Speech/Non-speech Classifier Using Adaptive Boosting

  • Kwon, Oh-Wook;Lee, Te-Won
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.3E
    • /
    • pp.124-132
    • /
    • 2003
  • We propose a new method for speech/non-speech classifiers based on concepts of the adaptive boosting (AdaBoost) algorithm in order to detect speech for robust speech recognition. The method uses a combination of simple base classifiers through the AdaBoost algorithm and a set of optimized speech features combined with spectral subtraction. The key benefits of this method are the simple implementation, low computational complexity and the avoidance of the over-fitting problem. We checked the validity of the method by comparing its performance with the speech/non-speech classifier used in a standard voice activity detector. For speech recognition purpose, additional performance improvements were achieved by the adoption of new features including speech band energies and MFCC-based spectral distortion. For the same false alarm rate, the method reduced 20-50% of miss errors.

A Study on Text Choice for Web-Based Speaker Verification System (웹 기반의 화자확인시스템을 위한 문장선정에 관한 연구)

  • 안기모;이재희;강철호
    • The Journal of the Acoustical Society of Korea
    • /
    • v.19 no.6
    • /
    • pp.34-40
    • /
    • 2000
  • In text-dependent speaker verification system, which text choice for speaker to utter is very important factor for performance improvement. In this paper, building a consonant mixture system using classification method of korean phonetic value is proposed. When it is applied to the web-based speaker verification system, it can cope with abrupt change of speaker's voice information and have the optimal performance in speaker verification system.

  • PDF