• Title/Summary/Keyword: Vocal Detection

Search Result 34, Processing Time 0.02 seconds

A Study on A Multi-Pulse Linear Predictive Filtering And Likelihood Ratio Test with Adaptive Threshold (멀티 펄스에 의한 선형 예측 필터링과 적응 임계값을 갖는 LRT의 연구)

  • Lee, Ki-Yong;Lee, Joo-Hun;Song, Iick-Ho;Ann, Sou-Guil
    • The Journal of the Acoustical Society of Korea
    • /
    • v.10 no.1
    • /
    • pp.20-29
    • /
    • 1991
  • A fundamental assumption in conventional linear predictive coding (LPC) analysis procedure is that the input to an all-pole vocal tract filter is white process. In the case of periodic inputs, however, a pitch bias error is introduced into the conventional LP coefficient. Multi-pulse (MP) LP analysis can reduce this bias, provided that an estimate of the excitation is available. Since the prediction error of conventional LP analysis can be modeled as the sum of an MP excitation sequence and a random noise sequence, we can view extracting MP sequences from the prediction error as a classical detection and estimation problem. In this paper, we propose an algorithm in which the locations and amplitudes of the MP sequences are first obtained by applying a likelihood ratio test (LRT) to the prediction error, and LP coefficients free of pitch bias are then obtained from the MP sequences. To verify the performance enhancement, we iterate the above procedure with adaptive threshold at each step.

  • PDF

A New EGG System Design and Speech Analysis for Quantitative Analysis of Human Glottal Vibration Patterns (성문진동 패턴의 정량적인 해석을 위한 새로운 시스템 설계와 음성분석)

  • 김종찬;이재천;김덕원;오명환;윤대희;차일환
    • Journal of Biomedical Engineering Research
    • /
    • v.20 no.4
    • /
    • pp.427-433
    • /
    • 1999
  • The purpose of the study is to develop an improved pitch extraction method that can be used in a variety of speech applications such as high-puality compression and vocoding, and recognition and synthesis of speech. To do so, we develop a new electroglottograph (EGG) measurement system that is based on the four modulation-demodulation type spot electrodes for detecting the EGG signals. Then, the glottal closure instant(GCI) is determined from the EGG signals on a real-time basis. We can obtain the pitch contour using the information on the GCI. It turns out that the new pitch contour algorithm (PCA) operates more reliably as compared to the conventional speech-only-based algorithm. In addition, we study the speech source models and glottal vibratory patterns for Koreans by measuring and analyzing the diversified vibration patterns of the vocal from the EGG signals.

  • PDF

Toward an integrated model of emotion recognition methods based on reviews of previous work (정서 재인 방법 고찰을 통한 통합적 모델 모색에 관한 연구)

  • Park, Mi-Sook;Park, Ji-Eun;Sohn, Jin-Hun
    • Science of Emotion and Sensibility
    • /
    • v.14 no.1
    • /
    • pp.101-116
    • /
    • 2011
  • Current researches on emotion detection classify emotions by using the information from facial, vocal, and bodily expressions, or physiological responses. This study was to review three representative emotion recognition methods, which were based on psychological theory of emotion. Firstly, literature review on the emotion recognition methods based on facial expressions was done. These studies were supported by Darwin's theory. Secondly, review on the emotion recognition methods based on changes in physiology was conducted. These researches were relied on James' theory. Lastly, a review on the emotion recognition was conducted on the basis of multimodality(i.e., combination of signals from face, dialogue, posture, or peripheral nervous system). These studies were supported by both Darwin's and James' theories. In each part, research findings was examined as well as theoretical backgrounds which each method was relied on. This review proposed a need for an integrated model of emotion recognition methods to evolve the way of emotion recognition. The integrated model suggests that emotion recognition methods are needed to include other physiological signals such as brain responses or face temperature. Also, the integrated model proposed that emotion recognition methods are needed to be based on multidimensional model and take consideration of cognitive appraisal factors during emotional experience.

  • PDF

Automatic detection and severity prediction of chronic kidney disease using machine learning classifiers (머신러닝 분류기를 사용한 만성콩팥병 자동 진단 및 중증도 예측 연구)

  • Jihyun Mun;Sunhee Kim;Myeong Ju Kim;Jiwon Ryu;Sejoong Kim;Minhwa Chung
    • Phonetics and Speech Sciences
    • /
    • v.14 no.4
    • /
    • pp.45-56
    • /
    • 2022
  • This paper proposes an optimal methodology for automatically diagnosing and predicting the severity of the chronic kidney disease (CKD) using patients' utterances. In patients with CKD, the voice changes due to the weakening of respiratory and laryngeal muscles and vocal fold edema. Previous studies have phonetically analyzed the voices of patients with CKD, but no studies have been conducted to classify the voices of patients. In this paper, the utterances of patients with CKD were classified using the variety of utterance types (sustained vowel, sentence, general sentence), the feature sets [handcrafted features, extended Geneva Minimalistic Acoustic Parameter Set (eGeMAPS), CNN extracted features], and the classifiers (SVM, XGBoost). Total of 1,523 utterances which are 3 hours, 26 minutes, and 25 seconds long, are used. F1-score of 0.93 for automatically diagnosing a disease, 0.89 for a 3-classes problem, and 0.84 for a 5-classes problem were achieved. The highest performance was obtained when the combination of general sentence utterances, handcrafted feature set, and XGBoost was used. The result suggests that a general sentence utterance that can reflect all speakers' speech characteristics and an appropriate feature set extracted from there are adequate for the automatic classification of CKD patients' utterances.