• Title/Summary/Keyword: voice extract

Search Result 69, Processing Time 0.023 seconds

Speaker Separation Based on Directional Filter and Harmonic Filter (Directional Filter와 Harmonic Filter 기반 화자 분리)

  • Baek, Seung-Eun;Kim, Jin-Young;Na, Seung-You;Choi, Seung-Ho
    • Speech Sciences
    • /
    • v.12 no.3
    • /
    • pp.125-136
    • /
    • 2005
  • Automatic speech recognition is much more difficult in real world. Speech recognition according to SIR (Signal to Interface Ratio) is difficult in situations in which noise of surrounding environment and multi-speaker exists. Therefore, study on main speaker's voice extractions a very important field in speech signal processing in binaural sound. In this paper, we used directional filter and harmonic filter among other existing methods to extract the main speaker's information in binaural sound. The main speaker's voice was extracted using directional filter, and other remaining speaker's information was removed using harmonic filter through main speaker's pitch detection. As a result, voice of the main speaker was enhanced.

  • PDF

Diagnosing Vocal Disorders using Cobweb Clustering of the Jitter, Shimmer, and Harmonics-to-Noise Ratio

  • Lee, Keonsoo;Moon, Chanki;Nam, Yunyoung
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.11
    • /
    • pp.5541-5554
    • /
    • 2018
  • A voice is one of the most significant non-verbal elements for communication. Disorders in vocal organs, or habitual muscular setting for articulatory cause vocal disorders. Therefore, by analyzing the vocal disorders, it is possible to predicate vocal diseases. In this paper, a method of predicting vocal disorders using the jitter, shimmer, and harmonics-to-noise ratio (HNR) extracted from vocal records is proposed. In order to extract jitter, shimmer, and HNR, one-second's voice signals are recorded in 44.1khz. In an experiment, 151 voice records are collected. The collected data set is clustered using cobweb clustering method. 21 classes with 12 leaves are resulted from the data set. According to the semantics of jitter, shimmer, and HNR, the class whose centroid has lowest jitter and shimmer, and highest HNR becomes the normal vocal group. The risk of vocal disorders can be predicted by measuring the distance and direction between the centroids.

Chest Girth Prediction Method Using Voice Signals Analysis Technology : Focusing on Men in the 20's (음성신호 분석 기술을 이용한 흉위 예측 기법 : 20대 남성을 대상으로)

  • Kim, Bong-Hyun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.16 no.9
    • /
    • pp.2031-2036
    • /
    • 2012
  • There is body type that physique classified by apparent characteristics as shape of human body. Chest girth circumference and body type statistically has been look into correlative disposition, character etc. In this paper, we carried out study about prediction of chest girth as voice that interrelationship drew to analyze voice of disposition, character etc. in personal character. With this in mind, we measured intensity, spectrum about laughter by chest girth to classify composition group of subjects and then we would like to extract experiment result to predict chest girth by reciprocal comparison.

A Proposal of Sasang Constitution Classification in Middle-aged Women Using Image and Voice Signals Process (영상 및 음성 신호 처리를 이용한 장년기 여성의 사상체질 분류 방법의 제안)

  • Lee, Se-Hwan;Kim, Bong-Hyun;Ka, Min-Kyoung;Cho, Dong-Uk;Kwak, Ji-Hyun;Oh, Sang-Young;J.Bae, Young-Lae
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.9 no.5
    • /
    • pp.1210-1217
    • /
    • 2008
  • Sasang medicine is our country's unique traditional medicine based on the classification of individual physical constitution. In the Sasang medicine, what is considered the most important task is to categorize Sasang constitution exactly. Therefore, security of objective elements and diagnosis index is a problem awaiting solution in Sasang constitution classification. To this the paper abstracted result value from objectification, visualization, a fixed quantity of Sasang constitution to analyze face image signals and voice signals. And, comparing the differences constitution would like to develop system classification of Sasang constitution. Specially, image and voice signals are different because of gender, age, region so it composed Sasang constitution group to 40-50 years women in Seoul. To extract of these image and voice signals wanted to perform comparison, analysis in constitution. Finally, it would like to prove a significance of research result through experiment.

Application of Vocal Properties and Vocal Independent Features to Classifying Sasang Constitution (음성 특성 및 음성 독립 변수의 사상체질 분류로의 적용 방법)

  • Kim, Keun-Ho;Kang, Nam-Sik;Ku, Bon-Cho;Kim, Jong-Yeol
    • Journal of Sasang Constitutional Medicine
    • /
    • v.23 no.4
    • /
    • pp.458-470
    • /
    • 2011
  • 1. Objectives Vocal characteristics are commonly considered as an important factor in determining the Sasang constitution and the health condition. We have tried to find out the classification procedure to distinguish the constitution objectively and quantitatively by analyzing the characteristics of subject's voice without noise and error. 2. Methods In this study, we extract the vocal features from voice selected with prior information, remove outliers, minimize the correlated features, correct the features with normalization according to gender and age, and make the discriminant functions that are adaptive to gender and age from the features for improving diagnostic accuracy. 3. Results and Conclusions Finally, the discriminant functions produced about 45% accuracy to classify the constitution for every age interval and every gender, and the diagnostic accuracy was meaningful as the result from only the voice.

Evaluation of Mental Fatigue Using Vowel Formant Analysis (모음 포먼트 분석을 통한 정신적 피로 평가)

  • Ha, Wook Hyun;Park, Sung Ha
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.37 no.1
    • /
    • pp.26-32
    • /
    • 2014
  • Mental fatigue is inevitable in the workplace. Since mental fatigue can lead to decreased efficiency and critical accidents, it is important to manage mental fatigue from the viewpoint of accident prevention. An experiment was performed to evaluate mental fatigue using the formant frequency analysis of human voices. The experimental task was to mentally add or subtract two one-digit numbers. After completing the tasks with four different levels of mental fatigue, subjects were asked to read Korean vowels and their voices were recorded. Five vowel sounds of "아", "어", "오", "우", and "이" from the voice recorded were then used to extract formant 1 frequency. Results of separate ANOVAs showed significant main effects of mental fatigue on formant 1 frequencies of all five vowels concerned. However, post-hoc comparisons revealed that formant 1 frequencies of "아" and "어" were most sensitive to mental fatigue level employed in this experiment. Formant 1 frequencies of "아" and "어" significantly decrease as the mental fatigue accumulates. The formant frequency extracted from human voice would be potentially applicable for detecting mental fatigue induced during industrial tasks.

Diagnosis of Parkinson's disease based on audio voice using wav2vec (Wav2vec을 이용한 오디오 음성 기반의 파킨슨병 진단)

  • Yoon, Hee-Jin
    • Journal of Digital Convergence
    • /
    • v.19 no.12
    • /
    • pp.353-358
    • /
    • 2021
  • Parkinson's disease is the second most common degenerative brain disease after Alzheimer's in old age. Symptoms of Parkinson's disease are factors that reduce the quality of life in daily life, such as shaking hands, slowing behavior and cognitive function. Parkinson's disease that can slow the progression of the disease through early diagnosis. To diagnoze Parkinson's disease early, an algorithm was implemented to extract features using wav2vec and to diagnose the presence or absence of Parkinson's disease with deep learning(ANN). As a results of the experiment, the accuracy was 97.47%. It was better than the results of diagnosing Parkinson's disease using the existing neural network. The audio voice file could simply reduce the experiment process and obtain improved results.

Discriminative Feature Vector Selection for Emotion Classification Based on Speech (음성신호기반의 감정분석을 위한 특징벡터 선택)

  • Choi, Ha-Na;Byun, Sung-Woo;Lee, Seok-Pil
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.64 no.9
    • /
    • pp.1363-1368
    • /
    • 2015
  • Recently, computer form were smaller than before because of computing technique's development and many wearable device are formed. So, computer's cognition of human emotion has importantly considered, thus researches on analyzing the state of emotion are increasing. Human voice includes many information of human emotion. This paper proposes a discriminative feature vector selection for emotion classification based on speech. For this, we extract some feature vectors like Pitch, MFCC, LPC, LPCC from voice signals are divided into four emotion parts on happy, normal, sad, angry and compare a separability of the extracted feature vectors using Bhattacharyya distance. So more effective feature vectors are recommended for emotion classification.

Personal Information Extraction Using A Microphone Array (마이크로폰어레이를 이용한 사용자 정보추출)

  • Kim, Hye-Jin;Yoon, Ho-Sub
    • The Journal of Korea Robotics Society
    • /
    • v.3 no.2
    • /
    • pp.131-136
    • /
    • 2008
  • This paper proposes a method to extract the personal information using a microphone array. Useful personal information, particularly customers, is age and gender. On the basis of this information, service applications for robots can satisfy users by offering services adaptive to the special needs of specific user groups that may include adults and children as well as females and males. We applied Gaussian Mixture Model (GMM) as a classifier and Mel Frequency Cepstral coefficients (MFCCs) as a voice feature. The major aim of this paper is to discover the voice source parameters of age and gender and to classify these two characteristics simultaneously. For the ubiquitous environment, voices obtained by the selected channels in a microphone array are useful to reduce background noise.

  • PDF

Voice Coding Using Mouth Shape Features (입술형태 특성을 이용한 음성코딩)

  • Jang, Jong-Hwan
    • The Journal of Engineering Research
    • /
    • v.1 no.1
    • /
    • pp.65-70
    • /
    • 1997
  • To transmit the degraded voice signal within various environment surrounding acoustic noises, we extract lip i the face and then compare lip edge features with prestoring DB having features such as mouth height, width, area, and rate. It provides high security and is not affected by acoustic noise because it is not necessary to transmit the actual utterance.

  • PDF