• Title/Summary/Keyword: Vocal Detection

Search Result 34, Processing Time 0.023 seconds

Vocal Separation Using Selective Frequency Subtraction Considering with Energies and Phases (에너지와 위상을 고려한 선택적 주파수 차감법을 이용한 보컬 분리)

  • Kim, Hyuntae;Park, Jangsik
    • Journal of Broadcast Engineering
    • /
    • v.20 no.3
    • /
    • pp.408-413
    • /
    • 2015
  • Recently, According to increasing interest to original sound Karaoke instrument, MIDI type karaoke manufacturer attempt to make more cheap method instead of original recoding method. The specific method is to make the original sound accompaniment to remove only the voice of the singer in the singer music album. In this paper, a system to separate vocal components from music accompaniment for stereo recordings were proposed. Proposed system consists of two stages. The first stage is a vocal detection. This stage classifies an input into vocal and non vocal portions by using SVM with MFCC. In the second stage, selective frequency subtractions were performed at each frequency bin in vocal portions. In this case, it is determined in consideration not only the energies for each frequency bin but also the phase of the each frequency bin at each channel signal. Listening test with removed vocal music from proposed system show relatively high satisfactory level.

Electroglottographic Measurements of Glottal Function in Voice according to Gender and Age

  • Ko, Do-Heung
    • Phonetics and Speech Sciences
    • /
    • v.3 no.1
    • /
    • pp.97-102
    • /
    • 2011
  • Electroglottography (EGG) is a common method for providing non-invasive measurements of glottal activity. EGG has been used in vocal pathology as a clinical or research tool to measure vocal fold contact. This paper presents the results of pitch, jitter, and closed quotient (CQ) measurements in electroglottographic signals of young (mean = 22.7 years) and elderly (mean = 74.3 years) male and female subjects. The sustained corner vowels /i/, /a/, and /u/ were measured at around 70 dB SPL since the most notable among EGG variables is the phonation intensity, which showed positive correlation with closed phase. The aim of this paper was to measure EGG data according to age and gender. In CQ, there was a significant difference between young and elderly female subjects while there was no significant difference between young and elderly male subjects. The mean value for young males was higher than that for elderly males while the mean value for young females was lower than that for elderly females. Thus, it can be said that in mean values, increased CQ was related to decreased age for females, while CQ decreased for males as the speaker's age decreased. Although the laryngeal degeneration due to increased age seems to occur to a lesser extent in females, the significant increase of CQ in elderly female voices could not be explained in terms of age-related physiological changes. In standard deviation of pitch and jitter, the mean values for young and elderly males were higher than that for young and elderly females. That is, male subjects showed higher in mean values of voice variables than female subjects. This result could be considered as a sign of vocal instability in males. It was suggested that these results may provide powerful insights into the control and regulation of normal phonation and into the detection and characterization of pathology.

  • PDF

Region-of-Interest Detection using the Energy from Vocal Fold Image (성대 영상에서 에너지를 이용한 관심 영역 추출)

  • Kim, Eom-Jun;Sung, Mee-Young
    • Journal of KIISE:Software and Applications
    • /
    • v.27 no.8
    • /
    • pp.804-814
    • /
    • 2000
  • In this paper, we propose an effective method to detect the regions of interests in the Videostrobokymography System. Videostrobokymography system is a medical image processing system for extracting automatically the diagnosis parameters from the irregular vibratory movements of the vocal fold. We detect the regions of interests through three steps. In the first step, we remove the noise in the input image and we find the minimum energy value in each frame. In the second step, we computed the edge by everage value for the one line. In the third step, the regions of interests can be extracted by using the Merge Algorithm which uses the variance of luminance as the feature points. We experimented this method for the vocal fold images of nineteen patients. In consequence, the regions of interests are detected in most vocal fold images. The method proposed in this study is efficient enough to extract the region of interests in the vocal fold images with the frame rate of 40 frames/second and the resolution of 200${\times}$280 pixels.

  • PDF

The Value of I-Scan Image-Enhanced Endoscopy in the Diagnosis of Vocal Cord Leukoplakia

  • Lee, Young Chan;Eun, Young-Gyu;Park, Il-Seok
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.29 no.2
    • /
    • pp.98-102
    • /
    • 2018
  • Background and Objectives : Detection of vascular abnormalities in vocal cord (VC) leukoplakia is important for the diagnosis of neoplastic change of the mucosa. The aim of this study was to investigate the value of i-scan in the differential diagnosis of VC leukoplakia based on visualization of abnormal vascular features. Material and Methods : Fifty-two patients with leukoplakia were enrolled in the study. Images of their larynx obtained using conventional white light endoscopy and an i-scan-enhanced endoscopy (Pentax DEFINA EPK-3000 Video Processors, with Pentax VNLJ10) were reviewed. The microvascular features of the lesions and vascular changes were analyzed and the results were compared with the histopathologic diagnosis. Results : Among the 52 leukoplakia patients, 7 (13.5%) patients had squamous hyperplasia, 10 (19.3%) mild dysplasia, 2 (3.8%) moderate dysplasia, 14 (26.9%) severe dysplasia, 4 (7.7%) carcinoma in situ, and 15 (28.8%) invasive squamous cell carcinoma on histopathologic examination. Using i-scan-enhanced endoscopy, abnormal vascular change with neoplastic neoangiogenesis was detected in most cases of malignant VC lesion [severe dysplasia : 9/14 (64.3%), carcinoma in situ: 2/4 (50.0%), and invasive squamous cell carcinoma : 11/15 (73.4%)]. Conclusion : i-scan-enhanced endoscopy is a useful optical technique for the diagnosis of VC leukoplakia. Our results suggest that i-scan may be a promising diagnostic tool in the early detection of laryngeal cancer.

Discrimination of Pathological Speech Using Hidden Markov Models

  • Wang, Jianglin;Jo, Cheol-Woo
    • Speech Sciences
    • /
    • v.13 no.3
    • /
    • pp.7-18
    • /
    • 2006
  • Diagnosis of pathological voice is one of the important issues in biomedical applications of speech technology. This study focuses on the discrimination of voice disorder using HMM (Hidden Markov Model) for automatic detection between normal voice and vocal fold disorder voice. This is a non-intrusive, non-expensive and fully automated method using only a speech sample of the subject. Speech data from normal people and patients were collected. Mel-frequency filter cepstral coefficients (MFCCs) were modeled by HMM classifier. Different states (3 states, 5 states and 7 states), 3 mixtures and left to right HMMs were formed. This method gives an accuracy of 93.8% for train data and 91.7% for test data in the discrimination of normal and vocal fold disorder voice for sustained /a/.

  • PDF

Usefullness of the Vibration Pick-Up in Detection of Pitch for Synchronization of Laryngeal Stroboscopy (후두 스트로보스코프 검사의 신호 동기화를 위한 진동 검출기의 유용성)

  • Lee, Jin-Choon;Lee, Byung-Joo;Wang, Soo-Geun;Roh, Jung-Hoon;Kwon, Sun-Bok;Jo, Cheol-Woo
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.18 no.1
    • /
    • pp.26-32
    • /
    • 2007
  • Objective and Background: Laryngeal stroboscope is an useful equipment in evaluation of vocal cord vibration and in early detection of mucosal lesion including invasive cancer of the vocal cord. Recently Lee et al. (2006) developed portable stroboscope using voice as synchronization signal. It has been frequently impaired ability to synchronize the flashes even in normal female. Authors tried to investigate various methods including vibration pick-up, microphone, laryngeal microphone, and contact microphone for development of simple and accurate method like electroglottograph signal. The purpose of this study was to estimate wheher the vibration pick-up is available and is consistent with the signal of EGG. Subjects and Methods: Authors compared the signals between EGG and noncontact method such as voice, contact methods including vibration pick-up, laryngeal microphone, and contact microphone in normal twenty adults (male 10 and female 10). The number of peak in one cycle was compared with the number of the peak in EGG, and the percent of phase difference in the peak was compared with EGG Also, authors tried to investigate which site of vibration pick-up was most effective for synchronization of stobo flashes. Three site including anterior neck below the cricoid cartilage, thyroid ala, and suprahyoid region were analysed. Results: Among various methods for synchronization of strobo flashes, vibration pick-up was most effective method in peak detection. And anterior neck below cricoid cartilage was the most available site of the vibration pick-up. Conclusion: Authors suggest that vibration pick-up is most available and effective method for synchronization of strobo flashes.

  • PDF

A Study on Number sounds Speaker recognition using the Pitch detection and the Fuzzified pattern (피치 검출과 퍼지화 패턴을 이용한 숫자음 화자 인식에 관한 연구)

  • 김연숙;김희주;김경재
    • Journal of the Korea Society of Computer and Information
    • /
    • v.8 no.3
    • /
    • pp.73-79
    • /
    • 2003
  • This paper proposes speaker recognition algorithm which includes both the pitch detection and the fuzzified pattern matching. This study utilizes pitch pattern using a pitch and speech parameter uses binary spectrum. In this paper. makes reference pattern using fuzzy membership function in order to include time variation width for non-utterance time and performs vocal track recognition of common character using fuzzified pattern matching.

  • PDF

A Study on Korean, English and Japanese Speaker Recognitions Using the Peak and Valley Pitch Detection and the Fuzzy Theory (PVPF방법과 퍼지 이론을 이용한 한국어, 영어 및 일본어 화자 인식에 관한 연구)

  • Kim, Yeon-Suk
    • The Transactions of the Korea Information Processing Society
    • /
    • v.6 no.2
    • /
    • pp.522-533
    • /
    • 1999
  • This paper proposes speaker recognition algorithm which includes both the pitch parameter and the fuzzy inference. This study proposes a pitch detection method PVPF(peak and valley pitch detection fuction) by means of comparing spectra which utilizes the transform characteristics between time and frequency. In this paper, makes reference pattern using membership function and performs vocal tract recognition of common character using fuzzy pattern matching in order to include time variation width for non-linear utterance time.

  • PDF

Pitch Detection by the Analysis of Speech and EGG Signals (2-채널 (음성 및 EGG) 신호 분석에 의한 피치검출)

  • Shin, Mu-Yong;Kim, Jeong-Cheol;Bae, Keun-Sung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.15 no.5
    • /
    • pp.5-12
    • /
    • 1996
  • We propose a two-channel(Speech & EGG) pitch detection algorithm. The EGG signal monitors the vibratory motion of vocal folds very well. Therefore, using the EGG signal as well as speech signal, we obtain a reliable and robust pitch detection algorithm that minimizers problems occuring in the pitch detection with speech only. The proposed algorithm gives precise pitch markers that are synchronized to the speech in the time domain. Experimental results demonstrate the superiority of the two-channel pitch detection algorithm over the conventional method, and it can be used in obtaining reference pitch for evaluation of other pitch detection algorithms.

  • PDF

Quantifying and Analyzing Vocal Emotion of COVID-19 News Speech Across Broadcasters in South Korea and the United States Based on CNN (한국과 미국 방송사의 코로나19 뉴스에 대해 CNN 기반 정량적 음성 감정 양상 비교 분석)

  • Nam, Youngja;Chae, SunGeu
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.2
    • /
    • pp.306-312
    • /
    • 2022
  • During the unprecedented COVID-19 outbreak, the public's information needs created an environment where they overwhelmingly consume information on the chronic disease. Given that news media affect the public's emotional well-being, the pandemic situation highlights the importance of paying particular attention to how news stories frame their coverage. In this study, COVID-19 news speech emotion from mainstream broadcasters in South Korea and the United States (US) were analyzed using convolutional neural networks. Results showed that neutrality was detected across broadcasters. However, emotions such as sadness and anger were also detected. This was evident in Korean broadcasters, whereas those emotions were not detected in the US broadcasters. This is the first quantitative vocal emotion analysis of COVID-19 news speech. Overall, our findings provide new insight into news emotion analysis and have broad implications for better understanding of the COVID-19 pandemic.