• Title/Summary/Keyword: Vocal Extraction

Search Result 23, Processing Time 0.021 seconds

A Comparative Study on Formant Frequency Extraction Performances (포먼트 주파수 추출 알고리즘들의 성능 비교평가 연구)

  • Son Sungyung;Kim Sang-Jin;Kim YoungMin;Hahn Minsoo
    • Proceedings of the KSPS conference
    • /
    • 2003.05a
    • /
    • pp.141-144
    • /
    • 2003
  • In this paper, we compared formant frequency extraction algorithms with various conditions, and show their performances. The formant frequency is the resonance frequency which is decided by the vocal tract characteristics. It is related with phonemes, or characteristics of the physical condition of the vocal track. Since the speech signal is influenced by both the sound source and the vocal tract, it is difficult to calculate the exact formant frequencies. Many studies on the formant frequency extraction had been executed already Besides, any new formant frequency extraction algorithm is hardly found recently.

  • PDF

RECOGNITION SYSTEM USING VOCAL-CORD SIGNAL (성대 신호를 이용한 인식 시스템)

  • Cho, Kwan-Hyun;Han, Mun-Sung;Park, Jun-Seok;Jeong, Young-Gyu
    • Proceedings of the KIEE Conference
    • /
    • 2005.10b
    • /
    • pp.216-218
    • /
    • 2005
  • This paper present a new approach to a noise robust recognizer for WPS interface. In noisy environments, performance of speech recognition is decreased rapidly. To solve this problem, We propose the recognition system using vocal-cord signal instead of speech. Vocal-cord signal has low quality but it is more robust to environment noise than speech signal. As a result, we obtained 75.21% accuracy using MFCC with CMS and 83.72% accuracy using ZCPA with RASTA.

  • PDF

Vocal Enhancement for Improving the Performance of Vocal Pitch Detection (보컬 피치 검출의 성능 향상을 위한 보컬 강화 기술)

  • Lee, Se-Won;Song, Chai-Jong;Lee, Seok-Pil;Park, Ho-Chong
    • The Journal of the Acoustical Society of Korea
    • /
    • v.30 no.6
    • /
    • pp.353-359
    • /
    • 2011
  • This paper proposes a vocal enhancement technique for improving the performance of vocal pitch detection in polyphonic music signal. The proposed vocal enhancement technique predicts an accompaniment signal from the input signal and generates an accompaniment replica signal according to the vocal power. Then, it removes the accompaniment replica signal from the input signal, resulting in a vocal-enhanced signal. The performance of the proposed method was measured by applying the same vocal pitch extraction method to the original and the vocal-enhanced signal, and the vocal pitch detection accuracy was increased by 7.1 % point in average.

A Study on Vocal Separation from Mixtured Music

  • Kim, Hyun-Tae;Park, Jang-Sik
    • Journal of information and communication convergence engineering
    • /
    • v.9 no.2
    • /
    • pp.161-165
    • /
    • 2011
  • Recently, According to increasing interest to original sound Karaoke instrument, MIDI type karaoke manufacturer attempt to make more cheap method instead of original recoding method. Separating technique for singing voice from music accompaniment is very useful in such equipment. We propose a system to separate singing voice from music accompaniment for stereo recordings. Our system consists of three stages. The first stage is a spectral change detector. The second stage classifies an input into vocal and non vocal portions by using GMM classifier. The last stage is a selective frequency separation stage. The results of removed by listening test from the results for computer based extraction simulation, spectrogram results show separation task successfully. Listening test with extracted MR from proposed system show vocal separating and removal task successfully.

Prediction of Closed Quotient During Vocal Phonation using GRU-type Neural Network with Audio Signals

  • Hyeonbin Han;Keun Young Lee;Seong-Yoon Shin;Yoseup Kim;Gwanghyun Jo;Jihoon Park;Young-Min Kim
    • Journal of information and communication convergence engineering
    • /
    • v.22 no.2
    • /
    • pp.145-152
    • /
    • 2024
  • Closed quotient (CQ) represents the time ratio for which the vocal folds remain in contact during voice production. Because analyzing CQ values serves as an important reference point in vocal training for professional singers, these values have been measured mechanically or electrically by either inverse filtering of airflows captured by a circumferentially vented mask or post-processing of electroglottography waveforms. In this study, we introduced a novel algorithm to predict the CQ values only from audio signals. This has eliminated the need for mechanical or electrical measurement techniques. Our algorithm is based on a gated recurrent unit (GRU)-type neural network. To enhance the efficiency, we pre-processed an audio signal using the pitch feature extraction algorithm. Then, GRU-type neural networks were employed to extract the features. This was followed by a dense layer for the final prediction. The Results section reports the mean square error between the predicted and real CQ. It shows the capability of the proposed algorithm to predict CQ values.

Correlation Analysis Between Vocal Fold Vibration and Voice Signal Analysis Parameter by Water Temperature (수온에 따른 성대 진동과 음성신호 분석 요소간의 상관성 분석)

  • Kim, Bong-Hyun;Cho, Dong-Uk
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.37 no.4C
    • /
    • pp.347-353
    • /
    • 2012
  • In this paper, we carried out experiments to analyze influence of vocal cords according to changes of water temperature. We would like to particularly perform a study to design voice measurement system for significant extraction about vibration patterns of vocal cords according to temperature changes of water to drink. To this end, we measured elements value of voice analysis vibration of vocal cords to change, when drank, temperature difference of step 8 from $0^{\circ}C$ to $70^{\circ}C$ to $10^{\circ}C$ intervals. As a result of us experiment, when drank water of $30^{\circ}C{\sim}40^{\circ}C$, vibration of vocal cords stabilized and accuracy of pronunciation improved. We can analyzed that water of $30^{\circ}C{\sim}40^{\circ}C$ had a good effect in vocal cords.

Modified Mel Frequency Cepstral Coefficient for Korean Children's Speech Recognition (한국어 유아 음성인식을 위한 수정된 Mel 주파수 캡스트럼)

  • Yoo, Jae-Kwon;Lee, Kyoung-Mi
    • The Journal of the Korea Contents Association
    • /
    • v.13 no.3
    • /
    • pp.1-8
    • /
    • 2013
  • This paper proposes a new feature extraction algorithm to improve children's speech recognition in Korean. The proposed feature extraction algorithm combines three methods. The first method is on the vocal tract length normalization to compensate acoustic features because the vocal tract length in children is shorter than in adults. The second method is to use the uniform bandwidth because children's voice is centered on high spectral regions. Finally, the proposed algorithm uses a smoothing filter for a robust speech recognizer in real environments. This paper shows the new feature extraction algorithm improves the children's speech recognition performance.

A Study on Extraction of Vocal Tract Characteristic After Canceling the Vocal Cord Property Using the Line Spectrum Pairs (선형 스펙트럼쌍을 이용한 성문특성이 제거된 성도특성 추출법에 관한 연구)

  • 민소연;장경아;배명진
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.7
    • /
    • pp.665-670
    • /
    • 2002
  • The most common form of pre-emphasis is y(n)=s(n)-As(n-1), where A typically lies between 0.9 and 1.0 in voiced signal. Also, this value reflects the degree of pre-emphasis and equals R(1)/R(0) in conventional method. This paper proposes a new flattening method to compensate the weaked high frequency components that occur by vocal cord characteristic. We used interval information of LSP to estimate formant frequency, After obtaining the value of slope and inverse slope using linear interpolation among formant frequency, flattening process is followed. Experimental results show that the proposed method flattened the weaked high frequency components effectively. That is, we could improve the flattening characteristics by using interval information of LSP as flattening factor at the process that compensates weaked high frequency components.

A system for recommending audio devices based on frequency band analysis of vocal component in sound source (음원 내 보컬 주파수 대역 분석에 기반한 음향기기 추천시스템)

  • Jeong-Hyun, Kim;Cheol-Min, Seok;Min-Ju, Kim;Su-Yeon, Kim
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.27 no.6
    • /
    • pp.1-12
    • /
    • 2022
  • As the music streaming service and the Hi-Fi market grow, various audio devices are being released. As a result, consumers have a wider range of product choices, but it has become more difficult to find products that match their musical tastes. In this study, we proposed a system that extracts the vocal component from the user's preferred sound source and recommends the most suitable audio device to the user based on this information. To achieve this, first, the original sound source was separated using Python's Spleeter Library, the vocal sound source was extracted, and the result of collecting frequency band data of manufacturers' audio devices was shown in a grid graph. The Matching Gap Index (MGI) was proposed as an indicator for comparing the frequency band of the extracted vocal sound source and the measurement data of the frequency band of the audio devices. Based on the calculated MGI value, the audio device with the highest similarity with the user's preference is recommended. The recommendation results were verified using equalizer data for each genre provided by sound professional companies.

Improvement of Vocal Detection Accuracy Using Convolutional Neural Networks

  • You, Shingchern D.;Liu, Chien-Hung;Lin, Jia-Wei
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.2
    • /
    • pp.729-748
    • /
    • 2021
  • Vocal detection is one of the fundamental steps in musical information retrieval. Typically, the detection process consists of feature extraction and classification steps. Recently, neural networks are shown to outperform traditional classifiers. In this paper, we report our study on how to improve detection accuracy further by carefully choosing the parameters of the deep network model. Through experiments, we conclude that a feature-classifier model is still better than an end-to-end model. The recommended model uses a spectrogram as the input plane and the classifier is an 18-layer convolutional neural network (CNN). With this arrangement, when compared with existing literature, the proposed model improves the accuracy from 91.8% to 94.1% in Jamendo dataset. As the dataset has an accuracy of more than 90%, the improvement of 2.3% is difficult and valuable. If even higher accuracy is required, the ensemble learning may be used. The recommend setting is a majority vote with seven proposed models. Doing so, the accuracy increases by about 1.1% in Jamendo dataset.