• Title/Summary/Keyword: Voice Training

Search Result 179, Processing Time 0.023 seconds

GMM Based Voice Conversion Using Kernel PCA (Kernel PCA를 이용한 GMM 기반의 음성변환)

  • Han, Joon-Hee;Bae, Jae-Hyun;Oh, Yung-Hwan
    • MALSORI
    • /
    • no.67
    • /
    • pp.167-180
    • /
    • 2008
  • This paper describes a novel spectral envelope conversion method based on Gaussian mixture model (GMM). The core of this paper is rearranging source feature vectors in input space to the transformed feature vectors in feature space for the better modeling of GMM of source and target features. The quality of statistical modeling is dependent on the distribution and the dimension of data. The proposed method transforms both of the distribution and dimension of data and gives us the chance to model the same data with different configuration. Because the converted feature vectors should be on the input space, only source feature vectors are rearranged in the feature space and target feature vectors remain unchanged for the joint pdf of source and target features using KPCA. The experimental result shows that the proposed method outperforms the conventional GMM-based conversion method in various training environment.

  • PDF

Korean Voice Phishing Text Classification Performance Analysis Using Machine Learning Techniques (머신러닝 기법을 이용한 한국어 보이스피싱 텍스트 분류 성능 분석)

  • Boussougou, Milandu Keith Moussavou;Jin, Sangyoon;Chang, Daeho;Park, Dong-Joo
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2021.11a
    • /
    • pp.297-299
    • /
    • 2021
  • Text classification is one of the popular tasks in Natural Language Processing (NLP) used to classify text or document applications such as sentiment analysis and email filtering. Nowadays, state-of-the-art (SOTA) Machine Learning (ML) and Deep Learning (DL) algorithms are the core engine used to perform these classification tasks with high accuracy, and they show satisfying results. This paper conducts a benchmarking performance's analysis of multiple SOTA algorithms on the first known labeled Korean voice phishing dataset called KorCCVi. Experimental results reveal performed on a test set of 366 samples reveal which algorithm performs the best considering the training time and metrics such as accuracy and F1 score.

Voice Command Web Browser Using Variable Vocabulary Word Recognizer (가변어휘 단어 인식기를 사용한 음성 명령 웹 브라우저)

  • 이항섭
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.2
    • /
    • pp.48-52
    • /
    • 1999
  • In this paper, we describe a Voice Command Web Browser using a variable vocabulary word recognizer that can do Internet surfing with Korean speech recognition on the Web. The feature of this browser is that it can handle the links and menus of the web browser by speech. Therefore, we can use speech interface together with mouse for web browsing. To recognize the recognition candidates dynamically changing according to Web pages, we use the variable vocabulary word recognizer. The recognizer was trained using POW (Phonetically Optimized Words) 3,848 words. So that it can recognize new words which did not exist in training data. The preliminary test results showed that the performance of speaker-independent and vocabulary-independent recognition is 93.8% for 32 Korean words. The Voice Command Web Browser was developed on windows 95/NT using Netscape Navigator and reflected usability test results in order to offer easy interface to users unfamiliar with speech interface. In on-line experiment of speaker-independent and environment-independent situation, Voice Command Web Browser showed recognition accuracy of 90%.

  • PDF

Application of Machine Learning on Voice Signals to Classify Body Mass Index - Based on Korean Adults in the Korean Medicine Data Center (머신러닝 기반 음성분석을 통한 체질량지수 분류 예측 - 한국 성인을 중심으로)

  • Kim, Junho;Park, Ki-Hyun;Kim, Ho-Seok;Lee, Siwoo;Kim, Sang-Hyuk
    • Journal of Sasang Constitutional Medicine
    • /
    • v.33 no.4
    • /
    • pp.1-9
    • /
    • 2021
  • Objectives The purpose of this study was to check whether the classification of the individual's Body Mass Index (BMI) could be predicted by analyzing the voice data constructed at the Korean medicine data center (KDC) using machine learning. Methods In this study, we proposed a convolutional neural network (CNN)-based BMI classification model. The subjects of this study were Korean adults who had completed voice recording and BMI measurement in 2006-2015 among the data established at the Korean Medicine Data Center. Among them, 2,825 data were used for training to build the model, and 566 data were used to assess the performance of the model. As an input feature of CNN, Mel-frequency cepstral coefficient (MFCC) extracted from vowel utterances was used. A model was constructed to predict a total of four groups according to gender and BMI criteria: overweight male, normal male, overweight female, and normal female. Results & Conclusions Performance evaluation was conducted using F1-score and Accuracy. As a result of the prediction for four groups, The average accuracy was 0.6016, and the average F1-score was 0.5922. Although it showed good performance in gender discrimination, it is judged that performance improvement through follow-up studies is necessary for distinguishing BMI within gender. As research on deep learning is active, performance improvement is expected through future research.

A Situational Training System for the food serving in the restaurant based on the Argumented Reality (증강 현실 기반 음식점 서빙 상황훈련 시스템)

  • Jung, Kwang-Il;Kim, Sung-Jin;Kim, Boo-Nyon;Kim, Tae-Young;Lim, Cheol-Su
    • Journal of Korea Game Society
    • /
    • v.9 no.1
    • /
    • pp.135-142
    • /
    • 2009
  • Nowadays, many interface devices or training systems for the disabled are being developed and introduced with the recent development in IT technology but only few training systems for the developmental disabled are introduced. In this paper, we present a situation training system based on the argumented reality in order to help the developmental disabled to increase their management level of capability to the certain situation. Our system is specifically based on the food serving in the restaurant. This maker-based system provides trainees to safely experience various different situations and take the training session under any circumstances. The trainees for this program are able to look around with the HMD on, take the training easily by following the voice instruction, and try situational scenario.

  • PDF

Recognition of the Korean alphabet Using Neural Oscillator Phase model Synchronization

  • Kwon, Yong-Bum;Lee, Jun-Tak
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2003.09a
    • /
    • pp.315-317
    • /
    • 2003
  • Neural oscillator is applied in oscillatory systems (Analysis of image information, Voice recognition. Etc...). If we apply established EBPA(Error back Propagation Algorithm) to oscillatory system, we are difficult to presume complicated input's patterns. Therefore, it requires more data at training, and approximation of convergent speed is difficult. In this paper, I studied the neural oscillator as synchronized states with appropriate phase relation between neurons and recognized the Korean alphabet using Neural Oscillator Phase model Synchronization.

  • PDF

The Effect of Visual Feedback Intervention on Voice Pitch of Adult with Hearing Impairment (선천성 청각장애성인의 시각적피드백 이용 음도치료 효과)

  • Euh, Su-Ji;Yoon, Mi-Sun
    • Speech Sciences
    • /
    • v.12 no.4
    • /
    • pp.215-226
    • /
    • 2005
  • This study is an attempt to investigate effect of pitch treatment program using visual feedback for profound deaf adults. Dr. Speech program was applied as a training tool. The subjects of this study were 3 profound deaf adults. Speech samples for evaluation were vowel prolongations and connected speech. Analysis was performed under the principle of single subject research design. As results of this study, all subjects showed the treatment effects which were represented by lowering fundamental frequency and speaking fundamental frequency.

  • PDF

A Proposal on the Development of Bioterrorism education for Public health personnel (보건관련학과의 생물테러교육 필요성에 대한 조사 및 교육현황)

  • Kim, Jee-Hee
    • 한국방재학회:학술대회논문집
    • /
    • 2008.02a
    • /
    • pp.393-394
    • /
    • 2008
  • Recently keeping pace with globalization, many international conferences and athletic games are being held in Korea. After 911 terror in New York in 2001, Korean government dispatched Zaytun Division in Iraq and this fact has also led to voice concerns that Korea should be prepared to protect from biological terrors as soon as possible. It is important to develop the bioterrorism emergency medical training for public health students including paramedic in Korea. So I propose the development of bioterrorism education curriculum.

  • PDF

Accurate Speech Detection based on Sub-band Selection for Robust Keyword Recognition (강인한 핵심어 인식을 위해 유용한 주파수 대역을 이용한 음성 검출기)

  • Ji Mikyong;Kim Hoirin
    • Proceedings of the KSPS conference
    • /
    • 2002.11a
    • /
    • pp.183-186
    • /
    • 2002
  • The speech detection is one of the important problems in real-time speech recognition. The accurate detection of speech boundaries is crucial to the performance of speech recognizer. In this paper, we propose a speech detector based on Mel-band selection through training. In order to show the excellence of the proposed algorithm, we compare it with a conventional one, so called, EPD-VAA (EndPoint Detector based on Voice Activity Detection). The proposed speech detector is trained in order to better extract keyword speech than other speech. EPD-VAA usually works well in high SNR but it doesn't work well any more in low SNR. But the proposed algorithm pre-selects useful bands through keyword training and decides the speech boundary according to the energy level of the sub-bands that is previously selected. The experimental result shows that the proposed algorithm outperforms the EPD-VAA.

  • PDF

A study on speech training aids for Deafs (청각장애자용 발음훈련기기 개발에 관한 연구)

  • Ahn, Sang-Pil;Lee, Jae-Hyuk;Yoon, Tae-Sung;Park, Sang-Hui
    • Proceedings of the KIEE Conference
    • /
    • 1990.07a
    • /
    • pp.47-50
    • /
    • 1990
  • Deafs cannot speak straight voice as normal people in lack of feedback of their pronunciation, therefore speech training is required. In this study, fundamental frequency, intensity, formant frequencies, vocal tract graphic and vocal tract area function, extracted from speech signal, are used as feature parameter. AR model, whose coefficients are extracted using inverse filtering. is used as speech generation model. In connect ion between vocal tract graphic and speech parameter, articulation distances and articulation distance functions in selected 15-intervals are determined by extracted vocal tract areas and formant frequencies.

  • PDF