• Title/Summary/Keyword: Voice Training

Search Result 177, Processing Time 0.023 seconds

Classification of muscle tension dysphonia (MTD) female speech and normal speech using cepstrum variables and random forest algorithm (켑스트럼 변수와 랜덤포레스트 알고리듬을 이용한 MTD(근긴장성 발성장애) 여성화자 음성과 정상음성 분류)

  • Yun, Joowon;Shim, Heejeong;Seong, Cheoljae
    • Phonetics and Speech Sciences
    • /
    • v.12 no.4
    • /
    • pp.91-98
    • /
    • 2020
  • This study investigated the acoustic characteristics of sustained vowel /a/ and sentence utterance produced by patients with muscle tension dysphonia (MTD) using cepstrum-based acoustic variables. 36 women diagnosed with MTD and the same number of women with normal voice participated in the study and the data were recorded and measured by ADSVTM. The results demonstrated that cepstral peak prominence (CPP) and CPP_F0 among all of the variables were statistically significantly lower than those of control group. When it comes to the GRBAS scale, overall severity (G) was most prominent, and roughness (R), breathiness (B), and strain (S) indices followed in order in the voice quality of MTD patients. As these characteristics increased, a statistically significant negative correlation was observed in CPP. We tried to classify MTD and control group using CPP and CPP_F0 variables. As a result of statistic modeling with a Random Forest machine learning algorithm, much higher classification accuracy (100% in training data and 83.3% in test data) was found in the sentence reading task, with CPP being proved to be playing a more crucial role in both vowel and sentence reading tasks.

DNN based Robust Speech Feature Extraction and Signal Noise Removal Method Using Improved Average Prediction LMS Filter for Speech Recognition (음성 인식을 위한 개선된 평균 예측 LMS 필터를 이용한 DNN 기반의 강인한 음성 특징 추출 및 신호 잡음 제거 기법)

  • Oh, SangYeob
    • Journal of Convergence for Information Technology
    • /
    • v.11 no.6
    • /
    • pp.1-6
    • /
    • 2021
  • In the field of speech recognition, as the DNN is applied, the use of speech recognition is increasing, but the amount of calculation for parallel training needs to be larger than that of the conventional GMM, and if the amount of data is small, overfitting occurs. To solve this problem, we propose an efficient method for robust voice feature extraction and voice signal noise removal even when the amount of data is small. Speech feature extraction efficiently extracts speech energy by applying the difference in frame energy for speech and the zero-crossing ratio and level-crossing ratio that are affected by the speech signal. In addition, in order to remove noise, the noise of the speech signal is removed by removing the noise of the speech signal with an average predictive improved LMS filter with little loss of speech information while maintaining the intrinsic characteristics of speech in detection of the speech signal. The improved LMS filter uses a method of processing noise on the input speech signal by adjusting the active parameter threshold for the input signal. As a result of comparing the method proposed in this paper with the conventional frame energy method, it was confirmed that the error rate at the start point of speech is 7% and the error rate at the end point is improved by 11%.

A Study of VR Interaction for Non-contact Hair Styling (비대면 헤어 스타일링 재현을 위한 VR 인터렉션 연구)

  • Park, Sungjun;Yoo, Sangwook;Chin, Seongah
    • The Journal of the Convergence on Culture Technology
    • /
    • v.8 no.2
    • /
    • pp.367-372
    • /
    • 2022
  • With the recent advent of the New Normal era, realistic technologies and non-contact technologies are receiving social attention. However, the hair styling field focuses on the direction of the hair itself, individual movements, and modeling, focusing on hair simulation. In order to create an improved practice environment and demand of the times, this study proposed a non-contact hair styling VR system. In the theoretical review, we studied the existing cases of hair cut research. Existing haircut-related research tend to be mainly focused on force-based feedback. Research on the interactive haircut work in the virtual environment as addressed in this paper has not been done yet. VR controllers capable of finger tracking the movements necessary for beauty enable selection, cutting, and rotation of beauty tools, and built a non-contact collaboration environment. As a result, we conducted two experiments for interactive hair cutting in VR. First, it is a haircut operation for synchronization using finger tracking and holding hook animation. We made position correction for accurate motion. Second, it is a real-time interactive cutting operation in a multi-user virtual collaboration environment. This made it possible for instructors and learners to communicate with each other through VR HMD built-in microphones and Photon Voice in non-contact situations.

A study on deep neural speech enhancement in drone noise environment (드론 소음 환경에서 심층 신경망 기반 음성 향상 기법 적용에 관한 연구)

  • Kim, Jimin;Jung, Jaehee;Yeo, Chaneun;Kim, Wooil
    • The Journal of the Acoustical Society of Korea
    • /
    • v.41 no.3
    • /
    • pp.342-350
    • /
    • 2022
  • In this paper, actual drone noise samples are collected for speech processing in disaster environments to build noise-corrupted speech database, and speech enhancement performance is evaluated by applying spectrum subtraction and mask-based speech enhancement techniques. To improve the performance of VoiceFilter (VF), an existing deep neural network-based speech enhancement model, we apply the Self-Attention operation and use the estimated noise information as input to the Attention model. Compared to existing VF model techniques, the experimental results show 3.77%, 1.66% and 0.32% improvements for Source to Distortion Ratio (SDR), Perceptual Evaluation of Speech Quality (PESQ), and Short-Time Objective Intelligence (STOI), respectively. When trained with a 75% mix of speech data with drone sounds collected from the Internet, the relative performance drop rates for SDR, PESQ, and STOI are 3.18%, 2.79% and 0.96%, respectively, compared to using only actual drone noise. This confirms that data similar to real data can be collected and effectively used for model training for speech enhancement in environments where real data is difficult to obtain.

Accelerometer-based Gesture Recognition for Robot Interface (로봇 인터페이스 활용을 위한 가속도 센서 기반 제스처 인식)

  • Jang, Min-Su;Cho, Yong-Suk;Kim, Jae-Hong;Sohn, Joo-Chan
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.1
    • /
    • pp.53-69
    • /
    • 2011
  • Vision and voice-based technologies are commonly utilized for human-robot interaction. But it is widely recognized that the performance of vision and voice-based interaction systems is deteriorated by a large margin in the real-world situations due to environmental and user variances. Human users need to be very cooperative to get reasonable performance, which significantly limits the usability of the vision and voice-based human-robot interaction technologies. As a result, touch screens are still the major medium of human-robot interaction for the real-world applications. To empower the usability of robots for various services, alternative interaction technologies should be developed to complement the problems of vision and voice-based technologies. In this paper, we propose the use of accelerometer-based gesture interface as one of the alternative technologies, because accelerometers are effective in detecting the movements of human body, while their performance is not limited by environmental contexts such as lighting conditions or camera's field-of-view. Moreover, accelerometers are widely available nowadays in many mobile devices. We tackle the problem of classifying acceleration signal patterns of 26 English alphabets, which is one of the essential repertoires for the realization of education services based on robots. Recognizing 26 English handwriting patterns based on accelerometers is a very difficult task to take over because of its large scale of pattern classes and the complexity of each pattern. The most difficult problem that has been undertaken which is similar to our problem was recognizing acceleration signal patterns of 10 handwritten digits. Most previous studies dealt with pattern sets of 8~10 simple and easily distinguishable gestures that are useful for controlling home appliances, computer applications, robots etc. Good features are essential for the success of pattern recognition. To promote the discriminative power upon complex English alphabet patterns, we extracted 'motion trajectories' out of input acceleration signal and used them as the main feature. Investigative experiments showed that classifiers based on trajectory performed 3%~5% better than those with raw features e.g. acceleration signal itself or statistical figures. To minimize the distortion of trajectories, we applied a simple but effective set of smoothing filters and band-pass filters. It is well known that acceleration patterns for the same gesture is very different among different performers. To tackle the problem, online incremental learning is applied for our system to make it adaptive to the users' distinctive motion properties. Our system is based on instance-based learning (IBL) where each training sample is memorized as a reference pattern. Brute-force incremental learning in IBL continuously accumulates reference patterns, which is a problem because it not only slows down the classification but also downgrades the recall performance. Regarding the latter phenomenon, we observed a tendency that as the number of reference patterns grows, some reference patterns contribute more to the false positive classification. Thus, we devised an algorithm for optimizing the reference pattern set based on the positive and negative contribution of each reference pattern. The algorithm is performed periodically to remove reference patterns that have a very low positive contribution or a high negative contribution. Experiments were performed on 6500 gesture patterns collected from 50 adults of 30~50 years old. Each alphabet was performed 5 times per participant using $Nintendo{(R)}$ $Wii^{TM}$ remote. Acceleration signal was sampled in 100hz on 3 axes. Mean recall rate for all the alphabets was 95.48%. Some alphabets recorded very low recall rate and exhibited very high pairwise confusion rate. Major confusion pairs are D(88%) and P(74%), I(81%) and U(75%), N(88%) and W(100%). Though W was recalled perfectly, it contributed much to the false positive classification of N. By comparison with major previous results from VTT (96% for 8 control gestures), CMU (97% for 10 control gestures) and Samsung Electronics(97% for 10 digits and a control gesture), we could find that the performance of our system is superior regarding the number of pattern classes and the complexity of patterns. Using our gesture interaction system, we conducted 2 case studies of robot-based edutainment services. The services were implemented on various robot platforms and mobile devices including $iPhone^{TM}$. The participating children exhibited improved concentration and active reaction on the service with our gesture interface. To prove the effectiveness of our gesture interface, a test was taken by the children after experiencing an English teaching service. The test result showed that those who played with the gesture interface-based robot content marked 10% better score than those with conventional teaching. We conclude that the accelerometer-based gesture interface is a promising technology for flourishing real-world robot-based services and content by complementing the limits of today's conventional interfaces e.g. touch screen, vision and voice.

Convergence Development of Video and E-learning System for Education Disabled Students (장애학생의 학습을 위한 화상과 이러닝 시스템의 융합 개발)

  • Son, Yeob-Myeong;Jung, Byeong-Soo
    • Journal of the Korea Convergence Society
    • /
    • v.6 no.4
    • /
    • pp.113-119
    • /
    • 2015
  • Currently, we are presenting an alternative educational environment for the normal student of education rules failure of the only that has been the school system student. The study for students with disabilities, it is designed especially to be able to use difficult disabilities the use of hand. Development objectives of the learning video e-learning system of persons with disabilities, is that to be able to capable of self-directed learning of disabled students. Configuration of e-running system, Web-based multimedia system, utilizing the system that will change the video conferencing system and voice to a character hearing impaired students through the chat system is 1:1 by communication, and teachers it is possible to perform two-way communication. A learning disability e-learning system developed in this paper between teachers and students with disabilities 1:1 training is conducted using a two-way communication algorithms.

Development of Case-based Multimedia Learning Contents for Preventing Malpractice in Operating Room (수술실 간호오류 예방을 위한 사례중심 멀티미디어 학습콘텐츠 개발)

  • Park, Ji Myung;Hwang, Seon Young
    • The Journal of the Korea Contents Association
    • /
    • v.16 no.10
    • /
    • pp.522-532
    • /
    • 2016
  • Purpose: This study was conducted to develop case-based self-learning multimedia contents for preventing malpractice that frequently occurred among nurses working in the operating room. Methods: Based on the learning needs of operating room nurses, real case reports, and literature reviews, the case-based multimedia learning contents were developed according to the instructional design procedure. The assessment of learning needs was performed by the combination of surveys using structured questionnaire and of interviews for 40 operating room nurses. Results: The learning contents included four learning modules with real malpractice cases from the areas of operating preparation, nursing skills during operation, environmental management of operating room, and patient safety and observation-related. The 80 minute long case-based multimedia learning contents were finally developed after content validity tests from clinical experts. Each module contained photos, sounds and flash animation with voice recording on the contents of nursing error cases and standardized protocols. Conclusion: The developed multimedia learning contents based on real error cases in this study can be utilized as an educational hands-on training materials for nurses to prevent malpractice in the operating room.

An Explorative Study on Development Direction of a Mobile Fitness App Game Associated with Smart Fitness Wear (스마트 피트니스 웨어 연동형 모바일 피트니스 앱 게임의 개발 방향 탐색)

  • Park, Su Youn;Lee, Joo Hyeon
    • Journal of Digital Contents Society
    • /
    • v.19 no.7
    • /
    • pp.1225-1235
    • /
    • 2018
  • In this study, as a part of practical and customized smart contents development planning research related to smart fitness contents associated with smart wear that can monitor physical activity, we investigated the potential needs for smart fitness contents through research. As a result, the potential needs for smart fitness contents is 'accessibility to use', 'inducement of interest', 'diverse story line' were derived at the stage of 'before exercise', 'Real - time voice coaching', 'accurate exercise posture monitoring', and 'personalized exercise prescription' were derived at the stage of 'during exercise'. At the stage of 'after exercise', 'substantial reward system', 'grading system', 'body figure change monitoring' and 'everyday life monitoring' were derived. At the stage of 'connection to the next exercise', 'triggering exercise motivation', 'high sustainability' wear derived.

A Machine Learning Approach for Stress Status Identification of Early Childhood by Using Bio-Signals (생체신호를 활용한 학습기반 영유아 스트레스 상태 식별 모델 연구)

  • Jeon, Yu-Mi;Han, Tae Seong;Kim, Kwanho
    • The Journal of Society for e-Business Studies
    • /
    • v.22 no.2
    • /
    • pp.1-18
    • /
    • 2017
  • Recently, identification of the extremely stressed condition of children is an essential skill for real-time recognition of a dangerous situation because incidents of children have been dramatically increased. In this paper, therefore, we present a model based on machine learning techniques for stress status identification of a child by using bio-signals such as voice and heart rate that are major factors for presenting a child's emotion. In addition, a smart band for collecting such bio-signals and a mobile application for monitoring child's stress status are also suggested. Specifically, the proposed method utilizes stress patterns of children that are obtained in advance for the purpose of training stress status identification model. Then, the model is used to predict the current stress status for a child and is designed based on conventional machine learning algorithms. The experiment results conducted by using a real-world dataset showed that the possibility of automated detection of a child's stress status with a satisfactory level of accuracy. Furthermore, the research results are expected to be used for preventing child's dangerous situations.

Lip-reading System based on Bayesian Classifier (베이지안 분류를 이용한 립 리딩 시스템)

  • Kim, Seong-Woo;Cha, Kyung-Ae;Park, Se-Hyun
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.25 no.4
    • /
    • pp.9-16
    • /
    • 2020
  • Pronunciation recognition systems that use only video information and ignore voice information can be applied to various customized services. In this paper, we develop a system that applies a Bayesian classifier to distinguish Korean vowels via lip shapes in images. We extract feature vectors from the lip shapes of facial images and apply them to the designed machine learning model. Our experiments show that the system's recognition rate is 94% for the pronunciation of 'A', and the system's average recognition rate is approximately 84%, which is higher than that of the CNN tested for comparison. Our results show that our Bayesian classification method with feature values from lip region landmarks is efficient on a small training set. Therefore, it can be used for application development on limited hardware such as mobile devices.