• Title/Summary/Keyword: Voice training method

Search Result 52, Processing Time 0.028 seconds

Motion Study of Treatment Robot for Autistic Children Using Speech Data Classification Based on Artificial Neural Network (음성 분류 인공신경망을 활용한 자폐아 치료용 로봇의 지능화 동작 연구)

  • Lee, Jin-Gyu;Lee, Bo-Hee
    • Journal of IKEEE
    • /
    • v.23 no.4
    • /
    • pp.1440-1447
    • /
    • 2019
  • Currently, the prevalence of autism spectrum disorders in children is reported to be higher and shows various types of disorders. In particular, they are having difficulty in communication due to communication impairment in the area of social communication and need to be improved through training. Thus, this study proposes a method of acquiring voice information through a microphone mounted on a robot designed through preliminary research and using this information to make intelligent motions. An ANN(Artificial Neural Network) was used to classify the speech data into robot motions, and we tried to improve the accuracy by combining the Recurrent Neural Network based on Convolutional Neural Network. The preprocessing of input speech data was analyzed using MFCC(Mel-Frequency Cepstral Coefficient), and the motion of the robot was estimated using various data normalization and neural network optimization techniques. In addition, the designed ANN showed a high accuracy by conducting an experiment comparing the accuracy with the existing architecture and the method of human intervention. In order to design robot motions with higher accuracy in the future and to apply them in the treatment and education environment of children with autism.

Laryngeal Findings and Phonetic Characteristics in Prelingually Deaf Patients (언어습득기 이전 청각장애인의 후두소견 및 음성학적 특성)

  • Kim, Seong-Tae;Yoon, Tae-Hyun;Kim, Sang-Yoon;Choi, Seung-Ho;Nam, Soon-Yuhl
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.20 no.1
    • /
    • pp.57-62
    • /
    • 2009
  • Background and Objectives : There are few studies reported that specifically examine the laryngeal function in patients with profound hearing loss or deafness, This study was designed to examine videostroboscopic findings and phonetic characteristics in adult patients with prelingually deaf. Materials and Method: Sixteen patients (seven males, nine females) diagnosed as prelingually deaf aged from 19 to 54 years, and were compared with a 20 normal control group with no laryngeal pathology and normal hearing group, Videostroboscopic evaluations were rated by experienced judges on various parameters describing the structure and function of the laryngeal mechanism during comfortable pitch and loudness phonations. Acoustic analysis test were done, and a nasalance test performed to measure rabbit, baby, and mother passage. CSL were measured to determine the first and two formant frequencies of vowels /a/, /i/, /u/, Statistical analysis was done using Mann-Whitney U or Wilcoxon signed ranks test. Results: Videostroboscopic findings showed phase symmetry but significantly more occurrences decrement in the amplitude of vibration, mucosal wave, irregularity of the vibration and increased glottal gap size during the closed phase of phonation, In addition, group of prelingually deaf patients were observed to have significantly more occurrences of abnormal supraglottic activities during phonation. The percentage of shimmer in the group of prelingually deaf patients were higher than in the control group. Characteristics of vowels were lower of the second formant of the vowel /i/. Nasalance in prelingually deaf patients showed normal nasality for all passages, Conclusion: Prelingually deaf patients show stroboscopic abnormal findings without any mucosal lesion, suggesting that they have considerable functional voice disorder. We suggest that prelingually deaf adults should perform vocal training for normalized laryngeal function after cochlear implantation.

  • PDF

A Machine Learning Approach for Stress Status Identification of Early Childhood by Using Bio-Signals (생체신호를 활용한 학습기반 영유아 스트레스 상태 식별 모델 연구)

  • Jeon, Yu-Mi;Han, Tae Seong;Kim, Kwanho
    • The Journal of Society for e-Business Studies
    • /
    • v.22 no.2
    • /
    • pp.1-18
    • /
    • 2017
  • Recently, identification of the extremely stressed condition of children is an essential skill for real-time recognition of a dangerous situation because incidents of children have been dramatically increased. In this paper, therefore, we present a model based on machine learning techniques for stress status identification of a child by using bio-signals such as voice and heart rate that are major factors for presenting a child's emotion. In addition, a smart band for collecting such bio-signals and a mobile application for monitoring child's stress status are also suggested. Specifically, the proposed method utilizes stress patterns of children that are obtained in advance for the purpose of training stress status identification model. Then, the model is used to predict the current stress status for a child and is designed based on conventional machine learning algorithms. The experiment results conducted by using a real-world dataset showed that the possibility of automated detection of a child's stress status with a satisfactory level of accuracy. Furthermore, the research results are expected to be used for preventing child's dangerous situations.

Lip-reading System based on Bayesian Classifier (베이지안 분류를 이용한 립 리딩 시스템)

  • Kim, Seong-Woo;Cha, Kyung-Ae;Park, Se-Hyun
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.25 no.4
    • /
    • pp.9-16
    • /
    • 2020
  • Pronunciation recognition systems that use only video information and ignore voice information can be applied to various customized services. In this paper, we develop a system that applies a Bayesian classifier to distinguish Korean vowels via lip shapes in images. We extract feature vectors from the lip shapes of facial images and apply them to the designed machine learning model. Our experiments show that the system's recognition rate is 94% for the pronunciation of 'A', and the system's average recognition rate is approximately 84%, which is higher than that of the CNN tested for comparison. Our results show that our Bayesian classification method with feature values from lip region landmarks is efficient on a small training set. Therefore, it can be used for application development on limited hardware such as mobile devices.

A Study on the Weather Support Service for Winter Sports (동계스포츠 맞춤형 기상지원 서비스를 위한 연구)

  • Back, Jin-Ho;Panday, Siddhartha Bikram;Lee, Ju-Sung;Kang, Hyo-Min
    • Journal of Korea Entertainment Industry Association
    • /
    • v.13 no.1
    • /
    • pp.139-156
    • /
    • 2019
  • The purpose of this study was to provide a method to support customized weather and environmental information services for the successful operation of winter sporting events. First, individual in-depth interviews and surveys were conducted with athletes, coaching staffs and experts related to the competition for 10 different winter sports for analysis of their needs. We conducted face-to-face survey and survey considering the training schedule and situation of experts. The recorded voice file was converted into word text, and extracted the weather and environmental information elements embedded in the opinions of the research participants based on literature reviews and data. The findings are expected to provide basic data on the weather conditions required to support specialized weather information for future large winter sports events, including the PyeongChang Winter Olympics.

RPCA-GMM for Speaker Identification (화자식별을 위한 강인한 주성분 분석 가우시안 혼합 모델)

  • 이윤정;서창우;강상기;이기용
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.7
    • /
    • pp.519-527
    • /
    • 2003
  • Speech is much influenced by the existence of outliers which are introduced by such an unexpected happenings as additive background noise, change of speaker's utterance pattern and voice detection errors. These kinds of outliers may result in severe degradation of speaker recognition performance. In this paper, we proposed the GMM based on robust principal component analysis (RPCA-GMM) using M-estimation to solve the problems of both ouliers and high dimensionality of training feature vectors in speaker identification. Firstly, a new feature vector with reduced dimension is obtained by robust PCA obtained from M-estimation. The robust PCA transforms the original dimensional feature vector onto the reduced dimensional linear subspace that is spanned by the leading eigenvectors of the covariance matrix of feature vector. Secondly, the GMM with diagonal covariance matrix is obtained from these transformed feature vectors. We peformed speaker identification experiments to show the effectiveness of the proposed method. We compared the proposed method (RPCA-GMM) with transformed feature vectors to the PCA and the conventional GMM with diagonal matrix. Whenever the portion of outliers increases by every 2%, the proposed method maintains almost same speaker identification rate with 0.03% of little degradation, while the conventional GMM and the PCA shows much degradation of that by 0.65% and 0.55%, respectively This means that our method is more robust to the existence of outlier.

Speech Recognition Using Linear Discriminant Analysis and Common Vector Extraction (선형 판별분석과 공통벡터 추출방법을 이용한 음성인식)

  • 남명우;노승용
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.4
    • /
    • pp.35-41
    • /
    • 2001
  • This paper describes Linear Discriminant Analysis and common vector extraction for speech recognition. Voice signal contains psychological and physiological properties of the speaker as well as dialect differences, acoustical environment effects, and phase differences. For these reasons, the same word spelled out by different speakers can be very different heard. This property of speech signal make it very difficult to extract common properties in the same speech class (word or phoneme). Linear algebra method like BT (Karhunen-Loeve Transformation) is generally used for common properties extraction In the speech signals, but common vector extraction which is suggested by M. Bilginer et at. is used in this paper. The method of M. Bilginer et al. extracts the optimized common vector from the speech signals used for training. And it has 100% recognition accuracy in the trained data which is used for common vector extraction. In spite of these characteristics, the method has some drawback-we cannot use numbers of speech signal for training and the discriminant information among common vectors is not defined. This paper suggests advanced method which can reduce error rate by maximizing the discriminant information among common vectors. And novel method to normalize the size of common vector also added. The result shows improved performance of algorithm and better recognition accuracy of 2% than conventional method.

  • PDF

Comparative study of data augmentation methods for fake audio detection (음성위조 탐지에 있어서 데이터 증강 기법의 성능에 관한 비교 연구)

  • KwanYeol Park;Il-Youp Kwak
    • The Korean Journal of Applied Statistics
    • /
    • v.36 no.2
    • /
    • pp.101-114
    • /
    • 2023
  • The data augmentation technique is effectively used to solve the problem of overfitting the model by allowing the training dataset to be viewed from various perspectives. In addition to image augmentation techniques such as rotation, cropping, horizontal flip, and vertical flip, occlusion-based data augmentation methods such as Cutmix and Cutout have been proposed. For models based on speech data, it is possible to use an occlusion-based data-based augmentation technique after converting a 1D speech signal into a 2D spectrogram. In particular, SpecAugment is an occlusion-based augmentation technique for speech spectrograms. In this study, we intend to compare and study data augmentation techniques that can be used in the problem of false-voice detection. Using data from the ASVspoof2017 and ASVspoof2019 competitions held to detect fake audio, a dataset applied with Cutout, Cutmix, and SpecAugment, an occlusion-based data augmentation method, was trained through an LCNN model. All three augmentation techniques, Cutout, Cutmix, and SpecAugment, generally improved the performance of the model. In ASVspoof2017, Cutmix, in ASVspoof2019 LA, Mixup, and in ASVspoof2019 PA, SpecAugment showed the best performance. In addition, increasing the number of masks for SpecAugment helps to improve performance. In conclusion, it is understood that the appropriate augmentation technique differs depending on the situation and data.

A Study on The Adoption of Drama for Improving Early Childhood Teacher's Artistic Competence (유아교사의 예술적 역량 함양을 위한 교육연극 활용에 관한 고찰)

  • Kim, Ji-Youn;Kim, Su-youn
    • (The) Research of the performance art and culture
    • /
    • no.41
    • /
    • pp.69-92
    • /
    • 2020
  • This study describes the impact of early childhood teacher's artistic competence on art education pedagogy and improved curriculum design. Furthermore, the effect of drama as a way of improving early childhood teacher's artistic competence is explained. Many researchers have mentioned that early childhood is a period of sensitivity and potential. Therefore, it will be helpful if children meet a teacher who understands them and inspires their innate artistic sense at a level of their eyes. It explained which aspect of artistic competence should be focused for the teacher training education. There are many approaches to develop early childhood teachers' artistic competence. Adopting drama is one of them. The strong points of drama to improve their artistic competence are as follows. Firstly, human's movement and voice are the main artistic channel in drama. What we are doing in daily life is found are drama world. It means if early childhood teachers experience drama activity, they will feel more comfortable and intimate with it. In addition, early childhood teachers tend to be familiar with dramatic play, so they can more easily access to drama world. Secondly, drama will be helpful to understand different feelings and to broaden and deepen understandings of others' standpoints. For early childhood teachers, drama activity will be helpful to understand how dramatic art form works and to lead children's play in diversified and sincere way. In addition, drama activity will be useful to build horizontal and democratic relationships between children and the teacher. It is one of the main emphases of 2019 revised Nori national curriculum. To sum up, drama will be a excellent method to develop artistic competence for early childhood teachers. Thus, it is expected that They have more opportunities to experience drama as an art form.

Increasing Accuracy of Stock Price Pattern Prediction through Data Augmentation for Deep Learning (데이터 증강을 통한 딥러닝 기반 주가 패턴 예측 정확도 향상 방안)

  • Kim, Youngjun;Kim, Yeojeong;Lee, Insun;Lee, Hong Joo
    • The Journal of Bigdata
    • /
    • v.4 no.2
    • /
    • pp.1-12
    • /
    • 2019
  • As Artificial Intelligence (AI) technology develops, it is applied to various fields such as image, voice, and text. AI has shown fine results in certain areas. Researchers have tried to predict the stock market by utilizing artificial intelligence as well. Predicting the stock market is known as one of the difficult problems since the stock market is affected by various factors such as economy and politics. In the field of AI, there are attempts to predict the ups and downs of stock price by studying stock price patterns using various machine learning techniques. This study suggest a way of predicting stock price patterns based on the Convolutional Neural Network(CNN) among machine learning techniques. CNN uses neural networks to classify images by extracting features from images through convolutional layers. Therefore, this study tries to classify candlestick images made by stock data in order to predict patterns. This study has two objectives. The first one referred as Case 1 is to predict the patterns with the images made by the same-day stock price data. The second one referred as Case 2 is to predict the next day stock price patterns with the images produced by the daily stock price data. In Case 1, data augmentation methods - random modification and Gaussian noise - are applied to generate more training data, and the generated images are put into the model to fit. Given that deep learning requires a large amount of data, this study suggests a method of data augmentation for candlestick images. Also, this study compares the accuracies of the images with Gaussian noise and different classification problems. All data in this study is collected through OpenAPI provided by DaiShin Securities. Case 1 has five different labels depending on patterns. The patterns are up with up closing, up with down closing, down with up closing, down with down closing, and staying. The images in Case 1 are created by removing the last candle(-1candle), the last two candles(-2candles), and the last three candles(-3candles) from 60 minutes, 30 minutes, 10 minutes, and 5 minutes candle charts. 60 minutes candle chart means one candle in the image has 60 minutes of information containing an open price, high price, low price, close price. Case 2 has two labels that are up and down. This study for Case 2 has generated for 60 minutes, 30 minutes, 10 minutes, and 5minutes candle charts without removing any candle. Considering the stock data, moving the candles in the images is suggested, instead of existing data augmentation techniques. How much the candles are moved is defined as the modified value. The average difference of closing prices between candles was 0.0029. Therefore, in this study, 0.003, 0.002, 0.001, 0.00025 are used for the modified value. The number of images was doubled after data augmentation. When it comes to Gaussian Noise, the mean value was 0, and the value of variance was 0.01. For both Case 1 and Case 2, the model is based on VGG-Net16 that has 16 layers. As a result, 10 minutes -1candle showed the best accuracy among 60 minutes, 30 minutes, 10 minutes, 5minutes candle charts. Thus, 10 minutes images were utilized for the rest of the experiment in Case 1. The three candles removed from the images were selected for data augmentation and application of Gaussian noise. 10 minutes -3candle resulted in 79.72% accuracy. The accuracy of the images with 0.00025 modified value and 100% changed candles was 79.92%. Applying Gaussian noise helped the accuracy to be 80.98%. According to the outcomes of Case 2, 60minutes candle charts could predict patterns of tomorrow by 82.60%. To sum up, this study is expected to contribute to further studies on the prediction of stock price patterns using images. This research provides a possible method for data augmentation of stock data.

  • PDF