• Title/Summary/Keyword: Voice Training

Search Result 179, Processing Time 0.023 seconds

Voice Recognition Performance Improvement using a convergence of Voice Energy Distribution Process and Parameter (음성 에너지 분포 처리와 에너지 파라미터를 융합한 음성 인식 성능 향상)

  • Oh, Sang-Yeob
    • Journal of Digital Convergence
    • /
    • v.13 no.10
    • /
    • pp.313-318
    • /
    • 2015
  • A traditional speech enhancement methods distort the sound spectrum generated according to estimation of the remaining noise, or invalid noise is a problem of lowering the speech recognition performance. In this paper, we propose a speech detection method that convergence the sound energy distribution process and sound energy parameters. The proposed method was used to receive properties reduce the influence of noise to maximize voice energy. In addition, the smaller value from the feature parameters of the speech signal The log energy features of the interval having a more of the log energy value relative to the region having a large energy similar to the log energy feature of the size of the voice signal containing the noise which reducing the mismatch of the training and the recognition environment recognition experiments Results confirmed that the improved recognition performance are checked compared to the conventional method. Car noise environment of Pause Hit Rate is in the 0dB and 5dB lower SNR region showed an accuracy of 97.1% and 97.3% in the high SNR region 10dB and 15dB 98.3%, showed an accuracy of 98.6%.

Voice Activity Detection in Noisy Environment using Speech Energy Maximization and Silence Feature Normalization (음성 에너지 최대화와 묵음 특징 정규화를 이용한 잡음 환경에 강인한 음성 검출)

  • Ahn, Chan-Shik;Choi, Ki-Ho
    • Journal of Digital Convergence
    • /
    • v.11 no.6
    • /
    • pp.169-174
    • /
    • 2013
  • Speech recognition, the problem of performance degradation is the difference between the model training and recognition environments. Silence features normalized using the method as a way to reduce the inconsistency of such an environment. Silence features normalized way of existing in the low signal-to-noise ratio. Increase the energy level of the silence interval for voice and non-voice classification accuracy due to the falling. There is a problem in the recognition performance is degraded. This paper proposed a robust speech detection method in noisy environments using a silence feature normalization and voice energy maximize. In the high signal-to-noise ratio for the proposed method was used to maximize the characteristics receive less characterized the effects of noise by the voice energy. Cepstral feature distribution of voice / non-voice characteristics in the low signal-to-noise ratio and improves the recognition performance. Result of the recognition experiment, recognition performance improved compared to the conventional method.

Channel Compensation for Cepstrum-Based Detection of Laryngeal Diseases (켑스트럼 기반의 후두암 감별을 위한 채널보상)

  • Kim Young Kuk;Kim Su Mi;Kim Hyung Soon;Wang Soo-Geun;Jo Cheol-Woo;Yang Byung-Gon
    • MALSORI
    • /
    • no.50
    • /
    • pp.111-122
    • /
    • 2004
  • Automatic detection of laryngeal diseases by voice is attractive because of its non-intrusive nature. Cepstrum based approach to detect laryngeal cancer shows reliable performance even when the periodicity of voice signals is severely lost, but it has a drawback that it is not robust to channel mismatch due to different microphone characteristics. In this paper, to deal with mismatched training and test microphone conditions, we investigate channel compensation techniques such as Cepstral Mean Subtraction (CMS) and Pole Filtered CMS (PFCMS). According to our experiments, PFCMS yields better performance than CMS. By using PFCMS, we obtained 12% and 40% error reduction over baseline and CMS, respectively.

  • PDF

Performance of GMM and ANN as a Classifier for Pathological Voice

  • Wang, Jianglin;Jo, Cheol-Woo
    • Speech Sciences
    • /
    • v.14 no.1
    • /
    • pp.151-162
    • /
    • 2007
  • This study focuses on the classification of pathological voice using GMM (Gaussian Mixture Model) and compares the results to the previous work which was done by ANN (Artificial Neural Network). Speech data from normal people and patients were collected, then diagnosed and classified into two different categories. Six characteristic parameters (Jitter, Shimmer, NHR, SPI, APQ and RAP) were chosen. Then the classification method based on the artificial neural network and Gaussian mixture method was employed to discriminate the data into normal and pathological speech. The GMM method attained 98.4% average correct classification rate with training data and 95.2% average correct classification rate with test data. The different mixture number (3 to 15) of GMM was used in order to obtain an optimal condition for classification. We also compared the average classification rate based on GMM, ANN and HMM. The proper number of mixtures on Gaussian model needs to be investigated in our future work.

  • PDF

Use of Pansori for Developing Actor's Aesthetic Voice (배우의 미학적 발성을 위한 판소리의 활용방안)

  • Lee, Ki-Ho
    • The Journal of the Korea Contents Association
    • /
    • v.9 no.12
    • /
    • pp.181-192
    • /
    • 2009
  • The purpose of this research is to investigate appropriate usage of pansori's method of breathing, sound making, and resonance in order to develop actor's aesthetic voice. Today's theatre no longer see inter-cultural approach as new or experimental, but see it as a part of global current. Actors are required to integrate some global-ness into their acting. It's not enough, however, for actors to equip some cosmopolitan sensibility. More important thing is that they should be able to integrate one's own culture and aesthetic into their performance. Only after acquiring one's own cultural identity, it is possible to step into inter-cultural work. It is fundamental, therefore, for actors to assimilate traditional movement and aesthetic voice. It's been known that traditional Korean voice traits are well preserved in Pansori. In this paper, based upon well-known theories and practices of western voice training, pansori's principles and practices are utilized to bring a new aesthetic voice.

Performance Improvement of Voice Dialing System using Post-Processing (후처리를 이용한 음성 다이얼링 시스템의 성능향상)

  • 김원구
    • The Journal of the Acoustical Society of Korea
    • /
    • v.19 no.5
    • /
    • pp.9-12
    • /
    • 2000
  • Voice dialing system can recognize the speaker's command and dial the destinate phone number automatically. Such a system is useful for wireless handsets and portable communication devices. As a personal voice dialing system, all the commands are used to train the HMM for speech recognition based on owner-selected phrases. Its implementation requires much less memory space and computation resource compared to a speaker-independent system. Since only two or three training utterances per command are used in this system, it is difficult to estimate exact state duration distribution to improve the recognition performance. Therefore a post-processor is presented to improve the performance. Experiments which use the database collected through the telephone line showed that the proposed post-processor improves the recognition system performance.

  • PDF

Diction Problem of Student Singers Based on the Vocal Tract Resonance (성도 공명을 중심으로 한 성악 전공 대학생의 발음법 연구)

  • Kim, Sun-Suk
    • Speech Sciences
    • /
    • v.7 no.4
    • /
    • pp.59-72
    • /
    • 2000
  • Vocal tract resonances are of paramount importance to voice sounds. Resonance frequencies determine vowel quality and the personal voice timber. The aim of this study was to make an effective diction program according to tuning formant frequencies by adjusting the vocal tract shape in professional voice users. Twelve male student singers and eleven female student singers participated in this study. The subjects repeated five simple vowels /a, e, i, o, u/ in normal speech and singing. The spoken vowels and sung vowels were measured by formant frequencies and the singer's formant frequencies using CSL and DSP Sona-Graph. Separately, Plot formants program was used to draw the vowel chart. The results were as follows. (1) Total formant frequencies of female singers were 11% higher than those of males singers in singing. (2) The F1 and F3 of sung vowels increased compared to F1 and F3 spoken vowels. However, The F2 of sung vowels decreased in comparison with F2 of spoken vowels. (3) Posterior vowel /u/ were moved anteriorly. This phenomenon seemed to be due to head voice singing training. (4) Singer's formant frequencies in student singers appeared according to the part: 2560 Hz for baritone, 2760 Hz for Tenor, 2821 Hz for Mezzo soprano and 3420 Hz for soprano.

  • PDF

Design and Implementation of Procedural Self-Instructional Contents and Application on Smart Glasses

  • Yoon, Hyoseok;Kim, Seong Beom;Kim, Nahyun
    • Journal of Multimedia Information System
    • /
    • v.8 no.4
    • /
    • pp.243-250
    • /
    • 2021
  • Instructional contents are used to demonstrate a technical process to teach and walkthrough certain procedures to carry out a task. This type of informational content is widely used for teaching and lectures in form of tutorial videos and training videos. Since there are questions and uncertainties for what could be the killer application for the novel wearables, we propose a self-instruction training application on a smart glass to utilize already-available instruction videos as well as public open data in creative ways. We design and implement a prototype application to help users train by wearing smart glasses specifically designed for two concrete and hand-constrained use cases where the user's hands need to be free to operate. To increase the efficiency and feasibility of the self-instruction training, we contribute to the development of a wearable killer application by integrating a voice-based user interface using speech recognizer, public open data APIs, and timestamp-based procedural content navigation structure into our proof-of-concept application.

Design of Speech-Training System for Voice Disorders Using Visual Effect (발성장애아동을 위한 발성훈련시스템 설계)

  • 정은순;김봉완;이용주
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2000.04b
    • /
    • pp.696-698
    • /
    • 2000
  • 본 연구는 발성장애아동을 대상으로 시각적 효과를 이용하여 발성치료 및 교육을 위한 도구 개발을 목적으로 한다. 따라서 특수아동의 발성장애에 대한 특성을 고려하여 그에 적합한 반복학습이 가능하도록 설계하였다. 또한 GUI와 게임적 요소를 가미하여 발성에 대한 아동의 흥미유발과 자발적 학습이 가능하도록 하였다.

  • PDF

Application and Practice of Estill Vocal Training (EVT) Through Theatrical and Musical Analysis of Musical Songs (뮤지컬 노래의 극과 음악 분석을 통한 조 에스틸 보컬 기법(EVT)의 적용과 실제)

  • Lee, Eun-Hye;Kim, Yu-Jeong
    • Journal of Korea Entertainment Industry Association
    • /
    • v.14 no.8
    • /
    • pp.91-102
    • /
    • 2020
  • The purpose of this study is to analyze musical songs from an academic perspective by applying vocal techniques that can express songs in depth in three dimensions. Singing a musical song cannot be completed with just the musical part, rather, it should be accompanied by the analysis of various aspects such as the emotional state of the scenes and the characters. To this end, this study performed a multi-dimensional analysis of fields such as theatrical structure, lyrics, musical structure, and dynamics. In addition, the study explored and applied Estill Voice Training(EVT) that actors can best express songs with the emotions of the theater and music. EVT categorizes voice into six tones: speech, sob/cry, falsetto, twang, opera, and belting. In this study, in addition to these six sounds, the positions of vocal cords and larynx were also applied to seek ways to effectively express songs using "Gar Nichts" from the musical "Elisabeth" as a case study. "Gar Nichts" is a song sung by the protagonist Elisabeth, which expresses the self and the conflict at the peak of pain. Musically, this song requires various sound and voice-changing techniques to cover the range of "G#3-Gb5." As a result, it was confirmed that in order to embody the emotions of the characters and the songs in depth, the analysis of scenes and characters as well as various singing techniques need to be applied in harmony.