• Title/Summary/Keyword: voice training

Search Result 182, Processing Time 0.024 seconds

Singing Voice Synthesis Using HMM Based TTS and MusicXML (HMM 기반 TTS와 MusicXML을 이용한 노래음 합성)

  • Khan, Najeeb Ullah;Lee, Jung-Chul
    • Journal of the Korea Society of Computer and Information
    • /
    • v.20 no.5
    • /
    • pp.53-63
    • /
    • 2015
  • Singing voice synthesis is the generation of a song using a computer given its lyrics and musical notes. Hidden Markov models (HMM) have been proved to be the models of choice for text to speech synthesis. HMMs have also been used for singing voice synthesis research, however, a huge database is needed for the training of HMMs for singing voice synthesis. And commercially available singing voice synthesis systems which use the piano roll music notation, needs to adopt the easy to read standard music notation which make it suitable for singing learning applications. To overcome this problem, we use a speech database for training context dependent HMMs, to be used for singing voice synthesis. Pitch and duration control methods have been devised to modify the parameters of the HMMs trained on speech, to be used as the synthesis units for the singing voice. This work describes a singing voice synthesis system which uses a MusicXML based music score editor as the front-end interface for entry of the notes and lyrics to be synthesized and a hidden Markov model based text to speech synthesis system as the back-end synthesizer. A perceptual test shows the feasibility of our proposed system.

A comparative study on accuracy and fatigue in hands-only CPR and traditional CPR by voice instruction (음성지시에 따른 전통적 심폐소생술과 가슴압박소생술시 흉부압박 정확도와 피로도 비교)

  • Yoon, Byoung-Gil;Baek, Mi-Lye
    • The Korean Journal of Emergency Medical Services
    • /
    • v.16 no.2
    • /
    • pp.31-41
    • /
    • 2012
  • Purpose : The purpose of the present study is to analyze the accuracy and fatigue felt by lay persons receiving CPR training when they perform hands only CPR (HOCPR) and traditional CPR (TCPR). The performance of CPR data will provide the criteria of dispatcher guidelines for the general public. Methods : For 2 minutes duration, HOCPR was conducted by 51 subjects and TCPR was conducted by 48 subjects. The accuracy measurement of chest compressions was based on the 2010 AHA guideline; the subjective fatigue level in before and after experiment was measured by a self-administered questionnaire. Results : There were no significant differences between the average depth, chest compression depth and chest compression location in terms of chest compression accuracy. However, there were significant differences between the two experimental groups in the accuracies for average speed and chest compression speed. The subjective fatigue level showed no significant difference. Conclusion : The experimental group performing HOCPR showed more accurate compression speed and lower fatigue level. These results suggested that HOCPR would be more effective in training the lay persons in accordance with the voice-instructed CPR.

A Case Study on Vocal Aerobic Treatment Voice Therapy Development and Application for Classical Singers (성악가를 위한 VAT 음성치료 개발 및 적용 사례연구)

  • Yoo, Jae-Yeon;Lee, Ha-Na
    • 재활복지
    • /
    • v.22 no.1
    • /
    • pp.157-168
    • /
    • 2018
  • The purpose of this study is to investigate the impact of semi-closed vocal training-based Vocal Aerobic Treatment on the voice improvement of soprano. Study subject was one soprano who appealed to the suffering of her voice problem due to vocal cord nodule. A study method of conducting pre/post acoustic evaluation and subjective voice evaluation to compare the measures was used; Vocal Aerobic Treatment was carried out twice a week for a total of 32 session. In the acoustic evaluation, MDVP (multi-dimensional voice program) and VRP (voice range profile) were used to evaluate the pitch, voice quality, and voice range; in the subjective voice evaluation, SVHI (singing voice handicap index) was used to assess voice satisfaction. As a result of the pitch evaluation, the soprano maintained a proper Fo. As a result of the voice quality evaluation, the jitter, shimmer, and the noise harmonic ratio numbers decreased compared to the numbers shown before the treatment. As a result of the voice range evaluation, the scope of the range was broadened, with the number of semitone increasing from 30 to 35. As for the subjective voice evaluation, the result of the total score obtained after the survey report divided by the number of questions showed a decrease from 3.6 to 0.6. The soprano herself reported of having a minor extent of a voice problem. The summary of the above results reflects that Vocal Aerobic Treatment is useful in the voice improvement of vocalists However, as this study is case research regarding the Vocal Aerobic Treatment effect on one soprano, further research on the treatment effect covering many other vocalists is necessary. Also, there is a need for follow-up studies regarding voice management and voice treatment program on not only the vocalists but also the voice users in many other professions.

Development of the Guidelines on the VTS English Competency Test

  • 최승희;장은규
    • Proceedings of the Korean Institute of Navigation and Port Research Conference
    • /
    • 2022.06a
    • /
    • pp.249-250
    • /
    • 2022
  • The purpose of this paper is to suggest the development of the Guidelines on VTS English language competency test, according to IALA Guideline 1132 - VTS Voice Communications and Phraseology. As the foundation for improving VTSOs' communication capabilities throughout their career lifecycle in terms of training, accreditation, and revalidation, a development of a VTS-specific language testing system with explicit language testing evaluation criteria becomes more critical. With the aim of facilitating the discussion, a range of suggestions to be considered in the development of Guidelines on the VTS English competency test are made

  • PDF

Audiobook Text Shaping for Synesthesia Voice Training - Focusing on Paralanguages - (오디오북 텍스트 형상화를 위한 공감각적 음성 훈련 연구 - 유사언어를 활용하여 -)

  • Cho, Ye-Shin;Choi, Jae-Oh
    • Journal of Korea Entertainment Industry Association
    • /
    • v.13 no.8
    • /
    • pp.167-180
    • /
    • 2019
  • The purpose of this study is to find out the results of synesthesia speech training using similar language for shaping audiobook text. The audiobook text for training uses Tolstoy's work, and uses similar language of tone, tone, pose, speed, intonation, accent, and expression of emotions. The participants who ten visually impaired trainee in H library were selected for qualitative research. Based on the research questions raised in this study, the results are as follows. First, synesthesia training, in which more than two senses of the five senses work simultaneously in voice training for audio book text shaping, produced the result by visualizing the original purpose, meaning, and background of the text. Second, the use of similar language was helpful in the whole process of expressing the meaning of sentence and dialogue for audiobook text shaping. In addition, although there were some differences among the study subjects, they found commonalities that considered tone, pose, and intonation important. Third, the visually impaired have advanced sensory aspects and memory, which resulted in rapid acquisition of metabolism and acceptance of transmission during training. In addition, the teacher's friendly behavior was a very important key mediator in the training process.

ETRI small-sized dialog style TTS system (ETRI 소용량 대화체 음성합성시스템)

  • Kim, Jong-Jin;Kim, Jeong-Se;Kim, Sang-Hun;Park, Jun;Lee, Yun-Keun;Hahn, Min-Soo
    • Proceedings of the KSPS conference
    • /
    • 2007.05a
    • /
    • pp.217-220
    • /
    • 2007
  • This study outlines a small-sized dialog style ETRI Korean TTS system which applies a HMM based speech synthesis techniques. In order to build the VoiceFont, dialog-style 500 sentences were used in training HMM. And the context information about phonemes, syllables, words, phrases and sentence were extracted fully automatically to build context-dependent HMM. In training the acoustic model, acoustic features such as Mel-cepstrums, logF0 and its delta, delta-delta were used. The size of the VoiceFont which was built through the training is 0.93Mb. The developed HMM-based TTS system were installed on the ARM720T processor which operates 60MHz clocks/second. To reduce computation time, the MLSA inverse filtering module is implemented with Assembly language. The speed of the fully implemented system is the 1.73 times faster than real time.

  • PDF

Voice Activity Detection Based on Real-Time Discriminative Weight Training (실시간 변별적 가중치 학습에 기반한 음성 검출기)

  • Chang, Sang-Ick;Jo, Q-Haing;Chang, Joon-Hyuk
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.45 no.4
    • /
    • pp.100-106
    • /
    • 2008
  • In this paper we apply a discriminative weight training employing power spectral flatness measure (PSFM) to a statistical model-based voice activity detection (VAD) in various noise environments. In our approach, the VAD decision rule is expressed as the geometric mean of optimally weighted likelihood ratio test (LRT) based on a minimum classification error (MCE) method which is different from the previous works in th at different weights are assigned to each frequency bin and noise environments depending on PSFM. According to the experimental results, the proposed approach is found to be effective for the statistical model-based VAD using the LRT.

Voice-Pishing Detection Algorithm Based on Minimum Classification Error Technique (최소 분류 오차 기법을 이용한 보이스 피싱 검출 알고리즘)

  • Lee, Kye-Hwan;Chang, Joon-Hyuk
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.46 no.3
    • /
    • pp.138-142
    • /
    • 2009
  • We propose an effective voice-phishing detection algorithm based on discriminative weight training. The detection of voice phishing is performed based on a Gaussian mixture model (GMM) incorporaiting minimum classification error (MCE) technique. Actually, the MCE technique is based on log-likelihood from the decoding parameter of the SMV(Selectable Mode Vocoder) directly extracted from the decoding process in the mobile phone. According to the experimental result, the proposed approach is found to be effective for the voice phishing detection.

A Survey on Participants' Satisfaction of Vocal Hygiene Education: A Preliminary Study (음성위생교육 만족도에 대한 예비 연구)

  • Yoon, Ji Hye;Kim, Sun Woo
    • Phonetics and Speech Sciences
    • /
    • v.5 no.3
    • /
    • pp.83-93
    • /
    • 2013
  • Vocal hygiene education is an indirect training approach to improve vocal function by educating all facets of optimal vocal health. Satisfaction levels of participants might be an important component of this indirect therapy for voice disorders. The authors aimed to investigate the satisfaction levels of vocal hygiene education in 51 patients with voice problems. We classified voice disorders of the participants according to three etiological categories (subgroups): organic, neurogenic, and functional. The survey consisted of three parts: 1) a condition of vocal hygiene education, 2) a degree of satisfaction of the present education, and 3) a request for future education. Participants responded to each item of the survey using a five-point Likert scale of 1 to 5 (1 being not at all and 5 being extremely). They also wrote down personal comments of improvement. Participants scored the vocal hygiene education offered by the speech-language pathologists between '3' and '4'. Specifically, the participants were highly satisfied with the specific and comprehensible explanation/instruction given by their speech-language pathologists. However, they were less satisfied with the tuition fee for the therapy sessions. Vocal hygiene education is offered individually to people in a clinical setting. Our results support the notion that vocal hygiene education can be an integral aspect of the treatment of voice problems in most cases.

Creation of a Voice Recognition-Based English Aided Learning Platform

  • Hui Xu
    • Journal of Information Processing Systems
    • /
    • v.20 no.4
    • /
    • pp.491-500
    • /
    • 2024
  • In hopes of resolving the issue of poor quality of information input for teaching spoken English online, the study creates an English teaching assistance model based on a recognition algorithm named dynamic time warping (DTW) and relies on automated voice recognition technology. In hopes of improving the algorithm's efficiency, the study modifies the speech signal's time-domain properties during the pre-processing stage and enhances the algorithm's performance in terms of computational effort and storage space. Finally, a simulation experiment is employed to evaluate the model application's efficacy. The study's revised DTW model, which achieves recognition rates of above 95% for all phonetic symbols and tops the list for cloudy consonant recognition with rates of 98.5%, 98.8%, and 98.7% throughout the three tests, respectively, is demonstrated by the study's findings. The enhanced model for DTW voice recognition also presents higher efficiency and requires less time for training and testing. The DTW model's KS value, which is the highest among the models analyzed in the KS value analysis, is 0.63. Among the comparative models, the model also presents the lowest curve position for both test functions. This shows that the upgraded DTW model features superior voice recognition capabilities, which could significantly improve online English education and lead to better teaching outcomes.