• Title/Summary/Keyword: Voice training

Search Result 182, Processing Time 0.019 seconds

Accurate Speech Detection based on Sub-band Selection for Robust Keyword Recognition (강인한 핵심어 인식을 위해 유용한 주파수 대역을 이용한 음성 검출기)

  • Ji Mikyong;Kim Hoirin
    • Proceedings of the KSPS conference
    • /
    • 2002.11a
    • /
    • pp.183-186
    • /
    • 2002
  • The speech detection is one of the important problems in real-time speech recognition. The accurate detection of speech boundaries is crucial to the performance of speech recognizer. In this paper, we propose a speech detector based on Mel-band selection through training. In order to show the excellence of the proposed algorithm, we compare it with a conventional one, so called, EPD-VAA (EndPoint Detector based on Voice Activity Detection). The proposed speech detector is trained in order to better extract keyword speech than other speech. EPD-VAA usually works well in high SNR but it doesn't work well any more in low SNR. But the proposed algorithm pre-selects useful bands through keyword training and decides the speech boundary according to the energy level of the sub-bands that is previously selected. The experimental result shows that the proposed algorithm outperforms the EPD-VAA.

  • PDF

A study on speech training aids for Deafs (청각장애자용 발음훈련기기 개발에 관한 연구)

  • Ahn, Sang-Pil;Lee, Jae-Hyuk;Yoon, Tae-Sung;Park, Sang-Hui
    • Proceedings of the KIEE Conference
    • /
    • 1990.07a
    • /
    • pp.47-50
    • /
    • 1990
  • Deafs cannot speak straight voice as normal people in lack of feedback of their pronunciation, therefore speech training is required. In this study, fundamental frequency, intensity, formant frequencies, vocal tract graphic and vocal tract area function, extracted from speech signal, are used as feature parameter. AR model, whose coefficients are extracted using inverse filtering. is used as speech generation model. In connect ion between vocal tract graphic and speech parameter, articulation distances and articulation distance functions in selected 15-intervals are determined by extracted vocal tract areas and formant frequencies.

  • PDF

A Proposal on the Development of Chemical-Biological-Radiological-Nuclear-Explosive (CBRNE) Emergency Medical Training Program for Fire Officers (소방공무원의 화생방테러 응급의료훈련 교육과목 개설에 대한 제언)

  • Kim, Jee-Hee
    • Fire Science and Engineering
    • /
    • v.21 no.4
    • /
    • pp.99-104
    • /
    • 2007
  • Recently keeping pace with globalization, many international conferences and athletic games are being held in Korea. After 911 terror in New York in 2001, Korean government dispatched Zaytun Division in Iraq and this fact has also led to voice concerns that Korea should be prepared to protect from terrors of chemical, biological, radiological, nuclear, and explosive(CBRNE) emergency as soon as possible. It is important to develop the CBRNE emergency medical training for fire officers in Korea. So I propose the curriculum.

Long Term Average Spectrum Characteristics of Speaking Voice of Western Operatic Singers (Long Term Average Spectrum을 이용한 성악가들의 Speaking Voice 분석)

  • Lee, Kyung-Chul;Hong, Seok-Jin;Jin, Sung-Min
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.15 no.2
    • /
    • pp.122-127
    • /
    • 2004
  • Background and Objectives : Many studies have described and analyzed singer's formant and it has been shown that the epilaryngeal tube in the human airway is responsible for vocal ring, or the singer's formant. A similar phenomenon produced by trained singers in their speech led some authors to examine the speaker's ring. This study was designed to analyze the speaking voice of the singers and speaker's ring. Baterials and Methods : Ten tenors, fifteen baritones, fifteen sopranos and ten mezzo sopranos attending the music college, department of vocal music were chosen for this study. Fifteen male and fifteen female untrained normal speakers were chosen for control group. Each subject was asked to produce a sample of a sustained spoken vowel /ah/ sound for at least five seconds and read sentence 'Kaeul'. The sound data was analyzed using the Fast Fourier Transform(FFT) - based power spectrum, Long term average(LTA) power spectrum using the FFT algorithm of the Computerized Speech Lab(CSL, Kay elemetrics, Model 4300B, USA). Statistical analysis was performed using the Mann-Whitney test of the Statistical Package for Social Sciences(SPSS). Results : For LTA Power spectrum of/ah/ sound, a significant increase was seen in the 2,500-3,500Hz region(p<0.01) in four trained singer group compared with untrained speaker group, and a significant increase in the 9,000-10,000Hz region(p<0.01) in soparano group. Similarly, in sentence 'Kaeul', there was a significant increase in energy in the tenor, baritone, mezzo soprano group compared with the untrained speaker group in the 2,500-3,500Hz region(p<0.01), and a significant increase in all frequency region(p<0.01) in the soprano group. Conclusions : The LTA power spectrum suggests that trained singers group show more energy concentration in the 'singer's formant' region in the speaking voice, and authors believe this region to be the 'speaker's ring'. Further research is needed on the effect of singing training on the resonance of the speaking voice.

  • PDF

Effects of Voice Therapy Using Gliding and Humming in Dysphonic Patients With Glottal Gap (활창과 허밍을 이용한 음성치료가 성문틈 환자의 음성 개선에 미치는 효과)

  • Jung, Dae-Yong;Shim, Mi-Ran;Hwang, Yeon-Shin;Kim, Geun-Jeon;Sun, Dong-Il
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.32 no.2
    • /
    • pp.81-86
    • /
    • 2021
  • Background and Objectives Therapies have been reported to treat the glottal gap previously. However, these voice therapies showed the limits because many techniques focused only on one among breathing, resonance and phonation. In addition patients often have difficulties visiting hospital frequently. 'Gliding and humming' is vocal training technique that readjusts total vocal patterns such as breathing, resonance and phonation. This technique can be easily applied during short term sessions. The purpose of this study is to evaluate the efficiency of voice therapy with 'gliding and humming' for patients with glottic gap during short-term treatment sessions. Materials and Method Twenty-three patients with glottal gap were selected. Of all patients, 14 patients had sulcus vocalis and 12 patients had muscle tension dysphonia (MTD). Voice therapies were performed 1.9 sessions in average. GRBAS, jitter, shimmer, noise to harmonic ratio, semitone range, closed quotient_vowel and maximum phonation time were compared before and after the therapies. In addition, changes of glottal gap and MTD severity were evaluated. Results Statistically significant improvement was observed. MTD improvement was observed only among the patients with glottal gap improvement. Also sulcus vocalis group showed the statistically significant improvement. Conclusion 'Gliding and humming' was effective to the patients with glottic gap and sulcus vocalis. Also, among patients who have both glottic gap and MTD, the data suggests that voice therapy for glottic gap also makes improvement in MTD.

Tube phonation in water for patients with hyperfunctional voice disorders: The effect of tube diameter and water immersion depth on bubble height and maximum phonation time (과기능적 음성장애 환자의 물저항발성: 튜브 직경과 물 깊이가 물거품 높이 및 최대발성지속시간에 미치는 영향)

  • Min Gyeong Kim;Seong Hee Choi;Jong-In Youn
    • Phonetics and Speech Sciences
    • /
    • v.15 no.2
    • /
    • pp.31-40
    • /
    • 2023
  • Tube phonation in water has been widely used for voice training among semi-occluded vocal tract (SOVT) exercises in which the patient bubbles with phonation keeping the tube submerged in water. This study aims to investigate the effect of tube diameter and water depth on bubble height and maximum phonation time (MPT) for patients with hyperfunctional voice disorders. Seventeen patients with hyperfunctional voice disorders were asked to bubble with sustained /u/ at the different inner diameters of tube (5, 7, and 10 mm), water depth (4, 7, and 10 cm). A water resistance phonation biofeedback system using a water height sensor was used for recording bubble height and MPT. The bubble height was significantly changed by the tube diameter while MPT was significantly changed with the tube diameter and water depth. Although the wider tube presented significantly lower bubble height for a given depth, relatively consistent bubble height was maintained. Depending on the water depth, the bubble height did not significantly differ for a given tube diameter. In addtion, MPT significantly decreased with water depth and a wider tube led significantly shorter MPT. A water level-driven water resistance biofeedback system provided useful information on bubble characteristics and vocal fold vibration depending on tube diameter and water depth. It can be useful to monitor the breath support during water resistance phonation for patients with hyperfunctional voice disorders.

Extending StarGAN-VC to Unseen Speakers Using RawNet3 Speaker Representation (RawNet3 화자 표현을 활용한 임의의 화자 간 음성 변환을 위한 StarGAN의 확장)

  • Bogyung Park;Somin Park;Hyunki Hong
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.7
    • /
    • pp.303-314
    • /
    • 2023
  • Voice conversion, a technology that allows an individual's speech data to be regenerated with the acoustic properties(tone, cadence, gender) of another, has countless applications in education, communication, and entertainment. This paper proposes an approach based on the StarGAN-VC model that generates realistic-sounding speech without requiring parallel utterances. To overcome the constraints of the existing StarGAN-VC model that utilizes one-hot vectors of original and target speaker information, this paper extracts feature vectors of target speakers using a pre-trained version of Rawnet3. This results in a latent space where voice conversion can be performed without direct speaker-to-speaker mappings, enabling an any-to-any structure. In addition to the loss terms used in the original StarGAN-VC model, Wasserstein distance is used as a loss term to ensure that generated voice segments match the acoustic properties of the target voice. Two Time-Scale Update Rule (TTUR) is also used to facilitate stable training. Experimental results show that the proposed method outperforms previous methods, including the StarGAN-VC network on which it was based.

An end-to-end synthesis method for Korean text-to-speech systems (한국어 text-to-speech(TTS) 시스템을 위한 엔드투엔드 합성 방식 연구)

  • Choi, Yeunju;Jung, Youngmoon;Kim, Younggwan;Suh, Youngjoo;Kim, Hoirin
    • Phonetics and Speech Sciences
    • /
    • v.10 no.1
    • /
    • pp.39-48
    • /
    • 2018
  • A typical statistical parametric speech synthesis (text-to-speech, TTS) system consists of separate modules, such as a text analysis module, an acoustic modeling module, and a speech synthesis module. This causes two problems: 1) expert knowledge of each module is required, and 2) errors generated in each module accumulate passing through each module. An end-to-end TTS system could avoid such problems by synthesizing voice signals directly from an input string. In this study, we implemented an end-to-end Korean TTS system using Google's Tacotron, which is an end-to-end TTS system based on a sequence-to-sequence model with attention mechanism. We used 4392 utterances spoken by a Korean female speaker, an amount that corresponds to 37% of the dataset Google used for training Tacotron. Our system obtained mean opinion score (MOS) 2.98 and degradation mean opinion score (DMOS) 3.25. We will discuss the factors which affected training of the system. Experiments demonstrate that the post-processing network needs to be designed considering output language and input characters and that according to the amount of training data, the maximum value of n for n-grams modeled by the encoder should be small enough.

A Phonetic Analysis of Yodel Singing by the Electroglottographic(EGG) Measurement (요들송에 대한 전기성문파형검사(EGG)를 이용한 발성학적 접근)

  • Suh, D.;Choi, H.S.
    • Speech Sciences
    • /
    • v.7 no.2
    • /
    • pp.113-126
    • /
    • 2000
  • A comparative phonetic analysis of Yodel singing and Belcanto singing by the electroglottographic(EGG) measurement was done in three singers. One professional tenor singer(SDI) who is also well trained in Yodel singing, another yodler(KWS) who is not so trained in Belcanto singing, and the other training tenor singer(CSK) who is not well trained both yodel and Belcanto singing. Closed quotient(CQ), speed quotient(SQ) and fundamental frequency (F0) at the initial modal part(I) , middle falsetto part(M), and final modal part(F) of the same phrase were measured by EGG machine and program(Kay model 4338). In the middle part, not only CQ but also SQ of the Yodel singing were much smaller than that of Belcanto singing in all three singers. However, accuracy of parameters in Belcanto singing of the yodler(KWS) and both Yodel singing and Belcanto singing of the training singer(CSK) were inferior to that of trained tenor singer(SDI). Possible advantages of utilizing Yodel singing training under the guidance of feedback control by the EGG for hyperfunctional voice disorders such as vocal nodules were discussed.

  • PDF

Recent Trends in Education and Training for Information Professionals in the U. S. and Their Impact on Library Education Programs in Korea (최근 미국의 정보전문가 교육의 동향과 한국 사서교육과정 개정의 기본방향)

  • Hahn Soon-chung
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.12
    • /
    • pp.149-163
    • /
    • 1985
  • This short survey article examines the current curricula for library and information science education in the U. S. in order to implement them for our professional education in the field in Korea so as to produce qualified and competent graduates. Some of the prevailing trends in education and training for information professionals in the U. S. are as follows: 1. Library schools tend to incorporate information science into library school curricula to reflect their emphasis on this area, and attempt to develop close ties with all segments of the information industry; 2. Library schools actively participate in cooperative research with other agencies to explore ways of solving problems; 3. There is a diversity of education and training programs to meet the needs of a wide variety of information professionals, with library school faculty members being drawn from a wide range of scholarly disciplines; 4. New methods of teaching are being developed to support research and instructional activities; 5. There has been a significant change in the composition of the student body, now given a strong voice in the administration of the library school.

  • PDF