• Title/Summary/Keyword: Simulated speech

Search Result 70, Processing Time 0.02 seconds

Effective Recognition of Velopharyngeal Insufficiency (VPI) Patient's Speech Using Simulated Speech Model (모의 음성 모델을 이용한 효과적인 구개인두부전증 환자 음성 인식)

  • Sung, Mee Young;Kwon, Tack-Kyun;Sung, Myung-Whun;Kim, Wooil
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.19 no.5
    • /
    • pp.1243-1250
    • /
    • 2015
  • This paper presents an effective recognition method of VPI patient's speech for a VPI speech reconstruction system. Speaker adaptation technique is employed to improve VPI speech recognition. This paper proposes to use simulated speech for generating an initial model for speaker adaptation, in order to effectively utilize the small size of VPI speech for model adaptation. We obtain 83.60% in average word accuracy by applying MLLR for speaker adaptation. The proposed speaker adaptation method using simulated speech model brings 6.38% improvement in average accuracy. The experimental results demonstrate that the proposed speaker adaptation method is highly effective for developing recognition system of VPI speech which is not suitable for constructing large-size speech database.

The role of prosody in dialect authentication Simulating Masan dialect with Seoul speech segments

  • Yoon, Kyu-Chul
    • Proceedings of the KSPS conference
    • /
    • 2007.05a
    • /
    • pp.234-239
    • /
    • 2007
  • The purpose of this paper is to examine the viability of simulating one dialect with the speech segments of another dialect through prosody cloning. The hypothesis is that, among Korean regional dialects, it is not the segmental differences but the prosodic differences that play a major role in authentic dialect perception. This work intends to support the hypothesis by simulating Masan dialect with the speech segments from Seoul dialect. The dialect simulation was performed by transplanting the prosodic features of Masan utterances unto the same utterances produced by a Seoul speaker. Thus, the simulated Masan utterances were composed of Seoul speech segments but their prosody came from the original Masan utterances. The prosodic features involved were the fundamental frequency contour, the segmental durations, and the intensity contour. The simulated Masan utterances were evaluated by four native Masan speakers and the role of prosody in dialect authentication and speech synthesis was discussed.

  • PDF

Microphone Array Based Speech Enhancement Using Independent Vector Analysis (마이크로폰 배열에서 독립벡터분석 기법을 이용한 잡음음성의 음질 개선)

  • Wang, Xingyang;Quan, Xingri;Bae, Keunsung
    • Phonetics and Speech Sciences
    • /
    • v.4 no.4
    • /
    • pp.87-92
    • /
    • 2012
  • Speech enhancement aims to improve speech quality by removing background noise from noisy speech. Independent vector analysis is a type of frequency-domain independent component analysis method that is known to be free from the frequency bin permutation problem in the process of blind source separation from multi-channel inputs. This paper proposed a new method of microphone array based speech enhancement that combines independent vector analysis and beamforming techniques. Independent vector analysis is used to separate speech and noise components from multi-channel noisy speech, and delay-sum beamforming is used to determine the enhanced speech among the separated signals. To verify the effectiveness of the proposed method, experiments for computer simulated multi-channel noisy speech with various signal-to-noise ratios were carried out, and both PESQ and output signal-to-noise ratio were obtained as objective speech quality measures. Experimental results have shown that the proposed method is superior to the conventional microphone array based noise removal approach like GSC beamforming in the speech enhancement.

Real-time implementation and performance evaluation of speech classifiers in speech analysis-synthesis

  • Kumar, Sandeep
    • ETRI Journal
    • /
    • v.43 no.1
    • /
    • pp.82-94
    • /
    • 2021
  • In this work, six voiced/unvoiced speech classifiers based on the autocorrelation function (ACF), average magnitude difference function (AMDF), cepstrum, weighted ACF (WACF), zero crossing rate and energy of the signal (ZCR-E), and neural networks (NNs) have been simulated and implemented in real time using the TMS320C6713 DSP starter kit. These speech classifiers have been integrated into a linear-predictive-coding-based speech analysis-synthesis system and their performance has been compared in terms of the percentage of the voiced/unvoiced classification accuracy, speech quality, and computation time. The results of the percentage of the voiced/unvoiced classification accuracy and speech quality show that the NN-based speech classifier performs better than the ACF-, AMDF-, cepstrum-, WACF- and ZCR-E-based speech classifiers for both clean and noisy environments. The computation time results show that the AMDF-based speech classifier is computationally simple, and thus its computation time is less than that of other speech classifiers, while that of the NN-based speech classifier is greater compared with other classifiers.

Effective Recognition of Velopharyngeal Insufficiency (VPI) Patient's Speech Using DNN-HMM-based System (DNN-HMM 기반 시스템을 이용한 효과적인 구개인두부전증 환자 음성 인식)

  • Yoon, Ki-mu;Kim, Wooil
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.23 no.1
    • /
    • pp.33-38
    • /
    • 2019
  • This paper proposes an effective recognition method of VPI patient's speech employing DNN-HMM-based speech recognition system, and evaluates the recognition performance compared to GMM-HMM-based system. The proposed method employs speaker adaptation technique to improve VPI speech recognition. This paper proposes to use simulated VPI speech for generating a prior model for speaker adaptation and selective learning of weight matrices of DNN, in order to effectively utilize the small size of VPI speech for model adaptation. We also apply Linear Input Network (LIN) based model adaptation technique for the DNN model. The proposed speaker adaptation method brings 2.35% improvement in average accuracy compared to GMM-HMM based ASR system. The experimental results demonstrate that the proposed DNN-HMM-based speech recognition system is effective for VPI speech with small-sized speech data, compared to conventional GMM-HMM system.

Research on Construction of the Korean Speech Corpus in Patient with Velopharyngeal Insufficiency (구개인두부전증 환자의 한국어 음성 코퍼스 구축 방안 연구)

  • Lee, Ji-Eun;Kim, Wook-Eun;Kim, Kwang Hyun;Sung, Myung-Whun;Kwon, Tack-Kyun
    • Korean Journal of Otorhinolaryngology-Head and Neck Surgery
    • /
    • v.55 no.8
    • /
    • pp.498-507
    • /
    • 2012
  • Background and Objectives We aimed to develop a Korean version of the velopharyngeal insufficiency (VPI) speech corpus system. Subjects and Method After developing a 3-channel simultaneous speech recording device capable of recording nasal/oral and normal compound speech separately, voice data were collected from VPI patients aged more than 10 years with/without the history of operation or prior speech therapy. This was compared to a control group for which VPI was simulated by using a french-3 nelaton tube inserted via both nostril through nasopharynx and pulling the soft palate anteriorly in varying degrees. The study consisted of three transcriptors: a speech therapist transcribed the voice file into text, a second transcriptor graded speech intelligibility and severity and the third tagged the types and onset times of misarticulation. The database were composed of three main tables regarding (1) speaker's demographics, (2) condition of the recording system and (3) transcripts. All of these were interfaced with the Praat voice analysis program, which enables the user to extract exact transcribed phrases for analysis. Results In the simulated VPI group, the higher the severity of VPI, the higher the nasalance score was obtained. In addition, we could verify the vocal energy that characterizes hypernasality and compensation in nasal/oral and compound sounds spoken by VPI patients as opposed to that characgerizes the normal control group. Conclusion With the Korean version of VPI speech corpus system, patients' common difficulties and speech tendencies in articulation can be objectively evaluated. Comparing these data with those of the normal voice, mispronunciation and dysarticulation of patients with VPI can be corrected.

Simulation of speech processing and coding strategy for cochlear implants (인공 청각 장치의 음성신호 처리와 자극방법의 시뮬레이션)

  • Kim, Young-Hoon;Park, Kwang-Suk
    • Proceedings of the KOSOMBE Conference
    • /
    • v.1991 no.11
    • /
    • pp.30-33
    • /
    • 1991
  • The object of speech processor for cochlear implants is to deliver speech information to the central nerve system. In this study we have presented the method which simulate speech processing and coding strategy for cochlear implants and simulated two different processing methods to the 12 adults with normal ears. The formant sinusoidal coding was better than the formant pulse coding In the consonant perception test and learning effects.(p < 0.05)

  • PDF

Speech Perception and Gap Detection Performance of Single-Sided Deafness under Noisy Conditions

  • Kwak, Chanbeom;Kim, Saea;Lee, Jihyeon;Seo, Youngjoon;Kong, Taehoon;Han, Woojae
    • Journal of Audiology & Otology
    • /
    • v.23 no.4
    • /
    • pp.197-203
    • /
    • 2019
  • Background and Objectives: Many studies have reported no benefit of sound localization, but improved speech understanding in noise after treating patients with single-sided deafness (SSD). Furthermore, their performances provided a large individual difference. The present study aimed to measure the ability of speech perception and gap detection in noise for the SSD patients to better understand their hearing nature. Subjects and Methods: Nine SSD patients with different onset and period of hearing deprivation and 20 young adults with normal hearing and simulated conductive hearing loss as the control groups conducted speech perception in noise (SPIN) and Gap-In-Noise (GIN) tests. The SPIN test asked how many presented sentences were understood at the +5 and -5 dB signal-to-noise ratio. The GIN test was asked to find the shortest gap in white noise with different lengths in the gap. Results: Compared to the groups with normal hearing and simulated instant hearing loss, the SSD group showed much poor performance in both SPIN and GIN tests while supporting central auditory plasticity of the SSD patients. Rather than a longer period of deafness, the large individual variance indicated that the congenital SSD patients showed better performance than the acquired SSD patients in two measurements. Conclusions: The results suggested that comprehensive assessments should be implemented before any treatment of the SSD patient considering their onset time and etiology, although these findings need to be generalized with a large sample size.

Speech Perception and Gap Detection Performance of Single-Sided Deafness under Noisy Conditions

  • Kwak, Chanbeom;Kim, Saea;Lee, Jihyeon;Seo, Youngjoon;Kong, Taehoon;Han, Woojae
    • Korean Journal of Audiology
    • /
    • v.23 no.4
    • /
    • pp.197-203
    • /
    • 2019
  • Background and Objectives: Many studies have reported no benefit of sound localization, but improved speech understanding in noise after treating patients with single-sided deafness (SSD). Furthermore, their performances provided a large individual difference. The present study aimed to measure the ability of speech perception and gap detection in noise for the SSD patients to better understand their hearing nature. Subjects and Methods: Nine SSD patients with different onset and period of hearing deprivation and 20 young adults with normal hearing and simulated conductive hearing loss as the control groups conducted speech perception in noise (SPIN) and Gap-In-Noise (GIN) tests. The SPIN test asked how many presented sentences were understood at the +5 and -5 dB signal-to-noise ratio. The GIN test was asked to find the shortest gap in white noise with different lengths in the gap. Results: Compared to the groups with normal hearing and simulated instant hearing loss, the SSD group showed much poor performance in both SPIN and GIN tests while supporting central auditory plasticity of the SSD patients. Rather than a longer period of deafness, the large individual variance indicated that the congenital SSD patients showed better performance than the acquired SSD patients in two measurements. Conclusions: The results suggested that comprehensive assessments should be implemented before any treatment of the SSD patient considering their onset time and etiology, although these findings need to be generalized with a large sample size.

Speech Emotion Recognition by Speech Signals on a Simulated Intelligent Robot (모의 지능로봇에서 음성신호에 의한 감정인식)

  • Jang, Kwang-Dong;Kwon, Oh-Wook
    • Proceedings of the KSPS conference
    • /
    • 2005.11a
    • /
    • pp.163-166
    • /
    • 2005
  • We propose a speech emotion recognition method for natural human-robot interface. In the proposed method, emotion is classified into 6 classes: Angry, bored, happy, neutral, sad and surprised. Features for an input utterance are extracted from statistics of phonetic and prosodic information. Phonetic information includes log energy, shimmer, formant frequencies, and Teager energy; Prosodic information includes pitch, jitter, duration, and rate of speech. Finally a patten classifier based on Gaussian support vector machines decides the emotion class of the utterance. We record speech commands and dialogs uttered at 2m away from microphones in 5different directions. Experimental results show that the proposed method yields 59% classification accuracy while human classifiers give about 50%accuracy, which confirms that the proposed method achieves performance comparable to a human.

  • PDF