• Title/Summary/Keyword: Speech Training

Search Result 580, Processing Time 0.027 seconds

KORAN DIGIT RECOGNITION IN NOISE ENVIRONMENT USING SPECTRAL MAPPING TRAINING

  • Ki Young Lee
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1994.06a
    • /
    • pp.1015-1020
    • /
    • 1994
  • This paper presents the Korean digit recognition method under noise environment using the spectral mapping training based on static supervised adaptation algorithm. In the presented recognition method, as a result of spectral mapping from one space of noisy speech spectrum to another space of speech spectrum without noise, spectral distortion of noisy speech is improved, and the recognition rate is higher than that of the conventional method using VQ and DTW without noise processing, and even when SNR level is 0 dB, the recognition rate is 10 times of that using the conventional method. It has been confirmed that the spectral mapping training has an ability to improve the recognition performance for speech in noise environment.

  • PDF

Performance Improvement in the Multi-Model Based Speech Recognizer for Continuous Noisy Speech Recognition (연속 잡음 음성 인식을 위한 다 모델 기반 인식기의 성능 향상에 대한 연구)

  • Chung, Yong-Joo
    • Speech Sciences
    • /
    • v.15 no.2
    • /
    • pp.55-65
    • /
    • 2008
  • Recently, the multi-model based speech recognizer has been used quite successfully for noisy speech recognition. For the selection of the reference HMM (hidden Markov model) which best matches the noise type and SNR (signal to noise ratio) of the input testing speech, the estimation of the SNR value using the VAD (voice activity detection) algorithm and the classification of the noise type based on the GMM (Gaussian mixture model) have been done separately in the multi-model framework. As the SNR estimation process is vulnerable to errors, we propose an efficient method which can classify simultaneously the SNR values and noise types. The KL (Kullback-Leibler) distance between the single Gaussian distributions for the noise signal during the training and testing is utilized for the classification. The recognition experiments have been done on the Aurora 2 database showing the usefulness of the model compensation method in the multi-model based speech recognizer. We could also see that further performance improvement was achievable by combining the probability density function of the MCT (multi-condition training) with that of the reference HMM compensated by the D-JA (data-driven Jacobian adaptation) in the multi-model based speech recognizer.

  • PDF

Minimum Classification Error Training to Improve Discriminability of PCMM-Based Feature Compensation (PCMM 기반 특징 보상 기법에서 변별력 향상을 위한 Minimum Classification Error 훈련의 적용)

  • Kim Wooil;Ko Hanseok
    • The Journal of the Acoustical Society of Korea
    • /
    • v.24 no.1
    • /
    • pp.58-68
    • /
    • 2005
  • In this paper, we propose a scheme to improve discriminative property in the feature compensation method for robust speech recognition under noisy environments. The estimation of noisy speech model used in existing feature compensation methods do not guarantee the computation of posterior probabilities which discriminate reliably among the Gaussian components. Estimation of Posterior probabilities is a crucial step in determining the discriminative factor of the Gaussian models, which in turn determines the intelligibility of the restored speech signals. The proposed scheme employs minimum classification error (MCE) training for estimating the parameters of the noisy speech model. For applying the MCE training, we propose to identify and determine the 'competing components' that are expected to affect the discriminative ability. The proposed method is applied to feature compensation based on parallel combined mixture model (PCMM). The performance is examined over Aurora 2.0 database and over the speech recorded inside a car during real driving conditions. The experimental results show improved recognition performance in both simulated environments and real-life conditions. The result verifies the effectiveness of the proposed scheme for increasing the performance of robust speech recognition systems.

HMM-based missing feature reconstruction for robust speech recognition in additive noise environments (가산잡음환경에서 강인음성인식을 위한 은닉 마르코프 모델 기반 손실 특징 복원)

  • Cho, Ji-Won;Park, Hyung-Min
    • Phonetics and Speech Sciences
    • /
    • v.6 no.4
    • /
    • pp.127-132
    • /
    • 2014
  • This paper describes a robust speech recognition technique by reconstructing spectral components mismatched with a training environment. Although the cluster-based reconstruction method can compensate the unreliable components from reliable components in the same spectral vector by assuming an independent, identically distributed Gaussian-mixture process of training spectral vectors, the presented method exploits the temporal dependency of speech to reconstruct the components by introducing a hidden-Markov-model prior which incorporates an internal state transition plausible for an observed spectral vector sequence. The experimental results indicate that the described method can provide temporally consistent reconstruction and further improve recognition performance on average compared to the conventional method.

Three-Stage Framework for Unsupervised Acoustic Modeling Using Untranscribed Spoken Content

  • Zgank, Andrej
    • ETRI Journal
    • /
    • v.32 no.5
    • /
    • pp.810-818
    • /
    • 2010
  • This paper presents a new framework for integrating untranscribed spoken content into the acoustic training of an automatic speech recognition system. Untranscribed spoken content plays a very important role for under-resourced languages because the production of manually transcribed speech databases still represents a very expensive and time-consuming task. We proposed two new methods as part of the training framework. The first method focuses on combining initial acoustic models using a data-driven metric. The second method proposes an improved acoustic training procedure based on unsupervised transcriptions, in which word endings were modified by broad phonetic classes. The training framework was applied to baseline acoustic models using untranscribed spoken content from parliamentary debates. We include three types of acoustic models in the evaluation: baseline, reference content, and framework content models. The best overall result of 18.02% word error rate was achieved with the third type. This result demonstrates statistically significant improvement over the baseline and reference acoustic models.

Inter-rater Reliability and Training Effect of the Differential Diagnosis of Speech and Language Disorder for Stroke Patients (뇌졸중 환자의 말, 언어장애 선별에 대한 검사자간 신뢰도 및 훈련효과)

  • Kim, Jung-Wan
    • The Journal of the Korea Contents Association
    • /
    • v.11 no.9
    • /
    • pp.407-413
    • /
    • 2011
  • Distinguishing aphasia in stroke patients and observing the subtle linguistic characteristics associated with it primarily requires the use of instruments that provide reliable assessment results. Additionally, examiners should be fully aware of how to use those instruments. This study examined 46 stroke patients for aphasia and assessed the reliability of their diagnoses according to examiners whose medical fields were different from each other. Furthermore, a comparison was made between the reliability before training and that after training. To this end, 46 stroke patients were tested for aphasia and in terms of their speech disorder degree by 3 groups, each of which consisted of 12 professionals (3 SLP, 3 neurologist, and 3 nurse). In the result, a rating of 'acceptable' was given for speech intelligibility tasks and the voice quality of /ah-/ prolongation, and other sub-tests were marked as 'good-excellent' by the experts with different areas of medical expertise. For the tasks marked as 'acceptable', the subjects were video-trained for 3 weeks and the differences were compared before and after their training. Consequently, the differences in the examiners' ratings in the speech intelligibility tasks showed a significant decrease and the accuracy of their voice quality ratings showed a significant increase. In the result of research on the correlation between the accuracy of the sub-test ratings and the amount of clinic experience, speech therapists developed more accuracy in rating a picture description task and a speech intelligibility task as their experience accumulated. Meanwhile, doctors and nurses showed more accurate ratings in picture description tasks with greater clinical experience. The results of this study suggest that assessing the neurologic-communicative disorders of stroke patients requires ongoing training and experience, especially for speech disorders. It was also found that the rating reliability in this case could be improved by training.

A Training Method for Emotion Recognition using Emotional Adaptation (감정 적응을 이용한 감정 인식 학습 방법)

  • Kim, Weon-Goo
    • Journal of IKEEE
    • /
    • v.24 no.4
    • /
    • pp.998-1003
    • /
    • 2020
  • In this paper, an emotion training method using emotional adaptation is proposed to improve the performance of the existing emotion recognition system. For emotion adaptation, an emotion speech model was created from a speech model without emotion using a small number of training emotion voices and emotion adaptation methods. This method showed superior performance even when using a smaller number of emotional voices than the existing method. Since it is not easy to obtain enough emotional voices for training, it is very practical to use a small number of emotional voices in real situations. In the experimental results using a Korean database containing four emotions, the proposed method using emotional adaptation showed better performance than the existing method.

Analysis and Implementation of Speech/Music Classification for 3GPP2 SMV Codec Employing SVM Based on Discriminative Weight Training (SMV코덱의 음성/음악 분류 성능 향상을 위한 최적화된 가중치를 적용한 입력벡터 기반의 SVM 구현)

  • Kim, Sang-Kyun;Chang, Joon-Hyuk;Cho, Ki-Ho;Kim, Nam-Soo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.5
    • /
    • pp.471-476
    • /
    • 2009
  • In this paper, we apply a discriminative weight training to a support vector machine (SVM) based speech/music classification for the selectable mode vocoder (SMV) of 3GPP2. In our approach, the speech/music decision rule is expressed as the SVM discriminant function by incorporating optimally weighted features of the SMV based on a minimum classification error (MCE) method which is different from the previous work in that different weights are assigned to each the feature of SMV. The performance of the proposed approach is evaluated under various conditions and yields better results compared with the conventional scheme in the SVM.

Remote Articulation Training System for the Deafs (청각장애자를 위한 원격조음훈련시스템의 개발)

  • Shin, T.K.;Shin, C.H.;Lee, J.H.;Yoo, S.K.;Park, S.H.
    • Proceedings of the KOSOMBE Conference
    • /
    • v.1996 no.11
    • /
    • pp.114-117
    • /
    • 1996
  • In this study, remote articulation training system which connects the hearing disabled trainee and the speech therapist via B-ISDN is introduced. The hearing disabled does not have the hearing feedback of his own pronunciation, and the chance of watching his speech organs' movement trajectory will offer him the self-training of articulation. So the system has two purposes of self articulation training and trainer's on-line checking in remote place. We estimate the vocal tract articulatory movements from the speech signal using inverse modelling and display the movement trajectory on the sideview of human face graphically. The trajectories of trainees' articulation is displayed along with the reference trajectories, so the trainee can control his articulating to make the two trajectories overlapped. For on-line communication and ckecking training record, the system has the function of video conferencing and transferring articulatory data.

  • PDF

Quality of Life in Older Adults with Cochlear Implantation: Can It Be Equal to That of Healthy Older Adults?

  • Tokat, Taskin;Muderris, Togay;Bozkurt, Ergul Basaran;Ergun, Ugurtan;Aysel, Abdulhalim;Catli, Tolgahan
    • Korean Journal of Audiology
    • /
    • v.25 no.3
    • /
    • pp.138-145
    • /
    • 2021
  • Background and Objectives: This study aimed to evaluate the audiologic results after cochlear implantation (CI) in older patients and the degree of improvement in their quality of life (QoL). Subjects and Methods: Patients over 65 years old who underwent CI at implant center in Bozyaka Training and Research Hospital were included in this study (n=54; 34 males and 20 females). The control group was patient over 65 years old with normal hearing (n=54; 34 males and 20 females). We administered three questionnaires [World Health Organization Quality of Life-BREF (WHOQOL-BREF), World Health Organization Quality of Life-OLD (WHOQOL-OLD)], and Geriatric Depression Scale (GDS) to evaluate the QoL, CIrelated effects on activities of daily life, and social activities in all the subjects. Moreover, correlations between speech recognition and the QoL scores were evaluated. The duration of implant use and comorbidities were also examined as potential factors affecting QoL. Results: The patients had remarkable improvements (the mean score of postoperative speech perception 75.7%) in speech perception after CI. The scores for the WHOQOL-OLD and WHOQOL-BREF questionnaire responses were similar in both the study and control groups, except those for a two subdomains (social relations and social participation). The patients with longer-term CI had higher scores than those with short-term CI use. In general, the changes in GDS scores were not significant (p<0.05). Conclusions: The treatment of hearing loss with CI conferred significant improvement in patient's QoL (p<0.01). The evaluation of QoL can provide multidimensional insights into a geriatric patient's progress and, therefore, should be considered by audiologists.