• Title/Summary/Keyword: Non-speech

Search Result 467, Processing Time 0.025 seconds

Discrimination of Pathological Speech Using Hidden Markov Models

  • Wang, Jianglin;Jo, Cheol-Woo
    • Speech Sciences
    • /
    • v.13 no.3
    • /
    • pp.7-18
    • /
    • 2006
  • Diagnosis of pathological voice is one of the important issues in biomedical applications of speech technology. This study focuses on the discrimination of voice disorder using HMM (Hidden Markov Model) for automatic detection between normal voice and vocal fold disorder voice. This is a non-intrusive, non-expensive and fully automated method using only a speech sample of the subject. Speech data from normal people and patients were collected. Mel-frequency filter cepstral coefficients (MFCCs) were modeled by HMM classifier. Different states (3 states, 5 states and 7 states), 3 mixtures and left to right HMMs were formed. This method gives an accuracy of 93.8% for train data and 91.7% for test data in the discrimination of normal and vocal fold disorder voice for sustained /a/.

  • PDF

AI-based language tutoring systems with end-to-end automatic speech recognition and proficiency evaluation

  • Byung Ok Kang;Hyung-Bae Jeon;Yun Kyung Lee
    • ETRI Journal
    • /
    • v.46 no.1
    • /
    • pp.48-58
    • /
    • 2024
  • This paper presents the development of language tutoring systems for nonnative speakers by leveraging advanced end-to-end automatic speech recognition (ASR) and proficiency evaluation. Given the frequent errors in non-native speech, high-performance spontaneous speech recognition must be applied. Our systems accurately evaluate pronunciation and speaking fluency and provide feedback on errors by relying on precise transcriptions. End-to-end ASR is implemented and enhanced by using diverse non-native speaker speech data for model training. For performance enhancement, we combine semisupervised and transfer learning techniques using labeled and unlabeled speech data. Automatic proficiency evaluation is performed by a model trained to maximize the statistical correlation between the fluency score manually determined by a human expert and a calculated fluency score. We developed an English tutoring system for Korean elementary students called EBS AI Peng-Talk and a Korean tutoring system for foreigners called KSI Korean AI Tutor. Both systems were deployed by South Korean government agencies.

End-to-end non-autoregressive fast text-to-speech (End-to-end 비자기회귀식 가속 음성합성기)

  • Kim, Wiback;Nam, Hosung
    • Phonetics and Speech Sciences
    • /
    • v.13 no.4
    • /
    • pp.47-53
    • /
    • 2021
  • Autoregressive Text-to-Speech (TTS) models suffer from inference instability and slow inference speed. Inference instability occurs when a poorly predicted sample at time step t affects all the subsequent predictions. Slow inference speed arises from a model structure that forces the predicted samples from time steps 1 to t-1 to predict the sample at time step t. In this study, an end-to-end non-autoregressive fast text-to-speech model is suggested as a solution to these problems. The results of this study show that this model's Mean Opinion Score (MOS) is close to that of Tacotron 2 - WaveNet, while this model's inference speed and stability are higher than those of Tacotron 2 - WaveNet. Further, this study aims to offer insight into the improvement of non-autoregressive models.

Dimension Reduction Method of Speech Feature Vector for Real-Time Adaptation of Voice Activity Detection (음성구간 검출기의 실시간 적응화를 위한 음성 특징벡터의 차원 축소 방법)

  • Park Jin-Young;Lee Kwang-Seok;Hur Kang-In
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.7 no.3
    • /
    • pp.116-121
    • /
    • 2006
  • In this paper, we propose the dimension reduction method of multi-dimension speech feature vector for real-time adaptation procedure in various noisy environments. This method which reduces dimensions non-linearly to map the likelihood of speech feature vector and noise feature vector. The LRT(Likelihood Ratio Test) is used for classifying speech and non-speech. The results of implementation are similar to multi-dimensional speech feature vector. The results of speech recognition implementation of detected speech data are also similar to multi-dimensional(10-order dimensional MFCC(Mel-Frequency Cepstral Coefficient)) speech feature vector.

  • PDF

Acoustic analysis of English lexical stress produced by Korean, Japanese and Taiwanese-Chinese speakers

  • Jung, Ye-Jee;Rhee, Seok-Chae
    • Phonetics and Speech Sciences
    • /
    • v.10 no.1
    • /
    • pp.15-22
    • /
    • 2018
  • Stressed vowels in English are usually produced using longer duration, higher pitch, and greater intensity than unstressed vowels. However, many English as a foreign language (EFL) learners have difficulty producing English lexical stress because their mother tongues do not have such features. In order to investigate if certain non-native English speakers (Korean, Japanese, and Taiwanese-Chinese native speakers) are able to produce English lexical stress in a native-like manner, speech samples were extracted from the L2 learners' corpus known as AESOP (the Asian English Speech cOrpus Project). Sixteen disyllabic words were analyzed in terms of the ratio of duration, pitch, and intensity. The results demonstrate that non-native English speakers are able to produce English stress in a similar way to native English speakers, and all speakers (both native and non-native) show a tendency to use duration as the strongest cue in producing stress. The results also show that the duration ratio of native English speakers was significantly higher than that of non-native speakers, indicating that native speakers produce a bigger difference in duration between stressed and unstressed vowels.

Automated Speech Analysis Applied to Sasang Constitution Classification (음성을 이용한 사상체질 분류 알고리즘)

  • Kang, Jae-Hwan;Yoo, Jong-Hyang;Lee, Hae-Jung;Kim, Jong-Yeol
    • Phonetics and Speech Sciences
    • /
    • v.1 no.3
    • /
    • pp.155-163
    • /
    • 2009
  • This paper introduces an automatic voice classification system for the diagnosis of individual constitution based on Sasang Constitutional Medicine (SCM) in Traditional Korean Medicine (TKM). For the developing of this algorithm, we used the voices of 473 speakers and extracted a total of 144 speech features from the speech data consisting of five sustained vowels and one sentence. The classification system, based on a rule-based algorithm that is derived from a non parametric statistical method, presents binary negative decisions. In conclusion, 55.7% of the speech data were diagnosed by this system, of which 72.8% were correct negative decisions.

  • PDF

Comparisons of Awareness of Health Care Services and Characteristics in Persons with Speech-Language Disorder Related to Speech Therapy Use for Life Care : From National Survey of the Disabled Person of 2017 (라이프 케어를 위한 언어장애인의 언어치료 이용여부에 따른 특성 및 보건의료서비스 인식 비교 : 2017년 장애인 실태조사를 이용하여)

  • Kang, So-La;Moon, Jong-Hoon
    • Journal of Korea Entertainment Industry Association
    • /
    • v.13 no.3
    • /
    • pp.249-258
    • /
    • 2019
  • The health care services are the most basic social institutions that are provided to citizen including disabled persons for improvement of health. However, the study of the difference of health care services according to the speech therapy use in the people with speech-language disorders was insufficient. The aim of this investigation was to compare the awareness of health care services and characteristics of people with speech-language disorders according to speech therapy use. The researchers selected 229 people with language disorder using raw data of National Survey of the Disabled Person (2017). We compared the characteristics and health care services of people with speech-language disorders by distinguishing between speech therapy non-users and speech therapy users. Among the 229 people with language disorder, speech therapy users were 37 persons (16.2%). In comparison with non-users, users were younger, more preschoolers, more family incomes, and intellectual disabilities and autistic disorder were the most common types of disability enrollment. Users had a lower proportion of unmet medical needs than non-users. For the reasons of unmet medical need, there were 6.8% and 6.3% of the "economic reasons" and "communication difficulties" Both users and non-users responded that "disability management services" need to be strengthened by the government. In conclusion, we suggest that access to health care services needs to be increased to lower the barriers of speech therapy use.

Voice Recognition Performance Improvement using the Convergence of Voice signal Feature and Silence Feature Normalization in Cepstrum Feature Distribution (음성 신호 특징과 셉스트럽 특징 분포에서 묵음 특징 정규화를 융합한 음성 인식 성능 향상)

  • Hwang, Jae-Cheon
    • Journal of the Korea Convergence Society
    • /
    • v.8 no.5
    • /
    • pp.13-17
    • /
    • 2017
  • Existing Speech feature extracting method in speech Signal, there are incorrect recognition rates due to incorrect speech which is not clear threshold value. In this article, the modeling method for improving speech recognition performance that combines the feature extraction for speech and silence characteristics normalized to the non-speech. The proposed method is minimized the noise affect, and speech recognition model are convergence of speech signal feature extraction to each speech frame and the silence feature normalization. Also, this method create the original speech signal with energy spectrum similar to entropy, therefore speech noise effects are to receive less of the noise. the performance values are improved in signal to noise ration by the silence feature normalization. We fixed speech and non speech classification standard value in cepstrum For th Performance analysis of the method presented in this paper is showed by comparing the results with CHMM HMM, the recognition rate was improved 2.7%p in the speech dependent and advanced 0.7%p in the speech independent.

Non-Stationary/Mixed Noise Estimation Algorithm Based on Minimum Statistics and Codebook Driven Short-Term Predictor Parameter Estimation (최소 통계법과 Short-Term 예측계수 코드북을 이용한 Non-Stationary/Mixed 배경잡음 추정 기법)

  • Lee, Myeong-Seok;Noh, Myung-Hoon;Park, Sung-Joo;Lee, Seok-Pil;Kim, Moo-Young
    • The Journal of the Acoustical Society of Korea
    • /
    • v.29 no.3
    • /
    • pp.200-208
    • /
    • 2010
  • In this work, the minimum statistics (MS) algorithm is combined with the codebook driven short-term predictor parameter estimation (CDSTP) to design a speech enhancement algorithm that is robust against various background noise environments. The MS algorithm functions well for the stationary noise but relatively not for the non-stationary noise. The CDSTP works efficiently for the non-stationary noise, but not for the noise that was not considered in the training stage. Thus, we propose to combine CDSTP and MS. Compared with the single use of MS and CDSTP, the proposed method produces better perceptual evaluation of speech quality (PESQ) score, and especially works excellent for the mixed background noise between stationary and non-stationary noises.

The Lombard effect on the speech of children with intellectual disability (지적장애 아동의 롬바드 효과에 따른 말산출 특성)

  • Lee, Hyunju;Lee, Jiyun;Kim, Yukyung
    • Phonetics and Speech Sciences
    • /
    • v.8 no.4
    • /
    • pp.115-122
    • /
    • 2016
  • This study investigates the acoustic-phonetic features and speech intelligibility of Lombard speech in children with intellectual disability, by examining the effect of Lombard speech at 3 levels of non-noise, 55dB, and 65dB. Eight children with intellectual disability read sentences and played speaking games, and their speech were analyzed in terms of intensity, pitch, vowel space of /a/, /i/, and /u/, VAI(3), articulation rate and speech intelligibility. Results showed, first, that intensity and pitch increased as noise level increased; second, that VAI(3) increased as the noise level increased; third, that articulation rate decreased as noise intensity increased; finally, that speech intelligibility increased as noise intensity increased. The Lombard speech changed the VAI(3), vowel space, articulation rate, speech intelligibility of the children with intellectual disability as well. This study suggests that the Lombard speech will be clinically useful for the persons who have intellectual disability and difficulties in self-control.