• Title/Summary/Keyword: speech error

Search Result 581, Processing Time 0.023 seconds

Phonetic Contrasts of One-syllable Words and Speech Intelligibility in Adults with Hearing Impairments (청각장애 성인의 일음절 낱말대조 명료도 특성)

  • Kim Soo-Jin;Do Yeon-Ji
    • MALSORI
    • /
    • no.56
    • /
    • pp.1-13
    • /
    • 2005
  • This study examined the speech intelligibility of one-syllable words with phonetic contrasts and analyzed segmental factors that can predict the overall speech intelligibility in hearing-impaired adults. To identify the speech error characteristics, a Korean word list was audio-recorded by 7 hearing-impaired adults, and 35 listeners selected the heard word out of 5 choices. Based in part on previous studies of speech of the hearing impaired, the word list consisted of monosyllabic consonant-vowel-consonant (CVC) real word pairs. Stimulus words included 77 phonetic contrast pairs. The results showed that the percentage of errors in final position (coda) contrast was higher than in any other position in syllable. And the intelligibility deficit factors of phonetic contrast in the hearing-impaired were analyzed through stepwise regression analysis. The overall intelligibility was predicted by the error rate of manner contrast at coda, voicing contrast (homorganic triplets) at onset and high-low contrast at nucleus.

  • PDF

Error Analysis of the Exponential RLS Algorithms Applied to Speech Signal Processing

  • Yoo, Kyung-Yul
    • The Journal of the Acoustical Society of Korea
    • /
    • v.15 no.3E
    • /
    • pp.78-85
    • /
    • 1996
  • The set of admissible time-variations in the input signal can be separated into two categories : slow parameter changes and large parameter changes which occur infrequently. A common approach used in the tracking of slowly time-varying parameters is the exponential recursive least-squares(RLS) algorithm. There have been a variety of research works on the error analysis of the exponential RLS algorithm for the slowly time-varying parameters. In this paper, the focus has been given to the error analysis of exponential RLS algorithms for the input data with abrupt property changes. The voiced speech signal is chosen as the principal application. In order to analyze the error performance of the exponential RLS algorithm, deterministic properties of the exponential RLS algorithms is first analyzed for the case of abrupt parameter changes, the impulsive input(or error variance) synchronous to the abrupt change of parameter vectors actually enhances the convergence of the exponential RLS algorithm. The analysis has also been verified through simulations on the synthetic speech signal.

  • PDF

Convergent Analysis on the Speech Sound of Typically Developing Children Aged 3 to 5 : Focused on Word Level and Connected Speech Level (3-5세 일반아동의 말소리에 대한 융합적 분석: 단어와 자발화를 중심으로)

  • Kim, Yun-Joo;Park, Hyun-Ju
    • Journal of the Korea Convergence Society
    • /
    • v.9 no.6
    • /
    • pp.125-132
    • /
    • 2018
  • This study was to investigate the speech sound production characteristics and evaluation aspects of preschool children through word test and connected speech test. For this, the authors conducted Assessment of Phonology and Articulation for Children(APAC) to 72 normal children(24 three-, four-, and five-year-olds each) and analyzed difference in percent of correct consonant(PCC) and intelligibility according to age and sex, correlation between PCC and intelligibility, and speech sound error patterns. PCC and intelligibility increased with age but there was no difference according to sex. The correlation was statistically significant in 5-year-old group. Speech sound error patterns were different in the two tests. This study showed that children's speech sound production varied according to language unit. Therefore, both types of tests should be done to grasp their speech sound production ability properly. This suggests that current standard to identify language impairment only by PCC of word level requires review and further studies.

Error Correction for Korean Speech Recognition using a LSTM-based Sequence-to-Sequence Model

  • Jin, Hye-won;Lee, A-Hyeon;Chae, Ye-Jin;Park, Su-Hyun;Kang, Yu-Jin;Lee, Soowon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.10
    • /
    • pp.1-7
    • /
    • 2021
  • Recently, since most of the research on correcting speech recognition errors is based on English, there is not enough research on Korean speech recognition. Compared to English speech recognition, however, Korean speech recognition has many errors due to the linguistic characteristics of Korean language, such as Korean Fortis and Korean Liaison, thus research on Korean speech recognition is needed. Furthermore, earlier works primarily focused on editorial distance algorithms and syllable restoration rules, making it difficult to correct the error types of Korean Fortis and Korean Liaison. In this paper, we propose a context-sensitive post-processing model of speech recognition using a LSTM-based sequence-to-sequence model and Bahdanau attention mechanism to correct Korean speech recognition errors caused by the pronunciation. Experiments showed that by using the model, the speech recognition performance was improved from 64% to 77% for Fortis, 74% to 90% for Liaison, and from 69% to 84% for average recognition than before. Based on the results, it seems possible to apply the proposed model to real-world applications based on speech recognition.

Alveolar Fricative Sound Errors by the Type of Morpheme in the Spontaneous Speech of 3- and 4-Year-Old Children (자발화에 나타난 형태소 유형에 따른 3-4세 아동의 치경마찰음 오류)

  • Kim, Soo-Jin;Kim, Jung-Mee;Yoon, Mi-Sun;Chang, Moon-Soo;Cha, Jae-Eun
    • Phonetics and Speech Sciences
    • /
    • v.4 no.3
    • /
    • pp.129-136
    • /
    • 2012
  • Korean alveolar fricatives are late-developing speech sounds. Most previous research on phonemes used individual words or pseudo words to produce sounds, but word-level phonological analysis does not always reflect a child's practical articulation ability. Also, there has been limited research on articulation development looking at speech production by grammatical morphemes despite its importance in Korean language. Therefore, this research examines the articulation development and phonological patterns of the /s/ phoneme in terms of morphological types produced in children's spontaneous conversational speech. The subjects were twenty-two typically developing 3- and 4-year-old Koreans. All children showed normal levels in three screening tests: hearing, vocabulary, and articulation. Spontaneous conversational samples were recorded at the children's homes. The results are as follows. The error rates decreased with increasing age in all morphological contexts. Also, error percentages within an age group were significantly lower in lexical morphemes than in grammatical morphemes. The stopping of fricative sounds was the main error pattern in all morphological contexts and reduced as age increased. This research shows that articulation performance can differ significantly by morphological contexts. The present study provides data that can be used to identify the difficult context for articulatory evaluation and therapy of alveolar fricative sounds.

An Automatic Post-processing Method for Speech Recognition using CRFs and TBL (CRFs와 TBL을 이용한 자동화된 음성인식 후처리 방법)

  • Seon, Choong-Nyoung;Jeong, Hyoung-Il;Seo, Jung-Yun
    • Journal of KIISE:Software and Applications
    • /
    • v.37 no.9
    • /
    • pp.706-711
    • /
    • 2010
  • In the applications of a human speech interface, reducing the error rate in recognition is the one of the main research issues. Many previous studies attempted to correct errors using post-processing, which is dependent on a manually constructed corpus and correction patterns. We propose an automatically learnable post-processing method that is independent of the characteristics of both the domain and the speech recognizer. We divide the entire post-processing task into two steps: error detection and error correction. We consider the error detection step as a classification problem for which we apply the conditional random fields (CRFs) classifier. Furthermore, we apply transformation-based learning (TBL) to the error correction step. Our experimental results indicate that the proposed method corrects a speech recognizer's insertion, deletion, and substitution errors by 25.85%, 3.57%, and 7.42%, respectively.

A New Stylization Method using Least-Square Error Minimization on Segmental Pitch Contour (최소 자승오차 방식을 이용한 세그먼트 피치패턴의 정형화)

  • 이정철
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1994.06c
    • /
    • pp.107-110
    • /
    • 1994
  • In this paper, we describe the features of the fundamental frequency contour of Korean read speech, and propose a new stylization method to characterize the Fø pattern of segments. Our algorithm consists of three stylization processes : the segment level, the syllable level, and the sord level. For stylization of Fø contour in the segment level , we applied least square error minimization method to determine Fø values at initial, medial, and final position in a segment. In the syllable level, we determine the stylized Fø pattern of a syllable using the mean Fø value of each word and style information for each word, syllable and segment, we reconstruct Fø contour of sentences. The simulation results show that the error is less than 10% of the actual Fø contour for each sentence. In perception test, there is little difference between the synthesized speech with the original difference between the synthesized speech with the original Fø contour and the synthesized speech with the stylized Fø contour.

  • PDF

Semantic-oriented Error Correction for Spoken Query Processing (음성 질의 처리를 위한 의미 기반 오류 수정)

  • Jeong Minwoo;Kim Byeongchang;Lee Gary Geunbae
    • Proceedings of the KSPS conference
    • /
    • 2003.10a
    • /
    • pp.153-156
    • /
    • 2003
  • Voice input is often required in many new application environments such as telephone-based information retrieval, car navigation systems, and user-friendly interfaces, but the low success rate of speech recognition makes it difficult to extend its application to new fields. Popular approaches to increase the accuracy of the recognition rate have been researched by post-processing of the recognition results, but previous approaches were mainly lexical-oriented ones in post error correction. We suggest a new semantic-oriented approach to correct both semantic level and lexical errors, which is also more accurate for especially domain-specific speech error correction. Through extensive experiments using a speech-driven in-vehicle telematics information application, we demonstrate the superior performance of our approach and some advantages over previous lexical-oriented approaches.

  • PDF

Performance Improvement ofSpeech Recognition Based on SPLICEin Noisy Environments (SPLICE 방법에 기반한 잡음 환경에서의 음성 인식 성능 향상)

  • Kim, Jong-Hyeon;Song, Hwa-Jeon;Lee, Jong-Seok;Kim, Hyung-Soon
    • MALSORI
    • /
    • no.53
    • /
    • pp.103-118
    • /
    • 2005
  • The performance of speech recognition system is degraded by mismatch between training and test environments. Recently, Stereo-based Piecewise LInear Compensation for Environments (SPLICE) was introduced to overcome environmental mismatch using stereo data. In this paper, we propose several methods to improve the conventional SPLICE and evaluate them in the Aurora2 task. We generalize SPLICE to compensate for covariance matrix as well as mean vector in the feature space, and thereby yielding the error rate reduction of 48.93%. We also employ the weighted sum of correction vectors using posterior probabilities of all Gaussians, and the error rate reduction of 48.62% is achieved. With the combination of the above two methods, the error rate is reduced by 49.61% from the Aurora2 baseline system.

  • PDF

Performance Analysis of a Class of Single Channel Speech Enhancement Algorithms for Automatic Speech Recognition (자동 음성 인식기를 위한 단채널 음질 향상 알고리즘의 성능 분석)

  • Song, Myung-Suk;Lee, Chang-Heon;Lee, Seok-Pil;Kang, Hong-Goo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.29 no.2E
    • /
    • pp.86-99
    • /
    • 2010
  • This paper analyzes the performance of various single channel speech enhancement algorithms when they are applied to automatic speech recognition (ASR) systems as a preprocessor. The functional modules of speech enhancement systems are first divided into four major modules such as a gain estimator, a noise power spectrum estimator, a priori signal to noise ratio (SNR) estimator, and a speech absence probability (SAP) estimator. We investigate the relationship between speech recognition accuracy and the roles of each module. Simulation results show that the Wiener filter outperforms other gain functions such as minimum mean square error-short time spectral amplitude (MMSE-STSA) and minimum mean square error-log spectral amplitude (MMSE-LSA) estimators when a perfect noise estimator is applied. When the performance of the noise estimator degrades, however, MMSE methods including the decision directed module to estimate a priori SNR and the SAP estimation module helps to improve the performance of the enhancement algorithm for speech recognition systems.