• Title/Summary/Keyword: speech error

Search Result 581, Processing Time 0.023 seconds

Word Recognition using Fuzzy Inference based on LPC (선형예측계수에 기초한 퍼지추론 단어 인식)

  • Choi, Seung-Ho;Kim, Hyeong-Geun
    • The Journal of the Acoustical Society of Korea
    • /
    • v.13 no.1
    • /
    • pp.32-41
    • /
    • 1994
  • To solve the frequency variation of speech patterns which consist of LPC sequences, new membership function view from LPC, spectrum and the relations between the order of LPC and spectrum is proposed. To solve the time variation, multi-secation equi-segmentation method which equally divide the speech section into several section are applied. False recognition mainly occur at time when the same syllable is placed at the same utterance. To reduce the error, fuzzy inference is executed using the proposed membership function and weights are assigned into sectional certainty and then the decision method for recognized the section up to the third candidate. To testify the validation of this method, we experimented the recognition test of 28 DDD area names. The recognition rate of the fuzzy inference by the triangle membership function is $92\%$. That of the combined method of the fuzzy inference and the dicision method is $92.9\%$ and that of fuzzy inference by the proposed membership funtion is $93.8\%$.

  • PDF

Speech Recognition Using Formant Bandwidth Normalization (포만트 밴드폭 정규화를 이용한 음성인식)

  • 홍종진;강석건;박군작;박규태
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.16 no.5
    • /
    • pp.458-467
    • /
    • 1991
  • In this paper, the cause of linear prediction error is analysed and the theoretical basis for nomalizing the format bandwidth to 0is given and its validity is verified. The formant and bandwidth in relation to the position of the poles of AR filter are measured for an alaysis of the relation between the pole position and the formant bandwidth. By changing the glottis reflection coefficient to 1. the pole position and the formant bandwidth. By changing the glottis reflection coefficient to 1. the effect of the glottis is eliminated and as the result a new linear preiction coefficients are obtained by normalizing the formant bandwidth of the signal to 0. since these coefficients are symmetrical, the standard deviation is larger than the coefficients with fixed glottis reflection coefficient. The bit rate for speech coding can be reduced by a factor of 2 without any loss of information. Through computer simulation, recognition rate of 96.7% is botained by using the proposed algorithm in recognizing 5 Korean vowels in noisy environment.

  • PDF

Robust Blind Source Separation to Noisy Environment For Speech Recognition in Car (차량용 음성인식을 위한 주변잡음에 강건한 브라인드 음원분리)

  • Kim, Hyun-Tae;Park, Jang-Sik
    • The Journal of the Korea Contents Association
    • /
    • v.6 no.12
    • /
    • pp.89-95
    • /
    • 2006
  • The performance of blind source separation(BSS) using independent component analysis (ICA) declines significantly in a reverberant environment. A post-processing method proposed in this paper was designed to remove the residual component precisely. The proposed method used modified NLMS(normalized least mean square) filter in frequency domain, to estimate cross-talk path that causes residual cross-talk components. Residual cross-talk components in one channel is correspond to direct components in another channel. Therefore, we can estimate cross-talk path using another channel input signals from adaptive filter. Step size is normalized by input signal power in conventional NLMS filter, but it is normalized by sum of input signal power and error signal power in modified NLMS filter. By using this method, we can prevent misadjustment of filter weights. The estimated residual cross-talk components are subtracted by non-stationary spectral subtraction. The computer simulation results using speech signals show that the proposed method improves the noise reduction ratio(NRR) by approximately 3dB on conventional FDICA.

  • PDF

Effects of auditory and visual presentation on phonemic awareness in 5- to 6- year-old children (청각적 말소리 자극과 시각적 글자 자극 제시방법에 따른 5, 6세 일반아동의 음소인식 수행력 비교)

  • Kim, Myung-Heon;Ha, Ji-Wan
    • Phonetics and Speech Sciences
    • /
    • v.8 no.1
    • /
    • pp.71-80
    • /
    • 2016
  • The phonemic awareness tasks (phonemic synthesis, phonemic elision, phonemic segmentation) by auditory presentation and visual presentation were conducted to 40 children who are 5 and 6 years old. The scores and error types in the sub-tasks by two presentations were compared to each other. Also, the correlation between the performances of phonemic awareness sub-tasks in two presentation conditions were examined. As a result, 6-year-old group showed significantly higher phonemic awareness scores than 5-year-old group. Both group showed significantly higher scores in visual presentation than auditory presentation. While the performance under the visual presentation was significantly lower especially in the segmentation than the other two tasks, there was no significant difference among sub-tasks under the auditory presentation. 5-year-old group showed significantly more 'no response' errors than 6-year-old group and 6-year-old group showed significantly more 'phoneme substitution' and 'phoneme omission' errors than 5-year-old group. Significantly more 'phoneme omission' errors were observed in the segmentation than the elision task, and significantly more 'phoneme addition' errors were observed in elision than the synthesis task. Lastly, there are positive correlations in auditory and visual synthesis tasks, auditory and visual elision tasks, and auditory and visual segmentation tasks. Summarizing the results, children tend to depend on orthographic knowledge when acquiring the initial phonemic awareness. Therefore, the result of this research would support the position that the orthographic knowledge affects the improvement of phonemic awareness.

The influence of Chinese high and level tone and rising tone on the pitch of Sino-Korean words pronounced by Chinese learners: Focusing on synonym with the same letters (중국인의 한국어 한자어 발음에서 보이는 중국어 음평과 양평의 영향: 동형동의어를 중심으로)

  • Liu, Si-Yang;Kim, Young-Joo
    • Phonetics and Speech Sciences
    • /
    • v.3 no.3
    • /
    • pp.35-47
    • /
    • 2011
  • The purpose of this study is to examine the influence of Chinese high and level vs. rising tone on the pitch pattern of corresponding Sino-Korean words delivered by Chinese learners of Korean and to examine the aspects how these two tones of corresponding Chinese words affect the pitch patterns of Sino-Korean words. Scope of this research is limited to the Chinese learners of Korean, especially when they pronounce same-form-same-meaning Sino-Korean words. In this study, Chinese learners pronounced both Chinese words and corresponding Sino-Korean words. By using the software learners' pitch pattern were recorded, analyzed, and compared with the tone of corresponding Chinese words. Experimental results showed that Sino-Korean words were affected by Chinese 'high and level tone - high and level tone', 'high and level tone - rising tone', 'high and level tone - falling-rising tone', 'high and level tone - falling tone' and 'rising tone - falling tone' when they started with lenis sounds. On the other hand when Sino-Korean words started with aspirated sounds they were affected by Chinese 'rising tone - high and level tone', 'rising tone - rising tone', 'rising tone - falling-rising tone', 'rising tone - falling tone'. In conclusion, the Chinese learners' pitch patterns of Sino-Korean words are affected by both Chinese high and level & rising tone, especially when Sino-Korean words started with lenis sounds they were more affected by Chinese high and level tone, on the other hand Chinese rising tone influence Sino-Korean words more when they were started with aspirated sounds.

  • PDF

The Influence of Chinese Falling-Rising Tone on the Pitch of Sino-Korean Words Pronounced by Chinese Learners: Focusing on the Partly-Different-Form-Same-Meaning Words (중국어 상성이 중국인의 한자어 발음에 미치는 영향 연구: 부분이형동의어를 중심으로)

  • Liu, Si Yang;Kim, Young-Joo
    • Phonetics and Speech Sciences
    • /
    • v.4 no.2
    • /
    • pp.21-31
    • /
    • 2012
  • The purpose of this study is to find the influence of Chinese falling-rising tone on the pitch pattern of corresponding partly-different-form-same-meaning Sino-Korean words delivered by Chinese learners of Korean and to examine how the falling-rising tone of corresponding Chinese words affects the pitch patterns of Sino-Korean words. The scope of this research is limited to Chinese learners of Korean, especially on two groups of Sino-Korean words - AB:CB type and AB:AC type that the are second-most frequently occuring different-form-same-meaning Sino-Korean words. In this study, Chinese learners pronounced both Chinese words and corresponding Sino-Korean words. Learners' pitch patterns were recorded and analyzed using software and compared with the tone of corresponding Chinese words. Experimental results showed that AB:CB type Sino-Korean words were not affected by Chinese 'falling-rising tone - high and level tone'. As well as AB:CB type, experimental results showed there were no significant influence on the pitch pattern of AB:AC type Sino-Korean words by Chinese falling-rising tone. But it was clear that Chinese learners' made pitch errors on both AB:CB type and AB:AC type Sino-Korean words. In conclusion, the Chinese learners' pitch patterns of partly-different-form-same-meaning Sino-Korean words are different from Korean native speakers', but their pitch errors cannot be attributed to Chinese falling-rising tone.

Performance Evaluation of Reverse Link for Speech and Data Traffic ini CDMA-Based IMT-2000 System (CDMA 방식의 IMT-2000 시스템에서 음성 및 데이터 트래픽에 대한 역방향링크의 성능 평가)

  • Lee, Hyun;Kang, Bob-Joo;You, Young-Gap;Cho, Kyoung-Rok
    • The Journal of Korean Institute of Electromagnetic Engineering and Science
    • /
    • v.11 no.4
    • /
    • pp.657-665
    • /
    • 2000
  • In this study, the bit error rate(BER) performance for the speech and data traffic is evaluated by results of the reverse link simulation of CDMA-based IMT-2000. Simulations in the reverse link are achieved for indoor, pedestrian, and vehicular environments, which are provided by ITU-R . Also, in the these simulations, the fast power control of 1.6kHz rate is applied. The amplitude and phase of the fading signal are estimated by using the 5-tap FIR filter, and the soft-decision Viterbi and Reed-Solomon (RS) decoding are applied. Simulation results provide the optimum ratio of pilot power to traffic power, the BER performance according to the number of fingers, and performance comparison between convolutional code and concatenated code at $10^-6$ BER in 5 MHz system.

  • PDF

Automatic Generation of Concatenate Morphemes for Korean LVCSR (대어휘 연속음성 인식을 위한 결합형태소 자동생성)

  • 박영희;정민화
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.4
    • /
    • pp.407-414
    • /
    • 2002
  • In this paper, we present a method that automatically generates concatenate morpheme based language models to improve the performance of Korean large vocabulary continuous speech recognition. The focus was brought into improvement against recognition errors of monosyllable morphemes that occupy 54% of the training text corpus and more frequently mis-recognized. Knowledge-based method using POS patterns has disadvantages such as the difficulty in making rules and producing many low frequency concatenate morphemes. Proposed method automatically selects morpheme-pairs from training text data based on measures such as frequency, mutual information, and unigram log likelihood. Experiment was performed using 7M-morpheme text corpus and 20K-morpheme lexicon. The frequency measure with constraint on the number of morphemes used for concatenation produces the best result of reducing monosyllables from 54% to 30%, bigram perplexity from 117.9 to 97.3. and MER from 21.3% to 17.6%.

Normalization of Spectral Magnitude and Cepstral Transformation for Compensation of Lombard Effect (롬바드 효과의 보정을 위한 스펙트럼 크기의 정규화와 켑스트럼 변환)

  • Chi, Sang-Mun;Oh, Yung-Hwan
    • The Journal of the Acoustical Society of Korea
    • /
    • v.15 no.4
    • /
    • pp.83-92
    • /
    • 1996
  • This paper describes Lombard effect compensation and noise suppression so as to reduce speech recognition error in noisy environments. Lombard effect is represented by the variation of spectral envelope of energy normalized word and the variation of overall vocal intensity. The variation of spectral envelope can be compensated by linear transformation in cepstral domain. The variation of vocal intensity is canceled by spectral magnitude normalization. Spectral subtraction is use to suppress noise contamination, and band-pass filtering is used to emphasize dynamic features. To understand Lombard effect and verify the effectiveness of the proposed method, speech data are collected in simulated noisy environments. Recognition experiments were conducted with contamination by noise from automobile cabins, an exhibition hall, telephone booths in down town, crowded streets, and computer rooms. From the experiments, the effectiveness of the proposed method has been confirmed.

  • PDF

Part-Of-Speech Tagging using multiple sources of statistical data (이종의 통계정보를 이용한 품사 부착 기법)

  • Cho, Seh-Yeong
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.18 no.4
    • /
    • pp.501-506
    • /
    • 2008
  • Statistical POS tagging is prone to error, because of the inherent limitations of statistical data, especially single source of data. Therefore it is widely agreed that the possibility of further enhancement lies in exploiting various knowledge sources. However these data sources are bound to be inconsistent to each other. This paper shows the possibility of using maximum entropy model to Korean language POS tagging. We use as the knowledge sources n-gram data and trigger pair data. We show how perplexity measure varies when two knowledge sources are combined using maximum entropy method. The experiment used a trigram model which produced 94.9% accuracy using Hidden Markov Model, and showed increase to 95.6% when combined with trigger pair data using Maximum Entropy method. This clearly shows possibility of further enhancement when various knowledge sources are developed and combined using ME method.