Search | Korea Science

Fractal Dimension Method for Connected-digit Recognition (연속음 처리를 위한 프랙탈 차원 방법 고찰)

Kim, Tae-Sik
- Speech Sciences
- /
- v.10 no.2
- /
- pp.45-55
- /
- 2003
Strange attractor can be used as a presentation method for signal processing. Fractal dimension is well known method that extract features from attractor. Even though the method provides powerful capabilities for speech processing, there is drawback which should be solved in advance. Normally, the size of the raw signal should be long enough for processing if we use the fractal dimension method. However, in the area of connected-digits problem, normally, syllable or semi-syllable based processing is applied. In this case, there is no evidence that we have sufficient data or not to extract characteristics of attractor. This paper discusses the relationship between the size of the signal data and the calculation result of fractal dimension, and also discusses the efficient way to be applied to connected-digit recognition.
PDF

A Study on Spoken Digits Analysis and Recognition (숫자음 분석과 인식에 관한 연구)

김득수;황철준
- Journal of Korea Society of Industrial Information Systems
- /
- v.6 no.3
- /
- pp.107-114
- /
- 2001
This paper describes Connected Digit Recognition with Considering Acoustic Feature in Korea. The recognition rate of connected digit is usually lower than word recognition. Therefore, speech feature parameter and acoustic feature are employed to make robust model for digit, and we could confirm the effect of Considering. Acoustic Feature throughout the experience of recognition. We used KLE 4 connected digit as database and 19 continuous distributed HMM as PLUs(Phoneme Like Units) using phonetical rules. For recognition experience, we have tested two cases. The first case, we used usual method like using Mel-Cepstrum and Regressive Coefficient for constructing phoneme model. The second case, we used expanded feature parameter and acoustic feature for constructing phoneme model. In both case, we employed OPDP(One Pass Dynamic Programming) and FSA(Finite State Automata) for recognition tests. When appling FSN for recognition, we applied various acoustic features. As the result, we could get 55.4% recognition rate for Mel-Cepstrum, and 67.4% for Mel-Cepstrum and Regressive Coefficient. Also, we could get 74.3% recognition rate for expanded feature parameter, and 75.4% for applying acoustic feature. Since, the case of applying acoustic feature got better result than former method, we could make certain that suggested method is effective for connected digit recognition in korean.
PDF

An Isolated Word Recognition Using the Mellin Transform (Mellin 변환을 이용한 격리 단어 인식)

김진만;이상욱;고세문
- Journal of the Korean Institute of Telematics and Electronics
- /
- v.24 no.5
- /
- pp.905-913
- /
- 1987
This paper presents a speaker dependent isolated digit recognition algorithm using the Mellin transform. Since the Mellin transform converts a scale information into a phase information, attempts have been made to utilize this scale invariance property of the Mellin transform in order to alleviate a time-normalization procedure required for a speech recognition. It has been found that good results can be obtained by taking the Mellin transform to the features such as a ZCR, log energy, normalized autocorrelation coefficients, first predictor coefficient and normalized prediction error. We employed a difference function for evaluating a similarity between two patterns. When the proposed algorithm was tested on Korean digit words, a recognition rate of 83.3% was obtained. The recognition accuracy is not compatible with the other technique such as LPC distance however, it is believed that the Mellin transform can effectively perform the time-normalization processing for the speech recognition.
PDF

Self-Adaptation Algorithm Based on Maximum A Posteriori Eigenvoice for Korean Connected Digit Recognition (한국어 연결 숫자음 인식을 일한 최대 사후 Eigenvoice에 근거한 자기적응 기법)

Kim Dong Kook;Jeon Hyung Bae
- The Journal of the Acoustical Society of Korea
- /
- v.23 no.8
- /
- pp.590-596
- /
- 2004
This paper Presents a new self-adaptation algorithm based on maximum a posteriori (MAP) eigenvoice for Korean connected digit recognition. The proposed MAP eigenvoice is developed by introducing a probability density model for the eigenvoice coefficients. The Proposed approach provides a unified framework that incorporates the Prior model into the conventional eigenvoice estimation. In self-adaptation system we use only one adaptation utterance that will be recognized, we use MAP eigenvoice that is most robust adaptation. In series of self-adaptation experiments on the Korean connected digit recognition task. we demonstrate that the performance of the proposed approach is better than that of the conventional eigenvoice algorithm for a small amount of adaptation data.
PDF KSCI

KORAN DIGIT RECOGNITION IN NOISE ENVIRONMENT USING SPECTRAL MAPPING TRAINING

Ki Young Lee
- Proceedings of the Acoustical Society of Korea Conference
- /
- 1994.06a
- /
- pp.1015-1020
- /
- 1994
This paper presents the Korean digit recognition method under noise environment using the spectral mapping training based on static supervised adaptation algorithm. In the presented recognition method, as a result of spectral mapping from one space of noisy speech spectrum to another space of speech spectrum without noise, spectral distortion of noisy speech is improved, and the recognition rate is higher than that of the conventional method using VQ and DTW without noise processing, and even when SNR level is 0 dB, the recognition rate is 10 times of that using the conventional method. It has been confirmed that the spectral mapping training has an ability to improve the recognition performance for speech in noise environment.
PDF

Recognition of Korean Connected Digit Telephone Speech Using the Training Data Based Temporal Filter (훈련데이터 기반의 temporal filter를 적용한 4연숫자 전화음성 인식)

Jung, Sung-Yun;Bae, Keun-Sung
- MALSORI
- /
- no.53
- /
- pp.93-102
- /
- 2005
The performance of a speech recognition system is generally degraded in telephone environment because of distortions caused by background noise and various channel characteristics. In this paper, data-driven temporal filters are investigated to improve the performance of a specific recognition task such as telephone speech. Three different temporal filtering methods are presented with recognition results for Korean connected-digit telephone speech. Filter coefficients are derived from the cepstral domain feature vectors using the principal component analysis. According to experimental results, the proposed temporal filtering method has shown slightly better performance than the previous ones.
PDF

The Relationship between Neurocognitive Functioning and Emotional Recognition in Chronic Schizophrenic Patients (만성 정신분열병 환자들의 인지 기능과 정서 인식 능력의 관련성)

Hwang, Hye-Li;Hwang, Tae-Yeon;Lee, Woo-Kyung;Han, Eun-Sun
- Korean Journal of Biological Psychiatry
- /
- v.11 no.2
- /
- pp.155-164
- /
- 2004
Objective:The present study examined the association between basic neurocognitive functions and emotional recognition in chronic schizophrenia. Furthermore, to Investigate cognitive variable related to emotion recognition in Schizophrenia. Methods:Forty eight patients from the Yongin Psychiatric Rehabilitation Center were evaluated for neurocognitive function, and Emotional Recognition Test which has four subscales finding emotional clue, discriminating emotions, understanding emotional context and emotional capacity. Measures of neurocognitive functioning were selected based on hypothesized relationships to perception of emotion. These measures included:1) Letter Number Sequencing Test, a measure of working memory;2) Word Fluency and Block Design, a measure of executive function;3) Hopkins Verbal Learning Test-Korean version, a measure of verbal memory;4) Digit Span, a measure of immediate memory;5) Span of Apprehension Task, a measure of early visual processing, visual scanning;6) Continuous Performance Test, a measure of sustained attention functioning. Correlation analyses between specific neurocognitive measures and emotional recognition test were made. To examine the degree to which neurocognitive performance predicting emotional recognition, hierarchical regression analyses were also made. Results:Working memory, and verbal memory were closely related with emotional discrimination. Working memory, Span of Apprehension and Digit Span were closely related with contextual recognition. Among cognitive measures, Span of Apprehension, Working memory, Digit Span were most important variables in predicting emotional capacity. Conclusion:These results are relevant considering that emotional information processing depends, in part, on the abilities to scan the context and to use immediate working memory. These results indicated that mul- tifaceted cognitive training program added with Emotional Recognition Task(Cognitive Behavioral Rehabilitation Therapy added with Emotional Management Program) are promising.
PDF

Auditory Recognition of Digit-in-Noise under Unaided and Aided Conditions in Moderate and Severe Sensorineural Hearing Loss

Aghasoleimani, Mina;Jalilvand, Hamid;Mahdavi, Mohammad Ebrahim;Ahmadi, Roghayeh
- Journal of Audiology & Otology
- /
- v.25 no.2
- /
- pp.72-79
- /
- 2021
Background and Objectives: The speech-in-noise test is typically performed using an audiometer. The results of the digit-in-noise recognition (DIN) test may be influenced by the flat frequency response of free-field audiometry and frequency of the hearing aid fit based on fitting rationale. This study aims to investigate the DIN test in unaided and aided conditions. Subjects and Methods: Thirty four adults with moderate and severe sensorineural hearing loss (SNHL) participated in the study. The signal-to-noise ratio (SNR) for 50% of the DIN test was obtained in the following two conditions: 1) the unaided condition, performed using an audiometer in a free field; and 2) aided condition, performed using a hearing aid with an unvented individual earmold that was fitted based on NAL-NL2. Results: There was a statistically significant elevation in the mean SNR for the severe SNHL group in both test conditions when compared with that of the moderate SNHL group. In both groups, the SNR for the aided condition was significantly lower than that of the unaided condition. Conclusions: Speech recognition in hearing-impaired patients can be realized by fitting hearing aids based on evidence-based fitting rationale rather than by measuring it using free-field audiometry measurement that is utilized in a routine clinic setup.
https://doi.org/10.7874/jao.2020.00094 인용

Korean Digit Speech Recognition Dialing System using Filter Bank (필터뱅크를 이용한 한국어 숫자음 인식 다이얼링 시스템)

박기영;최형기;김종교
- Journal of the Institute of Electronics Engineers of Korea TE
- /
- v.37 no.5
- /
- pp.62-70
- /
- 2000
In this study, speech recognition for Korean digit is performed using filter bank which is programmed discrete HMM and DTW. Spectral analysis reveals speech signal features which are mainly due to the shape of the vocal tract. And spectral feature of speech are generally obtained as the exit of filter banks, which properly integrated a spectrum at defined frequency ranges. A set of 8 band pass filters is generally used since it simulates human ear processing. And defined frequency ranges are 320-330, 450-460, 640-650, 840-850, 900-1000, 1100-1200, 2000-2100, 3900-4000Hz and then sampled at 8kHz of sampling rate. Frame width is 20ms and period is 10ms. Accordingly, we found that the recognition rate of DTW is better than HMM for Korean digit speech in the experimental result. Recognition accuracy of Korean digit speech using filter bank is 93.3% for the 24th BPF, 89.1% for the 16th BPF and 88.9% for the 8th BPF of hardware realization of voice dialing system.
PDF

Performance Improvement of Korean Connected Digit Recognition Based on Acoustic Parameters (음향학적 파라메터를 이용한 한국어 연결숫자인식의 성능개선)

김승희;김형순
- The Journal of the Acoustical Society of Korea
- /
- v.18 no.5
- /
- pp.58-62
- /
- 1999
This paper proposes use of acoustic parameters to improve the discriminability among digit models in Korean connected digit recognition. The proposed method used the logarithmic values of energy ratio between the predetermined frequency bands as additional feature parameters, based on the acoustic-phonetic knowledge. The results of our experiment show that the proposed method reduced the error rate by 46% in comparison with the baseline system. And incorporation of channel compensation technique in the proposed method yielded error reduction of about 69%.
PDF

Search Result 138, Processing Time 0.024 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)