Search | Korea Science

Robust Speech Recognition Using Missing Data Theory (손실 데이터 이론을 이용한 강인한 음성 인식)

김락용;조훈영;오영환
- The Journal of the Acoustical Society of Korea
- /
- v.20 no.3
- /
- pp.56-62
- /
- 2001
In this paper, we adopt a missing data theory to speech recognition. It can be used in order to maintain high performance of speech recognizer when the missing data occurs. In general, hidden Markov model (HMM) is used as a stochastic classifier for speech recognition task. Acoustic events are represented by continuous probability density function in continuous density HMM(CDHMM). The missing data theory has an advantage that can be easily applicable to this CDHMM. A marginalization method is used for processing missing data because it has small complexity and is easy to apply to automatic speech recognition (ASR). Also, a spectral subtraction is used for detecting missing data. If the difference between the energy of speech and that of background noise is below given threshold value, we determine that missing has occurred. We propose a new method that examines the reliability of detected missing data using voicing probability. The voicing probability is used to find voiced frames. It is used to process the missing data in voiced region that has more redundant information than consonants. The experimental results showed that our method improves performance than baseline system that uses spectral subtraction method only. In 452 words isolated word recognition experiment, the proposed method using the voicing probability reduced the average word error rate by 12％ in a typical noise situation.
PDF

사후 확률.확률 밀도 함수의 추정과 Probabilistic neural network을 이요한 모음 인식에 의한 평가

허강인;이광석;김명기
- The Journal of the Acoustical Society of Korea
- /
- v.12 no.6
- /
- pp.21-27
- /
- 1993
계층형 신경망은 패턴 분류를 위해 사용되어 왔다. 이것은 주어진 교사패턴들의 학습으로 원하는 입력-출력 간의 매핑을 할 수 있기 때문이다. 신경망은 타겟ㅌ트 패턴이 입력 패턴의 카테고리에 일치할 때 타겟트 패턴을 학습하므로서 사후 확률을 근사화할 수 있다. 그리고 입력 공간을 부분 공간으로 나누어 학습 데이터들의 비율로서 만든 타겟트 벡터들로 학습한 신경망은 확률밀도 함수를 나타낼 수 있다. 본 연구에서는 역전파 학습법을 이용한 계층형 NN 과 코드북으로서 사후 확률과 확률밀도함수의 측정방법을 제안하였다. VQ 로 추정한 사후확률고 확률밀도함수를 이용하여 학습이 필요없는 RBF network 의 일종인 PNN으로 모음 인식을 수행 하였다. 인식 실험에서 PNN 의 결과는 역전파 학습법을 이용항 3층 신경망과 VQ 의 평균 인식율과 비교되었다. VQ-PNN의 인식율이 다른 것보다 우수하게 나타났다.
PDF

Visualization of Korean Speech Based on the Distance of Acoustic Features (음성특징의 거리에 기반한 한국어 발음의 시각화)

Pok, Gou-Chol
- The Journal of Korea Institute of Information, Electronics, and Communication Technology
- /
- v.13 no.3
- /
- pp.197-205
- /
- 2020
Korean language has the characteristics that the pronunciation of phoneme units such as vowels and consonants are fixed and the pronunciation associated with a notation does not change, so that foreign learners can approach rather easily Korean language. However, when one pronounces words, phrases, or sentences, the pronunciation changes in a manner of a wide variation and complexity at the boundaries of syllables, and the association of notation and pronunciation does not hold any more. Consequently, it is very difficult for foreign learners to study Korean standard pronunciations. Despite these difficulties, it is believed that systematic analysis of pronunciation errors for Korean words is possible according to the advantageous observations that the relationship between Korean notations and pronunciations can be described as a set of firm rules without exceptions unlike other languages including English. In this paper, we propose a visualization framework which shows the differences between standard pronunciations and erratic ones as quantitative measures on the computer screen. Previous researches only show color representation and 3D graphics of speech properties, or an animated view of changing shapes of lips and mouth cavity. Moreover, the features used in the analysis are only point data such as the average of a speech range. In this study, we propose a method which can directly use the time-series data instead of using summary or distorted data. This was realized by using the deep learning-based technique which combines Self-organizing map, variational autoencoder model, and Markov model, and we achieved a superior performance enhancement compared to the method using the point-based data.
https://doi.org/10.17661/jkiiect.2020.13.3.197 인용 PDF KSCI

Statistical Analysis for Path Break-Up Time of Mobile Wireless Networks (이동 무선망의 경로 붕괴시간에 대한 통계적 분석)

Ahn, Hong-Young
- The Journal of the Institute of Internet, Broadcasting and Communication
- /
- v.15 no.5
- /
- pp.113-118
- /
- 2015
Mobile wireless networks have received a lot of attention as a future wireless network due to its rapid deployment without communication infrastructure. In these networks communication path between two arbitrary nodes break down because some links in the path are beyond transmission range($r_0$) due to the mobility of the nodes. The set of total path break down time(${\bigcup}T_i$), which is the union of path break down time of every node pair, can be a good measure of the connectivity of the dynamic mobile wireless network. In this paper we show that the distribution of the total path break down time can be approximated as a exponential probability density function and confirms it through experimental data. Statistical knowledge of break down time enables quantitative prediction of delay, packet loss between two nodes, thus provides confidence in the simulation results of mobile wireless networks.
https://doi.org/10.7236/JIIBC.2015.15.5.113 인용 PDF KSCI

Search Result 4, Processing Time 0.015 seconds

Robust Speech Recognition Using Missing Data Theory (손실 데이터 이론을 이용한 강인한 음성 인식)

사후 확률.확률 밀도 함수의 추정과 Probabilistic neural network을 이요한 모음 인식에 의한 평가

Visualization of Korean Speech Based on the Distance of Acoustic Features (음성특징의 거리에 기반한 한국어 발음의 시각화)

Statistical Analysis for Path Break-Up Time of Mobile Wireless Networks (이동 무선망의 경로 붕괴시간에 대한 통계적 분석)

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)