Search | Korea Science

Robust Speech Recognition Using Missing Data Theory (손실 데이터 이론을 이용한 강인한 음성 인식)

김락용;조훈영;오영환
- The Journal of the Acoustical Society of Korea
- /
- v.20 no.3
- /
- pp.56-62
- /
- 2001
In this paper, we adopt a missing data theory to speech recognition. It can be used in order to maintain high performance of speech recognizer when the missing data occurs. In general, hidden Markov model (HMM) is used as a stochastic classifier for speech recognition task. Acoustic events are represented by continuous probability density function in continuous density HMM(CDHMM). The missing data theory has an advantage that can be easily applicable to this CDHMM. A marginalization method is used for processing missing data because it has small complexity and is easy to apply to automatic speech recognition (ASR). Also, a spectral subtraction is used for detecting missing data. If the difference between the energy of speech and that of background noise is below given threshold value, we determine that missing has occurred. We propose a new method that examines the reliability of detected missing data using voicing probability. The voicing probability is used to find voiced frames. It is used to process the missing data in voiced region that has more redundant information than consonants. The experimental results showed that our method improves performance than baseline system that uses spectral subtraction method only. In 452 words isolated word recognition experiment, the proposed method using the voicing probability reduced the average word error rate by 12％ in a typical noise situation.
PDF

Study of Speech Recognition System Using the Java (자바를 이용한 음성인식 시스템에 관한 연구)

Choi, Kwang-Kook;Kim, Cheol;Choi, Seung-Ho;Kim, Jin-Young
- The Journal of the Acoustical Society of Korea
- /
- v.19 no.6
- /
- pp.41-46
- /
- 2000
In this paper, we implement the speech recognition system based on the continuous distribution HMM and Browser-embedded model using the Java. That is developed for the speech analysis, processing and recognition on the Web. Client sends server through the socket to the speech informations that extracting of end-point detection, MFCC, energy and delta coefficients using the Java Applet. The sewer consists of the HMM recognizer and trained DB which recognizes the speech and display the recognized text back to the client. Because of speech recognition system using the java is high error rate, the platform is independent of system on the network. But the meaning of implemented system is merged into multi-media parts and shows new information and communication service possibility in the future.
PDF

Implementation of Hands-Free Phone in a Car Using DSP (DSP를 이용한 차량용 핸즈프리 전화기의 구현)

Hong, Ki-Jun;Roh, Yi-Ju;Jeong, Kyung-Hoon;Kang, Dong-Wook;Yun, Kee-Bang;Kim, Ki-Doo
- 전자공학회논문지 IE
- /
- v.44 no.4
- /
- pp.1-10
- /
- 2007
In this thesis, we study the implementation of hands-free phone in a car, taking acoustic echo canceller, in order to remove acoustic echo effectively. Conventional coustic echo canceller used for only adaptive filtering has much difficulty to solve both echo and double-talk problem. To tackle this problem, we propose acoustic echo canceller consisting of adaptive filter using a modified NLMS, VAD to catch exact voice activity duration using two independent forgetting factors, double-talk detector to detect fast and precise double talk duration using cross-correlation between microphone signal and residual echo, and output controller using VAD and double-talk detector. The proposed hands-free phone taking acoustic echo canceller shows the performance that has not acoustic echo and guarantees full duplex.
PDF KSCI

A Study of Keyword Spotting System Based on the Weight of Non-Keyword Model (비핵심어 모델의 가중치 기반 핵심어 검출 성능 향상에 관한 연구)

Kim, Hack-Jin;Kim, Soon-Hyub
- The KIPS Transactions:PartB
- /
- v.10B no.4
- /
- pp.381-388
- /
- 2003
This paper presents a method of giving weights to garbage class clustering and Filler model to improve performance of keyword spotting system and a time-saving method of dialogue speech processing system for keyword spotting by calculating keyword transition probability through speech analysis of task domain users. The point of the method is grouping phonemes with phonetic similarities, which is effective in sensing similar phoneme groups rather than individual phonemes, and the paper aims to suggest five groups of phonemes obtained from the analysis of speech sentences in use in Korean morphology and in stock-trading speech processing system. Besides, task-subject Filler model weights are added to the phoneme groups, and keyword transition probability included in consecutive speech sentences is calculated and applied to the system in order to save time for system processing. To evaluate performance of the suggested system, corpus of 4,970 sentences was built to be used in task domains and a test was conducted with subjects of five people in their twenties and thirties. As a result, FOM with the weights on proposed five phoneme groups accounts for 85%, which has better performance than seven phoneme groups of Yapanel [1] with 88.5% and a little bit poorer performance than LVCSR with 89.8%. Even in calculation time, FOM reaches 0.70 seconds than 0.72 of seven phoneme groups. Lastly, it is also confirmed in a time-saving test that time is saved by 0.04 to 0.07 seconds when keyword transition probability is applied.
https://doi.org/10.3745/KIPSTB.2003.10B.4.381 인용 PDF KSCI

Speech detection using the probability of spectral occupancy (주파수 차지확률을 이용한 음성검출기 제안)

Hong, Seong-Bong;Ki, Tae-Young;Kim, Nam-Soo;Kim, Taejeong
- Proceedings of the IEEK Conference
- /
- 2000.09a
- /
- pp.171-174
- /
- 2000
In this paper, we improve statistical-model-based speech detector using the probability that a speech occupies a frequency bin. While the previous method assumes speech energy occupies all the frequency components and use them with equal weights in the likelihood ratio test for speech detection, the proposed method assumes speech energy occupies just some frequency componets and use them with different weights in accordance with the probabilities of spectral occupancy in the test. The probability is iteratively up-dated for speech frames to contribute to the likelihood ratio test. The proposed method well reflects the characteristic distribution of speech spectrum, and yields better detection performance.
PDF

Improvement of Confidence Measure Performance in Keyword Spotting using Background Model Set Algorithm (BMS 알고리즘을 이용한 핵심어 검출기 거절기능 성능 향상 실험)

Kim Byoung-Don;Kim Jin-Young;Choi Seung-Ho
- MALSORI
- /
- no.46
- /
- pp.103-115
- /
- 2003
In this paper, we proposed Background Model Set algorithm used in the speaker verification to improve calculating confidence measure(CM) in speech recognition. CM is to display relative likelihood between recognized models and antiphone models. In previous method calculating of CM, we calculated probability and standard deviation using all phonemes in composition of antiphone models. At this process, antiphone CM brought bad recognition result. Also, recognition time increases. In order to solve this problem, we studied about method to reconstitute average and standard deviation using BMS algorithm in CM calculation.
PDF

Virtual Dialog System Based on Multimedia Signal Processing for Smart Home Environments (멀티미디어 신호처리에 기초한 스마트홈 가상대화 시스템)

Kim, Sung-Ill;Oh, Se-Jin
- Journal of the Korean Institute of Intelligent Systems
- /
- v.15 no.2
- /
- pp.173-178
- /
- 2005
This paper focuses on the use of the virtual dialog system whose aim is to build more convenient living environments. In order to realize this, the main emphasis of the paper lies on the description of the multimedia signal processing on the basis of the technologies such as speech recognition, speech synthesis, video, or sensor signal processing. For essential modules of the dialog system, we incorporated the real-time speech recognizer based on HM-Net(Hidden Markov Network) as well as speech synthesis into the overall system. In addition, we adopted the real-time motion detector based on the changes of brightness in pixels, as well as the touch sensor that was used to start system. In experimental evaluation, the results showed that the proposed system was relatively easy to use for controlling electric appliances while sitting in a sofa, even though the performance of the system was not better than the simulation results owing to the noisy environments.
https://doi.org/10.5391/JKIIS.2005.15.2.173 인용 PDF KSCI

A Gain Control Algorithm of Low Computational Complexity based on Voice Activity Detection (음성 검출 기반의 저연산 이득 제어 알고리즘)

Kim, Sang-Kuyn;Cho, Woo-Hyeong;Jeong, Min-A;Kwon, Jang-Woo;Lee, Sangmin
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.40 no.5
- /
- pp.924-930
- /
- 2015
In this paper, we propose a novel approach of low computational complexity to improve the speech quality of the small acoustic equipment in noisy environment. The conventional gain control algorithm suppresses the noise of input signal, and then the part of wide dynamic range compression (WDRC) amplifies the undesired signal. The proposed algorithm controls the gain of hearing aids according to speech present probability by using the output of a voice activity detection (VAD). The performance of the proposed scheme is evaluated under various noise conditions by using objective measurement and yields superior results compared with the conventional algorithm.
https://doi.org/10.7840/kics.2015.40.5.924 인용 PDF KSCI

Multi-channel input-based non-stationary noise cenceller for mobile devices (이동형 단말기를 위한 다채널 입력 기반 비정상성 잡음 제거기)

Jeong, Sang-Bae;Lee, Sung-Doke
- Journal of the Korean Institute of Intelligent Systems
- /
- v.17 no.7
- /
- pp.945-951
- /
- 2007
Noise cancellation is essential for the devices which use speech as an interface. In real environments, speech quality and recognition rates are degraded by the auditive noises coming near the microphone. In this paper, we propose a noise cancellation algorithm using stereo microphones basically. The advantage of the use of multiple microphones is that the direction information of the target source could be applied. The proposed noise canceller is based on the Wiener filter. To estimate the filter, noise and target speech frequency responses should be known and they are estimated by the spectral classification in the frequency domain. The performance of the proposed algorithm is compared with that of the well-known Frost algorithm and the generalized sidelobe canceller (GSC) with an adaptation mode controller (AMC). As performance measures, the perceptual evaluation of speech quality (PESQ), which is the most widely used among various objective speech quality methods, and speech recognition rates are adopted.
https://doi.org/10.5391/JKIIS.2007.17.7.945 인용 PDF KSCI

Audio-Visual Fusion for Sound Source Localization and Improved Attention (음성-영상 융합 음원 방향 추정 및 사람 찾기 기술)

Lee, Byoung-Gi;Choi, Jong-Suk;Yoon, Sang-Suk;Choi, Mun-Taek;Kim, Mun-Sang;Kim, Dai-Jin
- Transactions of the Korean Society of Mechanical Engineers A
- /
- v.35 no.7
- /
- pp.737-743
- /
- 2011
Service robots are equipped with various sensors such as vision camera, sonar sensor, laser scanner, and microphones. Although these sensors have their own functions, some of them can be made to work together and perform more complicated functions. AudioFvisual fusion is a typical and powerful combination of audio and video sensors, because audio information is complementary to visual information and vice versa. Human beings also mainly depend on visual and auditory information in their daily life. In this paper, we conduct two studies using audioFvision fusion: one is on enhancing the performance of sound localization, and the other is on improving robot attention through sound localization and face detection.
https://doi.org/10.3795/KSME-A.2011.35.7.737 인용 PDF KSCI

Search Result 137, Processing Time 0.024 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)