Search | Korea Science

Robust Speech Recognition Using Real-Time Higher Order Statistics Normalization (고차통계 정규화를 이용한 강인한 음성인식)

Jeong, Ju-Hyun;Song, Hwa-Jeon;Kim, Hyung-Soon
- MALSORI
- /
- no.54
- /
- pp.63-72
- /
- 2005
The performance of speech recognition system is degraded by the mismatch between training and test environments. Many studies have been presented to compensate for noise components in the cepstral domain. Recently, higher order cepstral moment normalization method has been introduced to improve recognition accuracy. In this paper, we present real-time high order moment normalization method with post-processing smoothing filter to reduce the parameter estimation error in higher order moment computation. In experiments using Aurora2 database, we obtained error rate reduction of 44.7% with proposed algorithm in comparison with baseline system.
PDF

Design and Implementation of Multimodal Middleware for Mobile Environments (모바일 환경을 위한 멀티모달 미들웨어의 설계 및 구현)

Park, Seong-Soo;Ahn, Se-Yeol;Kim, Won-Woo;Koo, Myoung-Wan;Park, Sung-Chan
- MALSORI
- /
- no.60
- /
- pp.125-144
- /
- 2006
W3C announced a standard software architecture for multimodal context-aware middleware that emphasizes modularity and separates structure, contents, and presentation. We implemented a distributed multimodal interface system followed the W3C architecture, based on SCXML. SCXML uses parallel states to invoke both XHTML and VoiceXML contents as well as to gather composite or sequential multimodal inputs through man-machine interactions. We also hire Delivery Context Interface(DCI) module and an external service bundle enabling middleware to support context-awareness services for real world environments. The provision of personalized user interfaces for mobile devices is expected to be used for different devices with a wide variety of capabilities and interaction modalities. We demonstrated the implemented middleware could maintain multimodal scenarios in a clear, concise and consistent manner by some experiments.
PDF

On-line model compensation using noise masking effect for robust speech recognition (잡음 차폐를 이용한 온라인 모델 보상)

Jung Gue-Jun;Cho Hoon-Young;Oh Yung-Hwan
- Proceedings of the KSPS conference
- /
- 2003.05a
- /
- pp.215-218
- /
- 2003
In this paper we apply PMC (parallel model combination) to speech recognition system online. As a representative of model based noise compensation techniques, PMC compensates environmental mismatch by combining pretrained clean speech models and real-time estimated noise information. This is very effective approach for compensating extreme environmental mismatch but is inadequate to use in on-line system for heavy computational cost. To reduce the computational cost and to apply PMC online, we use a noise masking effect - the energy in a frequency band is dominated either by clean speech energy or by noise energy - in the process of model compensation. Experiments on artificially produced noisy speech data confirm that the proposed technique is fast and effective for the on-line model compensation.
PDF

A Corpus Selection Based Approach to Language Modeling for Large Vocabulary Continuous Speech Recognition (대용량 연속 음성 인식 시스템에서의 코퍼스 선별 방법에 의한 언어모델 설계)

Oh, Yoo-Rhee;Yoon, Jae-Sam;kim, Hong-Kook
- Proceedings of the KSPS conference
- /
- 2005.11a
- /
- pp.103-106
- /
- 2005
In this paper, we propose a language modeling approach to improve the performance of a large vocabulary continuous speech recognition system. The proposed approach is based on the active learning framework that helps to select a text corpus from a plenty amount of text data required for language modeling. The perplexity is used as a measure for the corpus selection in the active learning. From the recognition experiments on the task of continuous Korean speech, the speech recognition system employing the language model by the proposed language modeling approach reduces the word error rate by about 6.6 % with less computational complexity than that using a language model constructed with randomly selected texts.
PDF

Robust speech recognition in car environment with echo canceller (반향제거기를 갖는 자동차 실내 환경에서의 음성인식)

Park, Chul-Ho;Heo, Won-Chul;Bae, Keun-Sung
- Proceedings of the KSPS conference
- /
- 2005.11a
- /
- pp.147-150
- /
- 2005
The performance of speech recognition in car environment is severely degraded when there is music or news coming from a radio or a CD player. Since reference signals are available from the audio unit in the car, it is possible to remove them with an adaptive filter. In this paper, we present experimental results of speech recognition in car environment using the echo canceller. For this, we generate test speech signals by adding music or news to the car noisy speech from Aurora2 DB. The HTK-based continuous HMT system is constructed for a recognition system. In addition, the MMSE-STSA method is used to the output of the echo canceller to remove the residual noise more.
PDF

DTW based Utterance Rejection on Broadcasting News Keyword Spotting System (방송뉴스 핵심어 검출 시스템에서의 오인식 거부를 위한 DTW의 적용)

Park, Kyung-Mi;Park, Jeong-Sik;Oh, Yung-Hwan
- Proceedings of the KSPS conference
- /
- 2005.11a
- /
- pp.155-158
- /
- 2005
Keyword spotting is effective to find keyword from the continuously pronounced speech. However, non-keyword may be accepted as keyword when the environmental noise occurs or speaker changes. To overcome this performance degradation, utterance rejection techniques using confidence measure on the recognition result have been developed. In this paper, we apply DTW to the HMM based broadcasting news keyword spotting system for rejecting non-keyword. Experimental result shows that false acceptance rate is decreased to 50%.
PDF

A Study on the Continuous Speech Recognition for the Automatic Creation of International Phonetics (국제 음소의 자동 생성을 활용한 연속음성인식에 관한 연구)

Kim, Suk-Dong;Hong, Seong-Soo;Shin, Chwa-Cheul;Woo, In-Sung;Kang, Heung-Soon
- Journal of Korea Game Society
- /
- v.7 no.2
- /
- pp.83-90
- /
- 2007
One result of the trend towards globalization is an increased number of projects that focus on natural language processing. Automatic speech recognition (ASR) technologies, for example, hold great promise in facilitating global communications and collaborations. Unfortunately, to date, most research projects focus on single widely spoken languages. Therefore, the cost to adapt a particular ASR tool for use with other languages is often prohibitive. This work takes a more general approach. We propose an International Phoneticizing Engine (IPE) that interprets input files supplied in our Phonetic Language Identity (PLI) format to build a dictionary. IPE is language independent and rule based. It operates by decomposing the dictionary creation process into a set of well-defined steps. These steps reduce rule conflicts, allow for rule creation by people without linguistics training, and optimize run-time efficiency. Dictionaries created by the IPE can be used with the speech recognition system. IPE defines an easy-to-use systematic approach that can obtained 92.55% for the recognition rate of Korean speech and 89.93% for English.
PDF

A study of English vowel system (영어의 모음체계 연구)

Lee Jae-Young
- MALSORI
- /
- no.38
- /
- pp.71-97
- /
- 1999
In this paper I have surveyed vowel phonemes in a variety of English accents and have proposed the vowel systems of English. The English accents covered in this paper include General American English, Northeastern American English, Western American English, Southern British English, Northern British English, Scottish English, Southern Irish English, Northern Irish English, Australian English, and New Zealand English. The vowel systems proposed here reflect the acoustic information of vowels and phonological aspects of English. This paper offers an Optimality Theory-based analysis of the English vowel systems by appealing to independently motivated constraints. This paper, following Flemming(1995), makes an assumption that the vowel system in question is selected in output as an optimal candidate by a given constraint ranking, the assumption which is different from the view that the vowel system is fixed in input. The analysis proposed here gives an answer to why a specific vowel system is selected and why dialectal variations come about. It is shown in this paper that the vowel system selected in a specific dialect comes from an optimal satisfaction of a given constraint ranking and that dialectal differences result from dynamic permutations of the same constraints. The constraint-based analysis proffered here accounts well for the similarities and differences among dialects in regard to the vowel system.
PDF

A Study on a Generation of a Syllable Restoration Candidate Set and a Candidate Decrease (음절 복원 후보 집합의 생성과 후보 감소에 관한 연구)

김규식;김경징;이상범
- Journal of the Korea Computer Industry Society
- /
- v.3 no.12
- /
- pp.1679-1690
- /
- 2002
This paper, describe about a generation of a syllable restoration regulation for a post processing of a speech recognition and a decrease of a restoration candidate. It created a syllable restoration regulation to create a restoration candidate pronounced with phonetic value recognized through a post processing of the formula system that was a tone to recognize syllable unit phonetic value for a performance enhancement of a dialogue serial speech recognition. Also, I presented a plan to remove a regulation to create unused notation from a real life in a restoration regulation with a plan to reduce number candidate of a restoration meeting. A design implemented a restoration candidate set generator in order a syllable restoration regulation display that it created a proper restoration candidate set. The proper notation meeting that as a result of having proved about a standard pronunciation example and a word extracted from a pronunciation dictionary at random, the notation that an utterance was former was included in proved with what a generation became.
PDF

Context-adaptive Phoneme Segmentation for a TTS Database (문자-음성 합성기의 데이터 베이스를 위한 문맥 적응 음소 분할)

이기승;김정수
- The Journal of the Acoustical Society of Korea
- /
- v.22 no.2
- /
- pp.135-144
- /
- 2003
A method for the automatic segmentation of speech signals is described. The method is dedicated to the construction of a large database for a Text-To-Speech (TTS) synthesis system. The main issue of the work involves the refinement of an initial estimation of phone boundaries which are provided by an alignment, based on a Hidden Market Model(HMM). Multi-layer perceptron (MLP) was used as a phone boundary detector. To increase the performance of segmentation, a technique which individually trains an MLP according to phonetic transition is proposed. The optimum partitioning of the entire phonetic transition space is constructed from the standpoint of minimizing the overall deviation from hand labelling positions. With single speaker stimuli, the experimental results showed that more than 95% of all phone boundaries have a boundary deviation from the reference position smaller than 20 ms, and the refinement of the boundaries reduces the root mean square error by about 25%.
PDF KSCI

Search Result 313, Processing Time 0.019 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)