Search | Korea Science

Development of English Speech Recognizer for Pronunciation Evaluation (발성 평가를 위한 영어 음성인식기의 개발)

Park Jeon Gue;Lee June-Jo;Kim Young-Chang;Hur Yongsoo;Rhee Seok-Chae;Lee Jong-Hyun
- Proceedings of the KSPS conference
- /
- 2003.10a
- /
- pp.37-40
- /
- 2003
This paper presents the preliminary result of the automatic pronunciation scoring for non-native English speakers, and shows the developmental process for an English speech recognizer for the educational and evaluational purposes. The proposed speech recognizer, featuring two refined acoustic model sets, implements the noise-robust data compensation, phonetic alignment, highly reliable rejection, key-word and phrase detection, easy-to-use language modeling toolkit, etc., The developed speech recognizer achieves 0.725 as the average correlation between the human raters and the machine scores, based on the speech database YOUTH for training and K-SEC for test.
PDF

Speaker Verification System with Hybrid Model Improved by Adapted Continuous Wavelet Transform

Kim, Hyoungsoo;Yang, Sung-il;Younghun Kwon;Kyungjoon Cha
- The Journal of the Acoustical Society of Korea
- /
- v.18 no.3E
- /
- pp.30-36
- /
- 1999
In this paper, we develop a hybrid speaker recognition system [1] enhanced by pre-recognizer and post-recognizer. The pre-recognizer consists of general speech recognition systems and the post-recognizer is a pitch detection system using adapted continuous wavelet transform (ACWT) to improve the performance of the hybrid speaker recognition system. Two schemes to design ACWT is considered. One is the scheme to search basis library covering the whole band of speech fundamental frequency (speech pitch). The other is the scheme to determine which one is the best basis. Information cost functional is used for the criterion for the latter. ACWT is robust enough to classify the pitch of speech very well, even though the speech signal is badly damaged by environmental noises.
PDF

Implementation of Vocabulary- Independent Speech Recognizer Using a DSP (DSP를 이용한 가변어휘 음성인식기 구현에 관한 연구)

Chung, Ik-Joo
- Speech Sciences
- /
- v.11 no.3
- /
- pp.143-156
- /
- 2004
In this paper, we implemented a vocabulary-independent speech recognizer using the TMS320VC33 DSP. For this implementation, we had developed very small-sized recognition engine based on diphone sub-word unit, which is especially suited for embedded applications where the system resources are severely limited. The recognition accuracy of the developed recognizer with 1 mixture per state and 4 states per diphone is 94.5% when tested on frequently-used 2000 words set. The design of the hardware was focused on minimal use of parts, which results in reduced material cost. The finally developed hardware only includes a DSP, 512 Kword flash ROM and a voice codec. In porting the recognition engine to the DSP, we introduced several methods of using data and program memory efficiently and developed the versatile software protocol for host interface. Finally, we also made an evaluation board for testing the developed hardware recognition module.
PDF

Implementation of a Speaker-independent Speech Recognizer Using the TMS320F28335 DSP (TMS320F28335 DSP를 이용한 화자독립 음성인식기 구현)

Chung, Ik-Joo
- Journal of Industrial Technology
- /
- v.29 no.A
- /
- pp.95-100
- /
- 2009
In this paper, we implemented a speaker-independent speech recognizer using the TMS320F28335 DSP which is optimized for control applications. For this implementation, we used a small-sized commercial DSP module and developed a peripheral board including a codec, signal conditioning circuits and I/O interfaces. The speech signal digitized by the TLV320AIC23 codec is analyzed based on MFCC feature extraction methed and recognized using the continuous-density HMM. Thanks to the internal SRAM and flash memory on the TMS320F28335 DSP, we did not need any external memory devices. The internal flash memory contains ADPCM data for voice response as well as HMM data. Since the TMS320F28335 DSP is optimized for control applications, the recognizer may play a good role in the voice-activated control areas in aspect that it can integrate speech recognition capability and inherent control functions into the single DSP.
PDF

Korean Broadcast News Transcription Using Morpheme-based Recognition Units

Kwon, Oh-Wook;Alex Waibel
- The Journal of the Acoustical Society of Korea
- /
- v.21 no.1E
- /
- pp.3-11
- /
- 2002
Broadcast news transcription is one of the hardest tasks in speech recognition because broadcast speech signals have much variability in speech quality, channel and background conditions. We developed a Korean broadcast news speech recognizer. We used a morpheme-based dictionary and a language model to reduce the out-of·vocabulary (OOV) rate. We concatenated the original morpheme pairs of short length or high frequency in order to reduce insertion and deletion errors due to short morphemes. We used a lexicon with multiple pronunciations to reflect inter-morpheme pronunciation variations without severe modification of the search tree. By using the merged morpheme as recognition units, we achieved the OOV rate of 1.7% comparable to European languages with 64k vocabulary. We implemented a hidden Markov model-based recognizer with vocal tract length normalization and online speaker adaptation by maximum likelihood linear regression. Experimental results showed that the recognizer yielded 21.8% morpheme error rate for anchor speech and 31.6% for mostly noisy reporter speech.
PDF KSCI

Spoken Document Retrieval Based on Phone Sequence Strings Decoded by PVDHMM (PVDHMM을 이용한 음소열 기반의 SDR 응용)

Choi, Dae-Lim;Kim, Bong-Wan;Kim, Chong-Kyo;Lee, Yong-Ju
- MALSORI
- /
- no.62
- /
- pp.133-147
- /
- 2007
In this paper, we introduce a phone vector discrete HMM(PVDHMM) that decodes a phone sequence string, and demonstrates the applicability to spoken document retrieval. The PVDHMM treats a phone recognizer or large vocabulary continuous speech recognizer (LVCSR) as a vector quantizer whose codebook size is equal to the size of its phone set. We apply the PVDHMM to decode the phone sequence strings and compare the outputs with those of a continuous speech recognizer(CSR). Also we carry out spoken document retrieval experiment through PVDHMM word spotter on the phone sequence strings which are generated by phone recognizer or LVCSR and compare its results with those of retrieval through the phone-based vector space model.
PDF

Implementation of HMM-Based Speech Recognizer Using TMS320C6711 DSP

Bae Hyojoon;Jung Sungyun;Bae Keunsung
- MALSORI
- /
- no.52
- /
- pp.111-120
- /
- 2004
This paper focuses on the DSP implementation of an HMM-based speech recognizer that can handle several hundred words of vocabulary size as well as speaker independency. First, we develop an HMM-based speech recognition system on the PC that operates on the frame basis with parallel processing of feature extraction and Viterbi decoding to make the processing delay as small as possible. Many techniques such as linear discriminant analysis, state-based Gaussian selection, and phonetic tied mixture model are employed for reduction of computational burden and memory size. The system is then properly optimized and compiled on the TMS320C6711 DSP for real-time operation. The implemented system uses 486kbytes of memory for data and acoustic models, and 24.5 kbytes for program code. Maximum required time of 29.2 ms for processing a frame of 32 ms of speech validates real-time operation of the implemented system.
PDF

Adaptive Korean Continuous Speech Recognizer to Speech Rate (발화속도 적응적인 한국어 연속음 인식기)

Kim, Jae-Beom;Park, Chan-Kyu;Han, Mi-Sung;Lee, Jung-Hyun
- The Transactions of the Korea Information Processing Society
- /
- v.4 no.6
- /
- pp.1531-1540
- /
- 1997
In this paper, we presents automatic Korean continuous speech recognizer which is improved by the speech rate estimation and the compensation methods. Automatic continuous speech recognition is significantly more difficult than isolated word recognition because of coarticulatory effects and variations in speech rate. In order to recognize continuous speech, modeling methods of coarticulatory effects and variations in speech rate are needed. In this paper, the speech rate is measured by change of format, and the compensation is peformed by extracting relatively many feature vectors in fast speech. Coarticulatory effects are modeled by defining 514 Korean diphone set, and ETRI's 445 word DB is used for training speech material. With combining above methods, we implement automatic Korean continuous speech recognizer, which shows improved recognition rate, based on DHMM(Discrete Hidden Markov Model).
PDF

Improvement of Speech Recognition System Using the Trained Model of Speech Feature (음성특성 학습 모델을 이용한 음성인식 시스템의 성능 향상)

송점동
- The Journal of Information Technology
- /
- v.3 no.4
- /
- pp.1-12
- /
- 2000
We can devide the speech into high frequency speech and low frequency speech according to the feature of the speech, However so far the construction of the recognizer without concerning this feature causes low recognition rate relatively and the needs of an amount of data in the research on the speech recognition. In this paper, we propose the method that can devide this feature of speaker's speech using the Formant frequency, and the method that can recognize the speech after constructing the recognizer model reflecting the feature of the high and low frequency of the speaker's speech, For the experiment we constructed the recognizer model using 47 mono-phone of Korean and trained the recognizer model using 20 women's and men's speech respectively. We divided the feature of speech using the Formant frequency Table, that had been consisted of the Formant frequency, and the value of pitch, and then We performed recognition using the trained model according to the feature of speech The proposed system outperformed the existing method in the recognition rate, as the result.
PDF

Semi-supervised learning of speech recognizers based on variational autoencoder and unsupervised data augmentation (변분 오토인코더와 비교사 데이터 증강을 이용한 음성인식기 준지도 학습)

Jo, Hyeon Ho;Kang, Byung Ok;Kwon, Oh-Wook
- The Journal of the Acoustical Society of Korea
- /
- v.40 no.6
- /
- pp.578-586
- /
- 2021
We propose a semi-supervised learning method based on Variational AutoEncoder (VAE) and Unsupervised Data Augmentation (UDA) to improve the performance of an end-to-end speech recognizer. In the proposed method, first, the VAE-based augmentation model and the baseline end-to-end speech recognizer are trained using the original speech data. Then, the baseline end-to-end speech recognizer is trained again using data augmented from the learned augmentation model. Finally, the learned augmentation model and end-to-end speech recognizer are re-learned using the UDA-based semi-supervised learning method. As a result of the computer simulation, the augmentation model is shown to improve the Word Error Rate (WER) of the baseline end-to-end speech recognizer, and further improve its performance by combining it with the UDA-based learning method.
https://doi.org/10.7776/ASK.2021.40.6.578 인용 PDF KSCI

Search Result 185, Processing Time 0.022 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)