Search | Korea Science

Performance Analysis of Automatic Mispronunciation Detection Using Speech Recognizer (음성인식기를 이용한 발음오류 자동분류 결과 분석)

Kang Hyowon;Lee Sangpil;Bae Minyoung;Lee Jaekang;Kwon Chulhong
- Proceedings of the KSPS conference
- /
- 2003.10a
- /
- pp.29-32
- /
- 2003
This paper proposes an automatic pronunciation correction system which provides users with correction guidelines for each pronunciation error. For this purpose, we develop an HMM speech recognizer which automatically classifies pronunciation errors when Korean speaks foreign language. And, we collect speech database of native and nonnative speakers using phonetically balanced word lists. We perform analysis of mispronunciation types from the experiment of automatic mispronunciation detection using speech recognizer.
PDF

Maximum Likelihood Training and Adaptation of Embedded Speech Recognizers for Mobile Environments

Cho, Young-Kyu;Yook, Dong-Suk
- ETRI Journal
- /
- v.32 no.1
- /
- pp.160-162
- /
- 2010
For the acoustic models of embedded speech recognition systems, hidden Markov models (HMMs) are usually quantized and the original full space distributions are represented by combinations of a few quantized distribution prototypes. We propose a maximum likelihood objective function to train the quantized distribution prototypes. The experimental results show that the new training algorithm and the link structure adaptation scheme for the quantized HMMs reduce the word recognition error rate by 20.0%.
https://doi.org/10.4218/etrij.10.0209.0242 인용 PDF KSCI

A Study on the Voiced, Unvoiced and Silence Classification (유.무성음 및 묵음 식별에 관한 연구)

김명환
- Proceedings of the Acoustical Society of Korea Conference
- /
- 1984.12a
- /
- pp.73-77
- /
- 1984
This paper reports on a Voiced-Unvoiced-Silence Classification of speech for Korean Speech Recognition. In this paper, it is describe a method which uses a Pattern Recognition Technique for classifying a given speech segment into the three classes. Best result is obtained with the combination using ZCR, P1, Ep and classification error rate is less than 1%.
PDF

A Study on the Non-speech Communication to Prevent Communication Error of Maritime (해상통신 오류예방을 위한 비언어 통신에 관한 고찰)

JANG, Eun-Jin;YIM, Jeong-Bin
- Proceedings of KOSOMES biannual meeting
- /
- 2017.11a
- /
- pp.115-115
- /
- 2017
PDF

A Validation of the Isolated Word Speech Database (훈련용 단어 음성DB 검증)

Lee Soo-jong;Kim Sanghun;Lee Youngjik
- Proceedings of the KSPS conference
- /
- 2003.05a
- /
- pp.36-39
- /
- 2003
The purpose of this paper is to correct the errors in the isolated word speech database under the PC environment, and to analyze the various errors. The importance and procedures of the error detection are also described.
PDF

Speech Recognition Accuracy Measure using Deep Neural Network for Effective Evaluation of Speech Recognition Performance (효과적인 음성 인식 평가를 위한 심층 신경망 기반의 음성 인식 성능 지표)

Ji, Seung-eun;Kim, Wooil
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.21 no.12
- /
- pp.2291-2297
- /
- 2017
This paper describe to extract speech measure algorithm for evaluating a speech database, and presents generating method of a speech quality measure using DNN(Deep Neural Network). In our previous study, to produce an effective speech quality measure, we propose a combination of various speech measures which are highly correlated with WER(Word Error Rate). The new combination of various types of speech quality measures in this study is more effective to predict the speech recognition performance compared to each speech measure alone. In this paper, we describe the method of extracting measure using DNN, and we change one of the combined measure from GMM(Gaussican Mixture Model) score used in the previous study to DNN score. The combination with DNN score shows a higher correlation with WER compared to the combination with GMM score.
https://doi.org/10.6109/jkiice.2017.21.12.2291 인용 PDF KSCI

Feature Extraction Based on Speech Attractors in the Reconstructed Phase Space for Automatic Speech Recognition Systems

Shekofteh, Yasser;Almasganj, Farshad
- ETRI Journal
- /
- v.35 no.1
- /
- pp.100-108
- /
- 2013
In this paper, a feature extraction (FE) method is proposed that is comparable to the traditional FE methods used in automatic speech recognition systems. Unlike the conventional spectral-based FE methods, the proposed method evaluates the similarities between an embedded speech signal and a set of predefined speech attractor models in the reconstructed phase space (RPS) domain. In the first step, a set of Gaussian mixture models is trained to represent the speech attractors in the RPS. Next, for a new input speech frame, a posterior-probability-based feature vector is evaluated, which represents the similarity between the embedded frame and the learned speech attractors. We conduct experiments for a speech recognition task utilizing a toolkit based on hidden Markov models, over FARSDAT, a well-known Persian speech corpus. Through the proposed FE method, we gain 3.11% absolute phoneme error rate improvement in comparison to the baseline system, which exploits the mel-frequency cepstral coefficient FE method.
https://doi.org/10.4218/etrij.13.0112.0074 인용 PDF KSCI

Harmonic Structure Features for Robust Speaker Diarization

Zhou, Yu;Suo, Hongbin;Li, Junfeng;Yan, Yonghong
- ETRI Journal
- /
- v.34 no.4
- /
- pp.583-590
- /
- 2012
In this paper, we present a new approach for speaker diarization. First, we use the prosodic information calculated on the original speech to resynthesize the new speech data utilizing the spectrum modeling technique. The resynthesized data is modeled with sinusoids based on pitch, vibration amplitude, and phase bias. Then, we use the resynthesized speech data to extract cepstral features and integrate them with the cepstral features from original speech for speaker diarization. At last, we show how the two streams of cepstral features can be combined to improve the robustness of speaker diarization. Experiments carried out on the standardized datasets (the US National Institute of Standards and Technology Rich Transcription 04-S multiple distant microphone conditions) show a significant improvement in diarization error rate compared to the system based on only the feature stream from original speech.
https://doi.org/10.4218/etrij.12.0111.0455 인용 PDF KSCI

A Robust Speech Recognition Method Combining the Model Compensation Method with the Speech Enhancement Algorithm (음질향상 기법과 모델보상 방식을 결합한 강인한 음성인식 방식)

Kim, Hee-Keun;Chung, Yong-Joo;Bae, Keun-Seung
- Speech Sciences
- /
- v.14 no.2
- /
- pp.115-126
- /
- 2007
There have been many research efforts to improve the performance of the speech recognizer in noisy conditions. Among them, the model compensation method and the speech enhancement approach have been used widely. In this paper, we propose to combine the two different approaches to further enhance the recognition rates in the noisy speech recognition. For the speech enhancement, the minimum mean square error-short time spectral amplitude (MMSE-STSA) has been adopted and the parallel model combination (PMC) and Jacobian adaptation (JA) have been used as the model compensation approaches. From the experimental results, we could find that the hybrid approach that applies the model compensation methods to the enhanced speech produce better results than just using only one of the two approaches.
PDF

A Study on the Improvements of Security and Quality for Analog Speech Scrambler (아날로그 음성 비화기의 비도 및 음질 향상에 관한 연구)

공병구;조동호
- Journal of the Korean Institute of Telematics and Electronics B
- /
- v.30B no.9
- /
- pp.27-35
- /
- 1993
In this paper, a new algorithm for high level security and quality of speech is proposed. The algorithm is based on the rearrangement of the fast fourier transform (FFT) coefficients with pre and post filter process, hamming window and adaptive pseudo spectrum insertion. Then, the pre and post filters are used for the whitening of speech spectrum and the adaptive pseudo spectrum is inserted for the unclassification of silence/speech. Also, the hamming window technique is applied for the robustness to the syncronization error in the telephone line. According to the simulation results, it can be seen that the security of scrambled signal and the quality of descrambled signal have been improved fairly in both subjective and objective performance test and the new FFT scrambler is robust to the synchronization error.
PDF

Search Result 581, Processing Time 0.024 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)