Search | Korea Science

Selecting Good Speech Features for Recognition

Lee, Young-Jik;Hwang, Kyu-Woong
- ETRI Journal
- /
- v.18 no.1
- /
- pp.29-41
- /
- 1996
This paper describes a method to select a suitable feature for speech recognition using information theoretic measure. Conventional speech recognition systems heuristically choose a portion of frequency components, cepstrum, mel-cepstrum, energy, and their time differences of speech waveforms as their speech features. However, these systems never have good performance if the selected features are not suitable for speech recognition. Since the recognition rate is the only performance measure of speech recognition system, it is hard to judge how suitable the selected feature is. To solve this problem, it is essential to analyze the feature itself, and measure how good the feature itself is. Good speech features should contain all of the class-related information and as small amount of the class-irrelevant variation as possible. In this paper, we suggest a method to measure the class-related information and the amount of the class-irrelevant variation based on the Shannon's information theory. Using this method, we compare the mel-scaled FFT, cepstrum, mel-cepstrum, and wavelet features of the TIMIT speech data. The result shows that, among these features, the mel-scaled FFT is the best feature for speech recognition based on the proposed measure.
PDF

A study on Quality Management in Small and Medium Enterprises (강원도 중소기업 품질경영 운영 방안 사례)

Park Roh-Gook
- Journal of the Korea Safety Management & Science
- /
- v.8 no.1
- /
- pp.131-144
- /
- 2006
Quality system management adapted by small and medium enterprises in Kangwon province to enhance the competitiveness was studied. Variance analysis on several questionnaire answers was performed. Motives for acquiring the accreditation, such as product export, adjustment to international trend, enhancement of brand/product recognition, CEO's mind change, and management innovation, have been changed significantly among business types. Mind changes after the accreditations were setting company's first priority on quality, enhanced recognition on compliance of in-house standards and regulations, employee's performance with the recognition of quality. Amongst service problems to maintain the ace reditations were difficulties in maintaining the recognition of the company's finality management, labor increase to maintain the ISO 9000 enforcement team, and financial burden to keep the accreditation. Quality recognition after the accreditations was significantly improved in setting company's first priority on quality, enhanced recognition on compliance of in-house standards and regulations, employee's performance with the recognition of quality.
PDF KSCI

Performance comparison of Text-Independent Speaker Recognizer Using VQ and GMM (VQ와 GMM을 이용한 문맥독립 화자인식기의 성능 비교)

Kim, Seong-Jong;Chung, Hoon;Chung, Ik-Joo
- Speech Sciences
- /
- v.7 no.2
- /
- pp.235-244
- /
- 2000
This paper was focused on realizing the text-independent speaker recognizer using the VQ and GMM algorithm and studying the characteristics of the speaker recognizers that adopt these two algorithms. Because it was difficult ascertain the effect two algorithms have on the speaker recognizer theoretically, we performed the recognition experiments using various parameters and, as the result of the experiments, we could show that GMM algorithm had better recognition performance than VQ algorithm as following. The GMM showed better performance with small training data, and it also showed just a little difference of recognition rate as the kind of feature vectors and the length of input data vary. The GMM showed good recognition performance than the VQ on the whole.
PDF

Constructing a Noise-Robust Speech Recognition System using Acoustic and Visual Information (청각 및 시가 정보를 이용한 강인한 음성 인식 시스템의 구현)

Lee, Jong-Seok;Park, Cheol-Hoon
- Journal of Institute of Control, Robotics and Systems
- /
- v.13 no.8
- /
- pp.719-725
- /
- 2007
In this paper, we present an audio-visual speech recognition system for noise-robust human-computer interaction. Unlike usual speech recognition systems, our system utilizes the visual signal containing speakers' lip movements along with the acoustic signal to obtain robust speech recognition performance against environmental noise. The procedures of acoustic speech processing, visual speech processing, and audio-visual integration are described in detail. Experimental results demonstrate the constructed system significantly enhances the recognition performance in noisy circumstances compared to acoustic-only recognition by using the complementary nature of the two signals.
https://doi.org/10.5302/J.ICROS.2007.13.8.719 인용 PDF KSCI

License Plate Recognition System Using Artificial Neural Networks

Turkyilmaz, Ibrahim;Kacan, Kirami
- ETRI Journal
- /
- v.39 no.2
- /
- pp.163-172
- /
- 2017
A high performance license plate recognition system (LPRS) is proposed in this work. The proposed LPRS is composed of the following three main stages: (i) plate region determination, (ii) character segmentation, and (iii) character recognition. During the plate region determination stage, the image is enhanced by image processing algorithms to increase system performance. The rectangular license plate region is obtained using edge-based image processing methods on the binarized image. With the help of skew correction, the plate region is prepared for the character segmentation stage. Characters are separated from each other using vertical projections on the plate region. Segmented characters are prepared for the character recognition stage by a thinning process. At the character recognition stage, a three-layer feedforward artificial neural network using a backpropagation learning algorithm is constructed and the characters are determined.
https://doi.org/10.4218/etrij.17.0115.0766 인용 PDF KSCI

Preprocessing Technique for Improving Action Recognition Performance in ERP Video with Multiple Objects (다중 객체가 존재하는 ERP 영상에서 행동 인식 모델 성능 향상을 위한 전처리 기법)

Park, Eun-Soo;Kim, Seunghwan;Ryu, Eun-Seok
- Journal of Broadcast Engineering
- /
- v.25 no.3
- /
- pp.374-385
- /
- 2020
In this paper, we propose a preprocessing technique to solve the problems of action recognition with Equirectangular Projection (ERP) video. The preprocessing technique proposed in this paper assumes the person object as the subject of action, that is, the Object of Interest (OOI), and the surrounding area of the OOI as the ROI. The preprocessing technique consists of three modules. I) Recognize person object in the image with object recognition model. II) Create a saliency map from the input image. III) Select subject of action using recognized person object and saliency map. The subject boundary box of the selected action is input to the action recognition model in order to improve the action recognition performance. When comparing the performance of the proposed preprocessing method to the action recognition model and the performance of the original ERP image input method, the performance is improved up to 99.6%, and the action is obtained when only the OOI is detected. It can also see the effects of related video summaries.
https://doi.org/10.5909/JBE.2020.25.3.374 인용 PDF KSCI KPUBS

Joint streaming model for backchannel prediction and automatic speech recognition

Yong-Seok Choi;Jeong-Uk Bang;Seung Hi Kim
- ETRI Journal
- /
- v.46 no.1
- /
- pp.118-126
- /
- 2024
In human conversations, listeners often utilize brief backchannels such as "uh-huh" or "yeah." Timely backchannels are crucial to understanding and increasing trust among conversational partners. In human-machine conversation systems, users can engage in natural conversations when a conversational agent generates backchannels like a human listener. We propose a method that simultaneously predicts backchannels and recognizes speech in real time. We use a streaming transformer and adopt multitask learning for concurrent backchannel prediction and speech recognition. The experimental results demonstrate the superior performance of our method compared with previous works while maintaining a similar single-task speech recognition performance. Owing to the extremely imbalanced training data distribution, the single-task backchannel prediction model fails to predict any of the backchannel categories, and the proposed multitask approach substantially enhances the backchannel prediction performance. Notably, in the streaming prediction scenario, the performance of backchannel prediction improves by up to 18.7% compared with existing methods.
https://doi.org/10.4218/etrij.2023-0358 인용 PDF

High-Frequency Interchange Network for Multispectral Object Detection (다중 스펙트럼 객체 감지를 위한 고주파 교환 네트워크)

Park, Seon-Hoo;Yun, Jun-Seok;Yoo, Seok Bong;Han, Seunghwoi
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.26 no.8
- /
- pp.1121-1129
- /
- 2022
Object recognition is carried out using RGB images in various object recognition studies. However, RGB images in dark illumination environments or environments where target objects are occluded other objects cause poor object recognition performance. On the other hand, IR images provide strong object recognition performance in these environments because it detects infrared waves rather than visible illumination. In this paper, we propose an RGB-IR fusion model, high-frequency interchange network (HINet), which improves object recognition performance by combining only the strengths of RGB-IR image pairs. HINet connected two object detection models using a mutual high-frequency transfer (MHT) to interchange advantages between RGB-IR images. MHT converts each pair of RGB-IR images into a discrete cosine transform (DCT) spectrum domain to extract high-frequency information. The extracted high-frequency information is transmitted to each other's networks and utilized to improve object recognition performance. Experimental results show the superiority of the proposed network and present performance improvement of the multispectral object recognition task.
https://doi.org/10.6109/jkiice.2022.26.8.1121 인용 PDF KSCI

Speech Recognition Performance Improvement using Gamma-tone Feature Extraction Acoustic Model (감마톤 특징 추출 음향 모델을 이용한 음성 인식 성능 향상)

Ahn, Chan-Shik;Choi, Ki-Ho
- Journal of Digital Convergence
- /
- v.11 no.7
- /
- pp.209-214
- /
- 2013
Improve the recognition performance of speech recognition systems as a method for recognizing human listening skills were incorporated into the system. In noisy environments by separating the speech signal and noise, select the desired speech signal. but In terms of practical performance of speech recognition systems are factors. According to recognized environmental changes due to noise speech detection is not accurate and learning model does not match. In this paper, to improve the speech recognition feature extraction using gamma tone and learning model using acoustic model was proposed. The proposed method the feature extraction using auditory scene analysis for human auditory perception was reflected In the process of learning models for recognition. For performance evaluation in noisy environments, -10dB, -5dB noise in the signal was performed to remove 3.12dB, 2.04dB SNR improvement in performance was confirmed.
https://doi.org/10.14400/JDPM.2013.11.7.209 인용 PDF

The Vocabulary Recognition Optimize using Acoustic and Lexical Search (음향학적 및 언어적 탐색을 이용한 어휘 인식 최적화)

Ahn, Chan-Shik;Oh, Sang-Yeob
- Journal of Korea Multimedia Society
- /
- v.13 no.4
- /
- pp.496-503
- /
- 2010
Speech recognition system is developed of standalone, In case of a mobile terminal using that low recognition rate represent because of limitation of memory size and audio compression. This study suggest vocabulary recognition highest performance improvement system for separate acoustic search and lexical search. Acoustic search is carry out in mobile terminal, lexical search is carry out in server processing system. feature vector of speech signal extract using GMM a phoneme execution, recognition a phoneme list transmission server using Lexical Tree Search algorithm lexical search recognition execution. System performance as a result of represent vocabulary dependence recognition rate of 98.01%, vocabulary independence recognition rate of 97.71%, represent recognition speed of 1.58 second.
PDF KSCI

Search Result 3,870, Processing Time 0.028 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)