Search | Korea Science

The Optimal and Complete Prompts Lists Generation Algorithm for Connected Spoken Word Speech Corpus (연결 단어 음성 인식기 학습용 음성DB 녹음을 위한 최적의 대본 작성 알고리즘)

유하진
- The Journal of the Acoustical Society of Korea
- /
- v.23 no.2
- /
- pp.187-191
- /
- 2004
This paper describes an efficient algorithm to generate compact and complete prompts lists for connected spoken words speech corpus. In building a connected spoken digit recognizer, we have to acquire speech data in various contexts. However, in many speech databases the lists are made by using random generators. We provide an efficient algorithm that can generate compact and complete lists of digits in various contexts. This paper includes the proof of optimality and completeness of the algorithm.
PDF KSCI

Speech Recognition in Noisy Environments using Wiener Filtering (Wiener Filtering을 이용한 잡음환경에서의 음성인식)

Kim, Jin-Young;Eom, Ki-Wan;Choi, Hong-Sub
- Speech Sciences
- /
- v.1
- /
- pp.277-283
- /
- 1997
In this paper, we present a robust recognition algorithm based on the Wiener filtering method as a research tool to develop the Korean Speech recognition system. We especially used Wiener filtering method in cepstrum-domain, because the method in frequency-domain is computationally expensive and complex. Evaluation of the effectiveness of this method has been conducted in speaker-independent isolated Korean digit recognition tasks using discrete HMM speech recognition systems. In these tasks, we used 12th order weighted cepstral as a feature vector and added computer simulated white gaussian noise of different levels to clean speech signals for recognition experiments under noisy conditions. Experimental results show that the presented algorithm can provide an improvement in recognition of as much as from $5\%\;to\;\20\%$ in comparison to spectral subtraction method.
PDF

A study on the connected-digit recognition using MLP-VQ and Weighted DHMM (MLP-VQ와 가중 DHMM을 이용한 연결 숫자음 인식에 관한 연구)

Chung, Kwang-Woo;Hong, Kwang-Seok
- Journal of the Korean Institute of Telematics and Electronics S
- /
- v.35S no.8
- /
- pp.96-105
- /
- 1998
The aim of this paper is to propose the method of WDHMM(Weighted DHMM), using the MLP-VQ for the improvement of speaker-independent connect-digit recognition system. MLP neural-network output distribution shows a probability distribution that presents the degree of similarity between each pattern by the non-linear mapping among the input patterns and learning patterns. MLP-VQ is proposed in this paper. It generates codewords by using the output node index which can reach the highest level within MLP neural-network output distribution. Different from the old VQ, the true characteristics of this new MLP-VQ lie in that the degree of similarity between present input patterns and each learned class pattern could be reflected for the recognition model. WDHMM is also proposed. It can use the MLP neural-network output distribution as the way of weighing the symbol generation probability of DHMMs. This newly-suggested method could shorten the time of HMM parameter estimation and recognition. The reason is that it is not necessary to regard symbol generation probability as multi-dimensional normal distribution, as opposed to the old SCHMM. This could also improve the recognition ability by 14.7% higher than DHMM, owing to the increase of small caculation amount. Because it can reflect phone class relations to the recognition model. The result of my research shows that speaker-independent connected-digit recognition, using MLP-VQ and WDHMM, is 84.22%.
PDF

Segmentation and Recognition Methods for Touching Handwritten Digit String (접촉된 숫자열의 분할 및 인식 기법)

송성일;김황수
- Proceedings of the Korean Information Science Society Conference
- /
- 2002.10d
- /
- pp.481-483
- /
- 2002
본 논문은 숫자간 접촉이 포함된 무제약 오프라인 필기 숫자열 인식을 위한 분할 및 인식기법을 소개하고자 한다. 시스템은 숫자열에서 접촉된 성분을 추출하는 모듈, 접촉된 숫자를 분할하는 모듈과 최종적으로 분할된 결과를 조합하는 모듈로 이루어진다. 그리고, 위의 기법을 NIST 데이터에 적용하여 제안한 분할 및 인식기법의 효율성을 보여준다.
PDF

Telephone Speech Recognition with Data-Driven Selective Temporal Filtering based on Principal Component Analysis

Jung Sun Gyun;Son Jong Mok;Bae Keun Sung
- Proceedings of the IEEK Conference
- /
- 2004.08c
- /
- pp.764-767
- /
- 2004
The performance of a speech recognition system is generally degraded in telephone environment because of distortions caused by background noise and various channel characteristics. In this paper, data-driven temporal filters are investigated to improve the performance of a specific recognition task such as telephone speech. Three different temporal filtering methods are presented with recognition results for Korean connected-digit telephone speech. Filter coefficients are derived from the cepstral domain feature vectors using the principal component analysis.
PDF

An Experiment of a Spoken Digits-Recognition System (숫자음성 자동 인식에 관한 일실험)

;安居院猛
- Journal of the Korean Institute of Telematics and Electronics
- /
- v.15 no.6
- /
- pp.23-28
- /
- 1978
This paper describes a speech recognition system for ten isolated spoken digits. In this system, acoustic parameters such as zero crossing rate, log energy and three formant frequencies estimated by linear prediction method were extracted for classification and/or recognition purpose(s). The former two parameters were used for the classification of unvoiced consonants and the latter one for the recognition of vowels and voiced consonants. Promising recognition results were obtained in this experiment for ten digit utterances spoken by a male speaker.
PDF

Robust Video-Based Barcode Recognition via Online Sequential Filtering

Kim, Minyoung
- International Journal of Fuzzy Logic and Intelligent Systems
- /
- v.14 no.1
- /
- pp.8-16
- /
- 2014
We consider the visual barcode recognition problem in a noisy video data setup. Unlike most existing single-frame recognizers that require considerable user effort to acquire clean, motionless and blur-free barcode signals, we eliminate such extra human efforts by proposing a robust video-based barcode recognition algorithm. We deal with a sequence of noisy blurred barcode image frames by posing it as an online filtering problem. In the proposed dynamic recognition model, at each frame we infer the blur level of the frame as well as the digit class label. In contrast to a frame-by-frame based approach with heuristic majority voting scheme, the class labels and frame-wise noise levels are propagated along the frame sequences in our model, and hence we exploit all cues from noisy frames that are potentially useful for predicting the barcode label in a probabilistically reasonable sense. We also suggest a visual barcode tracking approach that efficiently localizes barcode areas in video frames. The effectiveness of the proposed approaches is demonstrated empirically on both synthetic and real data setup.
https://doi.org/10.5391/IJFIS.2014.14.1.8 인용 PDF KSCI

Design of Digits Recognition System Based on RBFNNs : A Comparative Study of Pre-processing Algorithms (방사형 기저함수 신경회로망 기반 숫자 인식 시스템의 설계 : 전처리 알고리즘을 이용한 인식성능의 비교연구)

Kim, Eun-Hu;Kim, Bong-Youn;Oh, Sung-Kwun
- The Transactions of The Korean Institute of Electrical Engineers
- /
- v.66 no.2
- /
- pp.416-424
- /
- 2017
In this study, we propose a design of digits recognition system based on RBFNNs through a comparative study of pre-processing algorithms in order to recognize digits in handwritten. Histogram of Oriented Gradient(HOG) is used to get the features of digits in the proposed digits recognition system. In the pre-processing part, a dimensional reduction is executed by using Principal Component Analysis(PCA) and (2D)2PCA which are widely adopted methods in order to minimize a loss of the information during the reduction process of feature space. Also, The architecture of radial basis function neural networks consists of three functional modules such as condition, conclusion, and inference part. In the condition part, the input space is partitioned with the use of fuzzy clustering realized by means of the Fuzzy C-Means algorithm. Also, it is used instead of gaussian function to consider the characteristic of input data. In the conclusion part, the connection weights are used as the extended type of polynomial expression such as constant, linear, quadratic and modified quadratic. By using MNIST handwritten digit benchmarking database, experimental results show the effectiveness and efficiency of proposed digit recognition system when compared with other studies.
https://doi.org/10.5370/KIEE.2017.66.2.416 인용 PDF KSCI

Real-time Speed Sign Recognition Method Using Virtual Environments and Camera Images (가상환경 및 카메라 이미지를 활용한 실시간 속도 표지판 인식 방법)

Eunji Song;Taeyun Kim;Hyobin Kim;Kyung-Ho Kim;Sung-Ho Hwang
- Journal of Drive and Control
- /
- v.20 no.4
- /
- pp.92-99
- /
- 2023
Autonomous vehicles should recognize and respond to the specified speed to drive in compliance with regulations. To recognize the specified speed, the most representative method is to read the numbers of the signs by recognizing the speed signs in the front camera image. This study proposes a method that utilizes YOLO-Labeling-Labeling-EfficientNet. The sign box is first recognized with YOLO, and the numeric digit is extracted according to the pixel value from the recognized box through two labeling stages. After that, the number of each digit is recognized using EfficientNet (CNN) learned with the virtual environment dataset produced directly. In addition, we estimated the depth of information from the height value of the recognized sign through regression analysis. We verified the proposed algorithm using the virtual racing environment and GTSRB, and proved its real-time performance and efficient recognition performance.
https://doi.org/10.7839/ksfc.2023.20.4.092 인용 PDF

Speech Feature Extraction Based on the Human Hearing Model

Chung, Kwang-Woo;Kim, Paul;Hong, Kwang-Seok
- Proceedings of the KSPS conference
- /
- 1996.10a
- /
- pp.435-447
- /
- 1996
In this paper, we propose the method that extracts the speech feature using the hearing model through signal processing techniques. The proposed method includes the following procedure ; normalization of the short-time speech block by its maximum value, multi-resolution analysis using the discrete wavelet transformation and re-synthesize using the discrete inverse wavelet transformation, differentiation after analysis and synthesis, full wave rectification and integration. In order to verify the performance of the proposed speech feature in the speech recognition task, korean digit recognition experiments were carried out using both the DTW and the VQ-HMM. The results showed that, in the case of using DTW, the recognition rates were 99.79% and 90.33% for speaker-dependent and speaker-independent task respectively and, in the case of using VQ-HMM, the rate were 96.5% and 81.5% respectively. And it indicates that the proposed speech feature has the potential for use as a simple and efficient feature for recognition task
PDF

Search Result 138, Processing Time 0.031 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)