Search | Korea Science

A study on the speech feature extraction based on the hearing model (청각 모델에 기초한 음성 특징 추출에 관한 연구)

김바울;윤석현;홍광석;박병철
- Journal of the Korean Institute of Telematics and Electronics B
- /
- v.33B no.4
- /
- pp.131-140
- /
- 1996
In this paper, we propose the method that extracts the speech feature using the hearing model through signal precessing techniques. The proposed method includes following procedure ; normalization of the short-time speech block by its maximum value, multi-resolution analysis using the discrete wavelet transformation and re-synthesize using thediscrete inverse wavelet transformation, differentiation after analysis and synthesis, full wave rectification and integration. In order to verify the performance of the proposed speech feature in the speech recognition task, korean digita recognition experiments were carried out using both the dTW and the VQ-HMM. The results showed that, in case of using dTW, the recognition rates were 99.79% and 90.33% for speaker-dependent and speaker-independent task respectively and, in case of using VQ-HMM, the rate were 96.5% and 81.5% respectively. And it indicates that the proposed speech feature has the potentials to use as a simple and efficient feature for recognition task.
PDF

Off-line recognition of handwritten korean and alphanumeric characters using hidden markov models (Hidden Markov Model을 이용한 필기체 한글 및 영.숫자 오프라인 인식)

김우성;박래홍
- Journal of the Korean Institute of Telematics and Electronics B
- /
- v.31B no.9
- /
- pp.85-100
- /
- 1994
This paper proposes a recognition system of constrained handwritten Hangul and alphanumeric characters using discrete hidden Markov models (HMM). HMM process encodes the distortion and similarity among patterns of a class through a doubly stochastic approach. Characterizing the statistical properties of characters using selected features, a recognition system can be implemented by absorbing possible variations in the form. Hangul shapes are classified into six types by fuzzy inference, and their recognition is performed based on quantized features by optimally ordering features according to their effectiveness in each class. The constrained alphanumerics recognition is also performed using the same features used in Hangul recognition. The forward-backward, Viterbi, and Baum-Welch reestimation algorithms are used for training and recognition of handwritten Hangul and alphanumeric characters. Simulation result shows that the proposed method recognizes handwritten Korean characters and alphanumerics effectively.
PDF

Endpoint Detection of Speech Signal Using Wavelet Transform (웨이브렛 변환을 이용한 음성신호의 끝점검출)

석종원;배건성
- The Journal of the Acoustical Society of Korea
- /
- v.18 no.6
- /
- pp.57-64
- /
- 1999
In this paper, we investigated the robust endpoint detection algorithm in noisy environment. A new feature parameter based on a discrete wavelet transform is proposed for word boundary detection of isolated utterances. The sum of standard deviation of wavelet coefficients in the third coarse and weighted first detailed scale is defined as a new feature parameter for endpoint detection. We then developed a new and robust endpoint detection algorithm using the feature found in the wavelet domain. For the performance evaluation, we evaluated the detection accuracy and the average recognition error rate due to endpoint detection in an HMM-based recognition system across several signal-to-noise ratios and noise conditions.
PDF

Korean Digit Speech Recognition Dialing System using Filter Bank (필터뱅크를 이용한 한국어 숫자음 인식 다이얼링 시스템)

박기영;최형기;김종교
- Journal of the Institute of Electronics Engineers of Korea TE
- /
- v.37 no.5
- /
- pp.62-70
- /
- 2000
In this study, speech recognition for Korean digit is performed using filter bank which is programmed discrete HMM and DTW. Spectral analysis reveals speech signal features which are mainly due to the shape of the vocal tract. And spectral feature of speech are generally obtained as the exit of filter banks, which properly integrated a spectrum at defined frequency ranges. A set of 8 band pass filters is generally used since it simulates human ear processing. And defined frequency ranges are 320-330, 450-460, 640-650, 840-850, 900-1000, 1100-1200, 2000-2100, 3900-4000Hz and then sampled at 8kHz of sampling rate. Frame width is 20ms and period is 10ms. Accordingly, we found that the recognition rate of DTW is better than HMM for Korean digit speech in the experimental result. Recognition accuracy of Korean digit speech using filter bank is 93.3% for the 24th BPF, 89.1% for the 16th BPF and 88.9% for the 8th BPF of hardware realization of voice dialing system.
PDF

Syllable Recognition of HMM using Segment Dimension Compression (세그먼트 차원압축을 이용한 HMM의 음절인식)

Kim, Joo-Sung;Lee, Yang-Woo;Hur, Kang-In;Ahn, Jum-Young
- The Journal of the Acoustical Society of Korea
- /
- v.15 no.2
- /
- pp.40-48
- /
- 1996
In this paper, a 40 dimensional segment vector with 4 frame and 7 frame width in every monosyllable interval was compressed into a 10, 14, 20 dimensional vector using K-L expansion and neural networks, and these was used to speech recognition feature parameter for CHMM. And we also compared them with CHMM added as feature parameter to the discrete duration time, the regression coefficients and the mixture distribution. In recognition test at 100 monosyllable, recognition rates of CHMM +${\bigtriangleup}$MCEP, CHMM +MIX and CHMM +DD respectively improve 1.4%, 2.36% and 2.78% over 85.19% of CHMM. And those using vector compressed by K-L expansion are less than MCEP + ${\bigtriangleup}$MCEP but those using K-L + MCEP, K-L + ${\bigtriangleup}$MCEP are almost same. Neural networks reflect more the speech dynamic variety than K-L expansion because they use the sigmoid function for the non-linear transform. Recognition rates using vector compressed by neural networks are higher than those using of K-L expansion and other methods.
PDF

Spoken Document Retrieval Based on Phone Sequence Strings Decoded by PVDHMM (PVDHMM을 이용한 음소열 기반의 SDR 응용)

Choi, Dae-Lim;Kim, Bong-Wan;Kim, Chong-Kyo;Lee, Yong-Ju
- MALSORI
- /
- no.62
- /
- pp.133-147
- /
- 2007
In this paper, we introduce a phone vector discrete HMM(PVDHMM) that decodes a phone sequence string, and demonstrates the applicability to spoken document retrieval. The PVDHMM treats a phone recognizer or large vocabulary continuous speech recognizer (LVCSR) as a vector quantizer whose codebook size is equal to the size of its phone set. We apply the PVDHMM to decode the phone sequence strings and compare the outputs with those of a continuous speech recognizer(CSR). Also we carry out spoken document retrieval experiment through PVDHMM word spotter on the phone sequence strings which are generated by phone recognizer or LVCSR and compare its results with those of retrieval through the phone-based vector space model.
PDF

Feature Extraction Based on GRFs for Facial Expression Recognition

Yoon, Myoong-Young
- Journal of Korea Society of Industrial Information Systems
- /
- v.7 no.3
- /
- pp.23-31
- /
- 2002
In this paper we propose a new feature vector for recognition of the facial expression based on Gibbs distributions which are well suited for representing the spatial continuity. The extracted feature vectors are invariant under translation rotation, and scale of an facial expression imege. The Algorithm for recognition of a facial expression contains two parts： the extraction of feature vector and the recognition process. The extraction of feature vector are comprised of modified 2-D conditional moments based on estimated Gibbs distribution for an facial image. In the facial expression recognition phase, we use discrete left-right HMM which is widely used in pattern recognition. In order to evaluate the performance of the proposed scheme, experiments for recognition of four universal expression (anger, fear, happiness, surprise) was conducted with facial image sequences on Workstation. Experiment results reveal that the proposed scheme has high recognition rate over 95%.
PDF

Gesture Recognition by Analyzing a Trajetory on Spatio-Temporal Space (시공간상의 궤적 분석에 의한 제스쳐 인식)

민병우;윤호섭;소정;에지마 도시야끼
- Journal of KIISE:Software and Applications
- /
- v.26 no.1
- /
- pp.157-157
- /
- 1999
Researches on the gesture recognition have become a very interesting topic in the computer vision area, Gesture recognition from visual images has a number of potential applicationssuch as HCI (Human Computer Interaction), VR(Virtual Reality), machine vision. To overcome thetechnical barriers in visual processing, conventional approaches have employed cumbersome devicessuch as datagloves or color marked gloves. In this research, we capture gesture images without usingexternal devices and generate a gesture trajectery composed of point-tokens. The trajectory Is spottedusing phase-based velocity constraints and recognized using the discrete left-right HMM. Inputvectors to the HMM are obtained by using the LBG clustering algorithm on a polar-coordinate spacewhere point-tokens on the Cartesian space .are converted. A gesture vocabulary is composed oftwenty-two dynamic hand gestures for editing drawing elements. In our experiment, one hundred dataper gesture are collected from twenty persons, Fifty data are used for training and another fifty datafor recognition experiment. The recognition result shows about 95% recognition rate and also thepossibility that these results can be applied to several potential systems operated by gestures. Thedeveloped system is running in real time for editing basic graphic primitives in the hardwareenvironments of a Pentium-pro (200 MHz), a Matrox Meteor graphic board and a CCD camera, anda Window95 and Visual C++ software environment.

Real-time Multiple People Tracking using Competitive Condensation (경쟁적 조건부 밀도 전파를 이용한 실시간 다중 인물 추적)

강희구;김대진;방승양
- Journal of KIISE:Software and Applications
- /
- v.30 no.7_8
- /
- pp.713-718
- /
- 2003
The CONDENSATION (Conditional Density Propagation) algorithm has a robust tracking performance and suitability for real-time implementation. However, the CONDENSATION tracker has some difficulties with real-time implementation for multiple people tracking since it requires very complicated shape modeling and a large number of samples for precise tracking performance. Further, it shows a poor tracking performance in the case of close or partially occluded people. To overcome these difficulties, we present three improvements: First, we construct effective templates of people´s shapes using the SOM (Self-Organizing Map). Second, we take the discrete HMM (Hidden Markov Modeling) for an accurate dynamical model of the people´s shape transition. Third, we use the competition rule to separate close or partially occluded people effectively. Simulation results shows that the proposed CONDENSATION algorithm can achieve robust and real-time tracking in the image sequences of a crowd of people.
PDF KSCI

Gaussian Model Optimization using Configuration Thread Control In CHMM Vocabulary Recognition (CHMM 어휘 인식에서 형상 형성 제어를 이용한 가우시안 모델 최적화)

Ahn, Chan-Shik;Oh, Sang-Yeob
- Journal of Digital Convergence
- /
- v.10 no.7
- /
- pp.167-172
- /
- 2012
In vocabulary recognition using HMM(Hidden Markov Model) by model for the observation of a discrete probability distribution indicates the advantages of low computational complexity, but relatively low recognition rate has the disadvantage that require sophisticated smoothing process. Gaussian mixtures in order to improve them with a continuous probability density CHMM (Continuous Hidden Markov Model) model is proposed for the optimization of the library system. In this paper is system configuration thread control in recognition Gaussian mixtures model provides a model to optimize of the CHMM vocabulary recognition. The result of applying the proposed system, the recognition rate of 98.1% in vocabulary recognition, respectively.
https://doi.org/10.14400/JDPM.2012.10.7.167 인용 PDF

Search Result 61, Processing Time 0.033 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)