• Title/Summary/Keyword: recognition distance

Search Result 1,007, Processing Time 0.026 seconds

A study on extraction of the frames representing each phoneme in continuous speech (연속음에서의 각 음소의 대표구간 추출에 관한 연구)

  • 박찬응;이쾌희
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.33B no.4
    • /
    • pp.174-182
    • /
    • 1996
  • In continuous speech recognition system, it is possible to implement the system which can handle unlimited number of words by using limited number of phonetic units such as phonemes. Dividing continuous speech into the string of tems of phonemes prior to recognition process can lower the complexity of the system. But because of the coarticulations between neiboring phonemes, it is very difficult ot extract exactly their boundaries. In this paper, we propose the algorithm ot extract short terms which can represent each phonemes instead of extracting their boundaries. The short terms of lower spectral change and higher spectral chang eare detcted. Then phoneme changes are detected using distance measure with this lower spectral change terms, and hgher spectral change terms are regarded as transition terms or short phoneme terms. Finally lower spectral change terms and the mid-term of higher spectral change terms are regarded s the represent each phonemes. The cepstral coefficients and weighted cepstral distance are used for speech feature and measuring the distance because of less computational complexity, and the speech data used in this experimetn was recoreded at silent and ordinary in-dorr environment. Through the experimental results, the proposed algorithm showed higher performance with less computational complexity comparing with the conventional segmetnation algorithms and it can be applied usefully in phoneme-based continuous speech recognition.

  • PDF

Pattern Recognition with Rotation Invariant Multiresolution Features

  • Rodtook, S.;Makhanov, S.S.
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2004.08a
    • /
    • pp.1057-1060
    • /
    • 2004
  • We propose new rotation moment invariants based on multiresolution filter bank techniques. The multiresolution pyramid motivates our simple but efficient feature selection procedure based on the fuzzy C-mean clustering, combined with the Mahalanobis distance. The procedure verifies an impact of random noise as well as an interesting and less known impact of noise due to spatial transformations. The recognition accuracy of the proposed techniques has been tested with the preceding moment invariants as well as with some wavelet based schemes. The numerical experiments, with more than 30,000 images, demonstrate a tangible accuracy increase of about 3% for low noise, 8% for the average noise and 15% for high level noise.

  • PDF

Realtime Face Recognition by Analysis of Feature Information (특징정보 분석을 통한 실시간 얼굴인식)

  • Chung, Jae-Mo;Bae, Hyun;Kim, Sung-Shin
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2001.12a
    • /
    • pp.299-302
    • /
    • 2001
  • The statistical analysis of the feature extraction and the neural networks are proposed to recognize a human face. In the preprocessing step, the normalized skin color map with Gaussian functions is employed to extract the region of face candidate. The feature information in the region of the face candidate is used to detect the face region. In the recognition step, as a tested, the 120 images of 10 persons are trained by the backpropagation algorithm. The images of each person are obtained from the various direction, pose, and facial expression. Input variables of the neural networks are the geometrical feature information and the feature information that comes from the eigenface spaces. The simulation results of$.$10 persons show that the proposed method yields high recognition rates.

  • PDF

A Study on the Recognition of Korean(Consonant) Characters Using Rapid Transform (Rapid Transform에 의한 한글(자음) 인식에 관한 연구)

  • Song, In-Jun;Lee, Jong-Ha;Kwak, Hoon-Sung
    • Proceedings of the KIEE Conference
    • /
    • 1987.07b
    • /
    • pp.1081-1084
    • /
    • 1987
  • The Rapid transform is used in the recognition of Korean (Consonant) characters. The test pattern is represented by two gray levels (0 and 1). A 2-dimensinal rapid transform of the test pattern is computed. Feature selection is carried out in the Rapid transform domain. These features are used with the corresponding features of the template patterns in features of the template patterns in computing the Euclidian distance function and the decision is made based on the minimum distance criterion. Experimental results show that recognition rate is 94%.

  • PDF

Realtime Face Recognition by Analysis of Feature Information (특징정보 분석을 통한 실시간 얼굴인식)

  • Chung, Jae-Mo;Bae, Hyun;Kim, Sung-Shin
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.11 no.9
    • /
    • pp.822-826
    • /
    • 2001
  • The statistical analysis of the feature extraction and the neural networks are proposed to recognize a human face. In the preprocessing step, the normalized skin color map with Gaussian functions is employed to extract the region of face candidate. The feature information in the region of the face candidate is used to detect the face region. In the recognition step, as a tested, the 120 images of 10 persons are trained by the backpropagation algorithm. The images of each person are obtained from the various direction, pose, and facial expression. Input variables of the neural networks are the geometrical feature information and the feature information that comes from the eigenface spaces. The simulation results of 10 persons show that the proposed method yields high recognition rates.

  • PDF

Road Lane and Vehicle Distance Recognition using Real-time Analysis of Camera Images (카메라 영상의 실시간 분석에 의한 차선 및 차간 인식)

  • Kang, Moon-Seol;Kim, Yu-Sin
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.16 no.12
    • /
    • pp.2665-2674
    • /
    • 2012
  • This paper propose the method to recognize the lanes and distance between cars in real-time which detects dangerous situations and helps safe driving in the actual road environment. First of all, it extracts the area of interest corresponding to roads and cars from the road image photographed by using the forward-looking camera. Through the hough transform for the area of interest, this study detects linear components and also selects the lane and conducts filtering by calculating probability. And through the shadow threshold analysis of the cars in front within the area of interest, it extracts the objects of cars in front and calculates the distance from cars in front. According to the result of applying the suggested technology to recognize the lane and distance between cars to the road situation for testing, it showed over 95% recognition rate; thus, it has been proved that it can respond to safe driving.

Performance Evaluation of an Automatic Distance Speech Recognition System (원거리 음성명령어 인식시스템 설계)

  • Oh, Yoo-Rhee;Yoon, Jae-Sam;Park, Ji-Hoon;Kim, Min-A;Kim, Hong-Kook;Kong, Dong-Geon;Myung, Hyun;Bang, Seok-Won
    • Proceedings of the IEEK Conference
    • /
    • 2007.07a
    • /
    • pp.303-304
    • /
    • 2007
  • In this paper, we implement an automatic distance speech recognition system for voiced-enabled services. We first construct a baseline automatic speech recognition (ASR) system, where acoustic models are trained from speech utterances spoken by using a cross-talking microphone. In order to improve the performance of the baseline ASR using distance speech, the acoustic models are adapted to adjust the spectral characteristics of speech according to different microphones and the environmental mismatches between cross-talking and distance speech. Next we develop a voice activity detection algorithm for distance speech. We compare the performance of the base-line system and the developed ASR system on a task of PBW (Phonetically Balanced Word) 452. As a result it is shown that the developed ASR system provides the average word error rate (WER) reduction of 30.6 % compared to the baseline ASR system.

  • PDF

Cepstral Distance and Log-Energy Based Silence Feature Normalization for Robust Speech Recognition (강인한 음성인식을 위한 켑스트럼 거리와 로그 에너지 기반 묵음 특징 정규화)

  • Shen, Guang-Hu;Chung, Hyun-Yeol
    • The Journal of the Acoustical Society of Korea
    • /
    • v.29 no.4
    • /
    • pp.278-285
    • /
    • 2010
  • The difference between training and test environments is one of the major performance degradation factors in noisy speech recognition and many silence feature normalization methods were proposed to solve this inconsistency. Conventional silence feature normalization method represents higher classification performance in higher SNR, but it has a problem of performance degradation in low SNR due to the low accuracy of speech/silence classification. On the other hand, cepstral distance represents well the characteristic distribution of speech/silence (or noise) in low SNR. In this paper, we propose a Cepstral distance and Log-energy based Silence Feature Normalization (CLSFN) method which uses both log-energy and cepstral euclidean distance to classify speech/silence for better performance. Because the proposed method reflects both the merit of log energy being less affected with noise in high SNR and the merit of cepstral distance having high discrimination accuracy for speech/silence classification in low SNR, the classification accuracy will be considered to be improved. The experimental results showed that our proposed CLSFN presented the improved recognition performances comparing with the conventional SFN-I/II and CSFN methods in all kinds of noisy environments.

Front-End Processing for Speech Recognition in the Telephone Network (전화망에서의 음성인식을 위한 전처리 연구)

  • Jun, Won-Suk;Shin, Won-Ho;Yang, Tae-Young;Kim, Weon-Goo;Youn, Dae-Hee
    • The Journal of the Acoustical Society of Korea
    • /
    • v.16 no.4
    • /
    • pp.57-63
    • /
    • 1997
  • In this paper, we study the efficient feature vector extraction method and front-end processing to improve the performance of the speech recognition system using KT(Korea Telecommunication) database collected through various telephone channels. First of all, we compare the recognition performances of the feature vectors known to be robust to noise and environmental variation and verify the performance enhancement of the recognition system using weighted cepstral distance measure methods. The experiment result shows that the recognition rate is increasedby using both PLP(Perceptual Linear Prediction) and MFCC(Mel Frequency Cepstral Coefficient) in comparison with LPC cepstrum used in KT recognition system. In cepstral distance measure, the weighted cepstral distance measure functions such as RPS(Root Power Sums) and BPL(Band-Pass Lifter) help the recognition enhancement. The application of the spectral subtraction method decrease the recognition rate because of the effect of distortion. However, RASTA(RelAtive SpecTrAl) processing, CMS(Cepstral Mean Subtraction) and SBR(Signal Bias Removal) enhance the recognition performance. Especially, the CMS method is simple but shows high recognition enhancement. Finally, the performances of the modified methods for the real-time implementation of CMS are compared and the improved method is suggested to prevent the performance degradation.

  • PDF

Automatic Clustering of Speech Data Using Modified MAP Adaptation Technique (수정된 MAP 적응 기법을 이용한 음성 데이터 자동 군집화)

  • Ban, Sung Min;Kang, Byung Ok;Kim, Hyung Soon
    • Phonetics and Speech Sciences
    • /
    • v.6 no.1
    • /
    • pp.77-83
    • /
    • 2014
  • This paper proposes a speaker and environment clustering method in order to overcome the degradation of the speech recognition performance caused by various noise and speaker characteristics. In this paper, instead of using the distance between Gaussian mixture model (GMM) weight vectors as in the Google's approach, the distance between the adapted mean vectors based on the modified maximum a posteriori (MAP) adaptation is used as a distance measure for vector quantization (VQ) clustering. According to our experiments on the simulation data generated by adding noise to clean speech, the proposed clustering method yields error rate reduction of 10.6% compared with baseline speaker-independent (SI) model, which is slightly better performance than the Google's approach.