통합 검색 | Korea Science

Intra-and Inter-frame Features for Automatic Speech Recognition

Lee, Sung Joo;Kang, Byung Ok;Chung, Hoon;Lee, Yunkeun
- ETRI Journal
- /
- 제36권3호
- /
- pp.514-517
- /
- 2014
In this paper, alternative dynamic features for speech recognition are proposed. The goal of this work is to improve speech recognition accuracy by deriving the representation of distinctive dynamic characteristics from a speech spectrum. This work was inspired by two temporal dynamics of a speech signal. One is the highly non-stationary nature of speech, and the other is the inter-frame change of a speech spectrum. We adopt the use of a sub-frame spectrum analyzer to capture very rapid spectral changes within a speech analysis frame. In addition, we attempt to measure spectral fluctuations of a more complex manner as opposed to traditional dynamic features such as delta or double-delta. To evaluate the proposed features, speech recognition tests over smartphone environments were conducted. The experimental results show that the feature streams simply combined with the proposed features are effective for an improvement in the recognition accuracy of a hidden Markov model-based speech recognizer.
https://doi.org/10.4218/etrij.14.0213.0181 인용 PDF KSCI KPUBS

Discrimination of Emotional States In Voice and Facial Expression

Kim, Sung-Ill;Yasunari Yoshitomi;Chung, Hyun-Yeol
- The Journal of the Acoustical Society of Korea
- /
- 제21권2E호
- /
- pp.98-104
- /
- 2002
The present study describes a combination method to recognize the human affective states such as anger, happiness, sadness, or surprise. For this, we extracted emotional features from voice signals and facial expressions, and then trained them to recognize emotional states using hidden Markov model (HMM) and neural network (NN). For voices, we used prosodic parameters such as pitch signals, energy, and their derivatives, which were then trained by HMM for recognition. For facial expressions, on the other hands, we used feature parameters extracted from thermal and visible images, and these feature parameters were then trained by NN for recognition. The recognition rates for the combined parameters obtained from voice and facial expressions showed better performance than any of two isolated sets of parameters. The simulation results were also compared with human questionnaire results.
PDF KSCI

개선된 고차상관 특징계수와 은닉마르코프 모델을 이용한 제스처 인식에 관한 연구 (A Study on Gesture Recognition using Improved Higher Order Local Correlation Features and HMM)

김종민
- 한국정보통신학회:학술대회논문집
- /
- 한국정보통신학회 2013년도 춘계학술대회
- /
- pp.521-524
- /
- 2013
본 논문에서는 개선된 고차상관 특징계수 통해서 얻어진 특징 정보를 제스처 심볼로 구성하여 인식하는 알고리즘에 대해 기술한다. 제안된 방법은 기존의 기하학적인 특징 기반 방법이나 외관기반 방법의 비해 많은 계산 량이 요구 되지 않고 최소한의 정보를 사용하고도 높은 인식률을 유지 할 수 있기에 실시간 시스템 구축에 매우 적합하다.
PDF

Enhanced Maximum Voiced Frequency Estimation Scheme for HTS Using Two-Band Excitation Model

Park, Jihoon;Hahn, Minsoo
- ETRI Journal
- /
- 제37권6호
- /
- pp.1211-1219
- /
- 2015
In a hidden Markov model-based speech synthesis system using a two-band excitation model, a maximum voiced frequency (MVF) is the most important feature as an excitation parameter because the synthetic speech quality depends on the MVF. This paper proposes an enhanced MVF estimation scheme based on a peak picking method. In the proposed scheme, both local peaks and peak lobes are picked from the spectrum of a linear predictive residual signal. The average of the normalized distances of local peaks and peak lobes is calculated and utilized as a feature to estimate an MVF. Experimental results of both objective and subjective tests show that the proposed scheme improves the synthetic speech quality compared with that of a conventional one in a mobile device as well as a PC environment.
https://doi.org/10.4218/etrij.15.0115.0124 인용 PDF KSCI

입술의 대칭성에 기반한 효율적인 립리딩 방법 (An Efficient Lipreading Method Based on Lip's Symmetry)

김진범;김진영
- 대한전자공학회논문지SP
- /
- 제37권5호
- /
- pp.105-114
- /
- 2000
본 논문에서는 영상 변환 기반 자동 립리딩 알고리즘에서 처리하는 데이터 수를 효과적으로 감소시키는데 중점을 두었다 화자의 입술에 대한 압축된 정보를 갖는 영상 변환 방식이 입술 윤곽선 기반 방식보다 우수한 립리딩 성능을 보이지만 이 방식은 입술 특정 파라미터를 다수 갖게 되므로 데이터 처리량이 많아지고 인식시간이 길어지게 된다 계산되는 데이터를 줄이기 위해 우리는 엽술의 대칭성에 기반하여 입술영상을 수직으로 접는 간단한 방법을 제안한다 추가적으로 주성분 분석(PCA) 알고리즘을 사용하여 빠른 알고리즘을 고려하였고, HMM을 이용한 단어 인식실험 결과를 보인다 제안된 방법에서 접어진 입술영상을 이용한 결과, 일반적으로 $16{\times}16$ 입술영상을 사용하는 방법에 비해 특정파라미터 수가 $22{\sim}47%$ 감소하였고, HMM(hidden Markov model) 인식 알고리즘을 이용한 단어 인식률에서도 $2{\sim}3%$ 개선된 결과를 얻었다.
PDF

웨이블렛 변환과 HMM을 이용한 고유공간 기반 얼굴인식에 관한 연구 (A Study on Eigenspace Face Recognition using Wavelet Transform and HMM)

이정재;김종민
- 한국정보통신학회논문지
- /
- 제16권10호
- /
- pp.2121-2128
- /
- 2012
본 논문은 Wavelet 변환을 이용한 실시간 얼굴 영역 검출을 제안하였으며, 계산의 효율성과 검출 성능을 동시에 만족시키는 강인한 검출 알고리즘을 제안하였다. 검출된 얼굴 영상은 주성분 분석을 통해 저차원 얼굴 심볼로 구성하여 얼굴을 인식한다. 제안된 방법은 기존의 기하학적인 특징 기반 방법이나 외관기반 방법의 비해 많은 계산 량이 요구 되지 않고 최소한의 정보를 사용하고도 높은 인식률을 유지 할 수 있기에 실시간 시스템 구축에 매우 적합하다. 또한 얼굴 인식 시 발생하는 잘못된 인식이나 인식 오차를 줄이기 위해 고유 공간상에 투영된 모델 특징 값을 군집화 알고리즘을 통해 특정한 기호로 구성하여 은닉마르코프 모델의 입력 기호로 사용하였다. 이렇게 함으로써 임의의 입력 얼굴은 확률 값이 가장 높은 해당 얼굴 모델로 인식하게 된다. 실험 결과 기존의 방식인 Euclidean과 Mahananobis방법 보다 제안한 방법이 잘못된 매칭이나 매칭 실패에서 우수한 인식 성능을 보였다.
https://doi.org/10.6109/jkiice.2012.16.10.2121 인용 PDF KSCI

Recognition Performance Improvement of Unsupervised Limabeam Algorithm using Post Filtering Technique

Nguyen, Dinh Cuong;Choi, Suk-Nam;Chung, Hyun-Yeol
- 대한임베디드공학회논문지
- /
- 제8권4호
- /
- pp.185-194
- /
- 2013
Abstract- In distant-talking environments, speech recognition performance degrades significantly due to noise and reverberation. Recent work of Michael L. Selzer shows that in microphone array speech recognition, the word error rate can be significantly reduced by adapting the beamformer weights to generate a sequence of features which maximizes the likelihood of the correct hypothesis. In this approach, called Likelihood Maximizing Beamforming algorithm (Limabeam), one of the method to implement this Limabeam is an UnSupervised Limabeam(USL) that can improve recognition performance in any situation of environment. From our investigation for this USL, we could see that because the performance of optimization depends strongly on the transcription output of the first recognition step, the output become unstable and this may lead lower performance. In order to improve recognition performance of USL, some post-filter techniques can be employed to obtain more correct transcription output of the first step. In this work, as a post-filtering technique for first recognition step of USL, we propose to add a Wiener-Filter combined with Feature Weighted Malahanobis Distance to improve recognition performance. We also suggest an alternative way to implement Limabeam algorithm for Hidden Markov Network (HM-Net) speech recognizer for efficient implementation. Speech recognition experiments performed in real distant-talking environment confirm the efficacy of Limabeam algorithm in HM-Net speech recognition system and also confirm the improved performance by the proposed method.
https://doi.org/10.14372/IEMEK.2013.8.4.185 인용 PDF KSCI

표정 HMM과 사후 확률을 이용한 얼굴 표정 인식 프레임워크 (A Recognition Framework for Facial Expression by Expression HMM and Posterior Probability)

김진옥
- 한국정보과학회논문지:컴퓨팅의 실제 및 레터
- /
- 제11권3호
- /
- pp.284-291
- /
- 2005
본 연구에서는 학습한 표정 패턴을 기반으로 비디오에서 사람의 얼굴을 검출하고 표정을 분석하여 분류하는 프레임워크를 제안한다. 제안 프레임워크는 얼굴 표정을 인식하는데 있어 공간적 정보 외시간에 따라 변하는 표정의 패턴을 표현하기 위해 표정 특성을 공간적으로 분석한 PCA와 시공간적으로 분석한 Hidden Markov Model(HMM) 기반의 표정 HMM을 이용한다. 표정의 공간적 특징 추출은 시간적 분석 과정과 밀접하게 연관되어 있기 때문에 다양하게 변화하는 표정을 검출하여 추적하고 분류하는데 HMM의 시공간적 접근 방식을 적용하면 효과적이기 때문이다. 제안 인식 프레임워크는 현재의 시각적 관측치와 이전 시각적 결과간의 사후 확률 방법에 의해 완성된다. 결과적으로 제안 프레임워크는 대표적인 6개 표정뿐만 아니라 표정의 정도가 약한 프레임에 대해서도 정확하고 강건한 표정 인식 결과를 보인다. 제안 프레임 워크를 이용하면 표정 인식, HCI, 키프레임 추출과 같은 응용 분야 구현에 효과적이다
PDF KSCI

안면근육 표면근전도 신호기반 근육 조합 최적화를 통한 단모음인식 (Monophthong Recognition Optimizing Muscle Mixing Based on Facial Surface EMG Signals)

이병현;류재환;이미란;김덕환
- 전자공학회논문지
- /
- 제53권3호
- /
- pp.143-150
- /
- 2016
본 논문에서는 안면근육 표면근전도를 기반으로 근육 조합 최적화를 통한 한국어 단모음 인식 방법을 제안한다. 표면근전도 신호는 한국어 단모음 발음에 따라 서로 다른 패턴과 근육 활성도를 보였다. 이전 연구에서 높은 인식 정확도를 보였던 RMS, VAR, MMAV1, MMAV2와 Cepstral Coefficients를 특징 추출 알고리즘으로 사용하였으며, QDA(Quadratic Discriminant Analysis)와 HMM(Hidden Markov Model)으로 한국어 단모음을 분류하였다. 트레이닝 단계에서 입력 받은 데이터로 근육조합을 최적화하고, 최적화 결과를 인식단계에 적용한다. 이때, 새로운 근전도 신호를 입력받고 한국어 단모음을 최종 인식한다. 실험결과 제안한 방법의 인식 정확도가 QDA에서 평균 85.7%, HMM에서 평균 75.1%를 보였다.
https://doi.org/10.5573/ieie.2016.53.3.143 인용 PDF KSCI

이동형 정보 증강 시스템을 위한 실시간 장소 인식 (Real-Time Place Recognition for Augmented Mobile Information Systems)

오수진;남양희
- 한국정보과학회논문지:컴퓨팅의 실제 및 레터
- /
- 제14권5호
- /
- pp.477-481
- /
- 2008
이동 중 사용자에게 필요한 정보를 제공하기 위해서는 장소를 인지하는 기술이 필요하다. 본 논문에서는 건물 내에서 이동하면서 카메라에 의해 포착된 영상 정보를 분석하여 현재 장소를 파악하고 카메라 영상에 관련 정보를 증강하는 비디오 기반 실시간 장소인식 시스템을 제안한다. 영상의 전역적 특징을 이용한 기존 연구들은 장면의 부분적인 폐색이나 잡음에 민감하고, 물체인식을 행하는 지역적 특징 의존 방식은 계산량이 많아 실시간 적용이 어렵다. 또한, 그러한 특징들로부터 장소인식 결과를 도출하기 위해서는 통계적 그래프 기반 모델이나 베이시안 네트웍등이 이용되어 왔는데, 전자의 경우 장소 이동의 확률을 얻기 위한 많은 통계 데이타가 필요하며, 후자는 장소 이동문맥을 활용하지 못하므로 물체 인식 결과에만 의존하는 단점이 있다. 본 논문에서는 장소 문맥 정보를 활용하면서 영상의 지역적, 전역적 특징추출법의 결합을 통해 부분 폐색 및 잡음에 대한 전역적 방법의 민감성을 보완하고, 지역적 방법의 느린 처리속도를 보완한 시스템을 제안한다. 제안된 방법을 건물 내부를 이동하면서 장소에 대한 정보를 얻는 정보증강 시스템에 적용하여 실시간 성능을 확인하였다.
PDF KSCI

검색결과 195건 처리시간 0.021초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)