• 제목/요약/키워드: 'Speech recognition

검색결과 2,053건 처리시간 0.023초

원거리 음성인식을 위한 MLLR적응기법 적용 (MLLR-Based Environment Adaptation for Distant-Talking Speech Recognition)

  • 권석봉;지미경;김회린;이용주
    • 대한음성학회지:말소리
    • /
    • 제53호
    • /
    • pp.119-127
    • /
    • 2005
  • Speech recognition is one of the user interface technologies in commanding and controlling any terminal such as a TV, PC, cellular phone etc. in a ubiquitous environment. In controlling a terminal, the mismatch between training and testing causes rapid performance degradation. That is, the mismatch decreases not only the performance of the recognition system but also the reliability of that. Therefore, the performance degradation due to the mismatch caused by the change of the environment should be necessarily compensated. Whenever the environment changes, environment adaptation is performed using the user's speech and the background noise of the changed environment and the performance is increased by employing the models appropriately transformed to the changed environment. So far, the research on the environment compensation has been done actively. However, the compensation method for the effect of distant-talking speech has not been developed yet. Thus, in this paper we apply MLLR-based environment adaptation to compensate for the effect of distant-talking speech and the performance is improved.

  • PDF

Speech Feature Extraction Based on the Human Hearing Model

  • Chung, Kwang-Woo;Kim, Paul;Hong, Kwang-Seok
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 1996년도 10월 학술대회지
    • /
    • pp.435-447
    • /
    • 1996
  • In this paper, we propose the method that extracts the speech feature using the hearing model through signal processing techniques. The proposed method includes the following procedure ; normalization of the short-time speech block by its maximum value, multi-resolution analysis using the discrete wavelet transformation and re-synthesize using the discrete inverse wavelet transformation, differentiation after analysis and synthesis, full wave rectification and integration. In order to verify the performance of the proposed speech feature in the speech recognition task, korean digit recognition experiments were carried out using both the DTW and the VQ-HMM. The results showed that, in the case of using DTW, the recognition rates were 99.79% and 90.33% for speaker-dependent and speaker-independent task respectively and, in the case of using VQ-HMM, the rate were 96.5% and 81.5% respectively. And it indicates that the proposed speech feature has the potential for use as a simple and efficient feature for recognition task

  • PDF

TMS320VC5510 DSK를 이용한 음성인식 로봇 (The Robot Speech Recognition using TMS320VC5510 DSK)

  • 최지현;정익주
    • 산업기술연구
    • /
    • 제27권A호
    • /
    • pp.211-218
    • /
    • 2007
  • As demands for interaction of humans and robots are increasing, robots are expected to be equipped with intelligibility which humans have. Especially, for natural communication, hearing capabilities are so essential that speech recognition technology for robot is getting more important. In this paper, we implement a speech recognizer suitable for robot applications. One of the major problem in robot speech recognition is poor speech quality captured when a speaker talks distant from the microphone a robot is mounted with. To cope with this problem, we used wireless transmission of commands recognized by the speech recognizer implemented using TMS320VC5510 DSK. In addition, as for implementation, since TMS320VC5510 DSP is a fixed-point device, we represent efficient realization of HMM algorithm using fixed-point arithmetic.

  • PDF

독일어 감정음성에서 추출한 포먼트의 분석 및 감정인식 시스템과 음성인식 시스템에 대한 음향적 의미 (An Analysis of Formants Extracted from Emotional Speech and Acoustical Implications for the Emotion Recognition System and Speech Recognition System)

  • 이서배
    • 말소리와 음성과학
    • /
    • 제3권1호
    • /
    • pp.45-50
    • /
    • 2011
  • Formant structure of speech associated with five different emotions (anger, fear, happiness, neutral, sadness) was analysed. Acoustic separability of vowels (or emotions) associated with a specific emotion (or vowel) was estimated using F-ratio. According to the results, neutral showed the highest separability of vowels followed by anger, happiness, fear, and sadness in descending order. Vowel /A/ showed the highest separability of emotions followed by /U/, /O/, /I/ and /E/ in descending order. The acoustic results were interpreted and explained in the context of previous articulatory and perceptual studies. Suggestions for the performance improvement of an automatic emotion recognition system and automatic speech recognition system were made.

  • PDF

가중 ARMA 필터를 이용한 강인한 음성인식 (Robust Speech Recognition Using Weighted Auto-Regressive Moving Average Filter)

  • 반성민;김형순
    • 말소리와 음성과학
    • /
    • 제2권4호
    • /
    • pp.145-151
    • /
    • 2010
  • In this paper, a robust feature compensation method is proposed for improving the performance of speech recognition. The proposed method is incorporated into the auto-regressive moving average (ARMA) based feature compensation. We employ variable weights for the ARMA filter according to the degree of speech activity, and pass the normalized cepstral sequence through the weighted ARMA filter. Additionally when normalizing the cepstral sequences in training, the cepstral means and variances are estimated from total training utterances. Experimental results show the proposed method significantly improves the speech recognition performance in the noisy and reverberant environments.

  • PDF

음성인식을 위한 웹페이지 변환 웹서비스와 음성라이브러리 구현 (An Implementation of the Speech-Library and Conversion Web-Services of the Web-Page for Speech-Recognition)

  • 오지영;김윤중
    • 한국콘텐츠학회:학술대회논문집
    • /
    • 한국콘텐츠학회 2006년도 추계 종합학술대회 논문집
    • /
    • pp.478-482
    • /
    • 2006
  • 본 연구에서는 음성인식을 위한 웹페이지 변환 웹서비스와 음성을 녹음하고 전송하는 음성라이브러리를 구현하였다. 구현된 시스템은 웹서비스 소비자와 웹서비스 제공자들로 구성되어 있다. 웹서비스 소비자는 음성을 녹음하고 웹서비스를 호출하여 음성인식을 요청한 후 결과를 사용자에게 반환하는 기능을 한다. 웹서비스 소비자는 음성라이브러리(speech-Library)와 웹서비스와 통신하는 프록시라이브러리를 포함한다. 음성라이브러리는 사용자가 녹음한 음성에서 음성데이터만 추출하는 전처리 과정과 사용자의 음성과 매핑되는 링크를 검색하는 기능을 수행한다. 프록시라이브러리의 기능은 두개의 웹서비스를 호출하고 반환되는 결과 값을 수신 받는다. 웹서비스 제공자는 파싱 웹서비스와 음성인식 웹서비스로 구성되어있다. 파싱 웹서비스는 일반 웹페이지를 ActiveX 컨트롤을 삽입하여 음성인식이 가능한 웹페이지로 재구성한다. 음성인식 웹서비스는 기존의 연구에서 구현된 시스템을 사용하였다. 실험 결과, 일반 웹페이지를 재구성하고 링크 테이블을 생성한 것을 확인할 수 있었다. 또 한 사용자의 음성과 매핑되는 URL을 검색하는 것도 확인하였다. 또한 음성인식 웹서비스의 결과에 매핑되는 URL를 검색하여 사용자에게 웹페이지를 반환하는 것도 확인하였다.

  • PDF

잡음 환경 하에서의 입술 정보와 PSO-NCM 최적화를 통한 거절 기능 성능 향상 (Improvement of Rejection Performance using the Lip Image and the PSO-NCM Optimization in Noisy Environment)

  • 김병돈;최승호
    • 말소리와 음성과학
    • /
    • 제3권2호
    • /
    • pp.65-70
    • /
    • 2011
  • Recently, audio-visual speech recognition (AVSR) has been studied to cope with noise problems in speech recognition. In this paper we propose a novel method of deciding weighting factors for audio-visual information fusion. We adopt the particle swarm optimization (PSO) to weighting factor determination. The AVSR experiments show that PSO-based normalized confidence measures (NCM) improve the rejection performance of mis-recognized words by 33%.

  • PDF

Maximum Likelihood Training and Adaptation of Embedded Speech Recognizers for Mobile Environments

  • Cho, Young-Kyu;Yook, Dong-Suk
    • ETRI Journal
    • /
    • 제32권1호
    • /
    • pp.160-162
    • /
    • 2010
  • For the acoustic models of embedded speech recognition systems, hidden Markov models (HMMs) are usually quantized and the original full space distributions are represented by combinations of a few quantized distribution prototypes. We propose a maximum likelihood objective function to train the quantized distribution prototypes. The experimental results show that the new training algorithm and the link structure adaptation scheme for the quantized HMMs reduce the word recognition error rate by 20.0%.

카오스차원에 의한 화자식별 파라미터 추출 (Extraction of Speaker Recognition Parameter Using Chaos Dimension)

  • 유병욱;김창석
    • 음성과학
    • /
    • 제1권
    • /
    • pp.285-293
    • /
    • 1997
  • This paper was constructed to investigate strange attractor in considering speech which is regarded as chaos in that the random signal appears in the deterministic raising system. This paper searches for the delay time from AR model power spectrum for constructing fit attractor for speech signal. As a result of applying Taken's embedding theory to the delay time, an exact correlation dimension solution is obtained. As a result of this consideration of speech, it is found that it has more speaker recognition characteristic parameter, and gains a large speaker discrimination recognition rate.

  • PDF

성도 면적 함수와 벡터 양자화를 이용한 음성 인식에 관한 연구 (A Study on Speech Recognition using Vocal Tract Area function and Vector Quantization)

  • 송제혁;김동준;박상희
    • 대한의용생체공학회:학술대회논문집
    • /
    • 대한의용생체공학회 1993년도 추계학술대회
    • /
    • pp.171-174
    • /
    • 1993
  • We propose the vocal tract area function as the feature vector of speech recognition. Vocal tract area function is directly related to speech production. The vocal tract area function is not only showing mechanism of speech production but also can be used as an effective feature vector in speech, recognition in this study.

  • PDF