• Title/Summary/Keyword: Speech-Recognition

Search Result 2,045, Processing Time 0.026 seconds

Pattern Recognition Methods for Emotion Recognition with speech signal

  • Park Chang-Hyun;Sim Kwee-Bo
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.6 no.2
    • /
    • pp.150-154
    • /
    • 2006
  • In this paper, we apply several pattern recognition algorithms to emotion recognition system with speech signal and compare the results. Firstly, we need emotional speech databases. Also, speech features for emotion recognition are determined on the database analysis step. Secondly, recognition algorithms are applied to these speech features. The algorithms we try are artificial neural network, Bayesian learning, Principal Component Analysis, LBG algorithm. Thereafter, the performance gap of these methods is presented on the experiment result section.

Speech Recognition in Noisy Environments using Wiener Filtering (Wiener Filtering을 이용한 잡음환경에서의 음성인식)

  • Kim, Jin-Young;Eom, Ki-Wan;Choi, Hong-Sub
    • Speech Sciences
    • /
    • v.1
    • /
    • pp.277-283
    • /
    • 1997
  • In this paper, we present a robust recognition algorithm based on the Wiener filtering method as a research tool to develop the Korean Speech recognition system. We especially used Wiener filtering method in cepstrum-domain, because the method in frequency-domain is computationally expensive and complex. Evaluation of the effectiveness of this method has been conducted in speaker-independent isolated Korean digit recognition tasks using discrete HMM speech recognition systems. In these tasks, we used 12th order weighted cepstral as a feature vector and added computer simulated white gaussian noise of different levels to clean speech signals for recognition experiments under noisy conditions. Experimental results show that the presented algorithm can provide an improvement in recognition of as much as from $5\%\;to\;\20\%$ in comparison to spectral subtraction method.

  • PDF

Feature Extraction Based on Speech Attractors in the Reconstructed Phase Space for Automatic Speech Recognition Systems

  • Shekofteh, Yasser;Almasganj, Farshad
    • ETRI Journal
    • /
    • v.35 no.1
    • /
    • pp.100-108
    • /
    • 2013
  • In this paper, a feature extraction (FE) method is proposed that is comparable to the traditional FE methods used in automatic speech recognition systems. Unlike the conventional spectral-based FE methods, the proposed method evaluates the similarities between an embedded speech signal and a set of predefined speech attractor models in the reconstructed phase space (RPS) domain. In the first step, a set of Gaussian mixture models is trained to represent the speech attractors in the RPS. Next, for a new input speech frame, a posterior-probability-based feature vector is evaluated, which represents the similarity between the embedded frame and the learned speech attractors. We conduct experiments for a speech recognition task utilizing a toolkit based on hidden Markov models, over FARSDAT, a well-known Persian speech corpus. Through the proposed FE method, we gain 3.11% absolute phoneme error rate improvement in comparison to the baseline system, which exploits the mel-frequency cepstral coefficient FE method.

Extraction of Speech Features for Emotion Recognition (감정 인식을 위한 음성 특징 도출)

  • Kwon, Chul-Hong;Song, Seung-Kyu;Kim, Jong-Yeol;Kim, Keun-Ho;Jang, Jun-Su
    • Phonetics and Speech Sciences
    • /
    • v.4 no.2
    • /
    • pp.73-78
    • /
    • 2012
  • Emotion recognition is an important technology in the filed of human-machine interface. To apply speech technology to emotion recognition, this study aims to establish a relationship between emotional groups and their corresponding voice characteristics by investigating various speech features. The speech features related to speech source and vocal tract filter are included. Experimental results show that statistically significant speech parameters for classifying the emotional groups are mainly related to speech sources such as jitter, shimmer, F0 (F0_min, F0_max, F0_mean, F0_std), harmonic parameters (H1, H2, HNR05, HNR15, HNR25, HNR35), and SPI.

A Robust Speech Recognition Method Combining the Model Compensation Method with the Speech Enhancement Algorithm (음질향상 기법과 모델보상 방식을 결합한 강인한 음성인식 방식)

  • Kim, Hee-Keun;Chung, Yong-Joo;Bae, Keun-Seung
    • Speech Sciences
    • /
    • v.14 no.2
    • /
    • pp.115-126
    • /
    • 2007
  • There have been many research efforts to improve the performance of the speech recognizer in noisy conditions. Among them, the model compensation method and the speech enhancement approach have been used widely. In this paper, we propose to combine the two different approaches to further enhance the recognition rates in the noisy speech recognition. For the speech enhancement, the minimum mean square error-short time spectral amplitude (MMSE-STSA) has been adopted and the parallel model combination (PMC) and Jacobian adaptation (JA) have been used as the model compensation approaches. From the experimental results, we could find that the hybrid approach that applies the model compensation methods to the enhanced speech produce better results than just using only one of the two approaches.

  • PDF

A Study on the Robust Bimodal Speech-recognition System in Noisy Environments (잡음 환경에 강인한 이중모드 음성인식 시스템에 관한 연구)

  • 이철우;고인선;계영철
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.1
    • /
    • pp.28-34
    • /
    • 2003
  • Recent researches have been focusing on jointly using lip motions (i.e. visual speech) and speech for reliable speech recognitions in noisy environments. This paper also deals with the method of combining the result of the visual speech recognizer and that of the conventional speech recognizer through putting weights on each result: the paper proposes the method of determining proper weights for each result and, in particular, the weights are autonomously determined, depending on the amounts of noise in the speech and the image quality. Simulation results show that combining the audio and visual recognition by the proposed method provides the recognition performance of 84% even in severely noisy environments. It is also shown that in the presence of blur in images, the newly proposed weighting method, which takes the blur into account as well, yields better performance than the other methods.

Korean Broadcast News Transcription Using Morpheme-based Recognition Units

  • Kwon, Oh-Wook;Alex Waibel
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.1E
    • /
    • pp.3-11
    • /
    • 2002
  • Broadcast news transcription is one of the hardest tasks in speech recognition because broadcast speech signals have much variability in speech quality, channel and background conditions. We developed a Korean broadcast news speech recognizer. We used a morpheme-based dictionary and a language model to reduce the out-of·vocabulary (OOV) rate. We concatenated the original morpheme pairs of short length or high frequency in order to reduce insertion and deletion errors due to short morphemes. We used a lexicon with multiple pronunciations to reflect inter-morpheme pronunciation variations without severe modification of the search tree. By using the merged morpheme as recognition units, we achieved the OOV rate of 1.7% comparable to European languages with 64k vocabulary. We implemented a hidden Markov model-based recognizer with vocal tract length normalization and online speaker adaptation by maximum likelihood linear regression. Experimental results showed that the recognizer yielded 21.8% morpheme error rate for anchor speech and 31.6% for mostly noisy reporter speech.

A Study on the Noisy Speech Recognition Based on the Data-Driven Model Parameter Compensation (직접데이터 기반의 모델적응 방식을 이용한 잡음음성인식에 관한 연구)

  • Chung, Yong-Joo
    • Speech Sciences
    • /
    • v.11 no.2
    • /
    • pp.247-257
    • /
    • 2004
  • There has been many research efforts to overcome the problems of speech recognition in the noisy conditions. Among them, the model-based compensation methods such as the parallel model combination (PMC) and vector Taylor series (VTS) have been found to perform efficiently compared with the previous speech enhancement methods or the feature-based approaches. In this paper, a data-driven model compensation approach that adapts the HMM(hidden Markv model) parameters for the noisy speech recognition is proposed. Instead of assuming some statistical approximations as in the conventional model-based methods such as the PMC, the statistics necessary for the HMM parameter adaptation is directly estimated by using the Baum-Welch algorithm. The proposed method has shown improved results compared with the PMC for the noisy speech recognition.

  • PDF

An Experimental Study on Barging-In Effects for Speech Recognition Using Three Telephone Interface Boards

  • Park, Sung-Joon;Kim, Ho-Kyoung;Koo, Myoung-Wan
    • Speech Sciences
    • /
    • v.8 no.1
    • /
    • pp.159-165
    • /
    • 2001
  • In this paper, we make an experiment on speech recognition systems with barging-in and non-barging-in utterances. Barging-in capability, with which we can say voice commands while voice announcement is coming out, is one of the important elements for practical speech recognition systems. Barging-in capability can be realized by echo cancellation techniques based on the LMS (least-mean-square) algorithm. We use three kinds of telephone interface boards with barging-in capability, which are respectively made by Dialogic Company, Natural MicroSystems Company and Korea Telecom. Speech database was made using these three kinds of boards. We make a comparative recognition experiment with this speech database.

  • PDF

DNN-based acoustic modeling for speech recognition of native and foreign speakers (원어민 및 외국인 화자의 음성인식을 위한 심층 신경망 기반 음향모델링)

  • Kang, Byung Ok;Kwon, Oh-Wook
    • Phonetics and Speech Sciences
    • /
    • v.9 no.2
    • /
    • pp.95-101
    • /
    • 2017
  • This paper proposes a new method to train Deep Neural Network (DNN)-based acoustic models for speech recognition of native and foreign speakers. The proposed method consists of determining multi-set state clusters with various acoustic properties, training a DNN-based acoustic model, and recognizing speech based on the model. In the proposed method, hidden nodes of DNN are shared, but output nodes are separated to accommodate different acoustic properties for native and foreign speech. In an English speech recognition task for speakers of Korean and English respectively, the proposed method is shown to slightly improve recognition accuracy compared to the conventional multi-condition training method.