• Title/Summary/Keyword: Speech Signals

Search Result 499, Processing Time 0.025 seconds

Recursive Segmentation of Speech Signals using Expectation-Minimization (EM 알고리즘을 이용할 재귀적인 음소분리)

  • Kang Byung-Ok;Jung Hong
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • spring
    • /
    • pp.103-106
    • /
    • 2002
  • 본 논문에서는 입력음성신호로부터 음소간의 경계를 찾는 문제를 풀기위해 재귀적인 방식으로 EM 알고리즘을 적용한다. 즉, 예상되는 두 끝점 사이의 부분을 현재의 프레임 n 이라고 하면, 그 전 프레임 n-1 에서 구해진 끝점이 주는 정보와 그 끝점으로부터 이어지는 음성샘플로부터 현재 프레임의 끝점을 구한다. 또한 현재의 프레임 n 에서 끝점을 추정해 내면, 그 추정한 끝점과 그 점 이후에 이어지는 음성샘플값으로부터 다음 프레임 n+1 의 끝점을 구한다. 이러한 방식을 재귀적인 음소분리 방식이라고 한다. 그리고, 각 프레임에서 끝점을 구하기 위해서는 끝점의 좌표를 추정해야 할 파라메터로 하고, 그 주변의 음성샘플 값을 관찰 값으로 하여 EM(Expectation and Maximization) 알고리즘을 이용한다. 이 EM 알고리즘을 이용한 재귀적인 음소분리 방식을 실제 음성 DB 로부터 음소쌍을 추출하여 테스트 했을 때 약 5 회의 EM 반복 후에 경계간으로 수렴함을 볼 수 있었다.

  • PDF

GMM based Nonlinear Transformation Methods for Voice Conversion

  • Vu, Hoang-Gia;Bae, Jae-Hyun;Oh, Yung-Hwan
    • Proceedings of the KSPS conference
    • /
    • 2005.11a
    • /
    • pp.67-70
    • /
    • 2005
  • Voice conversion (VC) is a technique for modifying the speech signal of a source speaker so that it sounds as if it is spoken by a target speaker. Most previous VC approaches used a linear transformation function based on GMM to convert the source spectral envelope to the target spectral envelope. In this paper, we propose several nonlinear GMM-based transformation functions in an attempt to deal with the over-smoothing effect of linear transformation. In order to obtain high-quality modifications of speech signals our VC system is implemented using the Harmonic plus Noise Model (HNM)analysis/synthesis framework. Experimental results are reported on the English corpus, MOCHA-TlMlT.

  • PDF

Acoustic Model-Based Filter Structure for Synthesizing Speech Signals

  • Lim, Il-Taek;Lee, Byeong-Gi
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1994.06a
    • /
    • pp.1021-1026
    • /
    • 1994
  • This paper proposes a filter structure suitable for speech synthesis applications. We first derive the lossy pole-zero model by employing the wave digital filter(WDF) adaptor formula, and by converting the fixed termination value - 1 into a loss factor $\mu$c$\in$(-1, 1). Then we discuss how to determine the reflection We employ the Durbin's method in estimating the numerator polynomial of the lossy pole-zero transfer function from the given speech sound, and then apply the step-down algorithm on the numerator to extract the reflection coefficients of the closed-termination tract. For determining the reflection coefficients of the other parts we employ a pre-calculated pole-estimator polynomial.

  • PDF

Speech Dereverberation using Improved Linear Prediction Residual (개선된 선형예측 잔여를 이용한 음성의 잔향음 제거)

  • Park, Chan-Sub;Kim, Ki-Man;Kang, Suk-Youb
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.11 no.10
    • /
    • pp.1845-1851
    • /
    • 2007
  • Background noise and room reverberation are two causes of degradation in speech in listening situations. Many algorithms developed to enhance reverberant speech. In this paper we propose a dereverberation method for enhancement of speech using modified the linear prediction(LP) residual in reverberant room condition. The proposed dereberberation method based on the fact that the signification excitation of the vocal tract system takes place at the instant of glottal closure in voiced speech. Our method used delay information form each sensor, and we need reverberant signals from 3 sensors. We obtain a new LP residual signal using modified IP residual combination which derived form weighting of the LP residual and the Hilbert transform of LP residual. The nature of the coherently added Hilbert envelop has several large amplitude spikes because of the effects of noise and reverberation. This residual of the clean speech is used to excite the time-varying all-pole filter to obtain the enhanced speech. We achieved simulation of proposed algorithm for performance analysis in reverberation environment. The proposed algorithm improves substantially the quality of reverberant speech.

Emotion recognition in speech using hidden Markov model (은닉 마르코프 모델을 이용한 음성에서의 감정인식)

  • 김성일;정현열
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.3 no.3
    • /
    • pp.21-26
    • /
    • 2002
  • This paper presents the new approach of identifying human emotional states such as anger, happiness, normal, sadness, or surprise. This is accomplished by using discrete duration continuous hidden Markov models(DDCHMM). For this, the emotional feature parameters are first defined from input speech signals. In this study, we used prosodic parameters such as pitch signals, energy, and their each derivative, which were then trained by HMM for recognition. Speaker adapted emotional models based on maximum a posteriori(MAP) estimation were also considered for speaker adaptation. As results, the simulation performance showed that the recognition rates of vocal emotion gradually increased with an increase of adaptation sample number.

  • PDF

A Multimodal Interface for Telematics based on Multimodal middleware (미들웨어 기반의 텔레매틱스용 멀티모달 인터페이스)

  • Park, Sung-Chan;Ahn, Se-Yeol;Park, Seong-Soo;Koo, Myoung-Wan
    • Proceedings of the KSPS conference
    • /
    • 2007.05a
    • /
    • pp.41-44
    • /
    • 2007
  • In this paper, we introduce a system in which car navigation scenario is plugged multimodal interface based on multimodal middleware. In map-based system, the combination of speech and pen input/output modalities can offer users better expressive power. To be able to achieve multimodal task in car environments, we have chosen SCXML(State Chart XML), a multimodal authoring language of W3C standard, to control modality components as XHTML, VoiceXML and GPS. In Network Manager, GPS signals from navigation software are converted to EMMA meta language, sent to MultiModal Interaction Runtime Framework(MMI). Not only does MMI handles GPS signals and a user's multimodal I/Os but also it combines them with information of device, user preference and reasoned RDF to give the user intelligent or personalized services. The self-simulation test has shown that middleware accomplish a navigational multimodal task over multiple users in car environments.

  • PDF

Development of 3-Ch EGG System Using Modulation and Demodulation Techniques(I) (변복조 방식을 이용한 3-채널 EGG 시스템의 개발(I))

  • Kim, J.M.;Song, C.G.;Lee, M.H.
    • Proceedings of the KOSOMBE Conference
    • /
    • v.1993 no.05
    • /
    • pp.134-135
    • /
    • 1993
  • The purpose of this research is development of EGG system for quantitative assessment of laryngeal function using speech and electroglotto-graphic data. The designed EGG system is 4-electrodes system which excitation current source is supplied from 1st to 4th electrode. The output signal.: from 2nd and 3rd electrodes, which are motivated by frequency of excitation current source, are air-pressure waveforms from vocal folds. After demodulation process, we obtain pitch signals of the modulated waveforms by excitation current source through differentiator which cuts off frequency below 0.1Hz. Software processing methods were used as conventional pitch extraction methods, but the proposed system is designed to analog hardware in order to eliminate interferences from low formant frequency of speech. We will construct the discriminating database between pathological subjects and control groups on each case. Using the proposed 3 channel EGG system and LMS algorithm, it will be detected that the distinctive characteristics of laryngeal function of voiced region and other regions by EGG signals and LPC spectra.

  • PDF

Direction-of-Arrival Estimation of Speech Signals Based on MUSIC and Reverberation Component Reduction (MUSIC 및 반향 성분 제거 기법을 이용한 음성신호의 입사각 추정)

  • Chang, Hyungwook;Jeong, Sangbae;Kim, Youngil
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.18 no.6
    • /
    • pp.1302-1309
    • /
    • 2014
  • In this paper, we propose a method to improve the performance of the direction-of-arrival (DOA) estimation of a speech source using a multiple signal classification (MUSIC)-based algorithm. Basically, the proposed algorithm utilizes a complex coefficient band pass filter to generate the narrow band signals for signal analysis. Also, reverberation component reduction and quadratic function-based response approximation in MUSIC spatial spectrum are utilized to improve the accuracy of DOA estimation. Experimental results show that the proposed method outperforms the well-known generalized cross-correlation (GCC)-based DOA estimation algorithm in the aspect of the estimation error and success rate, respectively.Abstract should be placed here. These instructions give you guidelines for preparing papers for JICCE.

Emotion Recognition Using Tone and Tempo Based on Voice for IoT (IoT를 위한 음성신호 기반의 톤, 템포 특징벡터를 이용한 감정인식)

  • Byun, Sung-Woo;Lee, Seok-Pil
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.65 no.1
    • /
    • pp.116-121
    • /
    • 2016
  • In Internet of things (IoT) area, researches on recognizing human emotion are increasing recently. Generally, multi-modal features like facial images, bio-signals and voice signals are used for the emotion recognition. Among the multi-modal features, voice signals are the most convenient for acquisition. This paper proposes an emotion recognition method using tone and tempo based on voice. For this, we make voice databases from broadcasting media contents. Emotion recognition tests are carried out by extracted tone and tempo features from the voice databases. The result shows noticeable improvement of accuracy in comparison to conventional methods using only pitch.