• Title/Summary/Keyword: speech separation

Search Result 88, Processing Time 0.078 seconds

Speech Enhancement Using Nonnegative Matrix Factorization with Temporal Continuity (시간 연속성을 갖는 비음수 행렬 분해를 이용한 음질 개선)

  • Nam, Seung-Hyon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.34 no.3
    • /
    • pp.240-246
    • /
    • 2015
  • In this paper, speech enhancement using nonnegative matrix factorization with temporal continuity has been addressed. Speech and noise signals are modeled as Possion distributions, and basis vectors and gain vectors of NMF are modeled as Gamma distributions. Temporal continuity of the gain vector is known to be critical to the quality of enhanced speech signals. In this paper, temporal continiuty is implemented by adopting Gamma-Markov chain priors for noise gain vectors during the separation phase. Simulation results show that the Gamma-Markov chain models temporal continuity of noise signals and track changes in noise effectively.

Noise removal algorithm for intelligent service robots in the high noise level environment (원거리 음성인식 시스템의 잡음 제거 기법에 대한 연구)

  • Woo, Sung-Min;Lee, Sang-Hoon;Jeong, Hong
    • Proceedings of the IEEK Conference
    • /
    • 2007.07a
    • /
    • pp.413-414
    • /
    • 2007
  • Successful speech recognition in noisy environments for intelligent robots depends on the performance of preprocessing elements employed. We propose an architecture that effectively combines adaptive beamforming (ABF) and blind source separation (BSS) algorithms in the spatial domain to avoid permutation ambiguity and heavy computational complexity. We evaluated the structure and assessed its performance with a DSP module. The experimental results of speech recognition test shows that the proposed combined system guarantees high speech recognition rate in the noisy environment and better performance than the ABF and BSS system.

  • PDF

The methods of recognition of consonants(voiced stops) by Neural Network (신경망에 의한 초성자음(ㄱ, ㄷ, ㅂ)의 인식방법)

  • 김석동
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1991.06a
    • /
    • pp.73-77
    • /
    • 1991
  • As the basic analysis to solve the stop consonants in phoneme based speech recognition using Back Propagation learning algorithm, changes in hidden units, training set and iteration. Also we propose an efficient processing method of separation between consonants and vowels.

  • PDF

Laryngotracheal Separation for Chronic Intractable Aspiration (만성 흡인에 대한 후두기관 분리술의 유용성)

  • 이강진;성명훈;박범정;성원진;노종렬;민양기;이철희;이재서;김광현
    • Korean Journal of Bronchoesophagology
    • /
    • v.7 no.2
    • /
    • pp.140-145
    • /
    • 2001
  • Background and Objectives: Intractable aspiration in patients with impaired protective function of the larynx often results in multiple episode of aspiration pneumonia, repeated hospitalizations and expensive nursing care. The purpose of this study was to review the authors’experience and Patient outcome with the laryngotracheal separation (LTS) procedure. Materials and Methods A retrospective review of 9 patients who underwent LTS between 1996 and 2001 was conducted. Ages ranged from 3 to 72 years. Results : Seven patients were expected to have morbid aspiration as a consequence of acquired neurologic injuries and two were congenital neurologic injuries. Two patients had a postoperative fistula, which was well controlled with local wound care and minor procedure. Following LTS, aspiration was effectively controlled in all patients and four were able to tolerate a regular diet. Conclusion : LTS is a low-risk, successful. definitive procedure which decreases the potential for aspiration, pulmonary complication, hospitalizations and increases quality of life, especially in patent with irreversible upper airway dysfunction and Poor speech potential.

  • PDF

Development of Automatic Lip-sync MAYA Plug-in for 3D Characters (3D 캐릭터에서의 자동 립싱크 MAYA 플러그인 개발)

  • Lee, Sang-Woo;Shin, Sung-Wook;Chung, Sung-Taek
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.18 no.3
    • /
    • pp.127-134
    • /
    • 2018
  • In this paper, we have developed the Auto Lip-Sync Maya plug-in for extracting Korean phonemes from voice data and text information based on Korean and produce high quality 3D lip-sync animation using divided phonemes. In the developed system, phoneme separation was classified into 8 vowels and 13 consonants used in Korean, referring to 49 phonemes provided by Microsoft Speech API engine SAPI. In addition, the pronunciation of vowels and consonants has variety Mouth Shapes, but the same Viseme can be applied to some identical ones. Based on this, we have developed Auto Lip-sync Maya Plug-in based on Python to enable lip-sync animation to be implemented automatically at once.

Speech Basis Matrix Using Noise Data and NMF-Based Speech Enhancement Scheme (잡음 데이터를 활용한 음성 기저 행렬과 NMF 기반 음성 향상 기법)

  • Kwon, Kisoo;Kim, Hyung Young;Kim, Nam Soo
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.40 no.4
    • /
    • pp.619-627
    • /
    • 2015
  • This paper presents a speech enhancement method using non-negative matrix factorization (NMF). In the training phase, each basis matrix of source signal is obtained from a proper database, and these basis matrices are utilized for the source separation. In this case, the performance of speech enhancement relies heavily on the basis matrix. The proposed method for which speech basis matrix is made a high reconstruction error for noise signal shows a better performance than the standard NMF which basis matrix is trained independently. For comparison, we propose another method, and evaluate one of previous method. In the experiment result, the performance is evaluated by perceptual evaluation speech quality and signal to distortion ratio, and the proposed method outperformed the other methods.

A Study on Emotion Recognition of Chunk-Based Time Series Speech (청크 기반 시계열 음성의 감정 인식 연구)

  • Hyun-Sam Shin;Jun-Ki Hong;Sung-Chan Hong
    • Journal of Internet Computing and Services
    • /
    • v.24 no.2
    • /
    • pp.11-18
    • /
    • 2023
  • Recently, in the field of Speech Emotion Recognition (SER), many studies have been conducted to improve accuracy using voice features and modeling. In addition to modeling studies to improve the accuracy of existing voice emotion recognition, various studies using voice features are being conducted. This paper, voice files are separated by time interval in a time series method, focusing on the fact that voice emotions are related to time flow. After voice file separation, we propose a model for classifying emotions of speech data by extracting speech features Mel, Chroma, zero-crossing rate (ZCR), root mean square (RMS), and mel-frequency cepstrum coefficients (MFCC) and applying them to a recurrent neural network model used for sequential data processing. As proposed method, voice features were extracted from all files using 'librosa' library and applied to neural network models. The experimental method compared and analyzed the performance of models of recurrent neural network (RNN), long short-term memory (LSTM) and gated recurrent unit (GRU) using the Interactive emotional dyadic motion capture Interactive Emotional Dyadic Motion Capture (IEMOCAP) english dataset.

An Acoustic Echo Canceller for Double-talk by Blind Signal Separation (암묵신호분리를 이용한 동시통화 음향반향제거기)

  • Lee, Haeng-Woo;Yun, Hyun-Min
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.16 no.2
    • /
    • pp.237-245
    • /
    • 2012
  • This paper describes an acoustic echo canceller with double-talk by the blind signal separation. The acoustic echo canceller is deteriorated or diverged in the double-talk period. So we use the blind signal separation to estimate the near-end speech signal and to eliminate the estimated signal from the residual signal. The blind signal separation extracts the near-end signal with dual microphones by the iterative computations using the 2nd order statistical character. Because the mixture model of blind signal separation is multi-channel in the closed reverberation environment, we used the copied coefficients of echo canceller without computing the separation coefficients. By this method, the acoustic echo canceller operates irrespective of double-talking. We verified performances of the proposed acoustic echo canceller by simulations. The results show that the acoustic echo canceller with this algorithm detects the double-talk periods thoroughly, and then operates stably in the normal state without the divergence of coefficients after ending the double-talking. And it shows the ERLE of averagely 20dB higher than the normal LMS algorithm.

Acoustic Echo Cancellation Based on Convolutive Blind Signal Separation Method (Convolutive 암묵신호분리방법에 기반한 음향반향 제거)

  • Lee, Haeng-Woo
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.13 no.5
    • /
    • pp.979-986
    • /
    • 2018
  • This paper deals with acoustic echo cancellation using blind signal separation method. This method does not degrade the echo cancellation performance even during double-talk. In the closed echo environment, the mixing model of acoustic signals is multi-channel, so the convolutive blind signal separation method is applied and the mixing coefficients are calculated by using the feedback model without directly calculating the separation coefficients for signal separation. The coefficient update is performed by iterative calculations based on the second-order statistical properties, thus estimates the near-end speech. A number of simulations have been performed to verify the performance of the proposed blind signal separation method. The simulation results show that the acoustic echo canceller using this method operates safely regardless of the presence of double-talk, and the PESQ is improved by 0.6 point compared with the general adaptive FIR filter structure.

On a Split Model for Analysis Techniques of Wideband Speech Signal (광대역 음성신호의 분할모델 분석기법에 관한 연구)

  • Park, Young-Ho;Ham, Myung-Kyu;You, Kwang-Bock;Bae, Myung-Jin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.7
    • /
    • pp.80-84
    • /
    • 1999
  • In this paper, the split model analysis algorithm, which can generate the wideband speech signal from the spectral information of narrowband signal, is developed. The split model analysis algorithm deals with the separation of the 10/sup th/ order LPC model into five cascade-connected 2/sup nd/ order model. The use of the less complex 2/sup nd/ order models allows for the exclusion of the complicated nonlinear relationships between model parameters and all the poles of the LPC model. The relationships between the model parameters and its corresponding analog poles is proved and applied to each 2/sup nd/ order model. The wideband speech signal is obtained by changing only the sampling rate.

  • PDF