• 제목/요약/키워드: speech source

검색결과 281건 처리시간 0.023초

Low Bit Rate을 고려한 LMS-MPC 방식에 관한 연구 (A Study on LMS-MPC Method Considering Low Bit Rate)

  • 이시우
    • 디지털융복합연구
    • /
    • 제10권5호
    • /
    • pp.233-238
    • /
    • 2012
  • 유성음원과 무성음원을 시용하는 음성부호화 방식에 있어서, 같은 프레임 안에 모음과 무성자음이 있는 경우에 음성 파형에 일그러짐이 나타난다. 이것을 해결하기 위하여 본 논문에서는 개별피치와 LMS(Least Mean Square)를 적용한 LMS-MPC를 제시하였으며, 기존의 MPC와 LMS-MPC의 SNRseg를 평가한 결과, LMS-MPC의 남자음성에서 1.5dB, 여자음성에서 1.3dB 개선된 것을 확인할 수 있었다. 결국, MPC에 비해 LMS-MPC의 SNRseg가 개선되어 음성파형의 일그러짐을 제어할 수 있었으며, 본 방법은 셀룰러폰이나 스마트폰과 같이 Low Bit Rate의 음원을 사용하여 음성신호를 부호화 하는 방식에 활용할 수 있을 것으로 기대된다.

잡음 환경하에서의 음성 분리 (Convolutive source separation in noisy environments)

  • 장인선;최승진
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2003년도 10월 학술대회지
    • /
    • pp.97-100
    • /
    • 2003
  • This paper addresses a method of convolutive source separation that based on SEONS (Second Order Nonstationary Source Separation) [1] that was originally developed for blind separation of instantaneous mixtures using nonstationarity. In order to tackle this problem, we transform the convolutive BSS problem into multiple short-term instantaneous problems in the frequency domain and separated the instantaneous mixtures in every frequency bin. Moreover, we also employ a H infinity filtering technique in order to reduce the sensor noise effect. Numerical experiments are provided to demonstrate the effectiveness of the proposed approach and compare its performances with existing methods.

  • PDF

음향 채널의 '성김' 특성을 이용한 반향환경에서의 화자 위치 탐지 (Speaker Localization in Reverberant Environments Using Sparse Priors on Acoustic Channels)

  • 조지원;박형민
    • 대한음성학회지:말소리
    • /
    • 제67호
    • /
    • pp.135-147
    • /
    • 2008
  • In this paper, we propose a method for source localization in reverberant environments based on an adaptive eigenvalue decomposition (AED) algorithm which directly estimates channel impulse responses from a speaker to microphones. Unfortunately, the AED algorithm may suffer from whitening effects on channels estimated from temporally correlated natural sounds. The proposed method which applies sparse priors to the estimated channels can avoid the temporal whitening and improve the performance of source localization in reverberant environments. Experimental results show the effectiveness of the proposed method.

  • PDF

고음질을 갖는 음색변경에 관한 연구 (A Study on the Voice Conversion Algorithm with High Quality)

  • 박형빈;배명진
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2000년도 제13회 신호처리 합동 학술대회 논문집
    • /
    • pp.157-160
    • /
    • 2000
  • In the generally a voice conversion has used VQ(Vector Quantization) for partitioning the spectral feature and has performed by adding an appropriate offset vector to the source speaker's spectral vector. But there is not represented the target speaker's various characteristics because of discrete characteristics of transformed parameter. In this paper, these problems are solved by using the LMR(Linear Multivariate Regression) instead of the mapping codebook which is determined to the relationship of source and target speaker vocal tract characteristics. Also we propose the method for solved the discontinuity which is caused by applying to time aligned parameters using Dynamic Time Warping the time or pitch-scale modified speech. In our proposed algorithm for overcoming the transitional discontinuities, first of all, we don't change time or pitch scale and by using the LMR change a speaker's vocal tract characteristics in speech with non-modified time or pitch. Compared to existed methods based on VQ and LMR, we have much better voice quality in the result of the proposed algorithm.

  • PDF

음성학적으로 본 사상체질 (A Phonetic Study of 'Sasang Constitution')

  • 문승재;탁지현;황혜정
    • 대한음성학회지:말소리
    • /
    • 제55권
    • /
    • pp.1-14
    • /
    • 2005
  • Sasang Constitution, one branch of oriental medicine, claims that people can be classified into four different 'constitutions:' Taeyang, Taeum, Soyang, and Soeum. This study investigates whether the classification of the constitutions could be accurately made solely based on people's voice by analyzing the data from 46 different voices whose constitutions were already determined. Seven source-related parameters and four filter-related parameters were phonetically analyzed and the GMM(Gaussian mixture model) was tried on the data. Both the results from phonetic analyses and GMM showed that all the parameters (except one) failed to distinguish the constitutions of the people successfully. And even the single exception, B2 (the bandwidth of the second formant) did not provide us with sufficient reasons to be the source of distinction. This result seems to suggest one of the two conclusions: either the Sasang Constitutions cannot be substantiated with phonetic characteristics of peoples' voices with reliable accuracy, or we need to find yet some other parameters which haven't been conventionally proposed.

  • PDF

Application of Block On-Line Blind Source Separation to Acoustic Echo Cancellation

  • Ngoc, Duong Q.K.;Park, Chul;Nam, Seung-Hyon
    • The Journal of the Acoustical Society of Korea
    • /
    • 제27권1E호
    • /
    • pp.17-24
    • /
    • 2008
  • Blind speech separation (BSS) is well-known as a powerful technique for speech enhancement in many real world environments. In this paper, we propose a new application of BSS - acoustic echo cancellation (AEC) in a car environment. For this purpose, we develop a block-online BSS algorithm which provides robust separation than a batch version in changing environments with moving speakers. Simulation results using real world recordings show that the block-online BSS algorithm is very robust to speaker movement. When combined with AEC, simulation results using real audio recording in a car confirm the expectation that BSS improves double talk detection and echo suppression.

휴모노이드 로봇을 위한 시청각 정보 기반 음원 정위 시스템 구현 (Implementation of Sound Source Localization Based on Audio-visual Information for Humanoid Robots)

  • 박정욱;나승유;김진영
    • 음성과학
    • /
    • 제11권4호
    • /
    • pp.29-42
    • /
    • 2004
  • This paper presents an implementation of real-time speaker localization using audio-visual information. Four channels of microphone signals are processed to detect vertical as well as horizontal speaker positions. At first short-time average magnitude difference function(AMDF) signals are used to determine whether the microphone signals are human voices or not. And then the orientation and distance information of the sound sources can be obtained through interaural time difference. Finally visual information by a camera helps get finer tuning of the angles to speaker. Experimental results of the real-time localization system show that the performance improves to 99.6% compared to the rate of 88.8% when only the audio information is used.

  • PDF

A New Formulation of Multichannel Blind Deconvolution: Its Properties and Modifications for Speech Separation

  • Nam, Seung-Hyon;Jee, In-Nho
    • The Journal of the Acoustical Society of Korea
    • /
    • 제25권4E호
    • /
    • pp.148-153
    • /
    • 2006
  • A new normalized MBD algorithm is presented for nonstationary convolutive mixtures and its properties/modifications are discussed in details. The proposed algorithm normalizes the signal spectrum in the frequency domain to provide faster stable convergence and improved separation without whitening effect. Modifications such as nonholonomic constraints and off-diagonal learning to the proposed algorithm are also discussed. Simulation results using a real-world recording confirm superior performanceof the proposed algorithm and its usefulness in real world applications.

식도발성의 숙련 정도에 따른 모음의 음향학적 특징과 자음 산출에 대한 연구 (Analysis of Acoustic Characteristics of Vowel and Consonants Production Study on Speech Proficiency in Esophageal Speech)

  • 최성희;최홍식;김한수;임성은;이성은;표화영
    • 음성과학
    • /
    • 제10권3호
    • /
    • pp.7-27
    • /
    • 2003
  • Esophageal Speech uses the esophageal air during phonation. Fluent esophageal speakers frequently intake air in oral communication, but unskilled esophageal speakers are difficult with swallowing lots of air. The purpose of this study was to investigate the difference of acoustic characteristics of vowel and consonants production according to the speech proficiency level in esophageal speech. 13 normal male speakers and 13 male esophageal speakers (5 unskilled esophageal speakers, 8 skilled esophageal speakers) with age ranging from 50 to 70 years old. The stimuli were sustained /a/ vowel and 36 meaningless two syllable words. Used vowel is /a/ and consonants were 18 : /k, n, t, m, p, s, c, $C^{h},\;k^{h},\;t^{h},\;p^{h}$, h, I, k', t', p', s', c'/. Fundermental frequency (Fx), Jitter, shimmer, HNR, MPT were measured with by electroglottography using Lx speech studio (Laryngograph Ltd, London, UK). 36 meaningless words produced by esophageal speakers were presented to 3 speech-language pathologists who phonetically transcribed their responses. Fx, Jitter, HNR parameters is significant different between skilled esophageal speakers and unskilled esophageal speakers (P<.05). Considering manner of articulation, ANOVA showed that differences in two esophageal speech groups on speech proficiency were significant; Glide had the highest number of confusion with the other phoneme class, affricates are the most intelligible in the unskilled esophageal speech group, whereas in the skilled esophageal speech group fricatives resulted highest number of confusions, nasals are the most intelligible. In the place of articulation, glottal /h/ is the highest confusion consonant in both groups. Bilabials are the most intelligible in the skilled esophageal speech, velars are the most intelligible in the unskilled esophageal speech. In the structure of syllable, 'CV+V' is more confusion in the skilled esophageal group, unskilled esophageal speech group has similar confusion in both structures. In unskilled esophageal speech, significantly different Fx, Jitter, HNR acoustic parameters of vowel and the highest confusions of Liquid, Nasals consonants could be attributed to unstable, improper contact of neoglottis as vibratory source and insufficiency in the phonatory air supply, and higher motoric demand of remaining articulation due to morphological characteristics of vocal tract after laryngectomy.

  • PDF

주거 공간에서 고령자 청력손실을 고려한 소음 및 잔향에 따른 음성 전송 성능의 주관적 평가 (Effect of noise and reverberation on subjective measure of speech transmission performance for elderly person with hearing loss in residential space)

  • 오양기;류종관;송한솔
    • 한국음향학회지
    • /
    • 제37권5호
    • /
    • pp.369-377
    • /
    • 2018
  • 본 논문은 주거공간에서 고령자 청력손실을 고려한 소음 및 잔향에 따른 음성 전송 성능을 청취실험을 통해 평가하였다. 주거환경 소음으로 바닥충격음, 교통소음, 공기전달음과 배수소음을 대상으로 하였으며, 공동주택의 잔향환경을 모사하기 위해 실내음향 컴퓨터시뮬레이션을 실시하여 충격응답를 추출하였다. 청취실험 음원은 고령자 청력손실(65세 남성)을 반영하기 위해 소음 및 단어 음원의 고주파대역의 음압레벨을 저감시킨 음원(고령자 음원)과 정상청력을 반영한 원음(청년 음원)을 대상으로 하였다. 청취실험은 각각 3개의 소음레벨($L_{Aeq}$ 30, 40, 50 dB)과 잔향시간(0.5, 1.0, 1.5 s)을 갖는 음환경 조건에서 제시된 단어($L_{Aeq}$ 55 dB)의 음성요해도(speech intelligibility)와 듣기 어려운 정도(listening difficulty)를 평가하는 것으로 하였다. 청취실험 결과, 음성레벨이 55 dB($L_{Aeq}$)일 때 잔향시간 1.0 s 이하 조건에서 충격소음(점핑음) 50 dB($L_{i,Fmax,AW}$)와 정상소음(도로, 음악, 배수 소음) 40 dB($L_{Aeq}$) 이하의 소음레벨에서는 고령자 및 청년 음원 모두 90 % 이상의 음성요해도와 30 % 이하의 듣기 어려운 정도를 확보할 수 있을 것으로 판단된다. 고령자 청력손실을 반영한 고령자 음원의 경우 청년 음원 보다 음성요해도는 0 % ~ 5 % 낮았고 듣기 어려운 정도는 2 % ~ 10 % 높은 것으로 나타났다.