• 제목/요약/키워드: Speech function

검색결과 696건 처리시간 0.025초

노년층의 말소리 지각 능력 및 관련 인지적 변인 (Speech perception difficulties and their associated cognitive functions in older adults)

  • 이수정;김향희
    • 말소리와 음성과학
    • /
    • 제8권1호
    • /
    • pp.63-69
    • /
    • 2016
  • The aims of the present study are two-fold: 1) to explore differences on speech perception between younger and older adults according to noise conditions; and 2) to investigate which cognitive domains are correlated with speech perception. Data were acquired from 15 younger adults and 15 older adults. Sentence recognition test was conducted in four noise conditions(i.e., in-quiet, +5 dB SNR, 0 dB SNR, -5 dB SNR). All participants completed auditory and cognitive assessment. Upon controlling for hearing thresholds, the older group revealed significantly poorer performance compared to the younger adults only under the high noise condition at -5 dB SNR. For older group, performance on Seoul Verbal Learning Test(immediate recall) was significantly correlated with speech perception performance, upon controlling for hearing thresholds. In older adults, working memory and verbal short-term memory are the best predictors of speech-in-noise perception. The current study suggests that consideration of cognitive function for older adults in speech perception assessment is necessary due to its adverse effect on speech perception under background noise.

MIN 모듈을 갖는 준연속 Hidden Markov Model (Semi-Continuous Hidden Markov Model with the MIN Module)

  • 김대극;이정주;정호균;이상희
    • 음성과학
    • /
    • 제7권4호
    • /
    • pp.11-26
    • /
    • 2000
  • In this paper, we propose the HMM with the MIN module. Because initial and re-estimated variance vectors are important elements for performance in HMM recognition systems, we propose a method which compensates for the mismatched statistical feature of training and test data. The MIN module function is a differentiable function similar to the sigmoid function. Unlike a continuous density function, it does not include variance vectors of the data set. The proposed hybrid HMM/MIN module is a unified network in which the observation probability in the HMM is replaced by the MIN module neural network. The parameters in the unified network are re-estimated by the gradient descent method for the Maximum Likelihood (ML) criterion. In estimating parameters, the variance vector is not estimated because there is no variance element in the MIN module function. The experiment was performed to compare the performance of the proposed HMM and the conventional HMM. The experiment measured an isolated number for speaker independent recognition.

  • PDF

에너지와 인근 피치간에 유사도를 이용한 잡음레벨 검출에 관한 연구 (A Study on the Noise-Level Measurement Using the Energy and Relation of Closed Pitch)

  • 강인규;이기영;배명진
    • 음성과학
    • /
    • 제11권3호
    • /
    • pp.157-164
    • /
    • 2004
  • Human has average pitch-level when speak naturally. That is 'Habitual pitch level'. However, if noise added at speech, the pitch-wave is changed irregularly. We can estimate noise level of speech by using this point. This paper calculates energy level of the input speech, pitch period from of above limited energy level by NAMDF (Normalized Average Magnitude Difference Function) method, after cut each frame by pitch period unit, and propose a method that estimate noise level through closed pitch of input speech.

  • PDF

A Single Channel Speech Enhancement for Automatic Speech Recognition

  • 이진규;서현손;강홍구
    • 한국방송∙미디어공학회:학술대회논문집
    • /
    • 한국방송공학회 2011년도 하계학술대회
    • /
    • pp.85-88
    • /
    • 2011
  • This paper describes a single channel speech enhancement as the pre-processor of automatic speech recognition system. The improvements are based on using optimally modified log-spectra (OM-LSA) gain function with a non-causal a priori signal-to-noise ratio (SNR) estimation. Experimental results show that the proposed method gives better perceptual evaluation of speech quality score (PESQ) and lower log-spectral distance, and also better word accuracy. In the enhancement system, parameters was turned for automatic speech recognition.

  • PDF

Improvements on MFCC by Elaboration of the Filter Banks and Windows

  • Lee, Chang-Young
    • 음성과학
    • /
    • 제14권4호
    • /
    • pp.131-144
    • /
    • 2007
  • In an effort to improve the performance of mel frequency cepstral coefficients (MFCC), we investigate the effects of varying the parameters for the filter banks and their associated windows on speech recognition rates. Specifically, the mel and bark scales are combined with various types of filter bank windows. Comparison and evaluation of the suggested methods are performed by two independent ways of speech recognition and the Fisher discriminant objective function. It is shown that the Hanning window based on the bark scale yields 28.1% relative performance improvements over the triangular window with the mel scale in speech recognition error rate. Further work on incorporating PCA and/or LDA would be desirable as a postprocessor to MFCC extraction.

  • PDF

A Closed-Form Solution of Linear Spectral Transformation for Robust Speech Recognition

  • Kim, Dong-Hyun;Yook, Dong-Suk
    • ETRI Journal
    • /
    • 제31권4호
    • /
    • pp.454-456
    • /
    • 2009
  • The maximum likelihood linear spectral transformation (ML-LST) using a numerical iteration method has been previously proposed for robust speech recognition. The numerical iteration method is not appropriate for real-time applications due to its computational complexity. In order to reduce the computational cost, the objective function of the ML-LST is approximated and a closed-form solution is proposed in this paper. It is shown experimentally that the proposed closed-form solution for the ML-LST can provide rapid speaker and environment adaptation for robust speech recognition.

유/무성음 결정에 다른 가변적인 시간축 변환 (Variable Time-Scale Modification with Voiced/Unvoiced Decision)

  • 손단영;김원구;윤대희;차일환
    • 전자공학회논문지B
    • /
    • 제32B권5호
    • /
    • pp.788-797
    • /
    • 1995
  • In this paper, a variable time-scale modification using SOLA(Synchronized OverLap and Add) is proposed, which takes into consideration the different time-scaled characteristics of voiced and unvoiced speech, Generally, voiced speech is subject to higher variations in length during time-scale modification than unvoiced speech, but the conventional method performs time-scale modification at a uniform rate for all speech. For this purpose, voiced and unvoiced speech duration at various talking speeds were statistically analyzed. The sentences were then spoken at rates of 0.7, 1.3, 1.5 and 1.8 times normal speed. A clipping autocorrelation function was applied to each analysis frame to determine voiced and unvoiced speech to obtain respective variation rates. The results were used to perform variable time-scale modification to produce sentences at rates of 0.7, 1.3, 1.5, 1.8 times normal speed. To evaluate performance, a MOS test was conducted to compare the proposed voiced/unvoiced variable time-scale modification and the uniform SOLA method. Results indicate that the proposed method produces sentence quality superior to that of the conventional method.

  • PDF

음소 음향학적 변화 패턴을 이용한 한국어 음성신호의 연속 모음 분할 (Consecutive Vowel Segmentation of Korean Speech Signal using Phonetic-Acoustic Transition Pattern)

  • 박창목;왕지남
    • 한국정보처리학회:학술대회논문집
    • /
    • 한국정보처리학회 2001년도 추계학술발표논문집 (상)
    • /
    • pp.801-804
    • /
    • 2001
  • This article is concerned with automatic segmentation of two adjacent vowels for speech signals. All kinds of transition case of adjacent vowels can be characterized by spectrogram. Firstly the voiced-speech is extracted by the histogram analysis of vowel indicator which consists of wavelet low pass components. Secondly given phonetic transcription and transition pattern spectrogram, the voiced-speech portion which has consecutive vowels automatically segmented by the template matching. The cross-correlation function is adapted as a template matching method and the modified correlation coefficient is calculated for all frames. The largest value on the modified correlation coefficient series indicates the boundary of two consecutive vowel sounds. The experiment is performed for 154 vowel transition sets. The 154 spectrogram templates are gathered from 154 words(PRW Speech DB) and the 161 test words(PBW Speech DB) which are uttered by 5 speakers were tested. The experimental result shows the validity of the method.

  • PDF

잡음제거 기능을 갖춘 시-청각 단서 제공 읽기 훈련 프로그램 (A Reading Trainning Program offering Visual-Auditory Cue with Noise Cancellation Function)

  • 방동혁;강현덕;길세기;이상민
    • 재활복지공학회논문지
    • /
    • 제2권1호
    • /
    • pp.35-43
    • /
    • 2009
  • 본 논문에서는 개발된 잡음제거 기능을 갖춘 시-청각 단서 제공 읽기 훈련 프로그램(이하 프로그램)을 소개한다. 프로그램은 시-청각 단서들을 지닌 훈련용 문장들을 제공한다. 말운동장애인들은 읽기훈련을 위해서 시각단서와 청각단서들을 각각 또는 동시에 사용 가능하다. 훈련 결과의 평가 편의성 제공을 위해서 잡음제거 알고리즘을 개발하였다. 알고리즘은 피험자가 컴퓨터화면에 제공된 문장을 읽을 때 읽는 말소리와 함께 녹음된 잡음과 청각단서 소리를 제거한다. 또한 피험자가 읽기 연습을 시작할 때 최초의 말소리 개시시간을 검출하는 기능을 구현하였다. 말소리의 녹음은 4가지 잡음환경(실내 잡음, 백색 잡음, 자동차 내부잡음, 배블 잡음)에서 성인 6명(남성 3 명, 여성 3명)으로부터 하였다. 잡음제거 전과 후에 대한 조건에서 녹음된 말소리의 실제 시작 시간과 프로그램상에서 찾은 시간과의 오차를 실험하였다. 잡음제거 전과 후에서의 시간오차가 $4.847{\pm}2.4235[ms]$ 향상되었다. 개발된 프로그램은 말운동장애인의 훈련 및 증상 평가에 도움이 될 수 있으리라 사료된다.

  • PDF

음소 음향학적 변화 정보를 이용한 한국어 음성신호의 자동 음소 분할 (Automatic Phonetic Segmentation of Korean Speech Signal Using Phonetic-acoustic Transition Information)

  • 박창목;왕지남
    • 한국음향학회지
    • /
    • 제20권8호
    • /
    • pp.24-30
    • /
    • 2001
  • 본 논문에서는 발음표기가 주어진 상황에서 음성 신호의 자동 음소 분할에 관한 것이며 음소의 경계를 음소 음향학적인 변화특성에 따라 3가지 형태로 분류하여 각각에 적합한 분할 알고리즘을 개발하였다. 형태 1은 묵음·유성음·무성음간의 분할이며 히스토그램분석으로 구한 문턱 값으로 초기 분할 후, 웨이블릿 계수의 SVF (Spectral Variation Function)를 이용하여 분할하였다. 형태 2는 연속적인 모음의 분할이며 각 모음변화특성을 템플릿으로 구성하여 분할에 활용하였다. 형태 3은 모음과 유성자음 혹은 유성화 자음의 분할이며 특성주파수대역의 진폭변화를 이용하여 후보구간을 정한 후, 캡스트럼 계수의 SVF를 이용하여 최종적인 분할을 수행하였다. 본 실험에서는 분할 성능을 테스트하기 위하여 한국어 PBWSpeech DB에서 342개의 단어를 자동으로 분할한 후, 수작업으로 분할한 결과와 비교하였다. 전체적인 자동 분할 성능은 20 msec내에서 81.5%의 분할성능을 보였다.

  • PDF