• 제목/요약/키워드: Distorted speech

검색결과 37건 처리시간 0.022초

CDMA이동통신환경에서의 음성인식을 위한 왜곡음성신호 거부방법 (Distorted Speech Rejection For Automatic Speech Recognition under CDMA Wireless Communication)

  • 김남수;장준혁
    • 한국음향학회지
    • /
    • 제23권8호
    • /
    • pp.597-601
    • /
    • 2004
  • 본 논문에서는 CDMA이동통신 환경에서의 음성인식을 위한 왜곡음성신호의 전처리-지부방법을 소개한다. 먼저, CDMA이동통신 채널에서의 왜곡된 음성신호를 분석하고 분석된 매커니즘을 바탕으로 채널에 의해 왜곡된 음성신호를 음성의 준주기성을 바탕으로 하여 거부하는 알고리즘을 제안한다. 실험을 통해 제안된 전처리-거부방법이 적은 계산량을 가지고 음성인식에 적용되어 효과적으로 CDMA에 환경에서 채널왜곡된 음성신호를 거부-할 수 있음을 알 수 있었다.

차량 잡음 환경에서 인위적 왜곡 음성을 이용한 Eigenspace-based MLLR에 기반한 고속 화자 적응 (Fast Speaker Adaptation Based on Eigenspace-based MLLR Using Artificially Distorted Speech in Car Noise Environment)

  • 송화전;전형배;김형순
    • 말소리와 음성과학
    • /
    • 제1권4호
    • /
    • pp.119-125
    • /
    • 2009
  • This paper proposes fast speaker adaptation method using artificially distorted speech in telematics terminal under the car noise environment based on eigenspace-based maximum likelihood linear regression (ES-MLLR). The artificially distorted speech is built from adding the various car noise signals collected from a driving car to the speech signal collected from an idling car. Then, in every environment, the transformation matrix is estimated by ES-MLLR using the artificially distorted speech corresponding to the specific noise environment. In test mode, an online model is built by weighted sum of the environment transformation matrices depending on the driving condition. In 3k-word recognition task in the telematics terminal, we achieve a performance superior to ES-MLLR even using the adaptation data collected from the driving condition.

  • PDF

방송뉴스 인식에서의 잡음 처리 기법에 대한 고찰 (A Study on Noise-Robust Methods for Broadcast News Speech Recognition)

  • 정용주
    • 대한음성학회지:말소리
    • /
    • 제50호
    • /
    • pp.71-83
    • /
    • 2004
  • Recently, broadcast news speech recognition has become one of the most attractive research areas. If we can transcribe automatically the broadcast news and store their contents in the text form instead of the video or audio signal itself, it will be much easier for us to search for the multimedia databases to obtain what we need. However, the desirable speech signal in the broadcast news are usually affected by the interfering signals such as the background noise and/or the music. Also, the speech of the reporter who is speaking over the telephone or with the ill-conditioned microphone is severely distorted by the channel effect. The interfered or distorted speech may be the main reason for the poor performance in the broadcast news speech recognition. In this paper, we investigated some methods to cope with the problems and we could see some performance improvements in the noisy broadcast news speech recognition.

  • PDF

Robust Histogram Equalization Using Compensated Probability Distribution

  • Kim, Sung-Tak;Kim, Hoi-Rin
    • 대한음성학회지:말소리
    • /
    • 제55권
    • /
    • pp.131-142
    • /
    • 2005
  • A mismatch between the training and the test conditions often causes a drastic decrease in the performance of the speech recognition systems. In this paper, non-linear transformation techniques based on histogram equalization in the acoustic feature space are studied for reducing the mismatched condition. The purpose of histogram equalization(HEQ) is to convert the probability distribution of test speech into the probability distribution of training speech. While conventional histogram equalization methods consider only the probability distribution of a test speech, for noise-corrupted test speech, its probability distribution is also distorted. The transformation function obtained by this distorted probability distribution maybe bring about miss-transformation of feature vectors, and this causes the performance of histogram equalization to decrease. Therefore, this paper proposes a new method of calculating noise-removed probability distribution by using assumption that the CDF of noisy speech feature vectors consists of component of speech feature vectors and component of noise feature vectors, and this compensated probability distribution is used in HEQ process. In the AURORA-2 framework, the proposed method reduced the error rate by over $44\%$ in clean training condition compared to the baseline system. For multi training condition, the proposed methods are also better than the baseline system.

  • PDF

구개열 환자 발음 판별을 위한 특징 추출 방법 분석 (Analysis of Feature Extraction Methods for Distinguishing the Speech of Cleft Palate Patients)

  • 김성민;김우일;권택균;성명훈;성미영
    • 정보과학회 논문지
    • /
    • 제42권11호
    • /
    • pp.1372-1379
    • /
    • 2015
  • 본 논문에서는 구개열 환자의 장애 발음과 정상인의 발음을 자동으로 구분하여 판별하는데 사용될 수 있는 특징 추출 방법들의 성능을 분석하는 실험에 대하여 소개한다. 이 연구는 발성 장애인의 복지 향상을 추구하며 수행하고 있는 장애 음성 자동 인식 및 복원 소프트웨어 시스템 개발의 기초과정이다. 실험에 사용된 음성 데이터는 정상인의 발음, 구개열 환자의 발음, 그리고 모의 환자의 발음의 세 그룹으로부터 수집된 한국어 단음절로서 14개의 기본 자음과 5개의 복합 자음, 7개 모음이다. 발음의 특징 추출은 LPCC, MFCC, PLP의 세 가지 방법으로 각각 수행하였고, GMM 음향 모델로 인식 훈련을 한 후, 수집된 단음절 데이터를 대상으로 하여 인식 실험을 실시하였다. 실험 결과, 정상인과 구개열 환자의 장애 발음을 구별하기 위하여 특징을 추출함에 있어서 MFCC 방법이 전반적으로 가장 우수하였다. 본 연구의 결과는 구개열 환자의 부정확한 발음을 자동으로 인식하고 복원하는 연구와 구개열 장애 발음의 정도를 측정할 수 있는 도구에 대한 연구에 도움이 될 것으로 기대된다.

심리 음향 켑스트럼 평균 차감법을 이용한 이동 전화망에서의 음질 평가 (Speech Quality Measure in a Mobile Communication System Using PLP Cepstral Distance with CMS)

  • 윤종진;박상욱;박영철;윤대희;차일환
    • 음성과학
    • /
    • 제6권
    • /
    • pp.163-179
    • /
    • 1999
  • For the set up, management and repair of a mobile communication system, continuous estimation of speech quality is required. Speech quality measurement can be conducted by listener's judgement in a subjective test such as MOS (Mean Opinion Score) test. However, this method is laborious, expensive and time-consuming, it is advisable to predict subjective speech quality via objective measures. This paper presents a robust objective speech quality measure, PLP-CMS (Perceptual Linear Predictive-Cepstral Mean Subtraction), which can predict subjective speech quality in mobile communication systems. PLP-CMS has a high correlation with subjective quality owing to PLP (Perceptual Linear Predictive) analysis and shows a robust performance not being influenced by PSTN (Public Switched Telephone Network) channel effects due to CMS (Cepstral Mean Subtraction). To prove the performance of our proposed algorithm, we carried out subjective and objective quality estimation on speech samples which are variously distorted in a real mobile communication system. As a result, we demonstrated that PLP-CMS has a higher correlation with subjective quality than PSQM (Perceptual Speech Quality Measure) and PLP-CD (Perceptual Linear Predictive-Cepstral Distance).

  • PDF

상악 가철식 보정장치인 circumferential comfortable retainer (CCR)에 대한 불편감 평가 (Discomfort caused by the circumferential comfortable retainer (CCR) as a removable maxillary retainer)

  • 최진휴;문철현
    • 대한치과교정학회지
    • /
    • 제40권5호
    • /
    • pp.325-333
    • /
    • 2010
  • 발음장애, 구토감 및 착용 불편감과 같은 보정장치에 대해 환자가 느끼는 불편감을 평가해보기 위해 고정식 교정장치로 교정치료를 받고 교정장치가 제거된 66명(남자 23명, 여자 43명; 평균연령 $23.42{\pm}10.19$)의 교정환자를 대상으로 무작위로 두 군으로 배정한 후 고정식 교정장치를 제거한 다음 날 CWR 장착군에게는 구개를 완전히 덮는 구개 완전 피개형 보정장치인 conventional wraparound retainer (CWR)를 장착시키고 CCR 장착군에게는 구개를 말 발굽 모양으로 부분 피개하는 보정장치인 circumferential comfortable retainer (CCR)를 4주 동안 장착시킨 후 발음장애, 구토감 및 착용 불편감의 정도에 대해 100-mm visual analog scale (VAS)로 표시할 수 있도록 제작된 설문지를 통해 얻은 점수에 대해 통계적으로 비교 분석하였다. 연구결과 발음장애와 착용 불편감의 비교에서 CCR 장착군이 CWR 장착군에 비해 통계적으로 유의하게 낮았다 ($p$ < 0.05). 구토감의 비교에서는 CCR 장착군이 CWR 장착군에 비해 낮은 점수를 보였지만 통계적으로는 유의한 차이를 보이지 않았다 ($p$ = 0.146). 이상의 연구 결과로 circumferential comfortable retainer (CCR)는 발음장애를 감소시키고, 착용 불편감을 완화시킴으로써 환자의 협조도를 증진시켜줄 수 있는바 고정식 교정장치를 이용한 교정치료 후 치료결과 유지에 도움이 될 수 있음을 시사하였다.

주파수 대역 제한에 의한 한국어 모음의 지각 특성 분석 (Perceptual Characteristics of Korean Vowels Distorted by the Frequency Band Limitation)

  • 김연화;최대림;이숙향;이용주
    • 말소리와 음성과학
    • /
    • 제6권1호
    • /
    • pp.85-93
    • /
    • 2014
  • This paper investigated the effects of frequency band limitation on perceptual characteristics of Korean vowels. Monosyllabic speech (144 syllables of CV type, 56 syllables of VC type, 8 syllables of V type) produced by two announcers were low- and high-pass filtered with cutoff frequencies ranging from 300 to 5000 Hz. Six listeners with normal hearing performed perception tests by types of filter and cutoff frequencies. We reported phoneme recognition rates and types of perception error of band-limited Korean vowels to examine how frequency distortion in the process of speech transmission affect listener's perception.

Relationship between Speech Perception in Noise and Phonemic Restoration of Speech in Noise in Individuals with Normal Hearing

  • Vijayasarathy, Srikar;Barman, Animesh
    • Journal of Audiology & Otology
    • /
    • 제24권4호
    • /
    • pp.167-173
    • /
    • 2020
  • Background and Objectives: Top-down restoration of distorted speech, tapped as phonemic restoration of speech in noise, maybe a useful tool to understand robustness of perception in adverse listening situations. However, the relationship between phonemic restoration and speech perception in noise is not empirically clear. Subjects and Methods: 20 adults (40-55 years) with normal audiometric findings were part of the study. Sentence perception in noise performance was studied with various signal-to-noise ratios (SNRs) to estimate the SNR with 50% score. Performance was also measured for sentences interrupted with silence and for those interrupted by speech noise at -10, -5, 0, and 5 dB SNRs. The performance score in the noise interruption condition was subtracted by quiet interruption condition to determine the phonemic restoration magnitude. Results: Fairly robust improvements in speech intelligibility was found when the sentences were interrupted with speech noise instead of silence. Improvement with increasing noise levels was non-monotonic and reached a maximum at -10 dB SNR. Significant correlation between speech perception in noise performance and phonemic restoration of sentences interrupted with -10 dB SNR speech noise was found. Conclusions: It is possible that perception of speech in noise is associated with top-down processing of speech, tapped as phonemic restoration of interrupted speech. More research with a larger sample size is indicated since the restoration is affected by the type of speech material and noise used, age, working memory, and linguistic proficiency, and has a large individual variability.

Relationship between Speech Perception in Noise and Phonemic Restoration of Speech in Noise in Individuals with Normal Hearing

  • Vijayasarathy, Srikar;Barman, Animesh
    • 대한청각학회지
    • /
    • 제24권4호
    • /
    • pp.167-173
    • /
    • 2020
  • Background and Objectives: Top-down restoration of distorted speech, tapped as phonemic restoration of speech in noise, maybe a useful tool to understand robustness of perception in adverse listening situations. However, the relationship between phonemic restoration and speech perception in noise is not empirically clear. Subjects and Methods: 20 adults (40-55 years) with normal audiometric findings were part of the study. Sentence perception in noise performance was studied with various signal-to-noise ratios (SNRs) to estimate the SNR with 50% score. Performance was also measured for sentences interrupted with silence and for those interrupted by speech noise at -10, -5, 0, and 5 dB SNRs. The performance score in the noise interruption condition was subtracted by quiet interruption condition to determine the phonemic restoration magnitude. Results: Fairly robust improvements in speech intelligibility was found when the sentences were interrupted with speech noise instead of silence. Improvement with increasing noise levels was non-monotonic and reached a maximum at -10 dB SNR. Significant correlation between speech perception in noise performance and phonemic restoration of sentences interrupted with -10 dB SNR speech noise was found. Conclusions: It is possible that perception of speech in noise is associated with top-down processing of speech, tapped as phonemic restoration of interrupted speech. More research with a larger sample size is indicated since the restoration is affected by the type of speech material and noise used, age, working memory, and linguistic proficiency, and has a large individual variability.