• Title/Summary/Keyword: Speech Enhancement

Search Result 340, Processing Time 0.023 seconds

Effects of Listener's Experience, Severity of Speaker's Articulation, and Linguistic Cues on Speech Intelligibility in Congenitally Deafened Adults with Cochlear Implants (청자의 경험, 화자의 조음 중증도, 단서 유형이 인공와우이식 선천성 농 성인의 말명료도에 미치는 영향)

  • Lee, Young-Mee;Sung, Jee-Eun;Park, Jeong-Mi;Sim, Hyun-Sub
    • Phonetics and Speech Sciences
    • /
    • v.3 no.1
    • /
    • pp.125-134
    • /
    • 2011
  • The current study investigated the effects of experience of deaf speech, severity of speaker's articulation, and linguistic cues on speech intelligibility of congenitally deafened adults with cochlear implants. Speech intelligibility was judged by 28 experienced listeners and 40 inexperienced listeners using a word transcription task. A three-way (2 $\times$ 2 $\times$ 4) mixed design was used with the experience of deaf speech (experienced/inexperienced listener) as a between-subject factor, the severity of speaker's articulation (mild to moderate/moderate to severe), and linguistic cues (no/phonetic/semantic/combined) as within-subject factors. The dependent measure was the number of correctly transcribed words. Results revealed that three main effects were statistically significant. Experienced listeners showed better performance on the transcription than inexperienced listeners, and listeners were better in transcribing speakers who were mild to moderate than moderate to severe. There were significant differences in speech intelligibility among the four different types of cues, showing that the combined cues provided the greatest enhancement of the intelligibility scores (combined > semantic > phonological > no). Three two-way interactions were statistically significant, indicating that the type of cues and severity of speakers differentiated experienced listeners from inexperienced listeners. The current results suggested that the use of a combination of linguistic cues increased the speech intelligibility of congenitally deafened adults with cochlear implants, and the experience of deaf speech was critical especially in evaluating speech intelligibility of severe speakers compared to that of mild speakers.

  • PDF

Glottal Characteristics of Word-initial Vowels in the Prosodic Boundary: Acoustic Correlates (운율경계에 위치한 어두 모음의 성문 특성: 음향적 상관성을 중심으로)

  • Sohn, Hyang-Sook
    • Phonetics and Speech Sciences
    • /
    • v.2 no.3
    • /
    • pp.47-63
    • /
    • 2010
  • This study provides a description of the glottal characteristics of the word-initial low vowels /a, $\ae$/ in terms of a set of acoustic parameters and discusses glottal configuration as their acoustic correlates. Furthermore, it examines the effect of prosodic boundary on the glottal properties of the vowels, seeking an account of the possible role of prosodic structure based on prosodic theory. Acoustic parameters reported to indicate glottal characteristics were obtained from the measurements made directly from the speech spectrum on recordings of Korean and English collected from 45 speakers. They consist of two separate groups of native Korean and native English speakers, each including both male and female speakers. Based on the three acoustic parameters of open quotient (OQ), first-formant bandwidth (B1), and spectral tilt (ST), comparisons were made between the speech of males and females, between the speech of native Korean and native English speakers, and between Korean and English produced by native Korean speakers. Acoustic analysis of the experimental data indicates that some or all glottal parameters play a crucial role in differentiating the speech groups, despite substantial interspeaker variations. Statistical analysis of the Korean data indicates prosodic strengthening with respect to the acoustic parameters B1 and OQ, suggesting acoustic enhancement in terms of the degree of glottal abduction and the glottal closure during a vibratory cycle.

  • PDF

A Spectral Compensation Method for Noise Robust Speech Recognition (잡음에 강인한 음성인식을 위한 스펙트럼 보상 방법)

  • Cho, Jung-Ho
    • 전자공학회논문지 IE
    • /
    • v.49 no.2
    • /
    • pp.9-17
    • /
    • 2012
  • One of the problems on the application of the speech recognition system in the real world is the degradation of the performance by acoustical distortions. The most important source of acoustical distortion is the additive noise. This paper describes a spectral compensation technique based on a spectral peak enhancement scheme followed by an efficient noise subtraction scheme for noise robust speech recognition. The proposed methods emphasize the formant structure and compensate the spectral tilt of the speech spectrum while maintaining broad-bandwidth spectral components. The recognition experiments was conducted using noisy speech corrupted by white Gaussian noise, car noise, babble noise or subway noise. The new technique reduced the average error rate slightly under high SNR(Signal to Noise Ratio) environment, and significantly reduced the average error rate by 1/2 under low SNR(10 dB) environment when compared with the case of without spectral compensations.

Voice Packet Processing Scheme for Voice Quality and Bandwidth Efficiency in VoIP (VoIP의 음성품질/대역효율 개선을 위한 음성패킷 처리)

  • Kim, Jae-Won;Sohn, Dong-Chul
    • Journal of Korea Multimedia Society
    • /
    • v.7 no.7
    • /
    • pp.896-904
    • /
    • 2004
  • In this paper, We present an efficient variable rate speech coder for spectral efficiency and packet processing technique for packet loss compensation of a voice codec with 10msec frame in VoIP service. Through disconnecting the users from the spectral resource during silence interval of about 60% period, a variable rate voice coder based on a voice activity detection(VAD) can increase spectral gain by two times. The performance of the method was analyzed by variation of detected voice activity factor and degraded speech frame ratio under various background noise level, and compared those of G.729B of ITU-T 8kbps standard speech codec. A method to compensate lost packets utilized addition of recovery data to a main stream and error concealment scheme for speech quality enhancement, the performance is verified by reconstructed speech quality. The proposed scheme can achieve spectral gain by two times or enhance speech quality by 3dB through reserved bandwidth of VAD. Therefore, the proposed method can enhance a spectral efficiency or speech quality of VoIP.

  • PDF

Improved speech enhancement of multi-channel Wiener filter using adjustment of principal subspace vector (다채널 위너 필터의 주성분 부공간 벡터 보정을 통한 잡음 제거 성능 개선)

  • Kim, Gibak
    • The Journal of the Acoustical Society of Korea
    • /
    • v.39 no.5
    • /
    • pp.490-496
    • /
    • 2020
  • We present a method to improve the performance of the multi-channel Wiener filter in noisy environment. To build subspace-based multi-channel Wiener filter, in the case of single target source, the target speech component can be effectively estimated in the principal subspace of speech correlation matrix. The speech correlation matrix can be estimated by subtracting noise correlation matrix from signal correlation matrix based on the assumption that the cross-correlation between speech and interfering noise is negligible compared with speech correlation. However, this assumption is not valid in the presence of strong interfering noise and significant error can be induced in the principal subspace accordingly. In this paper, we propose to adjust the principal subspace vector using speech presence probability and the steering vector for the desired speech source. The multi-channel speech presence probability is derived in the principal subspace and applied to adjust the principal subspace vector. Simulation results show that the proposed method improves the performance of multi-channel Wiener filter in noisy environment.

Non-Stationary/Mixed Noise Estimation Algorithm Based on Minimum Statistics and Codebook Driven Short-Term Predictor Parameter Estimation (최소 통계법과 Short-Term 예측계수 코드북을 이용한 Non-Stationary/Mixed 배경잡음 추정 기법)

  • Lee, Myeong-Seok;Noh, Myung-Hoon;Park, Sung-Joo;Lee, Seok-Pil;Kim, Moo-Young
    • The Journal of the Acoustical Society of Korea
    • /
    • v.29 no.3
    • /
    • pp.200-208
    • /
    • 2010
  • In this work, the minimum statistics (MS) algorithm is combined with the codebook driven short-term predictor parameter estimation (CDSTP) to design a speech enhancement algorithm that is robust against various background noise environments. The MS algorithm functions well for the stationary noise but relatively not for the non-stationary noise. The CDSTP works efficiently for the non-stationary noise, but not for the noise that was not considered in the training stage. Thus, we propose to combine CDSTP and MS. Compared with the single use of MS and CDSTP, the proposed method produces better perceptual evaluation of speech quality (PESQ) score, and especially works excellent for the mixed background noise between stationary and non-stationary noises.

Statistical Model-Based Noise Reduction Approach for Car Interior Applications to Speech Recognition

  • Lee, Sung-Joo;Kang, Byung-Ok;Jung, Ho-Young;Lee, Yun-Keun;Kim, Hyung-Soon
    • ETRI Journal
    • /
    • v.32 no.5
    • /
    • pp.801-809
    • /
    • 2010
  • This paper presents a statistical model-based noise suppression approach for voice recognition in a car environment. In order to alleviate the spectral whitening and signal distortion problem in the traditional decision-directed Wiener filter, we combine a decision-directed method with an original spectrum reconstruction method and develop a new two-stage noise reduction filter estimation scheme. When a tradeoff between the performance and computational efficiency under resource-constrained automotive devices is considered, ETSI standard advance distributed speech recognition font-end (ETSI-AFE) can be an effective solution, and ETSI-AFE is also based on the decision-directed Wiener filter. Thus, a series of voice recognition and computational complexity tests are conducted by comparing the proposed approach with ETSI-AFE. The experimental results show that the proposed approach is superior to the conventional method in terms of speech recognition accuracy, while the computational cost and frame latency are significantly reduced.

Time-Frequency Domain Impulsive Noise Detection System in Speech Signal (음성 신호에서의 시간-주파수 축 충격 잡음 검출 시스템)

  • Choi, Min-Seok;Shin, Ho-Seon;Hwang, Young-Soo;Kang, Hong-Goo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.30 no.2
    • /
    • pp.73-79
    • /
    • 2011
  • This paper presents a new impulsive noise detection algorithm in speech signal. The proposed method employs the frequency domain characteristic of the impulsive noise to improve the detection accuracy while avoiding the false-alarm problem by the pitch of the speech signal. Furthermore, we proposed time-frequency domain impulsive noise detector that utilizes both the time and frequency domain parameters which minimizes the false-alarm problem by mutually complementing each other. As the result, the proposed time-frequency domain detector shows the best performance with 99.33 % of detection accuracy and 1.49 % of false-alarm rate.

A Design of Multi-channel Speech Pickup Embedded System for Hands-free Comuunication (핸즈프리 통신을 위한 다중채널 음성픽업 임베디드 시스템 설계)

  • Ju, Hyng-Jun;Park, Chan-Sub;Jeon, Jae-Kuk;Kim, Ki-Man
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.11 no.2
    • /
    • pp.366-373
    • /
    • 2007
  • In this paper we propose a multi-channel speech pickup system for calling quality enhancement of hands-free communication using ALTERA Nios-II processor. Multi-channel speech pickup system uses Delay-and-Sum beamformer with zero-padding interpolator. This paper implements speech pickup system using the Nios-II processor with real-time I/O data processing speed. The proposes speech pickup embedded system shows a good agreement with those of computer simulation(MATLAB) and conventional DSP processor(TMS320C6711) result. The proposed method is effective more than previous methods in cost and design processing time. As a result, LE(Logic Element) of hardware used 3,649/5,980(61%) on a chip.

Transient Noise Reduction in Speech Signal Utilizing a Long-term Predictor (장구간 예측 필터를 이용한 음성 신호에서의 돌발 잡음 제거)

  • Choi, Min-Seok;Kang, Hong-Goo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.31 no.1
    • /
    • pp.29-38
    • /
    • 2012
  • This paper presents a transient noise reduction system in a speech signal. The proposed transient noise reduction system utilizes a median filter to reduce the transient noise. Since the median filter can distort speech during the noise reduction, a long-term prediction (LTP) filter is adopted as a pre-processor to minimize speech distortion. The speech information preserved by the LTP filter is re-synthesized after reducing the noise. This paper verifies the weakness of a linear prediction (LP) filter and the superiority of the LTP filter for preserving the speech component in transient noise presence environment. Applying the proposed system, the signal-to-noise ratio (SNR) of output is improved by 8dB in both speech and noise presence region, and PESQ score is increased by 1 point comparing with noisy input.