• Title/Summary/Keyword: speech quality evaluation

Search Result 178, Processing Time 0.028 seconds

A study on speech enhancement using complex-valued spectrum employing Feature map Dependent attention gate (특징 맵 중요도 기반 어텐션을 적용한 복소 스펙트럼 기반 음성 향상에 관한 연구)

  • Jaehee Jung;Wooil Kim
    • The Journal of the Acoustical Society of Korea
    • /
    • v.42 no.6
    • /
    • pp.544-551
    • /
    • 2023
  • Speech enhancement used to improve the perceptual quality and intelligibility of noise speech has been studied as a method using a complex-valued spectrum that can improve both magnitude and phase in a method using a magnitude spectrum. In this paper, a study was conducted on how to apply attention mechanism to complex-valued spectrum-based speech enhancement systems to further improve the intelligibility and quality of noise speech. The attention is performed based on additive attention and allows the attention weight to be calculated in consideration of the complex-valued spectrum. In addition, the global average pooling was used to consider the importance of the feature map. Complex-valued spectrum-based speech enhancement was performed based on the Deep Complex U-Net (DCUNET) model, and additive attention was conducted based on the proposed method in the Attention U-Net model. The results of the experiments on noise speech in a living room environment showed that the proposed method is improved performance over the baseline model according to evaluation metrics such as Source to Distortion Ratio (SDR), Perceptual Evaluation of Speech Quality (PESQ), and Short Time Object Intelligence (STOI), and consistently improved performance across various background noise environments and low Signal-to-Noise Ratio (SNR) conditions. Through this, the proposed speech enhancement system demonstrated its effectiveness in improving the intelligibility and quality of noisy speech.

Multi-channel input-based non-stationary noise cenceller for mobile devices (이동형 단말기를 위한 다채널 입력 기반 비정상성 잡음 제거기)

  • Jeong, Sang-Bae;Lee, Sung-Doke
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.17 no.7
    • /
    • pp.945-951
    • /
    • 2007
  • Noise cancellation is essential for the devices which use speech as an interface. In real environments, speech quality and recognition rates are degraded by the auditive noises coming near the microphone. In this paper, we propose a noise cancellation algorithm using stereo microphones basically. The advantage of the use of multiple microphones is that the direction information of the target source could be applied. The proposed noise canceller is based on the Wiener filter. To estimate the filter, noise and target speech frequency responses should be known and they are estimated by the spectral classification in the frequency domain. The performance of the proposed algorithm is compared with that of the well-known Frost algorithm and the generalized sidelobe canceller (GSC) with an adaptation mode controller (AMC). As performance measures, the perceptual evaluation of speech quality (PESQ), which is the most widely used among various objective speech quality methods, and speech recognition rates are adopted.

Classical Tamil Speech Enhancement with Modified Threshold Function using Wavelets

  • Indra., J;Kasthuri., N;Navaneetha Krishnan., S
    • Journal of Electrical Engineering and Technology
    • /
    • v.11 no.6
    • /
    • pp.1793-1801
    • /
    • 2016
  • Speech enhancement is a challenging problem due to the diversity of noise sources and their effects in different applications. The goal of speech enhancement is to improve the quality and intelligibility of speech by reducing noise. Many research works in speech enhancement have been accomplished in English and other European Languages. There has been limited or no such works or efforts in the past in the context of Tamil speech enhancement in the literature. The aim of the proposed method is to reduce the background noise present in the Tamil speech signal by using wavelets. New modified thresholding function is introduced. The proposed method is evaluated on several speakers and under various noise conditions including White Gaussian noise, Babble noise and Car noise. The Signal to Noise Ratio (SNR), Mean Square Error (MSE) and Mean Opinion Score (MOS) results show that the proposed thresholding function improves the speech enhancement compared to the conventional hard and soft thresholding methods.

Implementation and Evaluation of an HMM-Based Speech Synthesis System for the Tagalog Language

  • Mesa, Quennie Joy;Kim, Kyung-Tae;Kim, Jong-Jin
    • MALSORI
    • /
    • v.68
    • /
    • pp.49-63
    • /
    • 2008
  • This paper describes the development and assessment of a hidden Markov model (HMM) based Tagalog speech synthesis system, where Tagalog is the most widely spoken indigenous language of the Philippines. Several aspects of the design process are discussed here. In order to build the synthesizer a speech database is recorded and phonetically segmented. The constructed speech corpus contains approximately 89 minutes of Tagalog speech organized in 596 spoken utterances. Furthermore, contextual information is determined. The quality of the synthesized speech is assessed by subjective tests employing 25 native Tagalog speakers as respondents. Experimental results show that the new system is able to obtain a 3.29 MOS which indicates that the developed system is able to produce highly intelligible neutral Tagalog speech with stable quality even when a small amount of speech data is used for HMM training.

  • PDF

Speech Enhancement Based on Minima Controlled Recursive Averaging Technique Incorporating Conditional MAP (조건 사후 최대 확률 기반 최소값 제어 재귀평균기법을 이용한 음성향상)

  • Kum, Jong-Mo;Park, Yun-Sik;Chang, Joon-Hyuk
    • The Journal of the Acoustical Society of Korea
    • /
    • v.27 no.5
    • /
    • pp.256-261
    • /
    • 2008
  • In this paper, we propose a novel approach to improve the performance of minima controlled recursive averaging (MCRA) which is based on the conditional maximum a posteriori criterion. A crucial component of a practical speech enhancement system is the estimation of the noise power spectrum. One state-of-the-art approach is the minima controlled recursive averaging (MCRA) technique. The noise estimate in the MCRA technique is obtained by averaging past spectral power values based on a smoothing parameter that is adjusted by the signal presence probability in frequency subbands. We improve the MCRA using the speech presence probability which is the a posteriori probability conditioned on both the current observation the speech presence or absence of the previous frame. With the performance criteria of the ITU-T P.862 perceptual evaluation of speech quality (PESQ) and subjective evaluation of speech quality, we show that the proposed algorithm yields better results compared to the conventional MCRA-based scheme.

Real-time implementation of the 2.4kbps EHSX Speech Coder Using a $TMS320C6701^TM$ DSPCore ($TMS320C6701^TM$을 이용한 2.4kbps EHSX 음성 부호화기의 실시간 구현)

  • 양용호;이인성;권오주
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.29 no.7C
    • /
    • pp.962-970
    • /
    • 2004
  • This paper presents an efficient implementation of the 2.4 kbps EHSX(Enhanced Harmonic Stochastic Excitation) speech coder on a TMS320C6701$^{TM}$ floating-point digital signal processor. The EHSX speech codec is based on a harmonic and CELP(Code Excited Linear Prediction) modeling of the excitation signal respectively according to the frame characteristic such as a voiced speech and an unvoiced speech. In this paper, we represent the optimization methods to reduce the complexity for real-time implementation. The complexity in the filtering of a CELP algorithm that is the main part for the EHSX algorithm complexity can be reduced by converting program using floating-point variable to program using fixed-point variable. We also present the efficient optimization methods including the code allocation considering a DSP architecture and the low complexity algorithm of harmonic/pitch search in encoder part. Finally, we obtained the subjective quality of MOS 3.28 from speech quality test using the PESQ(perceptual evaluation of speech quality), ITU-T Recommendation P.862 and could get a goal of realtime operation of the EHSX codec.c.

A Selection Method of Reliable Codevectors using Noise Estimation Algorithm (잡음 추정 알고리즘을 이용한 신뢰성 있는 코드벡터 조합의 선정 방법)

  • Jung, Seungmo;Kim, Moo Young
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.52 no.7
    • /
    • pp.119-124
    • /
    • 2015
  • Speech enhancement has been required as a preprocessor for a noise robust speech recognition system. Codebook-based Speech Enhancement (CBSE) is highly robust in nonstationary noise environments compared with conventional noise estimation algorithms. However, its performance is severely degraded for the codevector combinations that have lower correlation with the input signal since CBSE depends on the trained codebook information. To overcome this problem, only the reliable codevector combinations are selected to be used to remove the codevector combinations that have lower correlation with input signal. The proposed method produces the improved performance compared to the conventional CBSE in terms of Log-Spectral Distortion (LSD) and Perceptual Evaluation of Speech Quality (PESQ).

Minima Controlled Speech Presence Uncertainty Tracking Method for Speech Enhancement (음성 향상을 위한 최소값 제어 음성 존재 부정확성의 추적기법)

  • Lee, Woo-Jung;Chang, Joon-Hyuk
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.7
    • /
    • pp.668-673
    • /
    • 2009
  • In this paper, we propose the minima controlled speech presence uncertainty tracking method to improve a speech enhancement. In the conventional tracking speech presence uncertainty, we propose a method for estimating distinct values of the a priori speech absence probability for different frames and channels. This estimation is inherently based on a posteriori SNR and used in estimating the speech absence probability (SAP). In this paper, we propose a novel estimation of distinct values of the a priori speech absence probability, which is based on minima controlled speech presence uncertainty tracking method, for different frames and channels. Subsequently, estimation is applied to the calculation of speech absence probability for speech enhancement. Performance of the proposed enhancement algorithm is evaluated by ITU-T P. 862 perceptual evaluation of speech quality (PESQ) under various noise environments. We show that the proposed algorithm yields better results compared to the conventional tracking speech presence uncertainty.

An Objective Speech Quality Measure using Masking Effect under Digital Mobile Telephone Network Environment (디지털 이동통신망 환경 하에서 마스킹 효과를 이용한 객관적 음질 평가 척도)

  • 김광수;김민정;석수영;정호열;정현일
    • Journal of Korea Multimedia Society
    • /
    • v.5 no.4
    • /
    • pp.405-414
    • /
    • 2002
  • In this paper, we propose a new objective speech quality measure using noise masking threshold for speech quality assessment of mobile telephone network environments, and verify the effectiveness of the proposed method through the experiments. For such a purpose, well known objective speech quality measures such as BSD and PSQM are first evaluated for digital mobile telephone network environments. However, these conventional methods does not have good performance under mobile networks environments compared to literary results. To be mote effective objective speech quality measure under mobile telephone environments, the proposed method employs human psychoacoustic masking effect. The DMOS, instead of MOS, is used as a subjective speech quality measure for performance evaluation. The performance comparison are carried out with speech data collected from digital mobile telephone environments. As results, the proposed measure have and average 4% higher performance, in terms of correlation, than existing objective speech quality measures such as BSD and PSQM.

  • PDF

A Scalable Audio Coder for High-quality Speech and Audio Services

  • Lee, Gil-Ho;Lee, Young-Han;Kim, Hong-Kook;Kim, Do-Young;Lee, Mi-Suk
    • MALSORI
    • /
    • no.61
    • /
    • pp.75-86
    • /
    • 2007
  • In this paper, we propose a scalable audio coder, which has a variable bandwidth from the narrowband speech bandwidth to the audio bandwidth and also has a bit-rate from 8 to 320 kbits/s, in order to cope with the quality of service(QoS) according to the network load. First of all, the proposed scalable coder splits bandwidth of the input audio into narrowband up to around 4 kHz and above. Next, the narrowband signals are compressed by a speech coding method compatible to an existing standard speech coder such as G.729, and the other signals whose bandwidth is above the narrowband are compressed on the basis of a psychoacoustic model. It is shown from the objective quality tests using the signal-to-noise ratio(SNR) and the perceptual evaluation of audio quality(PEAQ) that the proposed scalable audio coder provides a comparable quality to the MPEG-1 Layer III (MP3) audio coder.

  • PDF