• Title/Summary/Keyword: speech distortion

Search Result 227, Processing Time 0.027 seconds

A study on deep neural speech enhancement in drone noise environment (드론 소음 환경에서 심층 신경망 기반 음성 향상 기법 적용에 관한 연구)

  • Kim, Jimin;Jung, Jaehee;Yeo, Chaneun;Kim, Wooil
    • The Journal of the Acoustical Society of Korea
    • /
    • v.41 no.3
    • /
    • pp.342-350
    • /
    • 2022
  • In this paper, actual drone noise samples are collected for speech processing in disaster environments to build noise-corrupted speech database, and speech enhancement performance is evaluated by applying spectrum subtraction and mask-based speech enhancement techniques. To improve the performance of VoiceFilter (VF), an existing deep neural network-based speech enhancement model, we apply the Self-Attention operation and use the estimated noise information as input to the Attention model. Compared to existing VF model techniques, the experimental results show 3.77%, 1.66% and 0.32% improvements for Source to Distortion Ratio (SDR), Perceptual Evaluation of Speech Quality (PESQ), and Short-Time Objective Intelligence (STOI), respectively. When trained with a 75% mix of speech data with drone sounds collected from the Internet, the relative performance drop rates for SDR, PESQ, and STOI are 3.18%, 2.79% and 0.96%, respectively, compared to using only actual drone noise. This confirms that data similar to real data can be collected and effectively used for model training for speech enhancement in environments where real data is difficult to obtain.

PESQ-Based Selection of Efficient Partial Encryption Set for Compressed Speech

  • Yang, Hae-Yong;Lee, Kyung-Hoon;Lee, Sang-Han;Ko, Sung-Jea
    • ETRI Journal
    • /
    • v.31 no.4
    • /
    • pp.408-418
    • /
    • 2009
  • Adopting an encryption function in voice over Wi-Fi service incurs problems such as additional power consumption and degradation of communication quality. To overcome these problems, a partial encryption (PE) algorithm for compressed speech was recently introduced. However, from the security point of view, the partial encryption sets (PESs) of the conventional PE algorithm still have much room for improvement. This paper proposes a new selection method for finding a smaller PES while maintaining the security level of encrypted speech. The proposed PES selection method employs the perceptual evaluation of the speech quality (PESQ) algorithm to objectively measure the distortion of speech. The proposed method is applied to the ITU-T G.729 speech codec, and content protection capability is verified by a range of tests and a reconstruction attack. The experimental results show that encrypting only 20% of the compressed bitstream is sufficient to effectively hide the entire content of speech.

Speech pathologic evaluation of children with ankyloglossia (설유착증 환자의 언어병리학적 평가)

  • Lee, Ju-Kyung
    • Proceedings of the KSPS conference
    • /
    • 2007.05a
    • /
    • pp.155-157
    • /
    • 2007
  • Objective : There are close relationship between intraoral abnormal structure and speech-functional problem. Patients with cleft palate & ankyloglossia are typical examples. Patients with abnormal structure can be repaired toward normal structure by operation. Ankyloglossia may cause functional limitation - for example, speech disorder - even if adequate surgical treatment were done. And, each individuals have each speech disorders. The objective of this study is to evaluate the speechs of childrens with ankyloglossia, and to determine whether ankyloglossia is associated with articulation problem. We wanted to present criteria for indication of frenectomy. Study design The experimental group is composed of 10 childrens who visited our department of oral and maxillofacial surgery, dental hospital, Chonbuk university, due to ankyloglossia and articulation problem,. The average age is 5 Y 7M, M : F ratio is 4 : 1 at the time of speech test. The VPI consonant discrimination degree, PPVT, PCAT, Nasometer II, Visi-Pitch test result were obtained from each group. Result : There was significant difference for 'language development' through PPVT. Except 3 members of experimental group, all remainder showed retardation for 'language development'. For 'errored consonant rate', data showed more higher scores in alveolar consonant. There 'consonant error' in experimental group, mostly showed 'alveolar consonant', also a major modality of 'consonant error' was mostly distortion. Conclusion : We can judge the severity of ankyloglossia patient by examinig language development degree & speech test of 'alveolar consonant' . And we can make a decision for frenulotomy using these results.

  • PDF

Implementation of a Robust Speech Recognizer in Noisy Car Environment Using a DSP (DSP를 이용한 자동차 소음에 강인한 음성인식기 구현)

  • Chung, Ik-Joo
    • Speech Sciences
    • /
    • v.15 no.2
    • /
    • pp.67-77
    • /
    • 2008
  • In this paper, we implemented a robust speech recognizer using the TMS320VC33 DSP. For this implementation, we had built speech and noise database suitable for the recognizer using spectral subtraction method for noise removal. The recognizer has an explicit structure in aspect that a speech signal is enhanced through spectral subtraction before endpoints detection and feature extraction. This helps make the operation of the recognizer clear and build HMM models which give minimum model-mismatch. Since the recognizer was developed for the purpose of controlling car facilities and voice dialing, it has two recognition engines, speaker independent one for controlling car facilities and speaker dependent one for voice dialing. We adopted a conventional DTW algorithm for the latter and a continuous HMM for the former. Though various off-line recognition test, we made a selection of optimal conditions of several recognition parameters for a resource-limited embedded recognizer, which led to HMM models of the three mixtures per state. The car noise added speech database is enhanced using spectral subtraction before HMM parameter estimation for reducing model-mismatch caused by nonlinear distortion from spectral subtraction. The hardware module developed includes a microcontroller for host interface which processes the protocol between the DSP and a host.

  • PDF

A study on combination of loss functions for effective mask-based speech enhancement in noisy environments (잡음 환경에 효과적인 마스크 기반 음성 향상을 위한 손실함수 조합에 관한 연구)

  • Jung, Jaehee;Kim, Wooil
    • The Journal of the Acoustical Society of Korea
    • /
    • v.40 no.3
    • /
    • pp.234-240
    • /
    • 2021
  • In this paper, the mask-based speech enhancement is improved for effective speech recognition in noise environments. In the mask-based speech enhancement, enhanced spectrum is obtained by multiplying the noisy speech spectrum by the mask. The VoiceFilter (VF) model is used as the mask estimation, and the Spectrogram Inpainting (SI) technique is used to remove residual noise of enhanced spectrum. In this paper, we propose a combined loss to further improve speech enhancement. In order to effectively remove the residual noise in the speech, the positive part of the Triplet loss is used with the component loss. For the experiment TIMIT database is re-constructed using NOISEX92 noise and background music samples with various Signal to Noise Ratio (SNR) conditions. Source to Distortion Ratio (SDR), Perceptual Evaluation of Speech Quality (PESQ), and Short-Time Objective Intelligibility (STOI) are used as the metrics of performance evaluation. When the VF was trained with the mean squared error and the SI model was trained with the combined loss, SDR, PESQ, and STOI were improved by 0.5, 0.06, and 0.002 respectively compared to the system trained only with the mean squared error.

A study on loss combination in time and frequency for effective speech enhancement based on complex-valued spectrum (효과적인 복소 스펙트럼 기반 음성 향상을 위한 시간과 주파수 영역 손실함수 조합에 관한 연구)

  • Jung, Jaehee;Kim, Wooil
    • The Journal of the Acoustical Society of Korea
    • /
    • v.41 no.1
    • /
    • pp.38-44
    • /
    • 2022
  • Speech enhancement is performed to improve intelligibility and quality of the noise-corrupted speech. In this paper, speech enhancement performance was compared using different loss functions in time and frequency domains. This study proposes a combination of loss functions to utilize advantage of each domain by considering both the details of spectrum and the speech waveform. In our study, Scale Invariant-Source to Noise Ratio (SI-SNR) is used for the time domain loss function, and Mean Squared Error (MSE) is used for the frequency domain, which is calculated over the complex-valued spectrum and magnitude spectrum. The phase loss is obtained using the sin function. Speech enhancement result is evaluated using Source-to-Distortion Ratio (SDR), Perceptual Evaluation of Speech Quality (PESQ), and Short-Time Objective Intelligibility (STOI). In order to confirm the result of speech enhancement, resulting spectrograms are also compared. The experimental results over the TIMIT database show the highest performance when using combination of SI-SNR and magnitude loss functions.

The Wavelet Transform Based Subband Adaptive Acoustic Echo Canceller with Noise Cancellation Property (잡음제거 특성을 갖는 웨이브릿변환 기반 서브밴드 적응 음향반향제거기)

  • 박재우;안주원;권기룡;문광석;김강언
    • Proceedings of the IEEK Conference
    • /
    • 2000.09a
    • /
    • pp.7-10
    • /
    • 2000
  • This paper focuses on the development of speech enhancement techniques for hands-free audio terminals, including two major problems : noise cancellation and acoustic echo cancellation. The objective is to find a joint structure to get a near-end speech signal with minimum distortion and low levels of echo and noise. To solve the two problems, a new promising technique is studied and tested in computer simulation conditions.

  • PDF

A Study on the Objective Evaluation Model of Telephone Transmission Quality (통화품질 객관평가 모델링에 관한 연구)

  • 조재철;박순영;방만원
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.16 no.6
    • /
    • pp.509-516
    • /
    • 1991
  • In this paper, we propose on objective evaluation model of telephone transmission qulity in order to estimate a satisfaction score regarding speech quality in a relephone network. As the degradantion factors of telephone transmission quality, this model takes into account transmission loss, noise, distortion, talker echo and sidetone. A performance index[PI] is introduced for five psychological factors affecting telephone speech qualty, and a Mean Opinion Score(MOS) is estimated from the sum of all Pis. The simulation results indicate theat the MOS obtained from the objective evaluation model is in good agreement with that of subjective evaluation.

  • PDF

Reduction Algorithm of Environmental Noise by Multi-band Filter (멀티밴드필터에 의한 환경잡음억압 알고리즘)

  • Choi, Jae-Seung
    • Journal of the Korea Society of Computer and Information
    • /
    • v.17 no.8
    • /
    • pp.91-97
    • /
    • 2012
  • This paper first proposes the speech recognition algorithm by detection of the speech and noise sections at each frame, then proposes the reduction algorithm of environmental noise by multi-band filter which removes the background noises at each frame according to detection of the speech and noise sections. The proposed algorithm reduces the background noises using filter bank sub-band domain after extracting the features from the speech data. In this experiment, experimental results of the proposed noise reduction algorithm by the multi-band filter demonstrate using the speech and noise data, at each frame. Based on measuring the spectral distortion, experiments confirm that the proposed algorithm is effective for the speech by corrupted the noise.

A Study on Approximation-Synthesis of Transition Segment in Speech Signal (음성신호에서 천이구간의 근사합성에 관한 연구)

  • Lee See-Woo
    • The Journal of the Korea Contents Association
    • /
    • v.5 no.3
    • /
    • pp.167-173
    • /
    • 2005
  • In a speech coding system using excitation source of voiced and unvoiced, it would be involved a distortion of speech quality in case coexist with a voiced and unvoiced consonants in a frame. So, I propose TSIUVC(Transition Segment Including Unvoiced Consonant) extraction method by using pitch pulses and Zero Crossing Rate in order to unexistent with a voiced and unvoiced consonants in a frame. And this paper present a TSIUVC approximate-synthesis method by using frequency band division. As a result, this method obtains a high quality approximation-synthesis waveform within TSIUVC by using frequency information of 0.547kHz below and 2.813kHz above. And the TSIUVC extraction rate was $91\%$ for female voice and $96.2\%$ for male voice respectively This method has the capability of being applied to a new speech coding of Voiced/Silence/TSIUVC, speech analysis, and speech synthesis.

  • PDF