• 제목/요약/키워드: Speech quality

검색결과 803건 처리시간 0.028초

Evaluation Performance of Speech Coder in Speech Signal Processing

  • Lee, Kwang-Seok
    • Journal of information and communication convergence engineering
    • /
    • 제5권2호
    • /
    • pp.177-180
    • /
    • 2007
  • We compared CS-ACELP with QCELP speech coder in CDMA cellular under channel error environment and experimented performance with its measured value under channel error environment. Also, we specified the effective coding scheme to overcome. CS-ACELP speech coder using a LSP vector quantizer shows transparent speech quality from the results that SD is 0.92dB and outlier frames over 2dB is 2.9% in the BER 0.10% condition. CS-ACELP speech coder which is utilizing MA predictor shows better results on SVR and SEGSNR than QCELP speech coder(IS-96) adopting DPCM type predictor when bit error occurs from BER 0.01% to 0.50%.

SPEECH ENHANCEMENT BY FREQUENCY-WEIGHTED BLOCK LMS ALGORITHM

  • Cho, D.H.
    • 한국음향학회:학술대회논문집
    • /
    • 한국음향학회 1985년도 학술발표회 논문집
    • /
    • pp.87-94
    • /
    • 1985
  • In this paper, enhancement of speech corrupted by additive white or colored noise is stuided. The nuconstrained frequency-domain block least-mean-square (UFBLMS) adaptation algorithm and its frequency-weighted version are newly applied to speech enhancement. For enhancement of speech degraded by white noise, the performance of the UFBLMS algorithm is superior to the spectral subtraction method or Wiener filtering technique by more than 3 dB in segmented frequency-weighted signal-to-noise ratio(FWSNERSEG) when SNR of speech is in the range of 0 to 10 dB. As for enhancement of noisy speech corrupted by colored noise, the UFBLMS algorithm is superior to that of the spectral subtraction method by about 3 to 5 dB in FWSNRSEG. Also, it yields better performance by about 2 dB in FWSNR and FWSNRSEG than that of time-domain least-mean-square (TLMS) adaptive prediction filter(APF). In view of the computational complexity and performance improvement in speech quality and intelligibility, the frequency-weighted UFBLMS algorithm appears to yield the best performance among various algorithms in enhancing noisy speech corrupted by white or colored noise.

  • PDF

Korean Broadcast News Transcription Using Morpheme-based Recognition Units

  • Kwon, Oh-Wook;Alex Waibel
    • The Journal of the Acoustical Society of Korea
    • /
    • 제21권1E호
    • /
    • pp.3-11
    • /
    • 2002
  • Broadcast news transcription is one of the hardest tasks in speech recognition because broadcast speech signals have much variability in speech quality, channel and background conditions. We developed a Korean broadcast news speech recognizer. We used a morpheme-based dictionary and a language model to reduce the out-of·vocabulary (OOV) rate. We concatenated the original morpheme pairs of short length or high frequency in order to reduce insertion and deletion errors due to short morphemes. We used a lexicon with multiple pronunciations to reflect inter-morpheme pronunciation variations without severe modification of the search tree. By using the merged morpheme as recognition units, we achieved the OOV rate of 1.7% comparable to European languages with 64k vocabulary. We implemented a hidden Markov model-based recognizer with vocal tract length normalization and online speaker adaptation by maximum likelihood linear regression. Experimental results showed that the recognizer yielded 21.8% morpheme error rate for anchor speech and 31.6% for mostly noisy reporter speech.

4800bps CELP 음성 부호화기에 적용한 대역폭 확장에 관한 연구 (A Study on the Bandwidth Extension Adopted for 4800 bps CELP Speech Coder)

  • 박진수;김형순
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2002년도 11월 학술대회지
    • /
    • pp.175-178
    • /
    • 2002
  • Most existing telephone networks transmit narrowband speech witch has been bandlimited below 4 kHz. Compared with wideband speech up to 8 kHz, narrowband speech shows reduced intelligibility and a muffled quality. Bandwidth extension is a technique to generate wideband speech by reconstructing 4-8 kHz highband speech without any additional information. This paper presents experimental results of the bandwidth extension adopted for 4800 bps CELP speech coder. In this experiment, we examine various methods for reconstruction of wideband spectrum and excitation signal, compare and analyze their performance by performing the subjective preference test and measuring the cepstral distortion.

  • PDF

VoIP 환경에서의 잡음제거를 위한 최적화된 위너 필터 (Optimized Wiener Filter for Noise Reduction in VoIP Environments)

  • 정상배;이성독;한민수
    • 대한음성학회지:말소리
    • /
    • 제64호
    • /
    • pp.105-119
    • /
    • 2007
  • Noise reduction technologies are indispensable to achieve acceptable speech quality in VoIP systems. This paper proposes a Wiener filter optimized to the estimated SNR of noisy speech for the noise reduction in VoIP environments. The proposed noise canceller is applied as a pre-processor before speech encoding. The performance of the proposed method is evaluated by the PESQ in various noisy conditions. In this paper, the proposed algorithm is applied to G.711, G.723.1, and G.729A which are all VoIP speech codecs. The PESQ results show that the performance of our proposed noise reduction scheme outperforms those of the noise suppression in the IS-127 EVRC and the ETSI standard for the advanced distributed speech recognition front-end.

  • PDF

Two-Microphone Generalized Sidelobe Canceller with Post-Filter Based Speech Enhancement in Composite Noise

  • Park, Jinsoo;Kim, Wooil;Han, David K.;Ko, Hanseok
    • ETRI Journal
    • /
    • 제38권2호
    • /
    • pp.366-375
    • /
    • 2016
  • This paper describes an algorithm to suppress composite noise in a two-microphone speech enhancement system for robust hands-free speech communication. The proposed algorithm has four stages. The first stage estimates the power spectral density of the residual stationary noise, which is based on the detection of nonstationary signal-dominant time-frequency bins (TFBs) at the generalized sidelobe canceller output. Second, speech-dominant TFBs are identified among the previously detected nonstationary signal-dominant TFBs, and power spectral densities of speech and residual nonstationary noise are estimated. In the final stage, the bin-wise output signal-to-noise ratio is obtained with these power estimates and a Wiener post-filter is constructed to attenuate the residual noise. Compared to the conventional beamforming and post-filter algorithms, the proposed speech enhancement algorithm shows significant performance improvement in terms of perceptual evaluation of speech quality.

한국어 음성합성기의 운율 예측을 위한 의사결정트리 모델에 관한 연구 (A Study of Decision Tree Modeling for Predicting the Prosody of Corpus-based Korean Text-To-Speech Synthesis)

  • 강선미;권오일
    • 음성과학
    • /
    • 제14권2호
    • /
    • pp.91-103
    • /
    • 2007
  • The purpose of this paper is to develop a model enabling to predict the prosody of Korean text-to-speech synthesis using the CART and SKES algorithms. CART prefers a prediction variable in many instances. Therefore, a partition method by F-Test was applied to CART which had reduced the number of instances by grouping phonemes. Furthermore, the quality of the text-to-speech synthesis was evaluated after applying the SKES algorithm to the same data size. For the evaluation, MOS tests were performed on 30 men and women in their twenties. Results showed that the synthesized speech was improved in a more clear and natural manner by applying the SKES algorithm.

  • PDF

Knowledge-driven speech features for detection of Korean-speaking children with autism spectrum disorder

  • Seonwoo Lee;Eun Jung Yeo;Sunhee Kim;Minhwa Chung
    • 말소리와 음성과학
    • /
    • 제15권2호
    • /
    • pp.53-59
    • /
    • 2023
  • Detection of children with autism spectrum disorder (ASD) based on speech has relied on predefined feature sets due to their ease of use and the capabilities of speech analysis. However, clinical impressions may not be adequately captured due to the broad range and the large number of features included. This paper demonstrates that the knowledge-driven speech features (KDSFs) specifically tailored to the speech traits of ASD are more effective and efficient for detecting speech of ASD children from that of children with typical development (TD) than a predefined feature set, extended Geneva Minimalistic Acoustic Standard Parameter Set (eGeMAPS). The KDSFs encompass various speech characteristics related to frequency, voice quality, speech rate, and spectral features, that have been identified as corresponding to certain of their distinctive attributes of them. The speech dataset used for the experiments consists of 63 ASD children and 9 TD children. To alleviate the imbalance in the number of training utterances, a data augmentation technique was applied to TD children's utterances. The support vector machine (SVM) classifier trained with the KDSFs achieved an accuracy of 91.25%, surpassing the 88.08% obtained using the predefined set. This result underscores the importance of incorporating domain knowledge in the development of speech technologies for individuals with disorders.

Multi Mode Harmonic Transform Coding for Speech and Music

  • Kim, Jonghark;Shin, Jae-Hyun;Lee, Insung
    • The Journal of the Acoustical Society of Korea
    • /
    • 제22권3E호
    • /
    • pp.101-109
    • /
    • 2003
  • A multi-mode harmonic transform coding (MMHTC) for speech and music signals is proposed. Its structure is organized as a linear prediction model with an input of harmonic and transform-based excitation. The proposed coder also utilizes harmonic prediction and an improved quantizer of excitation signal. To efficiently quantize the excitation of music signals, the modulated lapped transform(MLT) is introduced. In other words, the coder combines both the time domain (linear prediction) and the frequency domain technique to achieve the best perceptual quality. The proposed coder showed better speech quality than that of the 8 kbps QCELP coder at a bit-rate of 4 kbps.

내전형 경련성 발성장애인에서 서동일 음성치료 기법의 적용 1례 (Application of Seo Dongil's Voice Technique in Patient with Adductor Spasmodic Dysphonia: A Case Study)

  • 서동일;유재연;정옥란;최홍식
    • 음성과학
    • /
    • 제9권4호
    • /
    • pp.39-47
    • /
    • 2002
  • The purpose of this study was to investigate the effects of Seo Dongil's voice technique on voice quality in patient with adductor spasmodic dysphonia. One patient participated in the study. The subject was assessed acoustically (Ave Fo, Ave Int, percent speech time, percent silence time, percent voice time, percent voiceless time) and perceptually (GRBAS scales) in the first and last session. Dr. Speech (version 4.0, Tiger-DRS) was used to compare acoustic parameters of pre-and post-treatment. Seo Dongil's voice technique consisted of relaxation, breathing exercise and phonation exercise. The results were as follows: First, Seo Dongil's voice technique tented to be effective on decreasing voice break and voice stoppage in patient with adductor spasmodic dysphonia. Second, GRBAS scales showed that Seo Dongil's voice technique was effective on improving voice quality of patient with adductor spasmodic dysphonia.

  • PDF