• Title/Summary/Keyword: speech quality

Search Result 807, Processing Time 0.025 seconds

Real-time implementation of the 2.4kbps EHSX Speech Coder Using a $TMS320C6701^TM$ DSPCore ($TMS320C6701^TM$을 이용한 2.4kbps EHSX 음성 부호화기의 실시간 구현)

  • 양용호;이인성;권오주
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.29 no.7C
    • /
    • pp.962-970
    • /
    • 2004
  • This paper presents an efficient implementation of the 2.4 kbps EHSX(Enhanced Harmonic Stochastic Excitation) speech coder on a TMS320C6701$^{TM}$ floating-point digital signal processor. The EHSX speech codec is based on a harmonic and CELP(Code Excited Linear Prediction) modeling of the excitation signal respectively according to the frame characteristic such as a voiced speech and an unvoiced speech. In this paper, we represent the optimization methods to reduce the complexity for real-time implementation. The complexity in the filtering of a CELP algorithm that is the main part for the EHSX algorithm complexity can be reduced by converting program using floating-point variable to program using fixed-point variable. We also present the efficient optimization methods including the code allocation considering a DSP architecture and the low complexity algorithm of harmonic/pitch search in encoder part. Finally, we obtained the subjective quality of MOS 3.28 from speech quality test using the PESQ(perceptual evaluation of speech quality), ITU-T Recommendation P.862 and could get a goal of realtime operation of the EHSX codec.c.

A study on Speech Coding Method using V/S/TSIUVC Switching (V/S/TSIUVC 스위칭을 이용한 음성부호화 방식에 관한 연구)

  • Lee, See-Woo
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.7 no.6
    • /
    • pp.1180-1184
    • /
    • 2006
  • In a speech coding system using excitation source of voiced and unvoiced, it would be a distortion of speech quality in a voiced and an unvoiced consonants in a frame. In this paper, I propose a new multi-pulse coding method make use of V/S/TSIUVC switching and TSIUVC approximation-synthesis method in order to restrict a distortion of speech quality. The TSIUVC is extracted by using the zero crossing rate and individual pitch pulse. And the TSIUVC extraction rate was 91% for female voice and 96.2% for male voice. The important thing is that the frequency information of 0.547kHz below and 2.813kHz above can be made with high quality synthesis waveform within TSIUVC. I evaluated the MPC of V/UV and FBD-MPC of V/S/TSIUVC. As a result, the synthesis speech of FBD-MPC was better in speech quality than the MPC.

  • PDF

A Study on Speech Synthesizer Using Distributed System (분산형 시스템을 적용한 음성합성에 관한 연구)

  • Kim, Jin-Woo;Min, So-Yeon;Na, Deok-Su;Bae, Myung-Jin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.29 no.3
    • /
    • pp.209-215
    • /
    • 2010
  • Recently portable terminal is received attention by wireless networks and mass capacity ROM. In this result, TTS(Text to Speech) system is inserted to portable terminal. Nevertheless high quality synthesis is difficult in portable terminal, users need high quality synthesis. In this paper, we proposed Distributed TTS (DTTS) that was composed of server and terminal. The DTTS on corpus based speech synthesis can be high quality synthesis. Synthesis system in server that generate optimized speech concatenation information after database search and transmit terminal. Synthesis system in terminal make high quality speech synthesis as low computation using transmitted speech concatenation information from server. The proposed method that can be reducing complexity, smaller power consumption and efficient maintenance.

Performance Comparison of the Speech Enhancement Methods for Noisy Speech Recognition (잡음음성인식을 위한 음성개선 방식들의 성능 비교)

  • Chung, Yong-Joo
    • Phonetics and Speech Sciences
    • /
    • v.1 no.2
    • /
    • pp.9-14
    • /
    • 2009
  • Speech enhancement methods can be generally classified into a few categories and they have been usually compared with each other in terms of speech quality. For the successful use of speech enhancement methods in speech recognition systems, performance comparisons in terms of speech recognition accuracy are necessary. In this paper, we compared the speech recognition performance of some of the representative speech enhancement algorithms which are popularly cited in the literature and used widely. We also compared the performance of speech enhancement methods with other noise robust speech recognition methods like PMC to verify the usefulness of speech enhancement approaches in noise robust speech recognition systems.

  • PDF

Separation of Voiced Sounds and Unvoiced Sounds for Corpus-based Korean Text-To-Speech (한국어 음성합성기의 성능 향상을 위한 합성 단위의 유무성음 분리)

  • Hong, Mun-Ki;Shin, Ji-Young;Kang, Sun-Mee
    • Speech Sciences
    • /
    • v.10 no.2
    • /
    • pp.7-25
    • /
    • 2003
  • Predicting the right prosodic elements is a key factor in improving the quality of synthesized speech. Prosodic elements include break, pitch, duration and loudness. Pitch, which is realized by Fundamental Frequency (F0), is the most important element relating to the quality of the synthesized speech. However, the previous method for predicting the F0 appears to reveal some problems. If voiced and unvoiced sounds are not correctly classified, it results in wrong prediction of pitch, wrong unit of triphone in synthesizing the voiced and unvoiced sounds, and the sound of click or vibration. This kind of feature is usual in the case of the transformation from the voiced sound to the unvoiced sound or from the unvoiced sound to the voiced sound. Such problem is not resolved by the method of grammar, and it much influences the synthesized sound. Therefore, to steadily acquire the correct value of pitch, in this paper we propose a new model for predicting and classifying the voiced and unvoiced sounds using the CART tool.

  • PDF

Abrupt Noise Cancellation and Speech Restoration for Speech Enhancement (음질 개선을 위한 돌발잡음 제거와 음성복원)

  • Son BeakKwon;Hahn Minsoo
    • Proceedings of the KSPS conference
    • /
    • 2003.10a
    • /
    • pp.101-104
    • /
    • 2003
  • In this paper, speech quality is improved by removing abrupt noise intervals and then substituting the gaps with estimates of the previous speech waveform. An abrupt noise detection signal has been proposed as a prediction error signal by utilizing LP coefficients of the previous frame. Abrupt noise intervals are estimated by using spectral energy. After removing estimated noise intervals, we applied several waveform substitution techniques such as zero substitution, previous frame repetition, pattern matching, and pitch waveform replication. To prove the validity of our algorithm, the LPC spectral distortion test and the recognition test are executed and, the results show that the speech quality is fairly well improved.

  • PDF

Secret Data Communication Method using Quantization of Wavelet Coefficients during Speech Communication (음성통신 중 웨이브렛 계수 양자화를 이용한 비밀정보 통신 방법)

  • Lee, Jong-Kwan
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2006.10d
    • /
    • pp.302-305
    • /
    • 2006
  • In this paper, we have proposed a novel method using quantization of wavelet coefficients for secret data communication. First, speech signal is partitioned into small time frames and the frames are transformed into frequency domain using a WT(Wavelet Transform). We quantize the wavelet coefficients and embedded secret data into the quantized wavelet coefficients. The destination regard quantization errors of received speech as seceret dat. As most speech watermark techniques have a trade off between noise robustness and speech quality, our method also have. However we solve the problem with a partial quantization and a noise level dependent threshold. In additional, we improve the speech quality with de-noising method using wavelet transform. Since the signal is processed in the wavelet domain, we can easily adapt the de-noising method based on wavelet transform. Simulation results in the various noisy environments show that the proposed method is reliable for secret communication.

  • PDF

A Study on a Analysis and Comparison of Preprocessing Technique for the Speech Compression (음성압축을 위한 전처리기법의 비교 분석에 관한 연구)

  • Jang, Kyung-A;Min, So-Yeon;Bae, Myung-Jin
    • Speech Sciences
    • /
    • v.10 no.4
    • /
    • pp.125-136
    • /
    • 2003
  • Speech coding techniques have been studied to reduce the complexity and bit rate but also to improve the sound quality. CELP type vocoder, has used as a one of standard, supports the great sound quality even low bit rate. In this paper, the preprocessing of input speech to reduce the bit rate is the different with the conventional vocoder. The different kinds of parameter are used for the preprocessing so this paper is compared with theses parameters for finding the more appropriate parameter for the vocoder. The parameters are used to synthesize the speech not to encode or decode for coding technique so we proposed the simple algorithm not to have the influence on the processing time or the computation time. The parameters in used the preprocessing step are speaking rate, duration and PSOLA technique.

  • PDF

Adaptive Playout Buffer Control Method for Improvement of VoIP Speech Quality (VoIP 통화품질 개선을 위한 적응 재생 버퍼 제어 기법)

  • Kang, Jin-Ah;Ko, Sung-Taek;Lim, Jea-Yun
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2006.11a
    • /
    • pp.75-79
    • /
    • 2006
  • In a VoIP(Voice over IP) system which support the realtime speech service, speech quality is deteriorated by the delay, the jitter, the loss, and the reversed packet order. In this thesis, APBC for receiver site is proposed, which compensate the jitter by the adaptive playout algorithm and conceal the packet loss, and align the packet order. Also, a VoIP application system is implemented, and the performance of APBC is verified on the implemented system by measuring the processing speed and the speech quality. From the result, processing speed is 257$\mu$sec, which is fast enough to deal with packet being received in realtime. Also, the speech quality by MOS(Mean Opinion Score) is improved as 18 percent compared with algorithm of fixed playout delay.

  • PDF

Adaptive Speech Streaming Based on Packet Loss Prediction Using Support Vector Machine for Software-Based Multipoint Control Unit over IP Networks

  • Kang, Jin Ah;Han, Mikyong;Jang, Jong-Hyun;Kim, Hong Kook
    • ETRI Journal
    • /
    • v.38 no.6
    • /
    • pp.1064-1073
    • /
    • 2016
  • An adaptive speech streaming method to improve the perceived speech quality of a software-based multipoint control unit (SW-based MCU) over IP networks is proposed. First, the proposed method predicts whether the speech packet to be transmitted is lost. To this end, the proposed method learns the pattern of packet losses in the IP network, and then predicts the loss of the packet to be transmitted over that IP network. The proposed method classifies the speech signal into different classes of silence, unvoiced, speech onset, or voiced frame. Based on the results of packet loss prediction and speech classification, the proposed method determines the proper amount and bitrate of redundant speech data (RSD) that are sent with primary speech data (PSD) in order to assist the speech decoder to restore the speech signals of lost packets. Specifically, when a packet is predicted to be lost, the amount and bitrate of the RSD must be increased through a reduction in the bitrate of the PSD. The effectiveness of the proposed method for learning the packet loss pattern and assigning a different speech coding rate is then demonstrated using a support vector machine and adaptive multirate-narrowband, respectively. The results show that as compared with conventional methods that restore lost speech signals, the proposed method remarkably improves the perceived speech quality of an SW-based MCU under various packet loss conditions in an IP network.