• Title/Summary/Keyword: Speech coding

Search Result 303, Processing Time 0.023 seconds

A Study on a Analysis and Comparison of Preprocessing Technique for the Speech Compression (음성압축을 위한 전처리기법의 비교 분석에 관한 연구)

  • Jang, Kyung-A;Min, So-Yeon;Bae, Myung-Jin
    • Speech Sciences
    • /
    • v.10 no.4
    • /
    • pp.125-136
    • /
    • 2003
  • Speech coding techniques have been studied to reduce the complexity and bit rate but also to improve the sound quality. CELP type vocoder, has used as a one of standard, supports the great sound quality even low bit rate. In this paper, the preprocessing of input speech to reduce the bit rate is the different with the conventional vocoder. The different kinds of parameter are used for the preprocessing so this paper is compared with theses parameters for finding the more appropriate parameter for the vocoder. The parameters are used to synthesize the speech not to encode or decode for coding technique so we proposed the simple algorithm not to have the influence on the processing time or the computation time. The parameters in used the preprocessing step are speaking rate, duration and PSOLA technique.

  • PDF

On a Pitch Alteration Method using Scaling the Harmonics Compensated with the Phase for Speech Synthesis (위상 보상된 고조파 스케일링에 의한 음성합성용 피치변경법)

  • Bae, Myung-Jin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.13 no.6
    • /
    • pp.91-97
    • /
    • 1994
  • In speech processing, the waveform codings are concerned with simply preserving the waveform of signal through a redundancy reduction process. In the case of speech synthesis, the waveform codings with high quality are mainly used to the synthesis by analysis. Because the parameters of this coding are not classified as both excitation and vocal tract, it is difficult to apply the waveform coding to the synthesis by rule. Thus, in order to apply the waveform coding to synthesis by rule, it is necessary to alter the pitches. In this paper, we proposed a new pitch alteration method that can change the pitch period in waveform coding by dividing the speech signals into the vocal tract and excitation parameters. This method is a time-frequency domain method preserving the phase component of the waveform in time domain and the magnitude component in frequency domain. Thus, it is possible that the waveform coding is carried out the synthesis by rule in speech processing. In case of using the algorithm, we can obtain spectrum distortion with $2.94\%$. That is, the spectrum distortion is decreased more $5.06\%$ than that of the pitch alteration method in time domain.

  • PDF

A New MPEG Reference Model for Unified Speech and Audio Coding (통합 음성/오디오 부호화를 위한 새로운 MPEG 참조 모델)

  • Song, Jeong-Ook;Oh, Hyen-O;Kang, Hong-Goo
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.47 no.5
    • /
    • pp.74-80
    • /
    • 2010
  • Speech and audio codecs have been developed based on different type of coding technologies since they have different characteristics of signal and applications. In harmony with a convergence between broadcasting and telecommunication system, international organizations for standardization such as 3GPP and ISO/IEC MPEG have tried to compress and transmit multimedia signals using unified codecs. MPEG recently initiated an activity to standardize the USAC (Unified speech and audio coding). However, USAC RM (Reference model) software has been problematic since it has a complex hierarchy, many useless source codes and poor quality of the encoder. To solve these problems, this paper introduces a new RM software designed with an open source paradigm. It was presented at the MPEG meeting in April, 2010 and the source code was released in June.

A Variable Data Rate Speech Coding Technique Based on the Inflection Point Detection of Speech (음성의 변곡점 추출 및 전송에 기반한 가변 데이터율 음성 부호화 기법)

  • Iem, Byeong-Gwan
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.62 no.4
    • /
    • pp.562-565
    • /
    • 2013
  • A new variable rate speech coding technique is proposed. The method is based on the observation that the speech signal approximately looks linear for a very short period of time. The information transmitted is the location and data value of inflection points. If the distance between the inflection points is large, the mid point location and its data value are also delivered. Thus, the encoder transmits both the location and the data value for the inflection samples, but the location only for the non-inflection points. The location information is expressed using one bit for each sample, 0 for non-inflection and 1 for inflection point. At the receiver, using the interpolation, the decoder estimates the untransmitted sample values for non-inflection locations from the received sample values for the inflection samples. With 50 % of computational cost of the existing CVSD delta modulation, the proposed method is expected to achieve the data rate of 36 to 38 kbps and the SNR of 10 to 13 dB.

On a Multiband Nonuniform Samping Technique with a Gaussian Noise Codebook for Speech Coding (가우시안 코드북을 갖는 다중대역 비균일 음성 표본화법)

  • Chung, Hyung-Goue;Bae, Myung-Jin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.16 no.6
    • /
    • pp.110-114
    • /
    • 1997
  • When applying the nonuniform sampling to noisy speech signal, the required data rate increases to be comparable to or more than that by uniform sampling such as PCM. To solve this problem, we have proposed the waveform coding method, multiband nonuniform waveform coding(MNWC), applying the nonuniform sampling to band-separated speech signal[7]. However, the speech quality is deteriorated when it is compared to the uniform sampling method, since the high band is simply modeled as a Gaussian noise with average level. In this paper, as a good method to overcome this drawback, the high band is modeled as one of 16 codewords having different center frequencies. By doing this, with maintaining high speech quality as MOS score of average 3.16, the proposed method achieves 1.5 times higher compression ratio than that of the conventional nonuniform sampling method(CNSM).

  • PDF

Design of Channel Coding Combined with 2.4kbps EHSX Coder (2.4kbps EHSX 음성부호화기와 결합된 채널코딩 방법)

  • Lee, Chang-Hwan;Kim, Young-Joon;Lee, In-Sung
    • The Journal of the Korea Contents Association
    • /
    • v.10 no.9
    • /
    • pp.88-96
    • /
    • 2010
  • We propose the efficient channel coding method combined with a 2.4kbps speech coder. The code rate of a channel coder is given by 1/2 and 1/2 rate convolutional coder is obtained from the punctured convolutional coder with rate of 1/3. The punctured convolutional coder is used for a variable rate allocation. The puncturing method according to the importance of the output data of the source encoder is applied for the convolutional coder. The importance of output data is analyzed by evaluating the bit error sensitivity of speech parameter bits. The performance of proposed coder is analyzed and simulated in Rayleigh fading channel and AWGN channel. The experimental results with 2.4kbps EHSX coder show that the variable rate channel coding method is superior to non-variable channel coding method from the subjective speech quality.

Efficient TTS Database Compression Based on AMR-WB Speech Coder (AMR-WB 음성 부호화기를 이용한 TTS 데이터베이스의 효율적인 압축 기법)

  • Lim, jong-Wook;Kim, Ki-Chul;Kim, Kyeong-Sun;Lee, Hang-Seop;Park, Hae-Young;Kim, Moo-Young
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.3
    • /
    • pp.290-297
    • /
    • 2009
  • This paper presents an improved adaptive multi-rate wideband (AMR-WB) algorithm for the efficient Text-To-Speech (TTS) database compression. The proposed algorithm includes unnecessary common bit-stream (CBS) removal and parameter delta coding combined with speaker-dependent huffman coding to reduce the required bit-rate without any quality degradation. We also propose lossy coding schemes to produce the maximum bit-rate reduction with negligible quality degradation. The proposed lossless algorithm including CBS removal can reduce bit-rate by 12.40% without quality degradation compared with the 12.65 kbps AMR-WB mode. The proposed lossy algorithm can reduce bit-rate by 20.00% with 0.12 PESQ degradation.

A Study on a Searching, Extraction and Approximation-Synthesis of Transition Segment in Continuous Speech (연속음성에서 천이구간의 탐색, 추출, 근사합성에 관한 연구)

  • Lee, Si-U
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.4
    • /
    • pp.1299-1304
    • /
    • 2000
  • In a speed coding system using excitation source of voiced and unvoiced, it would be involved a distortion of speech quality in case coexist with a voiced and an unvoiced consonants in a frame. So, I propose TSIUVC(Transition Segment Including UnVoiced Consonant) searching, extraction ad approximation-synthesis method in order to uncoexistent with a voiced and unvoiced consonants in a frame. This method based on a zerocrossing rate and pitch detector using FIR-STREAK Digital Filter. As a result, the extraction rates of TSIUVC are 84.8% (plosive), 94.9%(fricative), 92.3%(affricative) in female voice, and 88%(plosive), 94.9%(fricative), 92.3%(affricative) in male voice respectively, Also, I obain a high quality approximation-synthesis waveforms within TSIUVC by using frequency information of 0.547kHz below and 2.813kHz above. This method has the capability of being applied to speech coding of low bit rate, speech analysis and speech synthesis.

  • PDF

Modified Generic Mode Coding Scheme for Enhanced Sound Quality of G.718 SWB (G.718 초광대역 코덱의 음질 향상을 위한 개선된 Generic Mode Coding 방법)

  • Cho, Keun-Seok;Jeong, Sang-Bae
    • Phonetics and Speech Sciences
    • /
    • v.4 no.3
    • /
    • pp.119-125
    • /
    • 2012
  • This paper describes a new algorithm for encoding spectral shape and envelope in the generic mode of G.718 super-wide band (SWB). In the G.718 SWB coder, generic mode coding and sinusoidal enhancement are used for the quantization of modified discrete cosine transform (MDCT)-based parameters in the high frequency band. In the generic mode, the high frequency band is divided into sub-bands and for every sub-band the most similar match with the selected similarity criteria is searched from the coded and envelope normalized wideband content. In order to improve the quantization scheme in high frequency region of speech/audio signals, the modified generic mode by the improvement of the generic mode in G.718 SWB is proposed. In the proposed generic mode, perceptual vector quantization of spectral envelopes and the resolution increase for spectral copy are used. The performance of the proposed algorithm is evaluated in terms of objective quality. Experimental results show that the proposed algorithm increases the quality of sounds significantly.