• Title/Summary/Keyword: Predictive VQ

Search Result 10, Processing Time 0.024 seconds

A LSF Quantizer for the Wideband Speech Using the Predictive VQ-Pyramid VQ (예측 VQ-Pyramid VQ를 이용한 광대역 음성용 LSF 양자학기 설계)

  • 이강은;이인성;강상원
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.4
    • /
    • pp.333-339
    • /
    • 2004
  • This Paper proposes the vector quantizer-pyramid vector quantizer(VQ-PVQ) structure. Also both predictive structure and safety-net concept are combined into the VQ-PVQ to quantize the IPC parameter of wideband speech codec. The Performance is compared to the LPC vector quantizer used in the AMR-WB(ITU-T G.722.2). demonstrating reduction in both spectral distortion and encoding memory.

Quantization of LPC Coefficients Using a Multi-frame AR-model (Multi-frame AR model을 이용한 LPC 계수 양자화)

  • Jung, Won-Jin;Kim, Moo-Young
    • The Journal of the Acoustical Society of Korea
    • /
    • v.31 no.2
    • /
    • pp.93-99
    • /
    • 2012
  • For speech coding, a vocal tract is modeled using Linear Predictive Coding (LPC) coefficients. The LPC coefficients are typically transformed to Line Spectral Frequency (LSF) parameters which are advantageous for linear interpolation and quantization. If multidimensional LSF data are quantized directly using Vector-Quantization (VQ), high rate-distortion performance can be obtained by fully utilizing intra-frame correlation. In practice, since this direct VQ system cannot be used due to high computational complexity and memory requirement, Split VQ (SVQ) is used where a multidimensional vector is split into multilple sub-vectors for quantization. The LSF parameters also have high inter-frame correlation, and thus Predictive SVQ (PSVQ) is utilized. PSVQ provides better rate-distortion performance than SVQ. In this paper, to implement the optimal predictors in PSVQ for voice storage devices, we propose Multi-Frame AR-model based SVQ (MF-AR-SVQ) that considers the inter-frame correlations with multiple previous frames. Compared with conventional PSVQ, the proposed MF-AR-SVQ provides 1 bit gain in terms of spectral distortion without significant increase in complexity and memory requirement.

Designing a Quantizer of LPC Parameters for the Narrowband Speech Coder using Block-Constrained Trellis Coded Quantization (블록 제한 트렐리스 부호화 양자화 기법을 이용한 협대역 음성 부호화기용 LPC 계수 양자화기 설계)

  • Jun, Ja-Kyoung;Park, Sang-Kuk;Kang, Sang-Won
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.32 no.3C
    • /
    • pp.234-240
    • /
    • 2007
  • In this paper, low complexity block constrained trellis coded quantization (BC-TCQ) structures are introduced, and a predictive BC TCQ encoding method is developed for quantization of line spectrum frequencies (LSF) parameters for narrowband speech coding applications. Trellis-coded quantization(TCQ) is a form of VQ that builds the VQ codebook from interleaved constituent scalar quantization codebooks. The performance is compared to the other VQ, demonstrating reduction in spectral distortion and significant reduction in encoding complexity. The predictive BC-TCQ is about 0.47107 dB superior to the IS-641 split-VQ, 26bits/frame, in spectral distortion sense. The BC-TCQ is 64.54%, 76.93%, 2.35% of the IS-641 split-VQ, respectively, in the complexity of the additions, multiplies, comparisons.

Adaptive Predictive Image Coding of Variable Block Shapes Based on Edge Contents of Blocks (경계의 방향성에 근거를 둔 가변블록형상 적응 예측영상부호화)

  • Do, Jae-Su;Kim, Ju-Yeong;Jang, Ik-Hyeon
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.7
    • /
    • pp.2254-2263
    • /
    • 2000
  • This paper proposes an efficient predictive image-compression technique based on vector quantization of blocks of pels. In the proposed method edge contents of blocks control the selection of predictors and block shapes as well. The maximum number of bits assigned to quantizers has been in creased to 3bits/pel from 1/5bits/pel, the setting employed by forerunners in predictive vector quantization of images. This increase prevents the saturation in SNR observed in their results in high bit rates. The variable block shape is instrumental in eh reconstruction of edges. The adaptive procedure is controlled by means of he standard deviation ofp rediction errors generated by a default predictor; the standard deviation address a decision table which can be set up beforehand. eh proposed method is characterized by overall improvements in image quality over A-VQ-PE and A-DCT VQ, both of which are known for their efficient use of vector quantizers.

  • PDF

A MFCC-based CELP Speech Coder for Server-based Speech Recognition in Network Environments (네트워크 환경에서 서버용 음성 인식을 위한 MFCC 기반 음성 부호화기 설계)

  • Lee, Gil-Ho;Yoon, Jae-Sam;Oh, Yoo-Rhee;Kim, Hong-Kook
    • MALSORI
    • /
    • no.54
    • /
    • pp.27-43
    • /
    • 2005
  • Existing standard speech coders can provide speech communication of high quality while they degrade the performance of speech recognition systems that use the reconstructed speech by the coders. The main cause of the degradation is that the spectral envelope parameters in speech coding are optimized to speech quality rather than to the performance of speech recognition. For example, mel-frequency cepstral coefficient (MFCC) is generally known to provide better speech recognition performance than linear prediction coefficient (LPC) that is a typical parameter set in speech coding. In this paper, we propose a speech coder using MFCC instead of LPC to improve the performance of a server-based speech recognition system in network environments. However, the main drawback of using MFCC is to develop the efficient MFCC quantization with a low-bit rate. First, we explore the interframe correlation of MFCCs, which results in the predictive quantization of MFCC. Second, a safety-net scheme is proposed to make the MFCC-based speech coder robust to channel error. As a result, we propose a 8.7 kbps MFCC-based CELP coder. It is shown from a PESQ test that the proposed speech coder has a comparable speech quality to 8 kbps G.729 while it is shown that the performance of speech recognition using the proposed speech coder is better than that using G.729.

  • PDF

Design of a 4kb/s ACELP Codec Using the Generalized AbS Principle (Generalized AbS 구조를 이용한 4kb/s ACELP 음성 부호화기의 설계)

  • 성호상;강상원
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.7
    • /
    • pp.33-38
    • /
    • 1999
  • In this paper, we combine a generalized analysis-by-synthesis (AbS) structure and an algebraic excitation scheme to propose a new 4kb/s speech codec. This codec partly uses the structure of G.729. We design a line spectrum pair (LSP) quantizer, an adaptive codebook, and an excitation codebook to fit the 4 kb/s bit rate. The codec has a 25㎳ algorithmic delay, which corresponds to a 20㎳ frame size and a 5㎳ lookahead. At the bit rates below 4kb/s, most CELP speech codecs using the AbS principle have a drawback that results a rapid degradation of speech quality. To overcome this drawback we use the generalized AbS structure which is efficient for the low bit rate speech codec. LP coefficients are converted to LSP and quantized using a predictive 2-stage VQ. A low complexity algebraic codebook which uses shifting method is used for the fixed codebook excitation, and gains of the adaptive codebook and the fixed codebook are quantized using the VQ. To evaluate the performance of the proposed codec A-B preference tests are done with the fixed rate 8kb/s QCELP. As the result of the test, the performance of the codec is similar to that of the fixed rate 8kb/s QCELP.

  • PDF

Design of a Variable half rate speech codec (가변율 half rate 음성 부호화기의 설계)

  • 성호상
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1998.06e
    • /
    • pp.293-296
    • /
    • 1998
  • 본 논문에서는 다양한 멀티미디어 서비스를 위해 가변율 half rate 음성 부호화기를 설계하였다. 유, 무성음과 묵음의 구분을 위해 본 논문에서는 프레임 에너지와 음성 파라메터들을 이용한 효과적인 voicing 결정 알고리즘을 사용하였다. 유성음을 위한 half rate 음성 부호화기는 저속에서 좋은 특성을 보이는 generalized AbS구조를 이용하였다. LPC 계수는 LSP 계수로 변환한 후 predictive 2-stage VQ를 통해서 양자화하며, 여기 신호는 음질저하를 최소화하며 복잡도를 감소시킨 shift 방식의 대수적 고정 코드북 구조를 사용하고, 적응코드북과 여기코드북의 이득은 VQ로 양자화 하였다. 무성음을 위한 부호화기는 대부분이 유성음을 위한 부호화기와 동일하지만, 무성음에서는 피치간 상관도가 매우 낮으므로 피치 보간 방법을 사용하지 않고 개루프로 피치 lag를 찾은 후 전체 프레임에 사용한다. 1 kb/s 부호화기는 묵음 구간과 주변소음 구간에 사용되며 이 구간의 신호를 피치 성분이 미약한 주변소음들로 제한하고 이에 최적인 부음성 부호화기를 설계하였다. 최종적으로 완성된 가변율 half rate 부호화기는 voice activity factor(VAF)가 0.47인 시험음성에서 약 2.6 kb/s의 평균 전송률을 보였다. 주관적 음질 평가의 일환으로 IS-96 표준 코덱인 가변율 8 kb/s QCELP와 A-B preference 시험을 실시하였다. 시험 결과 평균전송률이 약 2배인 가변율 8 kb/s QCELP 보다 우수한 음질 성능을 보였다.

  • PDF

A study on a fast algorithm for the LSP coefficient quantization of G. 723.1 speech codec (G.723.1 음성 부호화기의 LSE 계수 양자화를 위한 고속화 알고리즘 연구)

  • Son Chang-yong;Sung Ho-sang;Kang Sang-won;Sung Yu-na
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • spring
    • /
    • pp.153-156
    • /
    • 2000
  • 본 논문에서는 멀티미디어 서비스들 중에서 음성 또는 오디오 신호를 저속으로 압축할 때 사용되는 G.723.1 부호화기의 line spectral frequency(LSF) 계수 양자화 방식을 고속으로 처리하는 알고리즘을 제안하였다. 제안된 고속탐색 방법은 LSF 계수의 순서성질을 이용하여 코드북의 탐색 범위를 줄임으로써 계산량을 크게 감소시킨다. 제안된 고속탐색 방법을 predictive split VQ(PSVQ) 구조를 갖는 G.723.1 에 적용한 결과 spectral distortion(SD) 성능 감쇄 및 추가적인 메모리 증가 없이 최적 코드벡터를 찾기 위한 코드북 탐색 과정에서 코드북의 평균 탐색 범위가 $20.1\%$ 감소했으며, 이는 additions, subtractions, multiplies 및 comparisons 수가 각각 $19.1\%$, $20.1\%$, $19.4\%$$12.2\% 감소하는 결과를 얻었다.

  • PDF

Hybrid Coding for Multi-spectral Satellite Image Compression (다중스펙트럼 위성영상 압축을 위한 복합부호화 기법)

  • Jung, Kyeong-Hoon
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.3 no.1
    • /
    • pp.1-11
    • /
    • 2000
  • The hybrid coding algorithm for multi-spectral image obtained from satellite is discussed. As the spatial and spectral resolution of satellite image are rapidly increasing, there are enormous amounts of data to be processed for computer processing and data transmission. Therefore an efficient coding algorithm is essential for multi-spectral image processing. In this paper, VQ(vector quantization), quadtree decomposition, and DCT(discrete cosine transform) are combined to compress the multi-spectral image. VQ is employed for predictive coding by using the fact that each band of multi-spectral image has the same spatial feature, and DCT is for the compression of residual image. Moreover, the image is decomposed into quadtree structure in order to allocate the data bit according to the information content within the image block to improve the coding efficiency. Computer simulation on Landsat TM image shows the validity of the proposed coding algorithm.

  • PDF

Design of a Quantization Algorithm of the Speech Feature Parameters for the Distributed Speech Recognition (분산 음성 인식 시스템을 위한 특징 계수 양자화 방식 설계)

  • Lee Joonseok;Yoon Byungsik;Kang Sangwon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.24 no.4
    • /
    • pp.217-223
    • /
    • 2005
  • In this paper, we propose a predictive block constrained trellis coded quantization (BC-TCQ) to quantize cepstral coefficients for the distributed speech recognition. For Prediction of the cepstral coefficients. the 1st order auto-regressive (AR) predictor is used. To quantize the prediction error signal effectively. we use a BC-TCQ. The performance is compared to the split vector quantizers used in the ETSI standard, demonstrating reduction in the cepstral distance and computational complexity.