• Title/Summary/Keyword: Vector compression

Search Result 262, Processing Time 0.019 seconds

H.264/AVC to MPEG-2 Video Transcoding by using Motion Vector Clustering (움직임벡터 군집화를 이용한 H.264/AVC에서 MPEG-2로의 비디오 트랜스코딩)

  • Shin, Yoon-Jeong;Son, Nam-Rye;Nguyen, Dinh Toan;Lee, Guee-Sang
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.5 no.1
    • /
    • pp.23-30
    • /
    • 2010
  • The H.264/AVC is increasingly used in broadcast video applications such as Internet Protocol television (IPTV), digital multimedia broadcasting (DMB) because of high compression performance. But the H.264/AVC coded video can be delivered to the widespread end-user equipment for MPEG-2 after transcoding between this video standards. This paper suggests a new transcoding algorithm for H.264/AVC to MPEG-2 transcoder that uses motion vector clustering in order to reduce the complexity without loss of video quality. The proposed method is exploiting the motion information gathered during h.264 decoding stage. To reduce the search space for the MPEG-2 motion estimation, the predictive motion vector is selected with a least distortion of the candidated motion vectors. These candidate motion vectors are considering the correlation of direction and distance of motion vectors of variable blocks in H.264/AVC. And then the best predictive motion vector is refined with full-search in ${\pm}2$ pixel search area. Compared with a cascaded decoder-encoder, the proposed transcoder achieves computational complexity savings up to 64% with a similar PSNR at the constant bitrate(CBR).

Motion Vector Predictor selection method for multi-view video coding (다시점 비디오 부호화를 위한 움직임벡터 예측값 선택 방법)

  • Choi, Won-Jun;Suh, Doug-Young;Kim, Kyu-Heon;Park, Gwang-Hoon
    • Journal of Broadcast Engineering
    • /
    • v.12 no.6
    • /
    • pp.565-573
    • /
    • 2007
  • In this paper, we propose a method to select motion vector predictor by considering prediction structure of a multi view content for coding efficiency of multi view coding which is being standardized in JVT. Motion vector of a different tendency is happened while carrying out temporal and view reference prediction of multi-view video coding. Also, due to the phenomena of motion vectors being searched in both temporal and view order, the motion vectors do not agree with each other resulting a decline in coding efficiency. This paper is about how the motion vector predictor are selected with information of prediction structure. By using the proposed method, a compression ratio of the proposed method in multi-view video coding is increased, and finally $0.03{\sim}0.1$ dB PSNR(Peak Signal-to-Noise Ratio) improvement was obtained compared with the case of JMVM 3.6 method.

A study on the application of residual vector quantization for vector quantized-variational autoencoder-based foley sound generation model (벡터 양자화 변분 오토인코더 기반의 폴리 음향 생성 모델을 위한 잔여 벡터 양자화 적용 연구)

  • Seokjin Lee
    • The Journal of the Acoustical Society of Korea
    • /
    • v.43 no.2
    • /
    • pp.243-252
    • /
    • 2024
  • Among the Foley sound generation models that have recently begun to be studied, a sound generation technique using the Vector Quantized-Variational AutoEncoder (VQ-VAE) structure and generation model such as Pixelsnail are one of the important research subjects. On the other hand, in the field of deep learning-based acoustic signal compression, residual vector quantization technology is reported to be more suitable than the conventional VQ-VAE structure. Therefore, in this paper, we aim to study whether residual vector quantization technology can be effectively applied to the Foley sound generation. In order to tackle the problem, this paper applies the residual vector quantization technique to the conventional VQ-VAE-based Foley sound generation model, and in particular, derives a model that is compatible with the existing models such as Pixelsnail and does not increase computational resource consumption. In order to evaluate the model, an experiment was conducted using DCASE2023 Task7 data. The results show that the proposed model enhances about 0.3 of the Fréchet audio distance. Unfortunately, the performance enhancement was limited, which is believed to be due to the decrease in the resolution of time-frequency domains in order to do not increase consumption of the computational resources.

Wavelet-Based Image Compression Using the Properties of Subbands (대역의 특성을 이용한 웨이블렛 기반 영상 압축 부호화)

  • 박성완;강의성;문동영;고성제
    • Journal of Broadcast Engineering
    • /
    • v.1 no.2
    • /
    • pp.118-132
    • /
    • 1996
  • This paper proposes a wavelet transform- based image compression method using the energy distribution. The proposed method Involves two steps. First, we use a wavelet transform for the subband decomposition. The original image Is decomposed into one low resolution subimage and three high frequency subimages. Each high frequency subimages have horizontal, vertical, and diagonal directional edges. The wavelet transform is luther applied to these high frequency subimages. Resultant transformed subimages have different energy distributions corresponding to different orientation of the high pass filter. Second, for higer compression ratio and computational effciency, we discard some subimages with small energy. The remaining subimages are encoded using either DPCM or quantization followed by entropy coding. Experimental results show that the proposed coding scheme has better performance in the peak signal to noise ratio(PSNR) and higher compression ratio than conventional image coding method using the wavelet transform followed by the straightforward vector quantization.

  • PDF

Lossless Compression for Hyperspectral Images based on Adaptive Band Selection and Adaptive Predictor Selection

  • Zhu, Fuquan;Wang, Huajun;Yang, Liping;Li, Changguo;Wang, Sen
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.8
    • /
    • pp.3295-3311
    • /
    • 2020
  • With the wide application of hyperspectral images, it becomes more and more important to compress hyperspectral images. Conventional recursive least squares (CRLS) algorithm has great potentiality in lossless compression for hyperspectral images. The prediction accuracy of CRLS is closely related to the correlations between the reference bands and the current band, and the similarity between pixels in prediction context. According to this characteristic, we present an improved CRLS with adaptive band selection and adaptive predictor selection (CRLS-ABS-APS). Firstly, a spectral vector correlation coefficient-based k-means clustering algorithm is employed to generate clustering map. Afterwards, an adaptive band selection strategy based on inter-spectral correlation coefficient is adopted to select the reference bands for each band. Then, an adaptive predictor selection strategy based on clustering map is adopted to select the optimal CRLS predictor for each pixel. In addition, a double snake scan mode is used to further improve the similarity of prediction context, and a recursive average estimation method is used to accelerate the local average calculation. Finally, the prediction residuals are entropy encoded by arithmetic encoder. Experiments on the Airborne Visible Infrared Imaging Spectrometer (AVIRIS) 2006 data set show that the CRLS-ABS-APS achieves average bit rates of 3.28 bpp, 5.55 bpp and 2.39 bpp on the three subsets, respectively. The results indicate that the CRLS-ABS-APS effectively improves the compression effect with lower computation complexity, and outperforms to the current state-of-the-art methods.

ECG signal compression based on B-spline approximation (B-spline 근사화 기반의 심전도 신호 압축)

  • Ryu, Chun-Ha;Kim, Tae-Hun;Lee, Byung-Gook;Choi, Byung-Jae;Park, Kil-Houm
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.21 no.5
    • /
    • pp.653-659
    • /
    • 2011
  • In general, electrocardiogram(ECG) signals are sampled with a frequency over 200Hz and stored for a long time. It is required to compress data efficiently for storing and transmitting them. In this paper, a method for compression of ECG data is proposed, using by Non Uniform B-spline approximation, which has been widely used to approximation theory of applied mathematics and geometric modeling. ECG signals are compressed and reconstructed using B-spline basis function which curve has local controllability and control a shape and curve in part. The proposed method selected additional knot with each step for minimizing reconstruction error and reduced time complexity. It is established that the proposed method using B-spline approximation has good compression ratio and reconstruct besides preserving all feature point of ECG signals, through the experimental results from MIT-BIH Arrhythmia database.

Low Complexity Motion Estimation Search Method for Multi-view Video Coding (다시점 비디오 부호화를 위한 저 복잡도 움직임 추정 탐색 기법)

  • Yoon, Hyo-Sun;Kim, Mi-Young
    • Journal of Korea Multimedia Society
    • /
    • v.16 no.5
    • /
    • pp.539-548
    • /
    • 2013
  • Although Motion estimation (ME) plays an important role in digital video compression, it requires a complicated search procedure to find an optimal motion vector. Multi-view video is obtained by capturing one three-dimensional scene with many cameras at different positions. The computational complexity of motion estimation for Multi-view video coding increases in proportion to the number of cameras. To reduce computational complexity and maintain the image quality, a low complexity motion estimation search method is proposed in this paper. The proposed search method consists of four-grid diamond search patten, two-gird diamond search pattern and TZ 2 Point search pattern. These search patterns exploit the characteristics of the distribution of motion vectors to place the search points. Experiment results show that the speedup improvement of the proposed method over TZ search method (JMVC) can be up to 1.8~4.5 times faster by reducing the computational complexity and the image quality degradation is about to 0.01~0.24 (dB).

A Robust Vector Quantization Method against Distortion Outlier and Source Mismatch (이상 신호왜곡과 소스 불일치에 강인한 벡터 양자화 방법)

  • Noh, Myung-Hoon;Kim, Moo-Young
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.49 no.3
    • /
    • pp.74-80
    • /
    • 2012
  • In resolution-constrained quantization, the size of Voronoi cell varies depending on probability density function of the input data, which causes large amount of distortion outliers. We propose a vector quantization method that reduces distortion outliers by combining the generalized Lloyd algorithm (GLA) and the cell-size constrained vector quantization (CCVQ) scheme. The training data are divided into the inside and outside regions according to the size of Voronoi cell, and consequently CCVQ and GLA are applied to each region, respectively. As CCVQ is applied to the densely populated region of the source instead of GLA, the number of centroids for the outside region can be increased such that distortion outliers can be decreased. In real-world environment, source mismatch between training and test data is inevitable. For the source mismatch case, the proposed algorithm improves performance in terms of average distortion and distortion outliers.

Forensic Classification of Median Filtering by Hough Transform of Digital Image (디지털 영상의 허프 변환에 의한 미디언 필터링 포렌식 분류)

  • RHEE, Kang Hyeon
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.54 no.5
    • /
    • pp.42-47
    • /
    • 2017
  • In the distribution of digital image, the median filtering is used for a forgery. This paper proposed the algorithm of a image forensics detection for the classification of median filtering. For the solution of this grave problem, the feature vector is composed of 42-Dim. The detected quantity 32, 64 and 128 of forgery image edges, respectively, which are processed by the Hough transform, then it extracted from the start-end point coordinates of the Hough Lines. Also, the Hough Peaks of the Angle-Distance plane are extracted. Subsequently, both of the feature vectors are composed of the proposed scheme. The defined 42-Dim. feature vector is trained in SVM (Support Vector Machine) classifier for the MF classification of the forged images. The experimental results of the proposed MF detection algorithm is compared between the 10-Dim. MFR and the 686-Dim. SPAM. It confirmed that the MF forensic classification ratio of the evaluated performance is 99% above with the whole test image types: the unaltered, the average filtering ($3{\times}3$), the JPEG (QF=90 and 70)) compression, the Gaussian filtered ($3{\times}3$ and $5{\times}5$) images, respectively.

Syllable Recognition of HMM using Segment Dimension Compression (세그먼트 차원압축을 이용한 HMM의 음절인식)

  • Kim, Joo-Sung;Lee, Yang-Woo;Hur, Kang-In;Ahn, Jum-Young
    • The Journal of the Acoustical Society of Korea
    • /
    • v.15 no.2
    • /
    • pp.40-48
    • /
    • 1996
  • In this paper, a 40 dimensional segment vector with 4 frame and 7 frame width in every monosyllable interval was compressed into a 10, 14, 20 dimensional vector using K-L expansion and neural networks, and these was used to speech recognition feature parameter for CHMM. And we also compared them with CHMM added as feature parameter to the discrete duration time, the regression coefficients and the mixture distribution. In recognition test at 100 monosyllable, recognition rates of CHMM +${\bigtriangleup}$MCEP, CHMM +MIX and CHMM +DD respectively improve 1.4%, 2.36% and 2.78% over 85.19% of CHMM. And those using vector compressed by K-L expansion are less than MCEP + ${\bigtriangleup}$MCEP but those using K-L + MCEP, K-L + ${\bigtriangleup}$MCEP are almost same. Neural networks reflect more the speech dynamic variety than K-L expansion because they use the sigmoid function for the non-linear transform. Recognition rates using vector compressed by neural networks are higher than those using of K-L expansion and other methods.

  • PDF