• Title/Summary/Keyword: feature coding

Search Result 203, Processing Time 0.03 seconds

Coding History Detection of Speech Signal using Deep Neural Network (심층 신경망을 이용한 음성 신호의 부호화 이력 검출)

  • Cho, Hyo-Jin;Jang, Won;Shin, Seong-Hyeon;Park, Hochong
    • Journal of Broadcast Engineering
    • /
    • v.23 no.1
    • /
    • pp.86-92
    • /
    • 2018
  • In this paper, we propose a method for coding history detection of digital speech signal. In digital speech communication and storage, the signal is encoded to reduce the number of bits. Therefore, when a speech signal waveform is given, we need to detect its coding history so that we can determine whether the signal is an original or an coded one, and if coded, determine the number of times of coding. In this paper, we propose a coding history detection method for 12.2kbps AMR codec in terms of original, single coding, and double coding. The proposed method extracts a speech-specific feature vector from the given speech, and models the feature vector using a deep neural network. We confirm that the proposed feature vector provides better performance in coding history detection than the feature vector computed from the general spectrogram.

Low Sit Rate Image Coding using Neural Network (신경망을 이용한 저비트율 영상코딩)

  • 정연길;최승규;배철수
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2001.10a
    • /
    • pp.579-582
    • /
    • 2001
  • Vector Transformation is a new method unified vector quantization and coding. So far, codebook generation applied to coding was LBG algorithm. But using the advantage of SOFM(Self-Organizing Feature Map) based on neural network can improve a system's performance. In this paper, we generated VTC(Vector Transformation Coding) codebook applied with SOFM algorithm and compare the result for several coding rates with LBG algorithm. The problem of Vector quantization is complicated calculation and codebook generation. So, to solve this problem, we used neural network approach method.

  • PDF

Motion Compensated Subband Video Coding with Arbitrarily Shaped Region Adaptivity

  • Kwon, Oh-Jin;Choi, Seok-Rim
    • ETRI Journal
    • /
    • v.23 no.4
    • /
    • pp.190-198
    • /
    • 2001
  • The performance of Motion Compensated Discrete Cosine Transform (MC-DCT) video coding is improved by using the region adaptive subband image coding [18]. On the assumption that the video is acquired from the camera on a moving platform and the distance between the camera and the scene is large enough, both the motion of camera and the motion of moving objects in a frame are compensated. For the compensation of camera motion, a feature matching algorithm is employed. Several feature points extracted using a Sobel operator are used to compensate the camera motion of translation, rotation, and zoom. The illumination change between frames is also compensated. Motion compensated frame differences are divided into three regions called stationary background, moving objects, and newly emerging areas each of which is arbitrarily shaped. Different quantizers are used for different regions. Compared to the conventional MC-DCT video coding using block matching algorithm, our video coding scheme shows about 1.0-dB improvements on average for the experimental video samples.

  • PDF

Multiscale Spatial Position Coding under Locality Constraint for Action Recognition

  • Yang, Jiang-feng;Ma, Zheng;Xie, Mei
    • Journal of Electrical Engineering and Technology
    • /
    • v.10 no.4
    • /
    • pp.1851-1863
    • /
    • 2015
  • – In the paper, to handle the problem of traditional bag-of-features model ignoring the spatial relationship of local features in human action recognition, we proposed a Multiscale Spatial Position Coding under Locality Constraint method. Specifically, to describe this spatial relationship, we proposed a mixed feature combining motion feature and multi-spatial-scale configuration. To utilize temporal information between features, sub spatial-temporal-volumes are built. Next, the pooled features of sub-STVs are obtained via max-pooling method. In classification stage, the Locality-Constrained Group Sparse Representation is adopted to utilize the intrinsic group information of the sub-STV features. The experimental results on the KTH, Weizmann, and UCF sports datasets show that our action recognition system outperforms the classical local ST feature-based recognition systems published recently.

A Novel Feature Selection Method for Output Coding based Multiclass SVM (출력 코딩 기반 다중 클래스 서포트 벡터 머신을 위한 특징 선택 기법)

  • Lee, Youngjoo;Lee, Jeongjin
    • Journal of Korea Multimedia Society
    • /
    • v.16 no.7
    • /
    • pp.795-801
    • /
    • 2013
  • Recently, support vector machine has been widely used in various application fields due to its superiority of classification performance comparing with decision tree and neural network. Since support vector machine is basically designed for the binary classification problem, output coding method to analyze the classification result of multiclass binary classifier is used for the application of support vector machine into the multiclass problem. However, previous feature selection method for output coding based support vector machine found the features to improve the overall classification accuracy instead of improving each classification accuracy of each classifier. In this paper, we propose the novel feature selection method to find the features for maximizing the classification accuracy of each binary classifier in output coding based support vector machine. Experimental result showed that proposed method significantly improved the classification accuracy comparing with previous feature selection method.

Gradual Block-based Efficient Lossy Location Coding for Image Retrieval (영상 검색을 위한 점진적 블록 크기 기반의 효율적인 손실 좌표 압축 기술)

  • Choi, Gyeongmin;Jung, Hyunil;Kim, Haekwang
    • Journal of Broadcast Engineering
    • /
    • v.18 no.2
    • /
    • pp.319-322
    • /
    • 2013
  • Image retrieval research activity has moved its focus from global descriptors to local descriptors of feature point such as SIFT. MPEG is Currently working on standardization of effective coding of location and local descriptors of feature point in the context mobile based image search driven application in the name of MPEG-7 CDVS (Compact Descriptor for Visual Search). The extracted feature points consist of two parts, location information and Descriptor. For efficient image retrieval, we proposed a novel method that is gradual block-based efficient lossy location coding to compress location information according to distribution in images. From experimental result, the number of average bits per feature point reduce 5~6% and the accuracy rate keep compared to state of the art TM 3.0.

Optimal Facial Emotion Feature Analysis Method based on ASM-LK Optical Flow (ASM-LK Optical Flow 기반 최적 얼굴정서 특징분석 기법)

  • Ko, Kwang-Eun;Park, Seung-Min;Park, Jun-Heong;Sim, Kwee-Bo
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.21 no.4
    • /
    • pp.512-517
    • /
    • 2011
  • In this paper, we propose an Active Shape Model (ASM) and Lucas-Kanade (LK) optical flow-based feature extraction and analysis method for analyzing the emotional features from facial images. Considering the facial emotion feature regions are described by Facial Action Coding System, we construct the feature-related shape models based on the combination of landmarks and extract the LK optical flow vectors at each landmarks based on the centre pixels of motion vector window. The facial emotion features are modelled by the combination of the optical flow vectors and the emotional states of facial image can be estimated by the probabilistic estimation technique, such as Bayesian classifier. Also, we extract the optimal emotional features that are considered the high correlation between feature points and emotional states by using common spatial pattern (CSP) analysis in order to improvise the operational efficiency and accuracy of emotional feature extraction process.

Experiment on Intermediate Feature Coding for Object Detection and Segmentation

  • Jeong, Min Hyuk;Jin, Hoe-Yong;Kim, Sang-Kyun;Lee, Heekyung;Choo, Hyon-Gon;Lim, Hanshin;Seo, Jeongil
    • Journal of Broadcast Engineering
    • /
    • v.25 no.7
    • /
    • pp.1081-1094
    • /
    • 2020
  • With the recent development of deep learning, most computer vision-related tasks are being solved with deep learning-based network technologies such as CNN and RNN. Computer vision tasks such as object detection or object segmentation use intermediate features extracted from the same backbone such as Resnet or FPN for training and inference for object detection and segmentation. In this paper, an experiment was conducted to find out the compression efficiency and the effect of encoding on task inference performance when the features extracted in the intermediate stage of CNN are encoded. The feature map that combines the features of 256 channels into one image and the original image were encoded in HEVC to compare and analyze the inference performance for object detection and segmentation. Since the intermediate feature map encodes the five levels of feature maps (P2 to P6), the image size and resolution are increased compared to the original image. However, when the degree of compression is weakened, the use of feature maps yields similar or better inference results to the inference performance of the original image.

A Tree Regularized Classifier-Exploiting Hierarchical Structure Information in Feature Vector for Human Action Recognition

  • Luo, Huiwu;Zhao, Fei;Chen, Shangfeng;Lu, Huanzhang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.3
    • /
    • pp.1614-1632
    • /
    • 2017
  • Bag of visual words is a popular model in human action recognition, but usually suffers from loss of spatial and temporal configuration information of local features, and large quantization error in its feature coding procedure. In this paper, to overcome the two deficiencies, we combine sparse coding with spatio-temporal pyramid for human action recognition, and regard this method as the baseline. More importantly, which is also the focus of this paper, we find that there is a hierarchical structure in feature vector constructed by the baseline method. To exploit the hierarchical structure information for better recognition accuracy, we propose a tree regularized classifier to convey the hierarchical structure information. The main contributions of this paper can be summarized as: first, we introduce a tree regularized classifier to encode the hierarchical structure information in feature vector for human action recognition. Second, we present an optimization algorithm to learn the parameters of the proposed classifier. Third, the performance of the proposed classifier is evaluated on YouTube, Hollywood2, and UCF50 datasets, the experimental results show that the proposed tree regularized classifier obtains better performance than SVM and other popular classifiers, and achieves promising results on the three datasets.