• Title/Summary/Keyword: Encoder Model

Search Result 354, Processing Time 0.026 seconds

The Implementation of Multi-Channel Audio Codec for Real-Time operation (실시간 처리를 위한 멀티채널 오디오 코덱의 구현)

  • Hong, Jin-Woo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.14 no.2E
    • /
    • pp.91-97
    • /
    • 1995
  • This paper describes the implementation of a multi-channel audio codec for HETV. This codec has the features of the 3/2-stereo plus low frequency enhancement, downward compatibility with the smaller number of channels, backward compatibility with the existing 2/0-stereo system(MPEG-1 audio), and multilingual capability. The encoder of this codec consists of 6-channel analog audio input part with the sampling rate of 48 kHz, 4-channel digital audio input part and three TMS320C40 /DSPs. The encoder implements multi-channel audio compression using a human perceptual psychoacoustic model, and has the bit rate reduction to 384 kbit/s without impairment of subjective quality. The decoder consists of 6-channel analog audio output part, 4-channel digital audio output part, and two TMS320C40 DSPs for a decoding procedure. The decoder analyzes the bit stream received with bit rate of 384 kbit/s from the encoder and reproduces the multi-channel audio signals for analog and digital outputs. The multi-processing of this audio codec using multiple DSPs is ensured by high speed transfer of date between DSPs through coordinating communication port activities with DMA coprocessors. Finally, some technical considerations are suggested to realize the problem of real-time operation, which are found out through the implementation of this codec using the MPEG-2 layer II sudio coding algorithm and the use of the hardware architecture with commercial multiple DSPs.

  • PDF

A binary adaptive arithmetic coding algorithm based on adaptive symbol changes for lossless medical image compression (무손실 의료 영상 압축을 위한 적응적 심볼 교환에 기반을 둔 이진 적응 산술 부호화 방법)

  • 지창우;박성한
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.22 no.12
    • /
    • pp.2714-2726
    • /
    • 1997
  • In this paper, adaptive symbol changes-based medical image compression method is presented. First, the differenctial image domain is obtained using the differentiation rules or obaptive predictors applied to original mdeical image. Also, the algorithm determines the context associated with the differential image from the domain. Then prediction symbols which are thought tobe the most probable differential image values are maintained at a high value through the adaptive symbol changes procedure based on estimates of the symbols with polarity coincidence between the differential image values to be coded under to context and differential image values in the model template. At the coding step, the differential image values are encoded as "predicted" or "non-predicted" by the binary adaptive arithmetic encoder, where a binary decision tree is employed. The simlation results indicate that the prediction hit ratios of differential image values using the proposed algorithm improve the coding gain by 25% and 23% than arithmetic coder with ISO JPEG lossless predictor and arithmetic coder with differentiation rules or adaptive predictors, respectively. It can be used in compression part of medical PACS because the proposed method allows the encoder be directly applied to the full bit-planes medical image without a decomposition of the full bit-plane into a series of binary bit-planes as well as lower complexity of encoder through using an additions when sub-dividing recursively unit intervals.

  • PDF

A Study on Attention Mechanism in DeepLabv3+ for Deep Learning-based Semantic Segmentation (딥러닝 기반의 Semantic Segmentation을 위한 DeepLabv3+에서 강조 기법에 관한 연구)

  • Shin, SeokYong;Lee, SangHun;Han, HyunHo
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.10
    • /
    • pp.55-61
    • /
    • 2021
  • In this paper, we proposed a DeepLabv3+ based encoder-decoder model utilizing an attention mechanism for precise semantic segmentation. The DeepLabv3+ is a semantic segmentation method based on deep learning and is mainly used in applications such as autonomous vehicles, and infrared image analysis. In the conventional DeepLabv3+, there is little use of the encoder's intermediate feature map in the decoder part, resulting in loss in restoration process. Such restoration loss causes a problem of reducing segmentation accuracy. Therefore, the proposed method firstly minimized the restoration loss by additionally using one intermediate feature map. Furthermore, we fused hierarchically from small feature map in order to effectively utilize this. Finally, we applied an attention mechanism to the decoder to maximize the decoder's ability to converge intermediate feature maps. We evaluated the proposed method on the Cityscapes dataset, which is commonly used for street scene image segmentation research. Experiment results showed that our proposed method improved segmentation results compared to the conventional DeepLabv3+. The proposed method can be used in applications that require high accuracy.

New Hybrid Approach of CNN and RNN based on Encoder and Decoder (인코더와 디코더에 기반한 합성곱 신경망과 순환 신경망의 새로운 하이브리드 접근법)

  • Jongwoo Woo;Gunwoo Kim;Keunho Choi
    • Information Systems Review
    • /
    • v.25 no.1
    • /
    • pp.129-143
    • /
    • 2023
  • In the era of big data, the field of artificial intelligence is showing remarkable growth, and in particular, the image classification learning methods by deep learning are becoming an important area. Various studies have been actively conducted to further improve the performance of CNNs, which have been widely used in image classification, among which a representative method is the Convolutional Recurrent Neural Network (CRNN) algorithm. The CRNN algorithm consists of a combination of CNN for image classification and RNNs for recognizing time series elements. However, since the inputs used in the RNN area of CRNN are the flatten values extracted by applying the convolution and pooling technique to the image, pixel values in the same phase in the image appear in different order. And this makes it difficult to properly learn the sequence of arrangements in the image intended by the RNN. Therefore, this study aims to improve image classification performance by proposing a novel hybrid method of CNN and RNN applying the concepts of encoder and decoder. In this study, the effectiveness of the new hybrid method was verified through various experiments. This study has academic implications in that it broadens the applicability of encoder and decoder concepts, and the proposed method has advantages in terms of model learning time and infrastructure construction costs as it does not significantly increase complexity compared to conventional hybrid methods. In addition, this study has practical implications in that it presents the possibility of improving the quality of services provided in various fields that require accurate image classification.

Wyner-Ziv Video Compression using Noise Model Selection (잡음 모델 선택을 이용한 Wyner-Ziv 비디오 압축)

  • Park, Chun-Ho;Shim, Hiuk-Jae;Jeon, Byeung-Woo
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.46 no.4
    • /
    • pp.58-66
    • /
    • 2009
  • Recently the emerging demands of the light-video encoder promotes lots of research efforts on DVC (Distributed Video Coding). As an appropriate video compression method, DVC has been studied, and Wyner-Ziv (WZ) video compression is its one representative structure. The WZ encoder splits the image into two kinds of frames, one is key frame which is compressed by conventional intra coding, and the other is WZ frame which is encoded by WZ coding. The WZ decoder decodes the key frame first, and estimates the WZ frame using temporal correlation between key frames. Estimated WZ frame (Side Information) cannot be the same as the original WZ frame due to the absence of the WZ frame information at decoder. As a result, the difference between the estimated and original WZ frames are regarded as virtual channel noise. The WZ frame is reconstructed by removing noise in side information. Therefore precise noise estimation produces good performance gain in WZ video compression by improving error correcting capability by channel code. But noise cannot be estimated precisely at WZ decoder unless there is good WZ frame information, and generally it is estimated from the difference of corresponding key frames. Also the estimated noise is limited by comparing with frame level noise to reduce the uncertainty of the estimation method. However these methods cannot provide good noise estimation for every frame or each bit plane. In this paper, we propose a noise nodel selection method which chooses a better noise model for each bit plane after generating candidate noise models. Experimental result shows PSNR gain up to 0.8 dB.

Multimodal Sentiment Analysis Using Review Data and Product Information (리뷰 데이터와 제품 정보를 이용한 멀티모달 감성분석)

  • Hwang, Hohyun;Lee, Kyeongchan;Yu, Jinyi;Lee, Younghoon
    • The Journal of Society for e-Business Studies
    • /
    • v.27 no.1
    • /
    • pp.15-28
    • /
    • 2022
  • Due to recent expansion of online market such as clothing, utilizing customer review has become a major marketing measure. User review has been used as a tool of analyzing sentiment of customers. Sentiment analysis can be largely classified with machine learning-based and lexicon-based method. Machine learning-based method is a learning classification model referring review and labels. As research of sentiment analysis has been developed, multi-modal models learned by images and video data in reviews has been studied. Characteristics of words in reviews are differentiated depending on products' and customers' categories. In this paper, sentiment is analyzed via considering review data and metadata of products and users. Gated Recurrent Unit (GRU), Long Short-Term Memory (LSTM), Self Attention-based Multi-head Attention models and Bidirectional Encoder Representation from Transformer (BERT) are used in this study. Same Multi-Layer Perceptron (MLP) model is used upon every products information. This paper suggests a multi-modal sentiment analysis model that simultaneously considers user reviews and product meta-information.

A Case Study on Intelligent Surveillance System for Urban Transit Environment (도시철도 환경에서 지능형 감시 시스템 구축 사례)

  • Chang, Il-Sik;An, Tae-Ki;Cho, Byeong-Mok;Park, Goo-Man
    • Proceedings of the KSR Conference
    • /
    • 2011.05a
    • /
    • pp.1722-1728
    • /
    • 2011
  • The security issue in urban transit system has been widely considered as the common matters after the fire accident at Daegu subway station. The safe urban transit system is highly demanded because of the vast number of daily passengers, and it is one of the most challenging projects. We introduced a test model for integrated security system for urban transit system and built it at a subway station to demonstrate its performance. This system consists of cameras, sensor network and central monitoring software. We described the smart camera functionality in more detail. The proposed smart camera includes the moving objects recognition module, video analytics, video encoder and server module that transmits video and audio information.

  • PDF

Complexity Reduction of an Adaptive Loop Filter Based on Local Homogeneity

  • Li, Xiang;Ahn, Yongjo;Sim, Donggyu
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.6 no.2
    • /
    • pp.93-101
    • /
    • 2017
  • This paper proposes an algorithm for adaptive loop filter (ALF) complexity reduction in the decoding process. In the original ALF algorithm, filtering for I frames is performed in the frame unit, and thus, all of the pixels in a frame are filtered if the current frame is an I frame. The proposed algorithm is designed on top of the local gradient calculation. On both the encoder side and the decoder side, homogeneous areas are checked and skipped in the filtering process, and the filter coefficient calculation is only performed in the inhomogeneous areas. The proposed algorithm is implemented in Joint Exploration Model (JEM) version 3.0 future video coding reference software. The proposed algorithm is applied for frame-level filtering and intra configuration. Compared with the JEM 3.0 anchor, the proposed algorithm has 0.31%, 0.76% and 0.73% bit rate loss for luma (Y) and chroma (U and V), respectively, with about an 8% decrease in decoding time.

A Deep Learning-Based Rate Control for HEVC Intra Coding

  • Marzuki, Ismail;Sim, Donggyu
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2019.11a
    • /
    • pp.180-181
    • /
    • 2019
  • This paper proposes a rate control algorithm for intra coding frame in HEVC encoder using a deep learning approach. The proposed algorithm is designed for CTU level bit allocation in intra frame by considering visual features spatially and temporally. Our features are generated using visual geometry group (VGG-16) with deep convolutional layers, then it is used for bit allocation per each CTU within an intra frame. According to our experiments, the proposed algorithm can achieve -2.04% Luma component BD-rate gain with minimal bit accuracy loss against the HM-16.20 rate control model.

  • PDF

MPEG Audio Layer-III Encoder Using Approximated Psy-choacoustic Model (간략화된 심리음향모델을 이용한 MPEG Audio Layer-III 부호화기)

  • 송창준;오현오;박영철;윤대희
    • Proceedings of the IEEK Conference
    • /
    • 2001.09a
    • /
    • pp.469-472
    • /
    • 2001
  • MPEC Audio Layer-III(MP3)알고리듬은 복호화기에 비해 부호화기가 월등히 많은 연산량을 가지고 있는 비대칭 구조를 가지고 있다. MP3 부호화기의 대부분의 연산량은 복잡한 초월함수 연산이 포함되는 심리음향모델과 반복 루프 과정을 수행하는 비선형 양자화와 비트 할당과정 이 차지한다. 본 논문에서는 MP3 부호화기의 실시간 구현을 위한 알고리듬 레벨의 최적화를 수행하였다. MP3 부호화기의 연산량을 줄이기 위해 심리음향모델을 간략화하고 반복 루프의 회수를 최소화할 수 있는 방법을 제안하였다. 프레임당 한 그래뉼의 심리음향모델 정보를 계산하여 한 프레임 내에서의 심리음향모델 정보를 추정함으로써 연산량을 45% 이상 감소시켰다. 또한 외부 반복 루프의 반복 회수를 줄이기 위하여 외부 반복 루프의 반복에 따른 스케일 팩터(Scale Factor) 및 양자화 스탭의 증가 패턴을 관찰하고 최적화된 스캐일 팩터 증가 방법을 제안하였다. 제안된 고속화 방법은 주관적 음질 평가를 통해 성능을 검증하였다.

  • PDF