• Title/Summary/Keyword: Encoder Model

Search Result 354, Processing Time 0.023 seconds

A group-wise attention based decoder for lightweight salient object detection on edge-devices (엣지 디바이스에서 객체 탐지를 위한 그룹별 어탠션 기반 경량 디코더 연구)

  • Thien-Thu Ngo;Md Delowar Hossain;Eui-Nam Huh
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.11a
    • /
    • pp.30-33
    • /
    • 2023
  • The recent scholarly focus has been directed towards the expeditious and accurate detection of salient objects, a task that poses considerable challenges for resource-limited edge devices due to the high computational demands of existing models. To mitigate this issue, some contemporary research has favored inference speed at the expense of accuracy. In an effort to reconcile the intrinsic trade-off between accuracy and computational efficiency, we present novel model for salient object detection. Our model incorporate group-wise attentive module within the decoder of the encoder-decoder framework, with the aim of minimizing computational overhead while preserving detection accuracy. Additionally, the proposed architectural design employs attention mechanisms to generate boundary information and semantic features pertinent to the salient objects. Through various experimentation across five distinct datasets, we have empirically substantiated that our proposed models achieve performance metrics comparable to those of computationally intensive state-of-the-art models, yet with a marked reduction in computational complexity.

MRU-Net: A remote sensing image segmentation network for enhanced edge contour Detection

  • Jing Han;Weiyu Wang;Yuqi Lin;Xueqiang LYU
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.12
    • /
    • pp.3364-3382
    • /
    • 2023
  • Remote sensing image segmentation plays an important role in realizing intelligent city construction. The current mainstream segmentation networks effectively improve the segmentation effect of remote sensing images by deeply mining the rich texture and semantic features of images. But there are still some problems such as rough results of small target region segmentation and poor edge contour segmentation. To overcome these three challenges, we propose an improved semantic segmentation model, referred to as MRU-Net, which adopts the U-Net architecture as its backbone. Firstly, the convolutional layer is replaced by BasicBlock structure in U-Net network to extract features, then the activation function is replaced to reduce the computational load of model in the network. Secondly, a hybrid multi-scale recognition module is added in the encoder to improve the accuracy of image segmentation of small targets and edge parts. Finally, test on Massachusetts Buildings Dataset and WHU Dataset the experimental results show that compared with the original network the ACC, mIoU and F1 value are improved, and the imposed network shows good robustness and portability in different datasets.

Denoising Diffusion Null-space Model and Colorization based Image Compression

  • Indra Imanuel;Dae-Ki Kang;Suk-Ho Lee
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.16 no.2
    • /
    • pp.22-30
    • /
    • 2024
  • Image compression-decompression methods have become increasingly crucial in modern times, facilitating the transfer of high-quality images while minimizing file size and internet traffic. Historically, early image compression relied on rudimentary codecs, aiming to compress and decompress data with minimal loss of image quality. Recently, a novel compression framework leveraging colorization techniques has emerged. These methods, originally developed for infusing grayscale images with color, have found application in image compression, leading to colorization-based coding. Within this framework, the encoder plays a crucial role in automatically extracting representative pixels-referred to as color seeds-and transmitting them to the decoder. The decoder, utilizing colorization methods, reconstructs color information for the remaining pixels based on the transmitted data. In this paper, we propose a novel approach to image compression, wherein we decompose the compression task into grayscale image compression and colorization tasks. Unlike conventional colorization-based coding, our method focuses on the colorization process rather than the extraction of color seeds. Moreover, we employ the Denoising Diffusion Null-Space Model (DDNM) for colorization, ensuring high-quality color restoration and contributing to superior compression rates. Experimental results demonstrate that our method achieves higher-quality decompressed images compared to standard JPEG and JPEG2000 compression schemes, particularly in high compression rate scenarios.

A Study on the Sensorless Speed Control of Permanent Magnet Direct Current Motor (영구자석 직류전동기의 센서리스 속도제어에 관한 연구)

  • Oh, Sae-Gin;Kim, Hyun-Chel;Kim, Jong-Su;Yoon, Kyoung-Kuk
    • Journal of Advanced Marine Engineering and Technology
    • /
    • v.36 no.5
    • /
    • pp.694-699
    • /
    • 2012
  • This paper proposes a new sensorless speed control scheme of permanent magnet DC motor using a numerical model and hysteresis controller, which requires neither shaft encoder, speed estimator nor PI controllers. By supplying the identical instantaneous voltage to both model and motor in the direction of reducing torque difference, the rotor speed approaches to the model speed, namely setting value and the system can control motor speed precisely. As the numerical model whose electric parameters are the same as those of the actual motor is adopted, the armature rotating speed can be converged to the setting value by controlling torque on both sides to be equalized. And the hysteresis controller controls torque by restricting the torque errors within respective hysteresis bands, and motor torque are controlled by the armature voltage. The experiment results indicate good speed and load responses from the low speed range to the high, show accurate speed changing performance.

Parallel Injection Method for Improving Descriptive Performance of Bi-GRU Image Captions (Bi-GRU 이미지 캡션의 서술 성능 향상을 위한 Parallel Injection 기법 연구)

  • Lee, Jun Hee;Lee, Soo Hwan;Tae, Soo Ho;Seo, Dong Hoan
    • Journal of Korea Multimedia Society
    • /
    • v.22 no.11
    • /
    • pp.1223-1232
    • /
    • 2019
  • The injection is the input method of the image feature vector from the encoder to the decoder. Since the image feature vector contains object details such as color and texture, it is essential to generate image captions. However, the bidirectional decoder model using the existing injection method only inputs the image feature vector in the first step, so image feature vectors of the backward sequence are vanishing. This problem makes it difficult to describe the context in detail. Therefore, in this paper, we propose the parallel injection method to improve the description performance of image captions. The proposed Injection method fuses all embeddings and image vectors to preserve the context. Also, We optimize our image caption model with Bidirectional Gated Recurrent Unit (Bi-GRU) to reduce the amount of computation of the decoder. To validate the proposed model, experiments were conducted with a certified image caption dataset, demonstrating excellence in comparison with the latest models using BLEU and METEOR scores. The proposed model improved the BLEU score up to 20.2 points and the METEOR score up to 3.65 points compared to the existing caption model.

PC-SAN: Pretraining-Based Contextual Self-Attention Model for Topic Essay Generation

  • Lin, Fuqiang;Ma, Xingkong;Chen, Yaofeng;Zhou, Jiajun;Liu, Bo
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.8
    • /
    • pp.3168-3186
    • /
    • 2020
  • Automatic topic essay generation (TEG) is a controllable text generation task that aims to generate informative, diverse, and topic-consistent essays based on multiple topics. To make the generated essays of high quality, a reasonable method should consider both diversity and topic-consistency. Another essential issue is the intrinsic link of the topics, which contributes to making the essays closely surround the semantics of provided topics. However, it remains challenging for TEG to fill the semantic gap between source topic words and target output, and a more powerful model is needed to capture the semantics of given topics. To this end, we propose a pretraining-based contextual self-attention (PC-SAN) model that is built upon the seq2seq framework. For the encoder of our model, we employ a dynamic weight sum of layers from BERT to fully utilize the semantics of topics, which is of great help to fill the gap and improve the quality of the generated essays. In the decoding phase, we also transform the target-side contextual history information into the query layers to alleviate the lack of context in typical self-attention networks (SANs). Experimental results on large-scale paragraph-level Chinese corpora verify that our model is capable of generating diverse, topic-consistent text and essentially makes improvements as compare to strong baselines. Furthermore, extensive analysis validates the effectiveness of contextual embeddings from BERT and contextual history information in SANs.

Motion Estimation and Mode Decision Algorithm for Very Low-complexity H.264/AVC Video Encoder (초저복잡도 H.264 부호기의 움직임 추정 및 모드 결정 알고리즘)

  • Yoo Youngil;Kim Yong Tae;Lee Seung-Jun;Kang Dong Wook;Kim Ki-Doo
    • Journal of Broadcast Engineering
    • /
    • v.10 no.4 s.29
    • /
    • pp.528-539
    • /
    • 2005
  • The H.264 has been adopted as the video codec for various multimedia services such as DMB and next-generation DVD because of its superior coding performance. However, the reference codec of the standard, the joint model (JM) contains quite a few algorithms which are too complex to be used for the resource-constraint embedded environment. This paper introduces very low-complexity H.264 encoding algorithm which is applicable for the embedded environment. The proposed algorithm was realized by restricting some coding tools on the basis that it should not cause too severe degradation of RD-performance and adding a few early termination and bypass conditions during the motion estimation and mode decision process. In case of encoding of 7.5fps QCIF sequence with 64kbpswith the proposed algorithm, the encoder yields worse PSNRs by 0.4 dB than the standard JM, but requires only $15\%$ of computational complexity and lowers the required memory and power consumption drastically. By porting the proposed H.264 codec into the PDA with Intel PXA255 Processor, we verified the feasibility of the H.264 based MMS(Multimedia Messaging Service) on PDA.

Design of QDI Model Based Encoder/Decoder Circuits for Low Delay-Power Product Data Transfers in GALS Systems (GALS 시스템에서의 저비용 데이터 전송을 위한 QDI모델 기반 인코더/디코더 회로 설계)

  • Oh Myeong-Hoon
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.43 no.1 s.343
    • /
    • pp.27-36
    • /
    • 2006
  • Conventional delay-insensitive (DI) data encodings usually require 2N+1 wires for transferring N-bit. To reduce complexity and power dissipation of wires in designing a large scaled chip, an encoder and a decoder circuits, where N-bit data transfer can be peformed with only N+l wires, are proposed. These circuits are based on a quasi delay-insensitive (QDI) model and designed by using current-mode multiple valued logic (CMMVL). The effectiveness of the proposed data transfer mechanism is validated by comparisons with conventional data transfer mechanisms using dual-rail and 1-of-4 encodings through simulation at the 0.25 um CMOS technology. In general, simulation results with wire lengths of 4 mm or larger show that the CMMVL scheme significantly reduces delay-power product ($D{\ast}P$) values of the dual-rail encoding with data rate of 5 MHz or more and the 1-of-4 encoding with data rate of 18 MHz or more. In addition, simulation results using the buffer-inserted dual-rail and 1-of-4 encodings for high performance with the wire length of 10 mm and 32-bit data demonstrate that the proposed CMMVL scheme reduces the D*P values of the dual-rail encoding with data rate of 4 MHz or more and 1-of-4 encoding with data rate of 25 MHz or more by up to $57.7\%\;and\;17.9\%,$ respectively.

An Efficient Parallelization Implementation of PU-level ME for Fast HEVC Encoding (고속 HEVC 부호화를 위한 효율적인 PU레벨 움직임예측 병렬화 구현)

  • Park, Soobin;Choi, Kiho;Park, Sang-Hyo;Jang, Euee Seon
    • Journal of Broadcast Engineering
    • /
    • v.18 no.2
    • /
    • pp.178-184
    • /
    • 2013
  • In this paper, we propose an efficient parallelization technique of PU-level motion estimation (ME) in the next generation video coding standard, high efficiency video coding (HEVC) to reduce the time complexity of video encoding. It is difficult to encode video in real-time because ME has significant complexity (i.e., 80 percent at the encoder). In order to solve this problem, various techniques have been studied, and among them is the parallelization, which is carefully concerned in algorithm-level ME design. In this regard, merge estimation method using merge estimation region (MER) that enables ME to be designed in parallel has been proposed; but, parallel ME based on MER has still unconsidered problems to be implemented ideally in HEVC test model (HM). Therefore, we propose two strategies to implement stable parallel ME using MER in HM. Through experimental results, the excellence of our proposed methods is shown; the encoding time using the proposed method is reduced by 25.64 percent on average of that of HM which uses sequential ME.

Financial Market Prediction and Improving the Performance Based on Large-scale Exogenous Variables and Deep Neural Networks (대규모 외생 변수 및 Deep Neural Network 기반 금융 시장 예측 및 성능 향상)

  • Cheon, Sung Gil;Lee, Ju Hong;Choi, Bum Ghi;Song, Jae Won
    • Smart Media Journal
    • /
    • v.9 no.4
    • /
    • pp.26-35
    • /
    • 2020
  • Attempts to predict future stock prices have been studied steadily since the past. However, unlike general time-series data, financial time-series data has various obstacles to making predictions such as non-stationarity, long-term dependence, and non-linearity. In addition, variables of a wide range of data have limitations in the selection by humans, and the model should be able to automatically extract variables well. In this paper, we propose a 'sliding time step normalization' method that can normalize non-stationary data and LSTM autoencoder to compress variables from all variables. and 'moving transfer learning', which divides periods and performs transfer learning. In addition, the experiment shows that the performance is superior when using as many variables as possible through the neural network rather than using only 100 major financial variables and by using 'sliding time step normalization' to normalize the non-stationarity of data in all sections, it is shown to be effective in improving performance. 'moving transfer learning' shows that it is effective in improving the performance in long test intervals by evaluating the performance of the model and performing transfer learning in the test interval for each step.