• Title/Summary/Keyword: Encoder Model

Search Result 354, Processing Time 0.026 seconds

DP-LinkNet: A convolutional network for historical document image binarization

  • Xiong, Wei;Jia, Xiuhong;Yang, Dichun;Ai, Meihui;Li, Lirong;Wang, Song
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.5
    • /
    • pp.1778-1797
    • /
    • 2021
  • Document image binarization is an important pre-processing step in document analysis and archiving. The state-of-the-art models for document image binarization are variants of encoder-decoder architectures, such as FCN (fully convolutional network) and U-Net. Despite their success, they still suffer from three limitations: (1) reduced feature map resolution due to consecutive strided pooling or convolutions, (2) multiple scales of target objects, and (3) reduced localization accuracy due to the built-in invariance of deep convolutional neural networks (DCNNs). To overcome these three challenges, we propose an improved semantic segmentation model, referred to as DP-LinkNet, which adopts the D-LinkNet architecture as its backbone, with the proposed hybrid dilated convolution (HDC) and spatial pyramid pooling (SPP) modules between the encoder and the decoder. Extensive experiments are conducted on recent document image binarization competition (DIBCO) and handwritten document image binarization competition (H-DIBCO) benchmark datasets. Results show that our proposed DP-LinkNet outperforms other state-of-the-art techniques by a large margin. Our implementation and the pre-trained models are available at https://github.com/beargolden/DP-LinkNet.

KI-HABS: Key Information Guided Hierarchical Abstractive Summarization

  • Zhang, Mengli;Zhou, Gang;Yu, Wanting;Liu, Wenfen
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.12
    • /
    • pp.4275-4291
    • /
    • 2021
  • With the unprecedented growth of textual information on the Internet, an efficient automatic summarization system has become an urgent need. Recently, the neural network models based on the encoder-decoder with an attention mechanism have demonstrated powerful capabilities in the sentence summarization task. However, for paragraphs or longer document summarization, these models fail to mine the core information in the input text, which leads to information loss and repetitions. In this paper, we propose an abstractive document summarization method by applying guidance signals of key sentences to the encoder based on the hierarchical encoder-decoder architecture, denoted as KI-HABS. Specifically, we first train an extractor to extract key sentences in the input document by the hierarchical bidirectional GRU. Then, we encode the key sentences to the key information representation in the sentence level. Finally, we adopt key information representation guided selective encoding strategies to filter source information, which establishes a connection between the key sentences and the document. We use the CNN/Daily Mail and Gigaword datasets to evaluate our model. The experimental results demonstrate that our method generates more informative and concise summaries, achieving better performance than the competitive models.

AI photo storyteller based on deep encoder-decoder architecture (딥인코더-디코더 기반의 인공지능 포토 스토리텔러)

  • Min, Kyungbok;Dang, L. Minh;Lee, Sujin;Moon, Hyeonjoon
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2019.10a
    • /
    • pp.931-934
    • /
    • 2019
  • Research using artificial intelligence to generate captions for an image has been studied extensively. However, these systems are unable to create creative stories that include more than one sentence based on image content. A story is a better way that humans use to foster social cooperation and develop social norms. This paper proposes a framework that can generate a relatively short story to describe based on the context of an image. The main contributions of this paper are (1) An unsupervised framework which uses recurrent neural network structure and encoder-decoder model to construct a short story for an image. (2) A huge English novel dataset, including horror and romantic themes that are manually collected and validated. By investigating the short stories, the proposed model proves that it can generate more creative contents compared to existing intelligent systems which can produce only one concise sentence. Therefore, the framework demonstrated in this work will trigger the research of a more robust AI story writer and encourages the application of the proposed model in helping story writer find a new idea.

Semi-Supervised Spatial Attention Method for Facial Attribute Editing

  • Yang, Hyeon Seok;Han, Jeong Hoon;Moon, Young Shik
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.10
    • /
    • pp.3685-3707
    • /
    • 2021
  • In recent years, facial attribute editing has been successfully used to effectively change face images of various attributes based on generative adversarial networks and encoder-decoder models. However, existing models have a limitation in that they may change an unintended part in the process of changing an attribute or may generate an unnatural result. In this paper, we propose a model that improves the learning of the attention mask by adding a spatial attention mechanism based on the unified selective transfer network (referred to as STGAN) using semi-supervised learning. The proposed model can edit multiple attributes while preserving details independent of the attributes being edited. This study makes two main contributions to the literature. First, we propose an encoder-decoder model structure that learns and edits multiple facial attributes and suppresses distortion using an attention mask. Second, we define guide masks and propose a method and an objective function that use the guide masks for multiple facial attribute editing through semi-supervised learning. Through qualitative and quantitative evaluations of the experimental results, the proposed method was proven to yield improved results that preserve the image details by suppressing unintended changes than existing methods.

A study on the application of residual vector quantization for vector quantized-variational autoencoder-based foley sound generation model (벡터 양자화 변분 오토인코더 기반의 폴리 음향 생성 모델을 위한 잔여 벡터 양자화 적용 연구)

  • Seokjin Lee
    • The Journal of the Acoustical Society of Korea
    • /
    • v.43 no.2
    • /
    • pp.243-252
    • /
    • 2024
  • Among the Foley sound generation models that have recently begun to be studied, a sound generation technique using the Vector Quantized-Variational AutoEncoder (VQ-VAE) structure and generation model such as Pixelsnail are one of the important research subjects. On the other hand, in the field of deep learning-based acoustic signal compression, residual vector quantization technology is reported to be more suitable than the conventional VQ-VAE structure. Therefore, in this paper, we aim to study whether residual vector quantization technology can be effectively applied to the Foley sound generation. In order to tackle the problem, this paper applies the residual vector quantization technique to the conventional VQ-VAE-based Foley sound generation model, and in particular, derives a model that is compatible with the existing models such as Pixelsnail and does not increase computational resource consumption. In order to evaluate the model, an experiment was conducted using DCASE2023 Task7 data. The results show that the proposed model enhances about 0.3 of the Fréchet audio distance. Unfortunately, the performance enhancement was limited, which is believed to be due to the decrease in the resolution of time-frequency domains in order to do not increase consumption of the computational resources.

Fault Detection and Diagnosis for EVA Production Processes Using AE-SOM (AE-SOM을 이용한 EVA 생산 공정 이상 검출 및 진단)

  • Park, Byeong Eon;Ji, Yumi;Sim, Ye Seul;Lee, Kyu-Hwang;Lee, Ho Kyung
    • Korean Chemical Engineering Research
    • /
    • v.58 no.3
    • /
    • pp.408-415
    • /
    • 2020
  • In this study, the AE-SOM method, which combines auto-encoder and self-organizing map, is used to detect and diagnose faults in EVA production process. Then, the fault propagation pathways are identified using Granger causality test. One year and seven months of operation data were obtained to detect faults of the process, and the process variables of the autoclave reactor are mainly analyzed. In the data pretreatment process, the data are standardized and 200 samples of each grade are randomly chosen to obtain a fault detection model. After that, the best matching unit (BMU) of each grade is confirmed by applying AE-SOM. The faults are determined based on each BMU. When a fault is found, the most causative variable of the fault is identified by using a contribution plot, and the fault propagation pathway is identified by Granger causality test. The prognostic of the two shutdowns is detected, and the fault propagation pathway caused by the faulty variable was analyzed.

Position Estimation of a Mobile Robot Based on USN and Encoder and Development of Tele-operation System using Internet (USN과 회전 센서를 이용한 이동로봇의 위치인식과 인터넷을 통한 원격제어 시스템 개발)

  • Park, Jong-Jin
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.9 no.6
    • /
    • pp.55-61
    • /
    • 2009
  • This paper proposes a position estimation of a mobile robot based on USN(Ubiquitous Sensor Network) and encoder, and development of tele-operation system using Internet. USN used in experiments is based on ZigBee protocol and has location estimation engine which uses RSSI signal to estimate distance between nodes. By distortion the estimated distance using RSSI is not correct, compensation method is needed. We obtained fuzzy model to calculate more accurate distance between nodes and use encoder which is built in robot to estimate accurate position of robot. Based on proposed position estimation method, tele-operation system was developed. We show by experiment that proposed method is more appropriate for estimation of position and remote navigation of mobile robot through Internet.

  • PDF

Fast mode decision by skipping variable block-based motion estimation and spatial predictive coding in H.264 (H.264의 가변 블록 크기 움직임 추정 및 공간 예측 부호화 생략에 의한 고속 모드 결정법)

  • 한기훈;이영렬
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.40 no.5
    • /
    • pp.417-425
    • /
    • 2003
  • H.264, which is the latest video coding standard of both ITU-T(International Telecommunication Union-Telecommunication standardization sector) and MPEG(Moving Picture Experts Group), adopts new video coding tools such as variable block size motion estimation, multiple reference frames, quarter-pel motion estimation/compensation(ME/MC), 4${\times}$4 Integer DCT(Discrete Cosine Transform), and Rate-Distortion Optimization, etc. These new video coding tools provide good coding of efficiency compared with existing video coding standards as H.263, MPEG-4, etc. However, these new coding tools require the increase of encoder complexity. Therefore, in order to apply H.264 to many real applications, fast algorithms are required for H.264 coding tools. In this paper, when encoder MacroBlock(MB) mode is decided by rate-distortion optimization tool, fast mode decision algorithm by skipping variable block size ME/MC and spatial-predictive coding, which occupies most encoder complexity, is proposed. In terms of computational complexity, the proposed method runs about 4 times as far as JM(Joint Model) 42 encoder of H.264, while the PSNR(peak signal-to-noise ratio)s of the decoded images are maintained.

EEG Dimensional Reduction with Stack AutoEncoder for Emotional Recognition using LSTM/RNN (LSTM/RNN을 사용한 감정인식을 위한 스택 오토 인코더로 EEG 차원 감소)

  • Aliyu, Ibrahim;Lim, Chang-Gyoon
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.15 no.4
    • /
    • pp.717-724
    • /
    • 2020
  • Due to the important role played by emotion in human interaction, affective computing is dedicated in trying to understand and regulate emotion through human-aware artificial intelligence. By understanding, emotion mental diseases such as depression, autism, attention deficit hyperactivity disorder, and game addiction will be better managed as they are all associated with emotion. Various studies for emotion recognition have been conducted to solve these problems. In applying machine learning for the emotion recognition, the efforts to reduce the complexity of the algorithm and improve the accuracy are required. In this paper, we investigate emotion Electroencephalogram (EEG) feature reduction and classification using Stack AutoEncoder (SAE) and Long-Short-Term-Memory/Recurrent Neural Networks (LSTM/RNN) classification respectively. The proposed method reduced the complexity of the model and significantly enhance the performance of the classifiers.

A study on Korean multi-turn response generation using generative and retrieval model (생성 모델과 검색 모델을 이용한 한국어 멀티턴 응답 생성 연구)

  • Lee, Hodong;Lee, Jongmin;Seo, Jaehyung;Jang, Yoonna;Lim, Heuiseok
    • Journal of the Korea Convergence Society
    • /
    • v.13 no.1
    • /
    • pp.13-21
    • /
    • 2022
  • Recent deep learning-based research shows excellent performance in most natural language processing (NLP) fields with pre-trained language models. In particular, the auto-encoder-based language model proves its excellent performance and usefulness in various fields of Korean language understanding. However, the decoder-based Korean generative model even suffers from generating simple sentences. Also, there is few detailed research and data for the field of conversation where generative models are most commonly utilized. Therefore, this paper constructs multi-turn dialogue data for a Korean generative model. In addition, we compare and analyze the performance by improving the dialogue ability of the generative model through transfer learning. In addition, we propose a method of supplementing the insufficient dialogue generation ability of the model by extracting recommended response candidates from external knowledge information through a retrival model.