• Title/Summary/Keyword: Encoder Model

Search Result 361, Processing Time 0.028 seconds

Multi-scale context fusion network for melanoma segmentation

  • Zhenhua Li;Lei Zhang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.18 no.7
    • /
    • pp.1888-1906
    • /
    • 2024
  • Aiming at the problems that the edge of melanoma image is fuzzy, the contrast with the background is low, and the hair occlusion makes it difficult to segment accurately, this paper proposes a model MSCNet for melanoma segmentation based on U-net frame. Firstly, a multi-scale pyramid fusion module is designed to reconstruct the skip connection and transmit global information to the decoder. Secondly, the contextural information conduction module is innovatively added to the top of the encoder. The module provides different receptive fields for the segmented target by using the hole convolution with different expansion rates, so as to better fuse multi-scale contextural information. In addition, in order to suppress redundant information in the input image and pay more attention to melanoma feature information, global channel attention mechanism is introduced into the decoder. Finally, In order to solve the problem of lesion class imbalance, this paper uses a combined loss function. The algorithm of this paper is verified on ISIC 2017 and ISIC 2018 public datasets. The experimental results indicate that the proposed algorithm has better accuracy for melanoma segmentation compared with other CNN-based image segmentation algorithms.

Cross-Lingual Style-Based Title Generation Using Multiple Adapters (다중 어댑터를 이용한 교차 언어 및 스타일 기반의 제목 생성)

  • Yo-Han Park;Yong-Seok Choi;Kong Joo Lee
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.8
    • /
    • pp.341-354
    • /
    • 2023
  • The title of a document is the brief summarization of the document. Readers can easily understand a document if we provide them with its title in their preferred styles and the languages. In this research, we propose a cross-lingual and style-based title generation model using multiple adapters. To train the model, we need a parallel corpus in several languages with different styles. It is quite difficult to construct this kind of parallel corpus; however, a monolingual title generation corpus of the same style can be built easily. Therefore, we apply a zero-shot strategy to generate a title in a different language and with a different style for an input document. A baseline model is Transformer consisting of an encoder and a decoder, pre-trained by several languages. The model is then equipped with multiple adapters for translation, languages, and styles. After the model learns a translation task from parallel corpus, it learns a title generation task from monolingual title generation corpus. When training the model with a task, we only activate an adapter that corresponds to the task. When generating a cross-lingual and style-based title, we only activate adapters that correspond to a target language and a target style. An experimental result shows that our proposed model is only as good as a pipeline model that first translates into a target language and then generates a title. There have been significant changes in natural language generation due to the emergence of large-scale language models. However, research to improve the performance of natural language generation using limited resources and limited data needs to continue. In this regard, this study seeks to explore the significance of such research.

Effective Multi-Modal Feature Fusion for 3D Semantic Segmentation with Multi-View Images (멀티-뷰 영상들을 활용하는 3차원 의미적 분할을 위한 효과적인 멀티-모달 특징 융합)

  • Hye-Lim Bae;Incheol Kim
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.12
    • /
    • pp.505-518
    • /
    • 2023
  • 3D point cloud semantic segmentation is a computer vision task that involves dividing the point cloud into different objects and regions by predicting the class label of each point. Existing 3D semantic segmentation models have some limitations in performing sufficient fusion of multi-modal features while ensuring both characteristics of 2D visual features extracted from RGB images and 3D geometric features extracted from point cloud. Therefore, in this paper, we propose MMCA-Net, a novel 3D semantic segmentation model using 2D-3D multi-modal features. The proposed model effectively fuses two heterogeneous 2D visual features and 3D geometric features by using an intermediate fusion strategy and a multi-modal cross attention-based fusion operation. Also, the proposed model extracts context-rich 3D geometric features from input point cloud consisting of irregularly distributed points by adopting PTv2 as 3D geometric encoder. In this paper, we conducted both quantitative and qualitative experiments with the benchmark dataset, ScanNetv2 in order to analyze the performance of the proposed model. In terms of the metric mIoU, the proposed model showed a 9.2% performance improvement over the PTv2 model using only 3D geometric features, and a 12.12% performance improvement over the MVPNet model using 2D-3D multi-modal features. As a result, we proved the effectiveness and usefulness of the proposed model.

Comparative Analysis of Self-supervised Deephashing Models for Efficient Image Retrieval System (효율적인 이미지 검색 시스템을 위한 자기 감독 딥해싱 모델의 비교 분석)

  • Kim Soo In;Jeon Young Jin;Lee Sang Bum;Kim Won Gyum
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.12
    • /
    • pp.519-524
    • /
    • 2023
  • In hashing-based image retrieval, the hash code of a manipulated image is different from the original image, making it difficult to search for the same image. This paper proposes and evaluates a self-supervised deephashing model that generates perceptual hash codes from feature information such as texture, shape, and color of images. The comparison models are autoencoder-based variational inference models, but the encoder is designed with a fully connected layer, convolutional neural network, and transformer modules. The proposed model is a variational inference model that includes a SimAM module of extracting geometric patterns and positional relationships within images. The SimAM module can learn latent vectors highlighting objects or local regions through an energy function using the activation values of neurons and surrounding neurons. The proposed method is a representation learning model that can generate low-dimensional latent vectors from high-dimensional input images, and the latent vectors are binarized into distinguishable hash code. From the experimental results on public datasets such as CIFAR-10, ImageNet, and NUS-WIDE, the proposed model is superior to the comparative model and analyzed to have equivalent performance to the supervised learning-based deephashing model. The proposed model can be used in application systems that require low-dimensional representation of images, such as image search or copyright image determination.

Fast Mode Decision using Block Size Activity for H.264/AVC (블록 크기 활동도를 이용한 H.264/AVC 부호화 고속 모드 결정)

  • Jung, Bong-Soo;Jeon, Byeung-Woo;Choi, Kwang-Pyo;Oh, Yun-Je
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.44 no.2 s.314
    • /
    • pp.1-11
    • /
    • 2007
  • H.264/AVC uses variable block sizes to achieve significant coding gain. It has 7 different coding modes having different motion compensation block sizes in Inter slice, and 2 different intra prediction modes in Intra slice. This fine-tuned new coding feature has achieved far more significant coding gain compared with previous video coding standards. However, extremely high computational complexity is required when rate-distortion optimization (RDO) algorithm is used. This computational complexity is a major problem in implementing real-time H.264/AVC encoder on computationally constrained devices. Therefore, there is a clear need for complexity reduction algorithm of H.264/AVC such as fast mode decision. In this paper, we propose a fast mode decision with early $P8\times8$ mode rejection based on block size activity using large block history map (LBHM). Simulation results show that without any meaningful degradation, the proposed method reduces whole encoding time on average by 53%. Also the hybrid usage of the proposed method and the early SKIP mode decision in H.264/AVC reference model reduces whole encoding time by 63% on average.

Korean Dependency Parsing Using Stack-Pointer Networks and Subtree Information (스택-포인터 네트워크와 부분 트리 정보를 이용한 한국어 의존 구문 분석)

  • Choi, Yong-Seok;Lee, Kong Joo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.10 no.6
    • /
    • pp.235-242
    • /
    • 2021
  • In this work, we develop a Korean dependency parser based on a stack-pointer network that consists of a pointer network and an internal stack. The parser has an encoder and decoder and builds a dependency tree for an input sentence in a depth-first manner. The encoder of the parser encodes an input sentence, and the decoder selects a child for the word at the top of the stack at each step. Since the parser has the internal stack where a search path is stored, the parser can utilize information of previously derived subtrees when selecting a child node. Previous studies used only a grandparent and the most recently visited sibling without considering a subtree structure. In this paper, we introduce graph attention networks that can represent a previously derived subtree. Then we modify our parser based on the stack-pointer network to utilize subtree information produced by the graph attention networks. After training the dependency parser using Sejong and Everyone's corpus, we evaluate the parser's performance. Experimental results show that the proposed parser achieves better performance than the previous approaches at sentence-level accuracies when adopting 2-depth graph attention networks.

Road Extraction from Images Using Semantic Segmentation Algorithm (영상 기반 Semantic Segmentation 알고리즘을 이용한 도로 추출)

  • Oh, Haeng Yeol;Jeon, Seung Bae;Kim, Geon;Jeong, Myeong-Hun
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.40 no.3
    • /
    • pp.239-247
    • /
    • 2022
  • Cities are becoming more complex due to rapid industrialization and population growth in modern times. In particular, urban areas are rapidly changing due to housing site development, reconstruction, and demolition. Thus accurate road information is necessary for various purposes, such as High Definition Map for autonomous car driving. In the case of the Republic of Korea, accurate spatial information can be generated by making a map through the existing map production process. However, targeting a large area is limited due to time and money. Road, one of the map elements, is a hub and essential means of transportation that provides many different resources for human civilization. Therefore, it is essential to update road information accurately and quickly. This study uses Semantic Segmentation algorithms Such as LinkNet, D-LinkNet, and NL-LinkNet to extract roads from drone images and then apply hyperparameter optimization to models with the highest performance. As a result, the LinkNet model using pre-trained ResNet-34 as the encoder achieved 85.125 mIoU. Subsequent studies should focus on comparing the results of this study with those of studies using state-of-the-art object detection algorithms or semi-supervised learning-based Semantic Segmentation techniques. The results of this study can be applied to improve the speed of the existing map update process.

Design and Implementation of Hybrid Network Associated 3D Video Broadcasting System (이종망 연동형 3D 비디오 방송시스템 설계 및 구현)

  • Yun, Kugjin;Cheong, Won-Sik;Lee, Jinyoung;Kim, Kyuheon
    • Journal of Broadcast Engineering
    • /
    • v.19 no.5
    • /
    • pp.687-698
    • /
    • 2014
  • ATSC is currently working on standardization of hybrid 3DTV broadcasting service in heterogenous network environment after completion of service-compatible 3DTV broadcasting service standard based on broadcasting channel. This paper proposes a convergence 3D video broadcasting method on broadcasting and IP network while guaranteeing a Full-HD 3D quality without degrading the image quality of legacy DTV. Specifically, this paper describes transmission of the 3D additional video using the ISO/IEC 23009-1 DASH, robust synchronization method under heterogenous network environments and system target decoder model for hybrid 3DTV receiver. Based on experimental results, we confirm that proposed technologies can be used as a core technology in the hybrid 3DTV standardization and a reference model for a development of hybrid 3DTV encoder and receiver.

VIBRATION ANALYSIS OF PCB MANUFACTURING SYSTEM USING MASKLESS EXPOSURE METHOD (Maskless 방식을 이용한 PCB 생산시스템의 진동 해석)

  • Jang, Won-Hyuk;Lee, Jae-Mun;Cho, Myeong-Woo;Kim, Joung-Su;Lee, Chul-Hee
    • Proceedings of the Korean Society for Noise and Vibration Engineering Conference
    • /
    • 2009.10a
    • /
    • pp.421-426
    • /
    • 2009
  • This paper presents vibration analysis of maskless exposure module in Printed Circuit Board (PCB) manufacturing system. In order to complete exposure process in PCB, masking type module has been widely used in electronics industries. However, masking process confronts some limitations of application due to higher production cost for masking as well as lower printing resolution. Therefore, maskless exposure module is started to be in the spotlight for flexible production system to meet the needs of fabrication in variable patterns at low cost. Since maskless exposure process adopts direct patterning to PCB, vibration problems become more critical compared to conventional masking type process. Moreover, movements of exposure engine as well as stage generate vibration sources in the system. Thus, it is imperative to analyze the vibration characteristics for the maskless exposure module to improve the quality and accuracy of PCB. In this study, vibration analysis using the Finite Element Analysis is conducted to identify the critical structural parts deteriorating vibration performance. Also, Experimental investigations are conducted by single/dual encoder measurement process under the operating module speed. Measurement points of vibration are selected by three places, which are base of stage, exposure engine and top of stage, to check the effect of vibration from the exposure engine. Comparisons between analysis results and experimental measurement are conducted to confirm the accuracy of analysis results including the developed FE model. Finally, this studies show feasibility of optimal design using the developed FE analysis model.

  • PDF

A Fast Decision Method of Quadtree plus Binary Tree (QTBT) Depth in JEM (차세대 비디오 코덱(JEM)의 고속 QTBT 분할 깊이 결정 기법)

  • Yoon, Yong-Uk;Park, Do-Hyun;Kim, Jae-Gon
    • Journal of Broadcast Engineering
    • /
    • v.22 no.5
    • /
    • pp.541-547
    • /
    • 2017
  • The Joint Exploration Model (JEM), which is a reference SW codec of the Joint Video Exploration Team (JVET) exploring the future video standard technology, provides a recursive Quadtree plus Binary Tree (QTBT) block structure. QTBT can achieve enhanced coding efficiency by adding new block structures at the expense of largely increased computational complexity. In this paper, we propose a fast decision algorithm of QTBT block partitioning depth that uses the rate-distortion (RD) cost of the upper and current depth to reduce the complexity of the JEM encoder. Experimental results showed that the computational complexity of JEM 5.0 can be reduced up to 21.6% and 11.0% with BD-rate increase of 0.7% and 1.2% in AI (All Intra) and RA (Random Access), respectively.