• Title/Summary/Keyword: Encoder Model

Search Result 354, Processing Time 0.024 seconds

Fast Inter CU Partitioning Algorithm using MAE-based Prediction Accuracy Functions for VVC (MAE 기반 예측 정확도 함수를 이용한 VVC의 고속 화면간 CU 분할 알고리즘)

  • Won, Dong-Jae;Moon, Joo-Hee
    • Journal of Broadcast Engineering
    • /
    • v.27 no.3
    • /
    • pp.361-368
    • /
    • 2022
  • Quaternary tree plus multi-type tree (QT+MTT) structure was adopted in the Versatile Video Coding (VVC) standard as a block partitioning tool. QT+MTT provides excellent coding gain; however, it has huge encoding complexity due to the flexibility of the binary tree (BT) and ternary tree (TT) splits. This paper proposes a fast inter coding unit (CU) partitioning algorithm for BT and TT split types based on prediction accuracy functions using the mean of the absolute error (MAE). The MAE-based decision model was established to achieve a consistent time-saving encoding with stable coding loss for a practical low complexity VVC encoder. Experimental results under random access test configuration showed that the proposed algorithm achieved the encoding time saving from 24.0% to 31.7% with increasing luminance Bjontegaard delta (BD) rate from 1.0% to 2.1%.

Pedestrian and Vehicle Distance Estimation Based on Hard Parameter Sharing (하드 파라미터 쉐어링 기반의 보행자 및 운송 수단 거리 추정)

  • Seo, Ji-Won;Cha, Eui-Young
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.3
    • /
    • pp.389-395
    • /
    • 2022
  • Because of improvement of deep learning techniques, deep learning using computer vision such as classification, detection and segmentation has also been used widely at many fields. Expecially, automatic driving is one of the major fields that applies computer vision systems. Also there are a lot of works and researches to combine multiple tasks in a single network. In this study, we propose the network that predicts the individual depth of pedestrians and vehicles. Proposed model is constructed based on YOLOv3 for object detection and Monodepth for depth estimation, and it process object detection and depth estimation consequently using encoder and decoder based on hard parameter sharing. We also used attention module to improve the accuracy of both object detection and depth estimation. Depth is predicted with monocular image, and is trained using self-supervised training method.

Structural health monitoring data anomaly detection by transformer enhanced densely connected neural networks

  • Jun, Li;Wupeng, Chen;Gao, Fan
    • Smart Structures and Systems
    • /
    • v.30 no.6
    • /
    • pp.613-626
    • /
    • 2022
  • Guaranteeing the quality and integrity of structural health monitoring (SHM) data is very important for an effective assessment of structural condition. However, sensory system may malfunction due to sensor fault or harsh operational environment, resulting in multiple types of data anomaly existing in the measured data. Efficiently and automatically identifying anomalies from the vast amounts of measured data is significant for assessing the structural conditions and early warning for structural failure in SHM. The major challenges of current automated data anomaly detection methods are the imbalance of dataset categories. In terms of the feature of actual anomalous data, this paper proposes a data anomaly detection method based on data-level and deep learning technique for SHM of civil engineering structures. The proposed method consists of a data balancing phase to prepare a comprehensive training dataset based on data-level technique, and an anomaly detection phase based on a sophisticatedly designed network. The advanced densely connected convolutional network (DenseNet) and Transformer encoder are embedded in the specific network to facilitate extraction of both detail and global features of response data, and to establish the mapping between the highest level of abstractive features and data anomaly class. Numerical studies on a steel frame model are conducted to evaluate the performance and noise immunity of using the proposed network for data anomaly detection. The applicability of the proposed method for data anomaly classification is validated with the measured data of a practical supertall structure. The proposed method presents a remarkable performance on data anomaly detection, which reaches a 95.7% overall accuracy with practical engineering structural monitoring data, which demonstrates the effectiveness of data balancing and the robust classification capability of the proposed network.

WiFi CSI Data Preprocessing and Augmentation Techniques in Indoor People Counting using Deep Learning (딥러닝을 활용한 실내 사람 수 추정을 위한 WiFi CSI 데이터 전처리와 증강 기법)

  • Kim, Yeon-Ju;Kim, Seungku
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.12
    • /
    • pp.1890-1897
    • /
    • 2021
  • People counting is an important technology to provide application services such as smart home, smart building, smart car, etc. Due to the social distancing of COVID-19, the people counting technology attracted public attention. People counting system can be implemented in various ways such as camera, sensor, wireless, etc. according to service requirements. People counting system using WiFi AP uses WiFi CSI data that reflects multipath information. This technology is an effective solution implementing indoor with low cost. The conventional WiFi CSI-based people counting technologies have low accuracy that obstructs the high quality service. This paper proposes a deep learning people counting system based on WiFi CSI data. Data preprocessing using auto-encoder, data augmentation that transform WiFi CSI data, and a proposed deep learning model improve the accuracy of people counting. In the experimental result, the proposed approach shows 89.29% accuracy in 6 subjects.

Personalized Chit-chat Based on Language Models (언어 모델 기반 페르소나 대화 모델)

  • Jang, Yoonna;Oh, Dongsuk;Lim, Jungwoo;Lim, Heuiseok
    • Annual Conference on Human and Language Technology
    • /
    • 2020.10a
    • /
    • pp.491-494
    • /
    • 2020
  • 최근 언어 모델(Language model)의 기술이 발전함에 따라, 자연어처리 분야의 많은 연구들이 좋은 성능을 내고 있다. 정해진 주제 없이 인간과 잡담을 나눌 수 있는 오픈 도메인 대화 시스템(Open-domain dialogue system) 분야에서 역시 이전보다 더 자연스러운 발화를 생성할 수 있게 되었다. 언어 모델의 발전은 응답 선택(Response selection) 분야에서도 모델이 맥락에 알맞은 답변을 선택하도록 하는 데 기여를 했다. 하지만, 대화 모델이 답변을 생성할 때 일관성 없는 답변을 만들거나, 구체적이지 않고 일반적인 답변만을 하는 문제가 대두되었다. 이를 해결하기 위하여 화자의 개인화된 정보에 기반한 대화인 페르소나(Persona) 대화 데이터 및 태스크가 연구되고 있다. 페르소나 대화 태스크에서는 화자마다 주어진 페르소나가 있고, 대화를 할 때 주어진 페르소나와 일관성이 있는 답변을 선택하거나 생성해야 한다. 이에 우리는 대용량의 코퍼스(Corpus)에 사전 학습(Pre-trained) 된 언어 모델을 활용하여 더 적절한 답변을 선택하는 페르소나 대화 시스템에 대하여 논의한다. 언어 모델 중 자기 회귀(Auto-regressive) 방식으로 모델링을 하는 GPT-2, DialoGPT와 오토인코더(Auto-encoder)를 이용한 BERT, 두 모델이 결합되어 있는 구조인 BART가 실험에 활용되었다. 이와 같이 본 논문에서는 여러 종류의 언어 모델을 페르소나 대화 태스크에 대해 비교 실험을 진행했고, 그 결과 Hits@1 점수에서 BERT가 가장 우수한 성능을 보이는 것을 확인할 수 있었다.

  • PDF

Updated Primer on Generative Artificial Intelligence and Large Language Models in Medical Imaging for Medical Professionals

  • Kiduk Kim;Kyungjin Cho;Ryoungwoo Jang;Sunggu Kyung;Soyoung Lee;Sungwon Ham;Edward Choi;Gil-Sun Hong;Namkug Kim
    • Korean Journal of Radiology
    • /
    • v.25 no.3
    • /
    • pp.224-242
    • /
    • 2024
  • The emergence of Chat Generative Pre-trained Transformer (ChatGPT), a chatbot developed by OpenAI, has garnered interest in the application of generative artificial intelligence (AI) models in the medical field. This review summarizes different generative AI models and their potential applications in the field of medicine and explores the evolving landscape of Generative Adversarial Networks and diffusion models since the introduction of generative AI models. These models have made valuable contributions to the field of radiology. Furthermore, this review also explores the significance of synthetic data in addressing privacy concerns and augmenting data diversity and quality within the medical domain, in addition to emphasizing the role of inversion in the investigation of generative models and outlining an approach to replicate this process. We provide an overview of Large Language Models, such as GPTs and bidirectional encoder representations (BERTs), that focus on prominent representatives and discuss recent initiatives involving language-vision models in radiology, including innovative large language and vision assistant for biomedicine (LLaVa-Med), to illustrate their practical application. This comprehensive review offers insights into the wide-ranging applications of generative AI models in clinical research and emphasizes their transformative potential.

A method for metadata extraction from a collection of records using Named Entity Recognition in Natural Language Processing (자연어 처리의 개체명 인식을 통한 기록집합체의 메타데이터 추출 방안)

  • Chiho Song
    • Journal of Korean Society of Archives and Records Management
    • /
    • v.24 no.2
    • /
    • pp.65-88
    • /
    • 2024
  • This pilot study explores a method of extracting metadata values and descriptions from records using named entity recognition (NER), a technique in natural language processing (NLP), a subfield of artificial intelligence. The study focuses on handwritten records from the Guro Industrial Complex, produced during the 1960s and 1970s, comprising approximately 1,200 pages and 80,000 words. After the preprocessing process of the records, which included digitization, the study employed a publicly available language API based on Google's Bidirectional Encoder Representations from Transformers (BERT) language model to recognize entity names within the text. As a result, 173 names of people and 314 of organizations and institutions were extracted from the Guro Industrial Complex's past records. These extracted entities are expected to serve as direct search terms for accessing the contents of the records. Furthermore, the study identified challenges that arose when applying the theoretical methodology of NLP to real-world records consisting of semistructured text. It also presents potential solutions and implications to consider when addressing these issues.

Density map estimation based on deep-learning for pest control drone optimization (드론 방제의 최적화를 위한 딥러닝 기반의 밀도맵 추정)

  • Baek-gyeom Seong;Xiongzhe Han;Seung-hwa Yu;Chun-gu Lee;Yeongho Kang;Hyun Ho Woo;Hunsuk Lee;Dae-Hyun Lee
    • Journal of Drive and Control
    • /
    • v.21 no.2
    • /
    • pp.53-64
    • /
    • 2024
  • Global population growth has resulted in an increased demand for food production. Simultaneously, aging rural communities have led to a decrease in the workforce, thereby increasing the demand for automation in agriculture. Drones are particularly useful for unmanned pest control fields. However, the current method of uniform spraying leads to environmental damage due to overuse of pesticides and drift by wind. To address this issue, it is necessary to enhance spraying performance through precise performance evaluation. Therefore, as a foundational study aimed at optimizing drone-based pest control technologies, this research evaluated water-sensitive paper (WSP) via density map estimation using convolutional neural networks (CNN) with a encoder-decoder structure. To achieve more accurate estimation, this study implemented multi-task learning, incorporating an additional classifier for image segmentation alongside the density map estimation classifier. The proposed model in this study resulted in a R-squared (R2) of 0.976 for coverage area in the evaluation data set, demonstrating satisfactory performance in evaluating WSP at various density levels. Further research is needed to improve the accuracy of spray result estimations and develop a real-time assessment technology in the field.

Side-Channel Archive Framework Using Deep Learning-Based Leakage Compression (딥러닝을 이용한 부채널 데이터 압축 프레임 워크)

  • Sangyun Jung;Sunghyun Jin;Heeseok Kim
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.34 no.3
    • /
    • pp.379-392
    • /
    • 2024
  • With the rapid increase in data, saving storage space and improving the efficiency of data transmission have become critical issues, making the research on the efficiency of data compression technologies increasingly important. Lossless algorithms can precisely restore original data but have limited compression ratios, whereas lossy algorithms provide higher compression rates at the expense of some data loss. There has been active research in data compression using deep learning-based algorithms, especially the autoencoder model. This study proposes a new side-channel analysis data compressor utilizing autoencoders. This compressor achieves higher compression rates than Deflate while maintaining the characteristics of side-channel data. The encoder, using locally connected layers, effectively preserves the temporal characteristics of side-channel data, and the decoder maintains fast decompression times with a multi-layer perceptron. Through correlation power analysis, the proposed compressor has been proven to compress data without losing the characteristics of side-channel data.

Fast Coding Unit Decision Algorithm Based on Region of Interest by Motion Vector in HEVC (움직임 벡터에 의한 관심영역 기반의 HEVC 고속 부호화 유닛 결정 방법)

  • Hwang, In Seo;Sunwoo, Myung Hoon
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.53 no.11
    • /
    • pp.41-47
    • /
    • 2016
  • High efficiency video coding (HEVC) employs a coding tree unit (CTU) to improve the coding efficiency. A CTU consists of coding units (CU), prediction units (PU), and transform units (TU). All possible block partitions should be performed on each depth level to obtain the best combination of CUs, PUs, and TUs. To reduce the complexity of block partitioning process, this paper proposes the PU mode skip algorithm with region of interest (RoI) selection using motion vector. In addition, this paper presents the CU depth level skip algorithm using the co-located block information in the previously encoded frames. First, the RoI selection algorithm distinguishes between dynamic CTUs and static CTUs and then, asymmetric motion partitioning (AMP) blocks are skipped in the static CTUs. Second, the depth level skip algorithm predicts the most probable target depth level from average depth in one CTU. The experimental results show that the proposed fast CU decision algorithm can reduce the total encoding time up to 44.8% compared to the HEVC test model (HM) 14.0 reference software encoder. Moreover, the proposed algorithm shows only 2.5% Bjontegaard delta bit rate (BDBR) loss.