• Title/Summary/Keyword: AI 데이터셋

Search Result 235, Processing Time 0.048 seconds

Analysis of the Status of Natural Language Processing Technology Based on Deep Learning (딥러닝 중심의 자연어 처리 기술 현황 분석)

  • Park, Sang-Un
    • The Journal of Bigdata
    • /
    • v.6 no.1
    • /
    • pp.63-81
    • /
    • 2021
  • The performance of natural language processing is rapidly improving due to the recent development and application of machine learning and deep learning technologies, and as a result, the field of application is expanding. In particular, as the demand for analysis on unstructured text data increases, interest in NLP(Natural Language Processing) is also increasing. However, due to the complexity and difficulty of the natural language preprocessing process and machine learning and deep learning theories, there are still high barriers to the use of natural language processing. In this paper, for an overall understanding of NLP, by examining the main fields of NLP that are currently being actively researched and the current state of major technologies centered on machine learning and deep learning, We want to provide a foundation to understand and utilize NLP more easily. Therefore, we investigated the change of NLP in AI(artificial intelligence) through the changes of the taxonomy of AI technology. The main areas of NLP which consists of language model, text classification, text generation, document summarization, question answering and machine translation were explained with state of the art deep learning models. In addition, major deep learning models utilized in NLP were explained, and data sets and evaluation measures for performance evaluation were summarized. We hope researchers who want to utilize NLP for various purposes in their field be able to understand the overall technical status and the main technologies of NLP through this paper.

An Overloaded Vehicle Identifying System based on Object Detection Model (객체 인식 모델을 활용한 적재불량 화물차 탐지 시스템 개발)

  • Jung, Woojin;Park, Yongju;Park, Jinuk;Kim, Chang-il
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.10a
    • /
    • pp.562-565
    • /
    • 2022
  • Recently, the increasing number of overloaded vehicles on the road poses a risk to traffic safety, such as falling objects, road damage, and chain collisions due to the abnormal weight distribution, and can cause great damage once an accident occurs. However, this irregular weight distribution is not possible to be recognized with the current weight measurement system for vehicles on roads. To address this limitation, we propose to build an object detection-based AI model to identify overloaded vehicles that cause such social problems. In addition, we present a simple yet effective method to construct an object detection model for the large-scale vehicle images. In particular, we utilize the large-scale of vehicle image sets provided by open AI-Hub, which include the overloaded vehicles from the CCTV, black box, and hand-held camera point of view. We inspected the specific features of sizes of vehicles and types of image sources, and pre-processed these images to train a deep learning-based object detection model. Finally, we demonstrated that the detection performance of the overloaded vehicle was improved by about 23% compared to the one using raw data. From the result, we believe that public big data can be utilized more efficiently and applied to the development of an object detection-based overloaded vehicle detection model.

  • PDF

A personalized exercise recommendation system using dimension reduction algorithms

  • Lee, Ha-Young;Jeong, Ok-Ran
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.6
    • /
    • pp.19-28
    • /
    • 2021
  • Nowadays, interest in health care is increasing due to Coronavirus (COVID-19), and a lot of people are doing home training as there are more difficulties in using fitness centers and public facilities that are used together. In this paper, we propose a personalized exercise recommendation algorithm using personalized propensity information to provide more accurate and meaningful exercise recommendation to home training users. Thus, we classify the data according to the criteria for obesity with a k-nearest neighbor algorithm using personal information that can represent individuals, such as eating habits information and physical conditions. Furthermore, we differentiate the exercise dataset by the level of exercise activities. Based on the neighborhood information of each dataset, we provide personalized exercise recommendations to users through a dimensionality reduction algorithm (SVD) among model-based collaborative filtering methods. Therefore, we can solve the problem of data sparsity and scalability of memory-based collaborative filtering recommendation techniques and we verify the accuracy and performance of the proposed algorithms.

$\mathcal{K}o$-ATOMIC: Korean Commonsense Knowledge Graph ($\mathcal{K}o$-ATOMIC: 일반 상식 기반의 한국어 지식 그래프)

  • Jaewook Lee;Jaehyung Seo;Seungjun Lee;Chanjun Park;Aiyanyo Imatitikua Danielle;Heuiseok Lim
    • Annual Conference on Human and Language Technology
    • /
    • 2022.10a
    • /
    • pp.412-417
    • /
    • 2022
  • 일반 상식 기반의 지식 그래프는 대규모 코퍼스에 포함되어 있는 일반 상식을 그래프로 표현하여, 자연어 처리의 하위 작업들에 적용할 수 있도록 하는 구조화된 지식 표현 방법이다. 현재 가장 잘 알려진 일반 상식 기반의 지식 그래프로는 ATOMIC [1]이 있다. 하지만 한국어를 주요 언어로 하는 일반 상식 기반의 지식 그래프에 대한 연구는 아직 활발하지 않다. 따라서 본 연구에서는 기존에 존재하는 영어 기반의 지식 그래프와 일반 상식 기반의 한국어 데이터셋을 활용해서 한국어 일반 상식 기반 지식 그래프를 구축하는 방법론을 제시한다. 또한, 제작한 지식 그래프를 평가하여 구축하는 방법론에 대한 타당성을 검증한다.

  • PDF

Drone detection system using YOLO (YOLO를 이용한 드론탐지 시스템)

  • Shin, JunPyo;Kim, YuMin;Choi, KyuMin;Sung, SeungMin;Lee, ByungKwon
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2021.01a
    • /
    • pp.233-236
    • /
    • 2021
  • 본 논문에서는 국내 드론 사용량이 증가하고 있으나 드론을 제재하기 위한 수단과 AI를 활용한 드론 콘텐츠가 부족하다. 상기 문제점을 해결하기 위해 Darknet 과 YOLO_mark를 사용하여 디바이스를 학습시켜 손쉽게 드론 인식 및 구별을 할 수 있게 구현하였다. 이를 통해 기존 드론 제재 수단의 한계를 극복하고 손쉽게 이용할 수 있다. 나아가 본 논문을 이용하여 군◦경에서 드론 식별 등으로 활용할 수 있다.

  • PDF

A Study on the Comparison of the Commercial API for Recognizing Speech with Emotion (상용 API 의 감정에 따른 음성 인식 성능 비교 연구)

  • Janghoon Yang
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.05a
    • /
    • pp.52-54
    • /
    • 2023
  • 최근 인공지능 기술의 발전에 따라서 다양한 서비스에서 음성 인식을 활용한 서비스를 제공하면서 음성 인식에 대한 중요성이 증가하고 있다. 이 논문에서는 국내에서 많이 사용되고 있는 대표적인 인공지능 서비스 API 를 제공하는 구글, ETRI, 네이버에 대해서 감정 음성 관점에서 그 차이를 평가하였다. AI Hub 에서 제공하는 감성 대화 말뭉치 데이터 셋의 일부인 음성 테스트 데이터를 사용하여 평가한 결과 ETRI API 가 문자 오류율 (1.29%)과 단어 오류율(10.1%)의 성능 지표에 대해서 가장 우수한 음성 인식 성능을 보임을 확인하였다.

Design of weighted federated learning framework based on local model validation

  • Kim, Jung-Jun;Kang, Jeon Seong;Chung, Hyun-Joon;Park, Byung-Hoon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.11
    • /
    • pp.13-18
    • /
    • 2022
  • In this paper, we proposed VW-FedAVG(Validation based Weighted FedAVG) which updates the global model by weighting according to performance verification from the models of each device participating in the training. The first method is designed to validate each local client model through validation dataset before updating the global model with a server side validation structure. The second is a client-side validation structure, which is designed in such a way that the validation data set is evenly distributed to each client and the global model is after validation. MNIST, CIFAR-10 is used, and the IID, Non-IID distribution for image classification obtained higher accuracy than previous studies.

Framework Design for Malware Dataset Extraction Using Code Patches in a Hybrid Analysis Environment (코드패치 및 하이브리드 분석 환경을 활용한 악성코드 데이터셋 추출 프레임워크 설계)

  • Ki-Sang Choi;Sang-Hoon Choi;Ki-Woong Park
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.34 no.3
    • /
    • pp.403-416
    • /
    • 2024
  • Malware is being commercialized and sold on the black market, primarily driven by financial incentives. With the increasing demand driven by these sales, the scope of attacks via malware has expanded. In response, there has been a surge in research efforts leveraging artificial intelligence for detection and classification. However, adversaries are integrating various anti-analysis techniques into their malware to thwart analytical efforts. In this study, we introduce the "Malware Analysis with Dynamic Extraction (MADE)" framework, a hybrid binary analysis tool devised to procure datasets from advanced malware incorporating Anti-Analysis techniques. The MADE framework has the proficiency to autonomously execute dynamic analysis on binaries, encompassing those laden with Anti-VM and Anti-Debugging defenses. Experimental results substantiate that the MADE framework can effectively circumvent over 90% of diverse malware implementations using Anti-Analysis techniques and can adeptly extract relevant datasets.

Model Type Inference Attack Using Output of Black-Box AI Model (블랙 박스 모델의 출력값을 이용한 AI 모델 종류 추론 공격)

  • An, Yoonsoo;Choi, Daeseon
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.32 no.5
    • /
    • pp.817-826
    • /
    • 2022
  • AI technology is being successfully introduced in many fields, and models deployed as a service are deployed with black box environment that does not expose the model's information to protect intellectual property rights and data. In a black box environment, attackers try to steal data or parameters used during training by using model output. This paper proposes a method of inferring the type of model to directly find out the composition of layer of the target model, based on the fact that there is no attack to infer the information about the type of model from the deep learning model. With ResNet, VGGNet, AlexNet, and simple convolutional neural network models trained with MNIST datasets, we show that the types of models can be inferred using the output values in the gray box and black box environments of the each model. In addition, we inferred the type of model with approximately 83% accuracy in the black box environment if we train the big and small relationship feature that proposed in this paper together, the results show that the model type can be infrerred even in situations where only partial information is given to attackers, not raw probability vectors.

Validation of Semantic Segmentation Dataset for Autonomous Driving (승용자율주행을 위한 의미론적 분할 데이터셋 유효성 검증)

  • Gwak, Seoku;Na, Hoyong;Kim, Kyeong Su;Song, EunJi;Jeong, Seyoung;Lee, Kyewon;Jeong, Jihyun;Hwang, Sung-Ho
    • Journal of Drive and Control
    • /
    • v.19 no.4
    • /
    • pp.104-109
    • /
    • 2022
  • For autonomous driving research using AI, datasets collected from road environments play an important role. In other countries, various datasets such as CityScapes, A2D2, and BDD have already been released, but datasets suitable for the domestic road environment still need to be provided. This paper analyzed and verified the dataset reflecting the Korean driving environment. In order to verify the training dataset, the class imbalance was confirmed by comparing the number of pixels and instances of the dataset. A similar A2D2 dataset was trained with the same deep learning model, ConvNeXt, to compare and verify the constructed dataset. IoU was compared for the same class between two datasets with ConvNeXt and mIoU was compared. In this paper, it was confirmed that the collected dataset reflecting the driving environment of Korea is suitable for learning.