• Title/Summary/Keyword: AI 데이터셋

Search Result 224, Processing Time 0.025 seconds

KFREB: Korean Fictional Retrieval-based Evaluation Benchmark for Generative Large Language Models (KFREB: 생성형 한국어 대규모 언어 모델의 검색 기반 생성 평가 데이터셋)

  • Jungseob Lee;Junyoung Son;Taemin Lee;Chanjun Park;Myunghoon Kang;Jeongbae Park;Heuiseok Lim
    • Annual Conference on Human and Language Technology
    • /
    • 2023.10a
    • /
    • pp.9-13
    • /
    • 2023
  • 본 논문에서는 대규모 언어모델의 검색 기반 답변 생성능력을 평가하는 새로운 한국어 벤치마크, KFREB(Korean Fictional Retrieval Evaluation Benchmark)를 제안한다. KFREB는 모델이 사전학습 되지 않은 허구의 정보를 바탕으로 검색 기반 답변 생성 능력을 평가함으로써, 기존의 대규모 언어모델이 사전학습에서 보았던 사실을 반영하여 생성하는 답변이 실제 검색 기반 답변 시스템에서의 능력을 제대로 평가할 수 없다는 문제를 해결하고자 한다. 제안된 KFREB는 검색기반 대규모 언어모델의 실제 서비스 케이스를 고려하여 장문 문서, 두 개의 정답을 포함한 골드 문서, 한 개의 골드 문서와 유사 방해 문서 키워드 유무, 그리고 문서 간 상호 참조를 요구하는 상호참조 멀티홉 리즈닝 경우 등에 대한 평가 케이스를 제공하며, 이를 통해 대규모 언어모델의 적절한 선택과 실제 서비스 활용에 대한 인사이트를 제공할 수 있을 것이다.

  • PDF

Enhancing the performance of the facial keypoint detection model by improving the quality of low-resolution facial images (저화질 안면 이미지의 화질 개선를 통한 안면 특징점 검출 모델의 성능 향상)

  • KyoungOok Lee;Yejin Lee;Jonghyuk Park
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.2
    • /
    • pp.171-187
    • /
    • 2023
  • When a person's face is recognized through a recording device such as a low-pixel surveillance camera, it is difficult to capture the face due to low image quality. In situations where it is difficult to recognize a person's face, problems such as not being able to identify a criminal suspect or a missing person may occur. Existing studies on face recognition used refined datasets, so the performance could not be measured in various environments. Therefore, to solve the problem of poor face recognition performance in low-quality images, this paper proposes a method to generate high-quality images by performing image quality improvement on low-quality facial images considering various environments, and then improve the performance of facial feature point detection. To confirm the practical applicability of the proposed architecture, an experiment was conducted by selecting a data set in which people appear relatively small in the entire image. In addition, by choosing a facial image dataset considering the mask-wearing situation, the possibility of expanding to real problems was explored. As a result of measuring the performance of the feature point detection model by improving the image quality of the face image, it was confirmed that the face detection after improvement was enhanced by an average of 3.47 times in the case of images without a mask and 9.92 times in the case of wearing a mask. It was confirmed that the RMSE for facial feature points decreased by an average of 8.49 times when wearing a mask and by an average of 2.02 times when not wearing a mask. Therefore, it was possible to verify the applicability of the proposed method by increasing the recognition rate for facial images captured in low quality through image quality improvement.

Clasification of Cyber Attack Group using Scikit Learn and Cyber Treat Datasets (싸이킷런과 사이버위협 데이터셋을 이용한 사이버 공격 그룹의 분류)

  • Kim, Kyungshin;Lee, Hojun;Kim, Sunghee;Kim, Byungik;Na, Wonshik;Kim, Donguk;Lee, Jeongwhan
    • Journal of Convergence for Information Technology
    • /
    • v.8 no.6
    • /
    • pp.165-171
    • /
    • 2018
  • The most threatening attack that has become a hot topic of recent IT security is APT Attack.. So far, there is no way to respond to APT attacks except by using artificial intelligence techniques. Here, we have implemented a machine learning algorithm for analyzing cyber threat data using machine learning method, using a data set that collects cyber attack cases using Scikit Learn, a big data machine learning framework. The result showed an attack classification accuracy close to 70%. This result can be developed into the algorithm of the security control system in the future.

A Study of Development and Application of an Inland Water Body Training Dataset Using Sentinel-1 SAR Images in Korea (Sentinel-1 SAR 영상을 활용한 국내 내륙 수체 학습 데이터셋 구축 및 알고리즘 적용 연구)

  • Eu-Ru Lee;Hyung-Sup Jung
    • Korean Journal of Remote Sensing
    • /
    • v.39 no.6_1
    • /
    • pp.1371-1388
    • /
    • 2023
  • Floods are becoming more severe and frequent due to global warming-induced climate change. Water disasters are rising in Korea due to severe rainfall and wet seasons. This makes preventive climate change measures and efficient water catastrophe responses crucial, and synthetic aperture radar satellite imagery can help. This research created 1,423 water body learning datasets for individual water body regions along the Han and Nakdong waterways to reflect domestic water body properties discovered by Sentinel-1 satellite radar imagery. We created a document with exact data annotation criteria for many situations. After the dataset was processed, U-Net, a deep learning model, analyzed water body detection results. The results from applying the learned model to water body locations not involved in the learning process were studied to validate soil water body monitoring on a national scale. The analysis showed that the created water body area detected water bodies accurately (F1-Score: 0.987, Intersection over Union [IoU]: 0.955). Other domestic water body regions not used for training and evaluation showed similar accuracy (F1-Score: 0.941, IoU: 0.89). Both outcomes showed that the computer accurately spotted water bodies in most areas, however tiny streams and gloomy areas had problems. This work should improve water resource change and disaster damage surveillance. Future studies will likely include more water body attribute datasets. Such databases could help manage and monitor water bodies nationwide and shed light on misclassified regions.

Detecting Foreign Objects in Chest X-Ray Images using Artificial Intelligence (인공 지능을 이용한 흉부 엑스레이 이미지에서의 이물질 검출)

  • Chang-Hwa Han
    • Journal of the Korean Society of Radiology
    • /
    • v.17 no.6
    • /
    • pp.873-879
    • /
    • 2023
  • This study explored the use of artificial intelligence(AI) to detect foreign bodies in chest X-ray images. Medical imaging, especially chest X-rays, plays a crucial role in diagnosing diseases such as pneumonia and lung cancer. With the increase in imaging tests, AI has become an important tool for efficient and fast diagnosis. However, images can contain foreign objects, including everyday jewelry like buttons and bra wires, which can interfere with accurate readings. In this study, we developed an AI algorithm that accurately identifies these foreign objects and processed the National Institutes of Health chest X-ray dataset based on the YOLOv8 model. The results showed high detection performance with accuracy, precision, recall, and F1-score all close to 0.91. Despite the excellent performance of AI, the study solved the problem that foreign objects in the image can distort the reading results, emphasizing the innovative role of AI in radiology and its reliability based on accuracy, which is essential for clinical implementation.

3DentAI: U-Nets for 3D Oral Structure Reconstruction from Panoramic X-rays (3DentAI: 파노라마 X-ray로부터 3차원 구강구조 복원을 위한 U-Nets)

  • Anusree P.Sunilkumar;Seong Yong Moon;Wonsang You
    • The Transactions of the Korea Information Processing Society
    • /
    • v.13 no.7
    • /
    • pp.326-334
    • /
    • 2024
  • Extra-oral imaging techniques such as Panoramic X-rays (PXs) and Cone Beam Computed Tomography (CBCT) are the most preferred imaging modalities in dental clinics owing to its patient convenience during imaging as well as their ability to visualize entire teeth information. PXs are preferred for routine clinical treatments and CBCTs for complex surgeries and implant treatments. However, PXs are limited by the lack of third dimensional spatial information whereas CBCTs inflict high radiation exposure to patient. When a PX is already available, it is beneficial to reconstruct the 3D oral structure from the PX to avoid further expenses and radiation dose. In this paper, we propose 3DentAI - an U-Net based deep learning framework for 3D reconstruction of oral structure from a PX image. Our framework consists of three module - a reconstruction module based on attention U-Net for estimating depth from a PX image, a realignment module for aligning the predicted flattened volume to the shape of jaw using a predefined focal trough and ray data, and lastly a refinement module based on 3D U-Net for interpolating the missing information to obtain a smooth representation of oral cavity. Synthetic PXs obtained from CBCT by ray tracing and rendering were used to train the networks without the need of paired PX and CBCT datasets. Our method, trained and tested on a diverse datasets of 600 patients, achieved superior performance to GAN-based models even with low computational complexity.

Analysis of the Status of Natural Language Processing Technology Based on Deep Learning (딥러닝 중심의 자연어 처리 기술 현황 분석)

  • Park, Sang-Un
    • The Journal of Bigdata
    • /
    • v.6 no.1
    • /
    • pp.63-81
    • /
    • 2021
  • The performance of natural language processing is rapidly improving due to the recent development and application of machine learning and deep learning technologies, and as a result, the field of application is expanding. In particular, as the demand for analysis on unstructured text data increases, interest in NLP(Natural Language Processing) is also increasing. However, due to the complexity and difficulty of the natural language preprocessing process and machine learning and deep learning theories, there are still high barriers to the use of natural language processing. In this paper, for an overall understanding of NLP, by examining the main fields of NLP that are currently being actively researched and the current state of major technologies centered on machine learning and deep learning, We want to provide a foundation to understand and utilize NLP more easily. Therefore, we investigated the change of NLP in AI(artificial intelligence) through the changes of the taxonomy of AI technology. The main areas of NLP which consists of language model, text classification, text generation, document summarization, question answering and machine translation were explained with state of the art deep learning models. In addition, major deep learning models utilized in NLP were explained, and data sets and evaluation measures for performance evaluation were summarized. We hope researchers who want to utilize NLP for various purposes in their field be able to understand the overall technical status and the main technologies of NLP through this paper.

An Overloaded Vehicle Identifying System based on Object Detection Model (객체 인식 모델을 활용한 적재불량 화물차 탐지 시스템 개발)

  • Jung, Woojin;Park, Yongju;Park, Jinuk;Kim, Chang-il
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.10a
    • /
    • pp.562-565
    • /
    • 2022
  • Recently, the increasing number of overloaded vehicles on the road poses a risk to traffic safety, such as falling objects, road damage, and chain collisions due to the abnormal weight distribution, and can cause great damage once an accident occurs. However, this irregular weight distribution is not possible to be recognized with the current weight measurement system for vehicles on roads. To address this limitation, we propose to build an object detection-based AI model to identify overloaded vehicles that cause such social problems. In addition, we present a simple yet effective method to construct an object detection model for the large-scale vehicle images. In particular, we utilize the large-scale of vehicle image sets provided by open AI-Hub, which include the overloaded vehicles from the CCTV, black box, and hand-held camera point of view. We inspected the specific features of sizes of vehicles and types of image sources, and pre-processed these images to train a deep learning-based object detection model. Finally, we demonstrated that the detection performance of the overloaded vehicle was improved by about 23% compared to the one using raw data. From the result, we believe that public big data can be utilized more efficiently and applied to the development of an object detection-based overloaded vehicle detection model.

  • PDF

$\mathcal{K}o$-ATOMIC: Korean Commonsense Knowledge Graph ($\mathcal{K}o$-ATOMIC: 일반 상식 기반의 한국어 지식 그래프)

  • Jaewook Lee;Jaehyung Seo;Seungjun Lee;Chanjun Park;Aiyanyo Imatitikua Danielle;Heuiseok Lim
    • Annual Conference on Human and Language Technology
    • /
    • 2022.10a
    • /
    • pp.412-417
    • /
    • 2022
  • 일반 상식 기반의 지식 그래프는 대규모 코퍼스에 포함되어 있는 일반 상식을 그래프로 표현하여, 자연어 처리의 하위 작업들에 적용할 수 있도록 하는 구조화된 지식 표현 방법이다. 현재 가장 잘 알려진 일반 상식 기반의 지식 그래프로는 ATOMIC [1]이 있다. 하지만 한국어를 주요 언어로 하는 일반 상식 기반의 지식 그래프에 대한 연구는 아직 활발하지 않다. 따라서 본 연구에서는 기존에 존재하는 영어 기반의 지식 그래프와 일반 상식 기반의 한국어 데이터셋을 활용해서 한국어 일반 상식 기반 지식 그래프를 구축하는 방법론을 제시한다. 또한, 제작한 지식 그래프를 평가하여 구축하는 방법론에 대한 타당성을 검증한다.

  • PDF

Drone detection system using YOLO (YOLO를 이용한 드론탐지 시스템)

  • Shin, JunPyo;Kim, YuMin;Choi, KyuMin;Sung, SeungMin;Lee, ByungKwon
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2021.01a
    • /
    • pp.233-236
    • /
    • 2021
  • 본 논문에서는 국내 드론 사용량이 증가하고 있으나 드론을 제재하기 위한 수단과 AI를 활용한 드론 콘텐츠가 부족하다. 상기 문제점을 해결하기 위해 Darknet 과 YOLO_mark를 사용하여 디바이스를 학습시켜 손쉽게 드론 인식 및 구별을 할 수 있게 구현하였다. 이를 통해 기존 드론 제재 수단의 한계를 극복하고 손쉽게 이용할 수 있다. 나아가 본 논문을 이용하여 군◦경에서 드론 식별 등으로 활용할 수 있다.

  • PDF