• Title/Summary/Keyword: AI dataset

Search Result 233, Processing Time 0.027 seconds

A Study on the Implementation and Performance Verification of DistilBERT in an Embedded System(Raspberry PI 5) Environment (임베디드 시스템(Raspberry PI 5) 환경에서의 DistilBERT 구현 및 성능 검증에 관한 연구)

  • Chae-woo Im;Eun-Ho Kim;Jang-Won Suh
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2024.05a
    • /
    • pp.617-618
    • /
    • 2024
  • 본 논문에서 핵심적으로 연구할 내용은 기존 논문에서 소개된 BERT-base 모델의 경량화 버전인 DistilBERT 모델을 임베디드 시스템(Raspberry PI 5) 환경에 탑재 및 구현하는 것이다. 또한, 본 논문에서는 임베디드 시스템(Raspberry PI 5) 환경에 탑재한 DistilBERT 모델과 BERT-base 모델 간의 성능 비교를 수행하였다. 성능 평가에 사용한 데이터셋은 SQuAD(Standford Question Answering Dataset)로 질의응답 태스크에 대한 데이터셋이며, 성능 검증 지표로는 EM(Exact Match) Score와 F1 Score 그리고 추론시간을 사용하였다. 실험 결과를 통해 DistilBERT와 같은 경량화 모델이 임베디드 시스템(Raspberry PI 5)과 같은 환경에서 온 디바이스 AI(On-Device AI)로 잘 작동함을 증명하였다.

Construction of a Standard Dataset for Liver Tumors for Testing the Performance and Safety of Artificial Intelligence-Based Clinical Decision Support Systems (인공지능 기반 임상의학 결정 지원 시스템 의료기기의 성능 및 안전성 검증을 위한 간 종양 표준 데이터셋 구축)

  • Seung-seob Kim;Dong Ho Lee;Min Woo Lee;So Yeon Kim;Jaeseung Shin;Jin‑Young Choi;Byoung Wook Choi
    • Journal of the Korean Society of Radiology
    • /
    • v.82 no.5
    • /
    • pp.1196-1206
    • /
    • 2021
  • Purpose To construct a standard dataset of contrast-enhanced CT images of liver tumors to test the performance and safety of artificial intelligence (AI)-based algorithms for clinical decision support systems (CDSSs). Materials and Methods A consensus group of medical experts in gastrointestinal radiology from four national tertiary institutions discussed the conditions to be included in a standard dataset. Seventy-five cases of hepatocellular carcinoma, 75 cases of metastasis, and 30-50 cases of benign lesions were retrieved from each institution, and the final dataset consisted of 300 cases of hepatocellular carcinoma, 300 cases of metastasis, and 183 cases of benign lesions. Only pathologically confirmed cases of hepatocellular carcinomas and metastases were enrolled. The medical experts retrieved the medical records of the patients and manually labeled the CT images. The CT images were saved as Digital Imaging and Communications in Medicine (DICOM) files. Results The medical experts in gastrointestinal radiology constructed the standard dataset of contrast-enhanced CT images for 783 cases of liver tumors. The performance and safety of the AI algorithm can be evaluated by calculating the sensitivity and specificity for detecting and characterizing the lesions. Conclusion The constructed standard dataset can be utilized for evaluating the machine-learning-based AI algorithm for CDSS.

Proposal of AI-based Graffiti Robot for Children disconnected from Peers with COVID-19 (코로나19로 또래와 단절된 아동을 위한 인공지능 낙서 로봇 제안)

  • Song, Ju-Yeon;Lee, Kang-Hee
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2020.07a
    • /
    • pp.29-31
    • /
    • 2020
  • 본 논문에서는 코로나19 사태로 인한 팬데믹(pandemic) 현상으로 인해 또래와 단절된 아동들의 정서발달을 위해 인공지능 낙서 로봇인 Doodle Robot을 제안한다. Doodle Robot은 또래 형제가 없는 아동에게 함께 그림을 그릴 수 있는 그림친구로서 아동의 정서적 발달에 기여한다. YOLO 알고리즘을 사용하여 객체검출기능을 구현하였고 낙서 Data는 Quick! Draw Dataset에서 추출하였다.

  • PDF

Analysis of Copyright and Licensing Issues in Artificial Intelligence (인공지능에서 저작권과 라이선스 이슈 분석)

  • W.O. Ryoo;S.Y. Lee;S.I. Jung
    • Electronics and Telecommunications Trends
    • /
    • v.38 no.6
    • /
    • pp.84-94
    • /
    • 2023
  • Open source has many advantages and is widely used in various fields. However, legal disputes regarding copyright and licensing of datasets and learning models have recently arisen in artificial intelligence developments. We examine how datasets affect artificial intelligence learning and services from the perspective of copyrighting and licensing when datasets are used for training models. The licensing conditions of datasets can lead to copyright infringement and license violation, thus determining the scope of disclosure and commercialization of the trained model. In addition, we examine related legal issues.

Generating Extreme Close-up Shot Dataset Based On ROI Detection For Classifying Shots Using Artificial Neural Network (인공신경망을 이용한 샷 사이즈 분류를 위한 ROI 탐지 기반의 익스트림 클로즈업 샷 데이터 셋 생성)

  • Kang, Dongwann;Lim, Yang-mi
    • Journal of Broadcast Engineering
    • /
    • v.24 no.6
    • /
    • pp.983-991
    • /
    • 2019
  • This study aims to analyze movies which contain various stories according to the size of their shots. To achieve this, it is needed to classify dataset according to the shot size, such as extreme close-up shots, close-up shots, medium shots, full shots, and long shots. However, a typical video storytelling is mainly composed of close-up shots, medium shots, full shots, and long shots, it is not an easy task to construct an appropriate dataset for extreme close-up shots. To solve this, we propose an image cropping method based on the region of interest (ROI) detection. In this paper, we use the face detection and saliency detection to estimate the ROI. By cropping the ROI of close-up images, we generate extreme close-up images. The dataset which is enriched by proposed method is utilized to construct a model for classifying shots based on its size. The study can help to analyze the emotional changes of characters in video stories and to predict how the composition of the story changes over time. If AI is used more actively in the future in entertainment fields, it is expected to affect the automatic adjustment and creation of characters, dialogue, and image editing.

Grade Analysis and Two-Stage Evaluation of Beef Carcass Image Using Deep Learning (딥러닝을 이용한 소도체 영상의 등급 분석 및 단계별 평가)

  • Kim, Kyung-Nam;Kim, Seon-Jong
    • The Journal of the Convergence on Culture Technology
    • /
    • v.8 no.2
    • /
    • pp.385-391
    • /
    • 2022
  • Quality evaluation of beef carcasses is an important issue in the livestock industry. Recently, through the AI monitor system based on artificial intelligence, the quality manager can receive help in making accurate decisions based on the analysis of beef carcass images or result information. This artificial intelligence dataset is an important factor in judging performance. Existing datasets may have different surface orientation or resolution. In this paper, we proposed a two-stage classification model that can efficiently manage the grades of beef carcass image using deep learning. And to overcome the problem of the various conditions of the image, a new dataset of 1,300 images was constructed. The recognition rate of deep network for 5-grade classification using the new dataset was 72.5%. Two-stage evaluation is a method to increase reliability by taking advantage of the large difference between grades 1++, 1+, and grades 1 and 2 and 3. With two experiments using the proposed two stage model, the recognition rates of 73.7% and 77.2% were obtained. As this, The proposed method will be an efficient method if we have a dataset with 100% recognition rate in the first stage.

Survival Time Prediction for Adenocarcinoma Lung Cancer based on Pathological Image Analysis (폐암 선암 생존시간 예측을 위한 병리학적 영상분석)

  • Vo, Vi Thi-Tuong;Kim, Aera;Lee, TaeBum;Kim, Soo-Hyung
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2021.11a
    • /
    • pp.779-782
    • /
    • 2021
  • Survival time analysis is one of the main methods used by the pathologist to prognosis for cancer patients. In this paper, we strive to estimate the individual survival time of Adenocarcinoma (ADC) lung cancer patients from pathological images by adopting the convolutional neural network called the SurvPatchV1 model. First, we extracted tissue patches from the whole-slide images (WSI) to deal with extremely large dimensions of WSI. Then the survival time of each patch is estimated through the SurvPatchV1 model. Finally, the individual survival time of each patient is computed. The proposed method is trained and tested on the subset of the NLST dataset for ADC lung cancer. The result demonstrates that our model can obtain all tissue information in lieu of only tumor information in a whole pathological image to estimate the individual survival time.

Named Entity Detection Using Generative Al for Personal Information-Specific Named Entity Annotation Conversation Dataset (개인정보 특화 개체명 주석 대화 데이터셋 기반 생성AI 활용 개체명 탐지)

  • Yejee Kang;Li Fei;Yeonji Jang;Seoyoon Park;Hansaem Kim
    • Annual Conference on Human and Language Technology
    • /
    • 2023.10a
    • /
    • pp.499-504
    • /
    • 2023
  • 본 연구에서는 민감한 개인정보의 유출과 남용 위험이 높아지고 있는 상황에서 정확한 개인정보 탐지 및 비식별화의 효율을 높이기 위해 개인정보 항목에 특화된 개체명 체계를 개발하였다. 개인정보 태그셋이 주석된 대화 데이터 4,981세트를 구축하고, 생성 AI 모델을 활용하여 개인정보 개체명 탐지 실험을 수행하였다. 실험을 위해 최적의 프롬프트를 설계하여 퓨샷러닝(few-shot learning)을 통해 탐지 결과를 평가하였다. 구축한 데이터셋과 영어 기반의 개인정보 주석 데이터셋을 비교 분석한 결과 고유식별번호 항목에 대해 본 연구에서 구축한 데이터셋에서 더 높은 탐지 성능이 나타났으며, 이를 통해 데이터셋의 필요성과 우수성을 입증하였다.

  • PDF

Weather Recognition Based on 3C-CNN

  • Tan, Ling;Xuan, Dawei;Xia, Jingming;Wang, Chao
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.8
    • /
    • pp.3567-3582
    • /
    • 2020
  • Human activities are often affected by weather conditions. Automatic weather recognition is meaningful to traffic alerting, driving assistance, and intelligent traffic. With the boost of deep learning and AI, deep convolutional neural networks (CNN) are utilized to identify weather situations. In this paper, a three-channel convolutional neural network (3C-CNN) model is proposed on the basis of ResNet50.The model extracts global weather features from the whole image through the ResNet50 branch, and extracts the sky and ground features from the top and bottom regions by two CNN5 branches. Then the global features and the local features are merged by the Concat function. Finally, the weather image is classified by Softmax classifier and the identification result is output. In addition, a medium-scale dataset containing 6,185 outdoor weather images named WeatherDataset-6 is established. 3C-CNN is used to train and test both on the Two-class Weather Images and WeatherDataset-6. The experimental results show that 3C-CNN achieves best on both datasets, with the average recognition accuracy up to 94.35% and 95.81% respectively, which is superior to other classic convolutional neural networks such as AlexNet, VGG16, and ResNet50. It is prospected that our method can also work well for images taken at night with further improvement.

Class Classification and Validation of a Musculoskeletal Risk Factor Dataset for Manufacturing Workers (제조업 노동자 근골격계 부담요인 데이터셋 클래스 분류와 유효성 검증)

  • Young-Jin Kang;;;Jeong, Seok Chan
    • The Journal of Bigdata
    • /
    • v.8 no.1
    • /
    • pp.49-59
    • /
    • 2023
  • There are various items in the safety and health standards of the manufacturing industry, but they can be divided into work-related diseases and musculoskeletal diseases according to the standards for sickness and accident victims. Musculoskeletal diseases occur frequently in manufacturing and can lead to a decrease in labor productivity and a weakening of competitiveness in manufacturing. In this paper, to detect the musculoskeletal harmful factors of manufacturing workers, we defined the musculoskeletal load work factor analysis, harmful load working postures, and key points matching, and constructed data for Artificial Intelligence(AI) learning. To check the effectiveness of the suggested dataset, AI algorithms such as YOLO, Lite-HRNet, and EfficientNet were used to train and verify. Our experimental results the human detection accuracy is 99%, the key points matching accuracy of the detected person is @AP0.5 88%, and the accuracy of working postures evaluation by integrating the inferred matching positions is LEGS 72.2%, NECT 85.7%, TRUNK 81.9%, UPPERARM 79.8%, and LOWERARM 92.7%, and considered the necessity for research that can prevent deep learning-based musculoskeletal diseases.