• Title/Summary/Keyword: Dataset for AI

Search Result 201, Processing Time 0.033 seconds

Dataset for Interactive Recommendation System (인터랙션 기반 추천 시스템 개발을 위한 데이터셋 연구)

  • Chung, Euisok;Kim, Hyun Woo;Oh, Hyo-Jung;Song, Hwa Jeon
    • Annual Conference on Human and Language Technology
    • /
    • 2020.10a
    • /
    • pp.481-485
    • /
    • 2020
  • AI와 사용자간의 대화를 통해 사용자의 요구사항을 파악하고, 해당 요구사항에 적합한 상품을 추천하는 형상을 인터랙션 기반 추천 시스템의 한 예로 볼 수 있다. 우리는 해당 시스템 개발을 위하여 의상 코디셋 추천을 위한 대화 기반 데이터셋을 구축하였다. 데이터셋은 대화와 의상 추천 절차를 반복하여 사용자가 원하는 의상셋을 찾아가는 내용으로 구성된다. 그리고, AI의 코디셋 추천 기술 검증을 위해 두가지 의상 추천 평가셋을 제안한다. 본 논문은 대화 데이터셋 및 관련 평가셋의 개발 절차 및 구성에 대하여 기술하고, 관련된 실험 결과 일부를 보여준다.

  • PDF

Axial load prediction in double-skinned profiled steel composite walls using machine learning

  • G., Muthumari G;P. Vincent
    • Computers and Concrete
    • /
    • v.33 no.6
    • /
    • pp.739-754
    • /
    • 2024
  • This study presents an innovative AI-driven approach to assess the ultimate axial load in Double-Skinned Profiled Steel sheet Composite Walls (DPSCWs). Utilizing a dataset of 80 entries, seven input parameters were employed, and various AI techniques, including Linear Regression, Polynomial Regression, Support Vector Regression, Decision Tree Regression, Decision Tree with AdaBoost Regression, Random Forest Regression, Gradient Boost Regression Tree, Elastic Net Regression, Ridge Regression, and LASSO Regression, were evaluated. Decision Tree Regression and Random Forest Regression emerged as the most accurate models. The top three performing models were integrated into a hybrid approach, excelling in accurately estimating DPSCWs' ultimate axial load. This adaptable hybrid model outperforms traditional methods, reducing errors in complex scenarios. The validated Artificial Neural Network (ANN) model showcases less than 1% error, enhancing reliability. Correlation analysis highlights robust predictions, emphasizing the importance of steel sheet thickness. The study contributes insights for predicting DPSCW strength in civil engineering, suggesting optimization and database expansion. The research advances precise load capacity estimation, empowering engineers to enhance construction safety and explore further machine learning applications in structural engineering.

Generating Extreme Close-up Shot Dataset Based On ROI Detection For Classifying Shots Using Artificial Neural Network (인공신경망을 이용한 샷 사이즈 분류를 위한 ROI 탐지 기반의 익스트림 클로즈업 샷 데이터 셋 생성)

  • Kang, Dongwann;Lim, Yang-mi
    • Journal of Broadcast Engineering
    • /
    • v.24 no.6
    • /
    • pp.983-991
    • /
    • 2019
  • This study aims to analyze movies which contain various stories according to the size of their shots. To achieve this, it is needed to classify dataset according to the shot size, such as extreme close-up shots, close-up shots, medium shots, full shots, and long shots. However, a typical video storytelling is mainly composed of close-up shots, medium shots, full shots, and long shots, it is not an easy task to construct an appropriate dataset for extreme close-up shots. To solve this, we propose an image cropping method based on the region of interest (ROI) detection. In this paper, we use the face detection and saliency detection to estimate the ROI. By cropping the ROI of close-up images, we generate extreme close-up images. The dataset which is enriched by proposed method is utilized to construct a model for classifying shots based on its size. The study can help to analyze the emotional changes of characters in video stories and to predict how the composition of the story changes over time. If AI is used more actively in the future in entertainment fields, it is expected to affect the automatic adjustment and creation of characters, dialogue, and image editing.

Named Entity Detection Using Generative Al for Personal Information-Specific Named Entity Annotation Conversation Dataset (개인정보 특화 개체명 주석 대화 데이터셋 기반 생성AI 활용 개체명 탐지)

  • Yejee Kang;Li Fei;Yeonji Jang;Seoyoon Park;Hansaem Kim
    • Annual Conference on Human and Language Technology
    • /
    • 2023.10a
    • /
    • pp.499-504
    • /
    • 2023
  • 본 연구에서는 민감한 개인정보의 유출과 남용 위험이 높아지고 있는 상황에서 정확한 개인정보 탐지 및 비식별화의 효율을 높이기 위해 개인정보 항목에 특화된 개체명 체계를 개발하였다. 개인정보 태그셋이 주석된 대화 데이터 4,981세트를 구축하고, 생성 AI 모델을 활용하여 개인정보 개체명 탐지 실험을 수행하였다. 실험을 위해 최적의 프롬프트를 설계하여 퓨샷러닝(few-shot learning)을 통해 탐지 결과를 평가하였다. 구축한 데이터셋과 영어 기반의 개인정보 주석 데이터셋을 비교 분석한 결과 고유식별번호 항목에 대해 본 연구에서 구축한 데이터셋에서 더 높은 탐지 성능이 나타났으며, 이를 통해 데이터셋의 필요성과 우수성을 입증하였다.

  • PDF

A Study on Dataset Generation Method for Korean Language Information Extraction from Generative Large Language Model and Prompt Engineering (생성형 대규모 언어 모델과 프롬프트 엔지니어링을 통한 한국어 텍스트 기반 정보 추출 데이터셋 구축 방법)

  • Jeong Young Sang;Ji Seung Hyun;Kwon Da Rong Sae
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.11
    • /
    • pp.481-492
    • /
    • 2023
  • This study explores how to build a Korean dataset to extract information from text using generative large language models. In modern society, mixed information circulates rapidly, and effectively categorizing and extracting it is crucial to the decision-making process. However, there is still a lack of Korean datasets for training. To overcome this, this study attempts to extract information using text-based zero-shot learning using a generative large language model to build a purposeful Korean dataset. In this study, the language model is instructed to output the desired result through prompt engineering in the form of "system"-"instruction"-"source input"-"output format", and the dataset is built by utilizing the in-context learning characteristics of the language model through input sentences. We validate our approach by comparing the generated dataset with the existing benchmark dataset, and achieve 25.47% higher performance compared to the KLUE-RoBERTa-large model for the relation information extraction task. The results of this study are expected to contribute to AI research by showing the feasibility of extracting knowledge elements from Korean text. Furthermore, this methodology can be utilized for various fields and purposes, and has potential for building various Korean datasets.

Class Classification and Validation of a Musculoskeletal Risk Factor Dataset for Manufacturing Workers (제조업 노동자 근골격계 부담요인 데이터셋 클래스 분류와 유효성 검증)

  • Young-Jin Kang;;;Jeong, Seok Chan
    • The Journal of Bigdata
    • /
    • v.8 no.1
    • /
    • pp.49-59
    • /
    • 2023
  • There are various items in the safety and health standards of the manufacturing industry, but they can be divided into work-related diseases and musculoskeletal diseases according to the standards for sickness and accident victims. Musculoskeletal diseases occur frequently in manufacturing and can lead to a decrease in labor productivity and a weakening of competitiveness in manufacturing. In this paper, to detect the musculoskeletal harmful factors of manufacturing workers, we defined the musculoskeletal load work factor analysis, harmful load working postures, and key points matching, and constructed data for Artificial Intelligence(AI) learning. To check the effectiveness of the suggested dataset, AI algorithms such as YOLO, Lite-HRNet, and EfficientNet were used to train and verify. Our experimental results the human detection accuracy is 99%, the key points matching accuracy of the detected person is @AP0.5 88%, and the accuracy of working postures evaluation by integrating the inferred matching positions is LEGS 72.2%, NECT 85.7%, TRUNK 81.9%, UPPERARM 79.8%, and LOWERARM 92.7%, and considered the necessity for research that can prevent deep learning-based musculoskeletal diseases.

Weather Recognition Based on 3C-CNN

  • Tan, Ling;Xuan, Dawei;Xia, Jingming;Wang, Chao
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.8
    • /
    • pp.3567-3582
    • /
    • 2020
  • Human activities are often affected by weather conditions. Automatic weather recognition is meaningful to traffic alerting, driving assistance, and intelligent traffic. With the boost of deep learning and AI, deep convolutional neural networks (CNN) are utilized to identify weather situations. In this paper, a three-channel convolutional neural network (3C-CNN) model is proposed on the basis of ResNet50.The model extracts global weather features from the whole image through the ResNet50 branch, and extracts the sky and ground features from the top and bottom regions by two CNN5 branches. Then the global features and the local features are merged by the Concat function. Finally, the weather image is classified by Softmax classifier and the identification result is output. In addition, a medium-scale dataset containing 6,185 outdoor weather images named WeatherDataset-6 is established. 3C-CNN is used to train and test both on the Two-class Weather Images and WeatherDataset-6. The experimental results show that 3C-CNN achieves best on both datasets, with the average recognition accuracy up to 94.35% and 95.81% respectively, which is superior to other classic convolutional neural networks such as AlexNet, VGG16, and ResNet50. It is prospected that our method can also work well for images taken at night with further improvement.

A Study on Radar Video Fusion Systems for Pedestrian and Vehicle Detection (보행자 및 차량 검지를 위한 레이더 영상 융복합 시스템 연구)

  • Sung-Youn Cho;Yeo-Hwan Yoon
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.24 no.1
    • /
    • pp.197-205
    • /
    • 2024
  • Development of AI and big data-based algorithms to advance and optimize the recognition and detection performance of various static/dynamic vehicles in front and around the vehicle at a time when securing driving safety is the most important point in the development and commercialization of autonomous vehicles. etc. are being studied. However, there are many research cases for recognizing the same vehicle by using the unique advantages of radar and camera, but deep learning image processing technology is not used, or only a short distance is detected as the same target due to radar performance problems. Therefore, there is a need for a convergence-based vehicle recognition method that configures a dataset that can be collected from radar equipment and camera equipment, calculates the error of the dataset, and recognizes it as the same target. In this paper, we aim to develop a technology that can link location information according to the installation location because data errors occur because it is judged as the same object depending on the installation location of the radar and CCTV (video).

Grade Analysis and Two-Stage Evaluation of Beef Carcass Image Using Deep Learning (딥러닝을 이용한 소도체 영상의 등급 분석 및 단계별 평가)

  • Kim, Kyung-Nam;Kim, Seon-Jong
    • The Journal of the Convergence on Culture Technology
    • /
    • v.8 no.2
    • /
    • pp.385-391
    • /
    • 2022
  • Quality evaluation of beef carcasses is an important issue in the livestock industry. Recently, through the AI monitor system based on artificial intelligence, the quality manager can receive help in making accurate decisions based on the analysis of beef carcass images or result information. This artificial intelligence dataset is an important factor in judging performance. Existing datasets may have different surface orientation or resolution. In this paper, we proposed a two-stage classification model that can efficiently manage the grades of beef carcass image using deep learning. And to overcome the problem of the various conditions of the image, a new dataset of 1,300 images was constructed. The recognition rate of deep network for 5-grade classification using the new dataset was 72.5%. Two-stage evaluation is a method to increase reliability by taking advantage of the large difference between grades 1++, 1+, and grades 1 and 2 and 3. With two experiments using the proposed two stage model, the recognition rates of 73.7% and 77.2% were obtained. As this, The proposed method will be an efficient method if we have a dataset with 100% recognition rate in the first stage.

A Study on Detection of Malicious Android Apps based on LSTM and Information Gain (LSTM 및 정보이득 기반의 악성 안드로이드 앱 탐지연구)

  • Ahn, Yulim;Hong, Seungah;Kim, Jiyeon;Choi, Eunjung
    • Journal of Korea Multimedia Society
    • /
    • v.23 no.5
    • /
    • pp.641-649
    • /
    • 2020
  • As the usage of mobile devices extremely increases, malicious mobile apps(applications) that target mobile users are also increasing. It is challenging to detect these malicious apps using traditional malware detection techniques due to intelligence of today's attack mechanisms. Deep learning (DL) is an alternative technique of traditional signature and rule-based anomaly detection techniques and thus have actively been used in numerous recent studies on malware detection. In order to develop DL-based defense mechanisms against intelligent malicious apps, feeding recent datasets into DL models is important. In this paper, we develop a DL-based model for detecting intelligent malicious apps using KU-CISC 2018-Android, the most up-to-date dataset consisting of benign and malicious Android apps. This dataset has hardly been addressed in other studies so far. We extract OPcode sequences from the Android apps and preprocess the OPcode sequences using an N-gram model. We then feed the preprocessed data into LSTM and apply the concept of Information Gain to improve performance of detecting malicious apps. Furthermore, we evaluate our model with numerous scenarios in order to verify the model's design and performance.