• Title/Summary/Keyword: 데이터셋 유형

Search Result 75, Processing Time 0.02 seconds

A Study on the Scholarly Information and Data Requirements of Researchers for Data-Driven Research and Development (데이터 기반 R&D 지원을 위한 연구자의 학술정보 및 데이터 요구 분석 연구)

  • Seok-Hyoung Lee;Kangsandajung Lee;Jayhoon Kim;Hyejin Lee
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.58 no.1
    • /
    • pp.255-283
    • /
    • 2024
  • In this study, as a preliminary research to effectively support data-driven R&D of researchers, we analyzed the academic information and data requirements for researchers to discover new types of academic information and datasets, and to propose directions for academic information services. To achieve the research objectives, we conducted an exploratory case study involving five researchers and administered an online survey among ScienceON users to glean insights into data-driven R&D behaviors and information/data requirements. As a result, researchers relatively referred to academic papers, datasets and software information from academic papers or conference materials. Moreover, the methods and pathways for acquiring data, as well as the types of data, varied across different subject areas. Researchers often faced challenges in data-driven R&D due to difficulties in locating and accessing necessary datasets or software such as learning models. Therefore it has been analyzed that for future support of data-driven R&D, there is a need to systematically construct datasets by subject. Additionally, it is considered necessary to extract and summarize dataset and related software information in conjunction with academic papers.

The Automated Scoring of Kinematics Graph Answers through the Design and Application of a Convolutional Neural Network-Based Scoring Model (합성곱 신경망 기반 채점 모델 설계 및 적용을 통한 운동학 그래프 답안 자동 채점)

  • Jae-Sang Han;Hyun-Joo Kim
    • Journal of The Korean Association For Science Education
    • /
    • v.43 no.3
    • /
    • pp.237-251
    • /
    • 2023
  • This study explores the possibility of automated scoring for scientific graph answers by designing an automated scoring model using convolutional neural networks and applying it to students' kinematics graph answers. The researchers prepared 2,200 answers, which were divided into 2,000 training data and 200 validation data. Additionally, 202 student answers were divided into 100 training data and 102 test data. First, in the process of designing an automated scoring model and validating its performance, the automated scoring model was optimized for graph image classification using the answer dataset prepared by the researchers. Next, the automated scoring model was trained using various types of training datasets, and it was used to score the student test dataset. The performance of the automated scoring model has been improved as the amount of training data increased in amount and diversity. Finally, compared to human scoring, the accuracy was 97.06%, the kappa coefficient was 0.957, and the weighted kappa coefficient was 0.968. On the other hand, in the case of answer types that were not included in the training data, the s coring was almos t identical among human s corers however, the automated scoring model performed inaccurately.

Detection of Car Hacking Using One Class Classifier (단일 클래스 분류기를 사용한 차량 해킹 탐지)

  • Seo, Jae-Hyun
    • Journal of the Korea Convergence Society
    • /
    • v.9 no.6
    • /
    • pp.33-38
    • /
    • 2018
  • In this study, we try to detect new attacks for vehicle by learning only one class. We use Car-Hacking dataset, an intrusion detection dataset, which is used to evaluate classification performance. The dataset are created by logging CAN (Controller Area Network) traffic through OBD-II port from a real vehicle. The dataset have four attack types. One class classification is one of unsupervised learning methods that classifies attack class by learning only normal class. When using unsupervised learning, it difficult to achieve high efficiency because it does not use negative instances for learning. However, unsupervised learning has the advantage for classifying unlabeled data, which are new attacks. In this study, we use one class classifier to detect new attacks that are difficult to detect using signature-based rules on network intrusion detection system. The proposed method suggests a combination of parameters that detect all new attacks and show efficient classification performance for normal dataset.

Automatic Classification of Department Types and Analysis of Co-Authorship Network: Focusing on Korean Journals in the Computer Field

  • Byungkyu Kim;Beom-Jong You;Min-Woo Park
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.4
    • /
    • pp.53-63
    • /
    • 2023
  • The utilization of department information in bibliometric analysis using scientific and technological literature is highly advantageous. In this paper, the department information dataset was built through the screening, data refinement, and classification processing of authors' department type belonging to university institutions appearing in academic journals in the field of science and technology published in Korea, and the automatic classification model based on deep learning was developed using the department information dataset as learning data and verification data. In addition, we analyzed the co-authorship structure and network in the field of computer science using the department information dataset and affiliation information of authors from domestic academic journals. The research resulted in a 98.6% accuracy rate for the automatic classification model using Korean department information. Moreover, the co-authorship patterns of Korean researchers in the computer science and engineering field, along with the characteristics and centralities of the co-author network based on institution type, region, institution, and department type, were identified in detail and visually presented on a map.

Automatic Change Detection Based on Areal Feature Matching in Different Network Data-sets (이종의 도로망 데이터 셋에서 면 객체 매칭 기반 변화탐지)

  • Kim, Jiyoung;Huh, Yong;Yu, Kiyun;Kim, Jung Ok
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.31 no.6_1
    • /
    • pp.483-491
    • /
    • 2013
  • By a development of car navigation systems and mobile or positioning technology, it increases interest in location based services, especially pedestrian navigation systems. Updating of digital maps is important because digital maps are mass data and required to short updating cycle. In this paper, we proposed change detection for different network data-sets based on areal feature matching. Prior to change detection, we defined type of updating between different network data-sets. Next, we transformed road lines into areal features(block) that are surrounded by them and calculated a shape similarity between blocks in different data-sets. Blocks that a shape similarity is more than 0.6 are selected candidate block pairs. Secondly, we detected changed-block pairs by bipartite graph clustering or properties of a concave polygon according to types of updating, and calculated Fr$\acute{e}$chet distance between segments within the block or forming it. At this time, road segments of KAIS map that Fr$\acute{e}$chet distance is more than 50 are extracted as updating road features. As a result of accuracy evaluation, a value of detection rate appears high at 0.965. We could thus identify that a proposed method is able to apply to change detection between different network data-sets.

Attack Datasets for ROS Intrusion Detection Systems (ROS 침입 탐지 시스템을 위한 공격 데이터셋 구축)

  • Hyunghoon Kim;Seungmin Lee;Jaewoong Heo;Hyo Jin Jo
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.34 no.4
    • /
    • pp.681-691
    • /
    • 2024
  • In recent decades, research and development in the field of industrial robotics, such as an unmanned ground vehicle (UGV) and an unmanned aerial vehicle (UAV), has been significant progress. In these advancements, it is important to use middleware, which facilitates communication and data management between different applications, and various industrial communication middleware protocols have been released. The robot operating system (ROS) is the most widely adopted as the main platform for robot system development among the communication middleware protocols. However, the ROS is known to be vulnerable to various cyber attacks, such as eavesdropping on communications and injecting malicious messages, because it was initially designed without security considerations. In response, numerous studies have proposed countermeasures to ROS vulnerabilities. In particular, some work has been proposed on generating ROS datasets for intrusion detection systems (IDS), but there is a lack of research in this area. In this paper, in order to contribute to improving the performance of ROS IDSs, we propose a new type of attack scenario that can occur in the ROS and build ROS attack datasets collected from a real robot system and make it available as an open dataset.

Activity Type Detection Of Random Forest Model Using UWB Radar And Indoor Environmental Measurement Sensor (UWB 레이더와 실내 환경 측정 센서를 이용한 랜덤 포레스트 모델의 재실활동 유형 감지)

  • Park, Jin Su;Jeong, Ji Seong;Yang, Chul Seung;Lee, Jeong Gi
    • The Journal of the Convergence on Culture Technology
    • /
    • v.8 no.6
    • /
    • pp.899-904
    • /
    • 2022
  • As the world becomes an aging society due to a decrease in the birth rate and an increase in life expectancy, a system for health management of the elderly population is needed. Among them, various studies on occupancy and activity types are being conducted for smart home care services for indoor health management. In this paper, we propose a random forest model that classifies activity type as well as occupancy status through indoor temperature and humidity, CO2, fine dust values and UWB radar positioning for smart home care service. The experiment measures indoor environment and occupant positioning data at 2-second intervals using three sensors that measure indoor temperature and humidity, CO2, and fine dust and two UWB radars. The measured data is divided into 80% training set data and 20% test set data after correcting outliers and missing values, and the random forest model is applied to evaluate the list of important variables, accuracy, sensitivity, and specificity.

Q-Net : Machine Reading Comprehension adding Question Type (Q-Net : 질문 유형을 추가한 기계 독해)

  • Kim, Jeong-Moo;Shin, Chang-Uk;Cha, Jeong-Won
    • Annual Conference on Human and Language Technology
    • /
    • 2018.10a
    • /
    • pp.645-648
    • /
    • 2018
  • 기계 독해는 기계가 주어진 본문을 이해하고 질문에 대한 정답을 본문 내에서 찾아내는 문제이다. 본 논문은 질문 유형을 추가하여 정답 선택에 도움을 주도록 설계하였다. 우리는 Person, Location, Date, Number, Why, How, What, Others와 같이 8개의 질문 유형을 나누고 이들이 본문의 중요 자질들과 Attention이 일어나도록 설계하였다. 제안 방법의 평가를 위해 SQuAD의 한국어 번역 데이터와 한국어 Wikipedia로 구축한 K-QuAD 데이터 셋으로 실험을 진행하였다. 제안한 모델의 실험 결과 부분 일치를 인정하여, EM 84.650%, F1 86.208%로 K-QuAD 제안 논문 실험인 BiDAF 모델보다 더 나은 성능을 얻었다.

  • PDF

Classification and analysis of error types for deep learning-based Korean spelling correction (딥러닝 기반 한국어 맞춤법 교정을 위한 오류 유형 분류 및 분석)

  • Koo, Seonmin;Park, Chanjun;So, Aram;Lim, Heuiseok
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.12
    • /
    • pp.65-74
    • /
    • 2021
  • Recently, studies on Korean spelling correction have been actively conducted based on machine translation and automatic noise generation. These methods generate noise and use as train and data set. This has limitation in that it is difficult to accurately measure performance because it is unlikely that noise other than the noise used for learning is included in the test set In addition, there is no practical error type standard, so the type of error used in each study is different, making qualitative analysis difficult. This paper proposes new 'error type classification' for deep learning-based Korean spelling correction research, and error analysis perform on existing commercialized Korean spelling correctors (System A, B, C). As a result of analysis, it was found the three correction systems did not perform well in correcting other error types presented in this paper other than spacing, and hardly recognized errors in word order or tense.

A Study on Automatic Classification of Subject Headings Using BERT Model (BERT 모형을 이용한 주제명 자동 분류 연구)

  • Yong-Gu Lee
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.57 no.2
    • /
    • pp.435-452
    • /
    • 2023
  • This study experimented with automatic classification of subject headings using BERT-based transfer learning model, and analyzed its performance. This study analyzed the classification performance according to the main class of KDC classification and the category type of subject headings. Six datasets were constructed from Korean national bibliographies based on the frequency of the assignments of subject headings, and titles were used as classification features. As a result, classification performance showed values of 0.6059 and 0.5626 on the micro F1 and macro F1 score, respectively, in the dataset (1,539,076 records) containing 3,506 subject headings. In addition, classification performance by the main class of KDC classification showed good performance in the class General works, Natural science, Technology and Language, and low performance in Religion and Arts. As for the performance by the category type of the subject headings, the categories of plant, legal name and product name showed high performance, whereas national treasure/treasure category showed low performance. In a large dataset, the ratio of subject headings that cannot be assigned increases, resulting in a decrease in final performance, and improvement is needed to increase classification performance for low-frequency subject headings.