• Title/Summary/Keyword: Dataset Creation

Search Result 39, Processing Time 0.026 seconds

Manchu Script Letters Dataset Creation and Labeling

  • Aaron Daniel Snowberger;Choong Ho Lee
    • Journal of information and communication convergence engineering
    • /
    • v.22 no.1
    • /
    • pp.80-87
    • /
    • 2024
  • The Manchu language holds historical significance, but a complete dataset of Manchu script letters for training optical character recognition machine-learning models is currently unavailable. Therefore, this paper describes the process of creating a robust dataset of extracted Manchu script letters. Rather than performing automatic letter segmentation based on whitespace or the thickness of the central word stem, an image of the Manchu script was manually inspected, and one copy of the desired letter was selected as a region of interest. This selected region of interest was used as a template to match all other occurrences of the same letter within the Manchu script image. Although the dataset in this study contained only 4,000 images of five Manchu script letters, these letters were collected from twenty-eight writing styles. A full dataset of Manchu letters is expected to be obtained through this process. The collected dataset was normalized and trained using a simple convolutional neural network to verify its effectiveness.

Development of a Risk Index for Prediction of Abnormal Pap Test Results in Serbia

  • Vukovic, Dejana;Antic, Ljiljana;Vasiljevic, Mladenko;Antic, Dragan;Matejic, Bojana
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.16 no.8
    • /
    • pp.3527-3531
    • /
    • 2015
  • Background: Serbia is one of the countries with highest incidence and mortality rates for cervical cancer in Central and South Eastern Europe. Introducing a risk index could provide a powerful means for targeting groups at high likelihood of having an abnormal cervical smear and increase efficiency of screening. The aim of the present study was to create and assess validity ofa index for prediction of an abnormal Pap test result. Materials and Methods: The study population was drawn from patients attending Departments for Women's Health in two primary health care centers in Serbia. Out of 525 respondents 350 were randomly selected and data obtained from them were used as the index creation dataset. Data obtained from the remaining 175 were used as an index validation data set. Results: Age at first intercourse under 18, more than 4 sexual partners, history of STD and multiparity were attributed statistical weights 16, 15, 14 and 13, respectively. The distribution of index scores in index-creation data set showed that most respondents had a score 0 (54.9%). In the index-creation dataset mean index score was 10.3 (SD-13.8), and in the validation dataset the mean was 9.1 (SD=13.2). Conclusions: The advantage of such scoring system is that it is simple, consisting of only four elements, so it could be applied to identify women with high risk for cervical cancer that would be referred for further examination.

Development of a Korean chatbot system that enables emotional communication with users in real time (사용자와 실시간으로 감성적 소통이 가능한 한국어 챗봇 시스템 개발)

  • Baek, Sungdae;Lee, Minho
    • Journal of Sensor Science and Technology
    • /
    • v.30 no.6
    • /
    • pp.429-435
    • /
    • 2021
  • In this study, the creation of emotional dialogue was investigated within the process of developing a robot's natural language understanding and emotional dialogue processing. Unlike an English-based dataset, which is the mainstay of natural language processing, the Korean-based dataset has several shortcomings. Therefore, in a situation where the Korean language base is insufficient, the Korean dataset should be dealt with in detail, and in particular, the unique characteristics of the language should be considered. Hence, the first step is to base this study on a specific Korean dataset consisting of conversations on emotional topics. Subsequently, a model was built that learns to extract the continuous dialogue features from a pre-trained language model to generate sentences while maintaining the context of the dialogue. To validate the model, a chatbot system was implemented and meaningful results were obtained by collecting the external subjects and conducting experiments. As a result, the proposed model was influenced by the dataset in which the conversation topic was consultation, to facilitate free and emotional communication with users as if they were consulting with a chatbot. The results were analyzed to identify and explain the advantages and disadvantages of the current model. Finally, as a necessary element to reach the aforementioned ultimate research goal, a discussion is presented on the areas for future studies.

A Simulation Model for the Creation of RFID Business Events (RFID 비즈니스 이벤트의 생성을 위한 시뮬레이션 모델)

  • Ryu, Wooseok
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.17 no.11
    • /
    • pp.2609-2614
    • /
    • 2013
  • Adoption of RFID has become widespread including logistics, drug supply-chain, and healthcare. To adopt RFID, we need to evaluate performance and feasibility of RFID S/W such as EPC Information Service (EPCIS), which demands a variety of test datasets of RFID business events. This paper proposes a novel method for creating RFID business events dataset by means of the simulation of RFID infrastructure. Proposed model provides a flexible representation capability since this is based on well-known petri-net. In addition, it can also be useful when determining adoption of RFID as it supports simulation of RFID environment.

A Case Study of Dataset Records in Information Management System (행정정보 데이터세트 사례 조사 연구)

  • Oh, Seh-La;Park, Seunghoon;Yim, Jin-Hee
    • Journal of Korean Society of Archives and Records Management
    • /
    • v.18 no.2
    • /
    • pp.109-133
    • /
    • 2018
  • The need for the records management of administrative information dataset has led to a broad consensus among archivists and has been continuously studied. In the meantime, information technology has greatly advanced, and the development and redevelopment of information management systems have been increasing. Nevertheless, dataset management in information management system has not been practiced in public organizations. This is because it is supposed that no practical management plan exists. From the point of view that practical dataset management methods should be based on the reality of dataset creation and management environment, this study investigates various active datasets in working administrative information systems. The examples and the information drawn from the examination are expected to contribute to dataset management planning. Moreover, the research methods can be utilized in further studies.

How do Export Pioneers Emerge and How are They Related to Product Creators?

  • HAHN, CHIN HEE
    • KDI Journal of Economic Policy
    • /
    • v.43 no.1
    • /
    • pp.1-27
    • /
    • 2021
  • In this paper, we empirically examine how export pioneers emerge and how they are related to product creators/innovators, utilizing a rich plant-product level dataset from the Korean manufacturing sector for the period of 1990-1998. Our analysis covers the process from the appearance of product creators as well as product imitators to the emergence of export pioneers. We find, first, that product imitators are larger, more productive and older than product creators. Second, most export pioneers are nevertheless found to be product creators. This result is largely due to the fact that almost all export pioneers export the products in the same year as product creation. Third, there are similarities as well as differences between product creators and export pioneers. Plants that are more productive or larger are more likely to become product creators as well as export pioneers. However, previous exporting experience positively affects the probability of export pioneering only, while plants' engagement in R&D positively affects the probability of product creation only. We discuss possible explanations for our main empirical results as well as their policy implications.

Intrusion Detection System Modeling Based on Learning from Network Traffic Data

  • Midzic, Admir;Avdagic, Zikrija;Omanovic, Samir
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.11
    • /
    • pp.5568-5587
    • /
    • 2018
  • This research uses artificial intelligence methods for computer network intrusion detection system modeling. Primary classification is done using self-organized maps (SOM) in two levels, while the secondary classification of ambiguous data is done using Sugeno type Fuzzy Inference System (FIS). FIS is created by using Adaptive Neuro-Fuzzy Inference System (ANFIS). The main challenge for this system was to successfully detect attacks that are either unknown or that are represented by very small percentage of samples in training dataset. Improved algorithm for SOMs in second layer and for the FIS creation is developed for this purpose. Number of clusters in the second SOM layer is optimized by using our improved algorithm to minimize amount of ambiguous data forwarded to FIS. FIS is created using ANFIS that was built on ambiguous training dataset clustered by another SOM (which size is determined dynamically). Proposed hybrid model is created and tested using NSL KDD dataset. For our research, NSL KDD is especially interesting in terms of class distribution (overlapping). Objectives of this research were: to successfully detect intrusions represented in data with small percentage of the total traffic during early detection stages, to successfully deal with overlapping data (separate ambiguous data), to maximize detection rate (DR) and minimize false alarm rate (FAR). Proposed hybrid model with test data achieved acceptable DR value 0.8883 and FAR value 0.2415. The objectives were successfully achieved as it is presented (compared with the similar researches on NSL KDD dataset). Proposed model can be used not only in further research related to this domain, but also in other research areas.

A Study on the Functional Requirements of Record Production System for Dataset : Focused on Case Study of KR Asset management system (데이터세트 생산시스템 기능요건 연구 KR 재산관리시스템 사례를 중심으로)

  • Ryu, Hanjo;Baek, Youngmi;Yim, Jinhee
    • The Korean Journal of Archival Studies
    • /
    • no.70
    • /
    • pp.5-40
    • /
    • 2021
  • Administrative information dataset records produced by various systems designed for work are difficult to manage on a case-by-case basis, requiring separate procedures to identify and evaluate data-sets. Identified data set records are apprasal and transferred to the records management system or disposed of. In this process, sufficient records management elements must be reflected in the production system itself in order to adhere to the principles of record management. In this paper, the functional requirements of the production system to accurately identify and safely manage data-sets were derived and applied based on the case of the KR property management system. It is hoped that this research on functional requirements of production systems will be added to lead to the creation of standards for functional requirements of data set production systems.

Empirical Study About ODA Effects on Job Creation

  • Seung Hee Ha;JaeHong Park
    • Journal of Korea Trade
    • /
    • v.26 no.6
    • /
    • pp.1-19
    • /
    • 2022
  • Purpose - This study empirically investigates the effects of Official Development Assistance (ODA) on the economic activities of private actors in recipient countries. As a proxy for the economic activities of private actors, we utilize the job creation activities of foreign subsidiaries in recipient countries. The foreign subsidiaries provide a foundation for economic development by creating paying jobs. That is, if ODA has been successfully transferred to foreign subsidiaries, then these foreign subsidiaries should help economic growth and help create a boom in the local market by providing jobs. These jobs eventually lead to the achievement of the primary aims of foreign aid, including poverty reduction. Thus, this study empirically examines the relationship between ODA and the number of jobs created by foreign subsidiaries in recipient countries. Design/methodology - This is the first study to examine the effects of the ODA on the job creation of foreign subsidiaries because it has been hard to obtain internal information related to the employment status of foreign subsidiaries. Fortunately, we have a unique panel dataset provided by the Export-Import Bank of Korea (KEXIM) for 2006 to 2013. In terms of the empirical specification, we use the generalized least squares (GLS) method. The panel GLS estimator allows us to have an efficient estimation that overcomes the limitations of the panel data. It employs assumptions about the heteroscedasticity between the panels and makes an autocorrelation of the error term within each panel. Findings - We find that ODA influences job creation in foreign subsidiaries. In particular, we found that ODA creates more jobs in sales than in managerial or production positions. This study also shows that the effect of the ODA on the foreign subsidiaries' job creation activities depend on the purpose of the ODA. By examining ODA effects on the foreign subsidiaries' economic activities (e.g., job creation), this study fills a gap in the current literature. Originality/value - Existing studies that focus on the ODA effect have either a macroeconomic point or a microeconomic point of view. However, both approaches do not explain how well foreign aid has influenced private economic actors of recipient countries. In essence, previous researchers found it difficult to obtain the necessary data for internal employment status from foreign subsidiaries. However, thanks to the Korea Export-Import Bank, this study shows that ODA indeed influences the job creation activities of foreign subsidiaries even after controlling for other factors such as FDI, GDP growth rate, employment rate, household expenditure, mother firms' share, etc. By doing so, we can examine how ODA influences the job creation of foreign subsidiaries, which might help economic development and reduce the amount of poverty in recipient countries.

Construction of Database for Deep Learning-based Occlusion Area Detection in the Virtual Environment (가상 환경에서의 딥러닝 기반 폐색영역 검출을 위한 데이터베이스 구축)

  • Kim, Kyeong Su;Lee, Jae In;Gwak, Seok Woo;Kang, Won Yul;Shin, Dae Young;Hwang, Sung Ho
    • Journal of Drive and Control
    • /
    • v.19 no.3
    • /
    • pp.9-15
    • /
    • 2022
  • This paper proposes a method for constructing and verifying datasets used in deep learning technology, to prevent safety accidents in automated construction machinery or autonomous vehicles. Although open datasets for developing image recognition technologies are challenging to meet requirements desired by users, this study proposes the interface of virtual simulators to facilitate the creation of training datasets desired by users. The pixel-level training image dataset was verified by creating scenarios, including various road types and objects in a virtual environment. Detecting an object from an image may interfere with the accurate path determination due to occlusion areas covered by another object. Thus, we construct a database, for developing an occlusion area detection algorithm in a virtual environment. Additionally, we present the possibility of its use as a deep learning dataset to calculate a grid map, that enables path search considering occlusion areas. Custom datasets are built using the RDBMS system.