• 제목/요약/키워드: Dataset Creation

검색결과 39건 처리시간 0.024초

Manchu Script Letters Dataset Creation and Labeling

  • Aaron Daniel Snowberger;Choong Ho Lee
    • Journal of information and communication convergence engineering
    • /
    • 제22권1호
    • /
    • pp.80-87
    • /
    • 2024
  • The Manchu language holds historical significance, but a complete dataset of Manchu script letters for training optical character recognition machine-learning models is currently unavailable. Therefore, this paper describes the process of creating a robust dataset of extracted Manchu script letters. Rather than performing automatic letter segmentation based on whitespace or the thickness of the central word stem, an image of the Manchu script was manually inspected, and one copy of the desired letter was selected as a region of interest. This selected region of interest was used as a template to match all other occurrences of the same letter within the Manchu script image. Although the dataset in this study contained only 4,000 images of five Manchu script letters, these letters were collected from twenty-eight writing styles. A full dataset of Manchu letters is expected to be obtained through this process. The collected dataset was normalized and trained using a simple convolutional neural network to verify its effectiveness.

Development of a Risk Index for Prediction of Abnormal Pap Test Results in Serbia

  • Vukovic, Dejana;Antic, Ljiljana;Vasiljevic, Mladenko;Antic, Dragan;Matejic, Bojana
    • Asian Pacific Journal of Cancer Prevention
    • /
    • 제16권8호
    • /
    • pp.3527-3531
    • /
    • 2015
  • Background: Serbia is one of the countries with highest incidence and mortality rates for cervical cancer in Central and South Eastern Europe. Introducing a risk index could provide a powerful means for targeting groups at high likelihood of having an abnormal cervical smear and increase efficiency of screening. The aim of the present study was to create and assess validity ofa index for prediction of an abnormal Pap test result. Materials and Methods: The study population was drawn from patients attending Departments for Women's Health in two primary health care centers in Serbia. Out of 525 respondents 350 were randomly selected and data obtained from them were used as the index creation dataset. Data obtained from the remaining 175 were used as an index validation data set. Results: Age at first intercourse under 18, more than 4 sexual partners, history of STD and multiparity were attributed statistical weights 16, 15, 14 and 13, respectively. The distribution of index scores in index-creation data set showed that most respondents had a score 0 (54.9%). In the index-creation dataset mean index score was 10.3 (SD-13.8), and in the validation dataset the mean was 9.1 (SD=13.2). Conclusions: The advantage of such scoring system is that it is simple, consisting of only four elements, so it could be applied to identify women with high risk for cervical cancer that would be referred for further examination.

사용자와 실시간으로 감성적 소통이 가능한 한국어 챗봇 시스템 개발 (Development of a Korean chatbot system that enables emotional communication with users in real time)

  • 백성대;이민호
    • 센서학회지
    • /
    • 제30권6호
    • /
    • pp.429-435
    • /
    • 2021
  • In this study, the creation of emotional dialogue was investigated within the process of developing a robot's natural language understanding and emotional dialogue processing. Unlike an English-based dataset, which is the mainstay of natural language processing, the Korean-based dataset has several shortcomings. Therefore, in a situation where the Korean language base is insufficient, the Korean dataset should be dealt with in detail, and in particular, the unique characteristics of the language should be considered. Hence, the first step is to base this study on a specific Korean dataset consisting of conversations on emotional topics. Subsequently, a model was built that learns to extract the continuous dialogue features from a pre-trained language model to generate sentences while maintaining the context of the dialogue. To validate the model, a chatbot system was implemented and meaningful results were obtained by collecting the external subjects and conducting experiments. As a result, the proposed model was influenced by the dataset in which the conversation topic was consultation, to facilitate free and emotional communication with users as if they were consulting with a chatbot. The results were analyzed to identify and explain the advantages and disadvantages of the current model. Finally, as a necessary element to reach the aforementioned ultimate research goal, a discussion is presented on the areas for future studies.

RFID 비즈니스 이벤트의 생성을 위한 시뮬레이션 모델 (A Simulation Model for the Creation of RFID Business Events)

  • 류우석
    • 한국정보통신학회논문지
    • /
    • 제17권11호
    • /
    • pp.2609-2614
    • /
    • 2013
  • 물류, 의약품, 병원 등 다양한 환경에서 RFID의 도입이 확산되고 있다. RFID의 도입을 위해서는 EPC정보서비스(EPCIS)등과 같은 핵심 RFID S/W의 성능 및 적합성 평가가 선행되어야 하며, 이때 다양한 종류의 비즈니스 이벤트 데이터셋이 필수적으로 요구된다. 본 논문에서는 RFID 응용환경을 시뮬레이션하는 접근방법을 통해 보다 실제와 유사한 RFID 비즈니스 이벤트 데이터셋을 생성하기 위한 기법을 제안한다. 제안하는 모델은 페트리넷을 기반으로 하여 다양한 RFID 환경에 대한 유연한 표현이 가능한 특징이 있다. 또한, 실제 RFID 환경의 시뮬레이션이 가능함에 따라 RFID 도입여부 검토에서도 유용하게 활용될 수 있다.

행정정보 데이터세트 사례 조사 연구 (A Case Study of Dataset Records in Information Management System)

  • 오세라;박승훈;임진희
    • 한국기록관리학회지
    • /
    • 제18권2호
    • /
    • pp.109-133
    • /
    • 2018
  • 행정정보 데이터세트의 기록관리 필요성은 기록관리 연구자들 사이에서 넓은 공감대를 형성하고 있으며 지속적으로 연구되어 왔다. 그동안 정보 기술의 발전에 따라 행정정보시스템의 신규 구축 및 재개발이 증가하고 있음에도 불구하고 실제 공공기관에서 운영 중인 각종 행정정보시스템에서 생산된 데이터세트는 관리하지 못 하고 있는 실정이다. 그 원인은 현실 적용이 가능한 관리 방안의 부재에 있다고 하겠다. 본 연구는 구현 가능한 행정정보 데이터세트 관리 방안은 데이터세트 관리 환경의 실상에 기초하여야 한다는 판단 하에, 현재 운영 중인 행정정보시스템에서의 데이터세트 생산 및 관리 환경 사례를 조사함으로써 관리 방안 개발의 기초 자료와 유사 연구에서 활용할 수 있는 조사방법론을 제시하고자 한다.

How do Export Pioneers Emerge and How are They Related to Product Creators?

  • HAHN, CHIN HEE
    • KDI Journal of Economic Policy
    • /
    • 제43권1호
    • /
    • pp.1-27
    • /
    • 2021
  • In this paper, we empirically examine how export pioneers emerge and how they are related to product creators/innovators, utilizing a rich plant-product level dataset from the Korean manufacturing sector for the period of 1990-1998. Our analysis covers the process from the appearance of product creators as well as product imitators to the emergence of export pioneers. We find, first, that product imitators are larger, more productive and older than product creators. Second, most export pioneers are nevertheless found to be product creators. This result is largely due to the fact that almost all export pioneers export the products in the same year as product creation. Third, there are similarities as well as differences between product creators and export pioneers. Plants that are more productive or larger are more likely to become product creators as well as export pioneers. However, previous exporting experience positively affects the probability of export pioneering only, while plants' engagement in R&D positively affects the probability of product creation only. We discuss possible explanations for our main empirical results as well as their policy implications.

Intrusion Detection System Modeling Based on Learning from Network Traffic Data

  • Midzic, Admir;Avdagic, Zikrija;Omanovic, Samir
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제12권11호
    • /
    • pp.5568-5587
    • /
    • 2018
  • This research uses artificial intelligence methods for computer network intrusion detection system modeling. Primary classification is done using self-organized maps (SOM) in two levels, while the secondary classification of ambiguous data is done using Sugeno type Fuzzy Inference System (FIS). FIS is created by using Adaptive Neuro-Fuzzy Inference System (ANFIS). The main challenge for this system was to successfully detect attacks that are either unknown or that are represented by very small percentage of samples in training dataset. Improved algorithm for SOMs in second layer and for the FIS creation is developed for this purpose. Number of clusters in the second SOM layer is optimized by using our improved algorithm to minimize amount of ambiguous data forwarded to FIS. FIS is created using ANFIS that was built on ambiguous training dataset clustered by another SOM (which size is determined dynamically). Proposed hybrid model is created and tested using NSL KDD dataset. For our research, NSL KDD is especially interesting in terms of class distribution (overlapping). Objectives of this research were: to successfully detect intrusions represented in data with small percentage of the total traffic during early detection stages, to successfully deal with overlapping data (separate ambiguous data), to maximize detection rate (DR) and minimize false alarm rate (FAR). Proposed hybrid model with test data achieved acceptable DR value 0.8883 and FAR value 0.2415. The objectives were successfully achieved as it is presented (compared with the similar researches on NSL KDD dataset). Proposed model can be used not only in further research related to this domain, but also in other research areas.

데이터세트 생산시스템 기능요건 연구 KR 재산관리시스템 사례를 중심으로 (A Study on the Functional Requirements of Record Production System for Dataset : Focused on Case Study of KR Asset management system)

  • 류한조;백영미;임진희
    • 기록학연구
    • /
    • 제70호
    • /
    • pp.5-40
    • /
    • 2021
  • 업무를 위해서 설계된 다양한 시스템에서 생산되는 행정정보데이터 세트 기록은 건 단위로 관리하기 어렵기 때문에 별도로 데이터세트를 식별하고 평가하는 절차가 필요하다. 식별된 데이터세트기록은 평가를 거쳐 기록관리시스템으로 이관하거나 폐기 등의 처분이 일어난다. 이러한 과정에서 기록관리원칙을 지키기 위해서는 생산시스템 자체에 충분한 기록관리요소가 반영되어야 한다. 본 논문에서는 데이터세트를 정확하게 식별하고 안전하게 관리하기 위한 생산시스템의 기능요건을 도출하고 KR 재산관리시스템의 사례를 토대로 적용하였다. 이러한 생산시스템 기능요건의 연구가 더해져 데이터세트 생산시스템의 기능요건 표준의 도출까지 이어지길 기대해 본다.

Empirical Study About ODA Effects on Job Creation

  • Seung Hee Ha;JaeHong Park
    • Journal of Korea Trade
    • /
    • 제26권6호
    • /
    • pp.1-19
    • /
    • 2022
  • Purpose - This study empirically investigates the effects of Official Development Assistance (ODA) on the economic activities of private actors in recipient countries. As a proxy for the economic activities of private actors, we utilize the job creation activities of foreign subsidiaries in recipient countries. The foreign subsidiaries provide a foundation for economic development by creating paying jobs. That is, if ODA has been successfully transferred to foreign subsidiaries, then these foreign subsidiaries should help economic growth and help create a boom in the local market by providing jobs. These jobs eventually lead to the achievement of the primary aims of foreign aid, including poverty reduction. Thus, this study empirically examines the relationship between ODA and the number of jobs created by foreign subsidiaries in recipient countries. Design/methodology - This is the first study to examine the effects of the ODA on the job creation of foreign subsidiaries because it has been hard to obtain internal information related to the employment status of foreign subsidiaries. Fortunately, we have a unique panel dataset provided by the Export-Import Bank of Korea (KEXIM) for 2006 to 2013. In terms of the empirical specification, we use the generalized least squares (GLS) method. The panel GLS estimator allows us to have an efficient estimation that overcomes the limitations of the panel data. It employs assumptions about the heteroscedasticity between the panels and makes an autocorrelation of the error term within each panel. Findings - We find that ODA influences job creation in foreign subsidiaries. In particular, we found that ODA creates more jobs in sales than in managerial or production positions. This study also shows that the effect of the ODA on the foreign subsidiaries' job creation activities depend on the purpose of the ODA. By examining ODA effects on the foreign subsidiaries' economic activities (e.g., job creation), this study fills a gap in the current literature. Originality/value - Existing studies that focus on the ODA effect have either a macroeconomic point or a microeconomic point of view. However, both approaches do not explain how well foreign aid has influenced private economic actors of recipient countries. In essence, previous researchers found it difficult to obtain the necessary data for internal employment status from foreign subsidiaries. However, thanks to the Korea Export-Import Bank, this study shows that ODA indeed influences the job creation activities of foreign subsidiaries even after controlling for other factors such as FDI, GDP growth rate, employment rate, household expenditure, mother firms' share, etc. By doing so, we can examine how ODA influences the job creation of foreign subsidiaries, which might help economic development and reduce the amount of poverty in recipient countries.

가상 환경에서의 딥러닝 기반 폐색영역 검출을 위한 데이터베이스 구축 (Construction of Database for Deep Learning-based Occlusion Area Detection in the Virtual Environment)

  • 김경수;이재인;곽석우;강원율;신대영;황성호
    • 드라이브 ㆍ 컨트롤
    • /
    • 제19권3호
    • /
    • pp.9-15
    • /
    • 2022
  • This paper proposes a method for constructing and verifying datasets used in deep learning technology, to prevent safety accidents in automated construction machinery or autonomous vehicles. Although open datasets for developing image recognition technologies are challenging to meet requirements desired by users, this study proposes the interface of virtual simulators to facilitate the creation of training datasets desired by users. The pixel-level training image dataset was verified by creating scenarios, including various road types and objects in a virtual environment. Detecting an object from an image may interfere with the accurate path determination due to occlusion areas covered by another object. Thus, we construct a database, for developing an occlusion area detection algorithm in a virtual environment. Additionally, we present the possibility of its use as a deep learning dataset to calculate a grid map, that enables path search considering occlusion areas. Custom datasets are built using the RDBMS system.