• Title/Summary/Keyword: dataset

Search Result 3,827, Processing Time 0.032 seconds

Handwritten Hangul Graphemes Classification Using Three Artificial Neural Networks

  • Aaron Daniel Snowberger;Choong Ho Lee
    • Journal of information and communication convergence engineering
    • /
    • v.21 no.2
    • /
    • pp.167-173
    • /
    • 2023
  • Hangul is unique compared to other Asian languages because of its simple letter forms that combine to create syllabic shapes. There are 24 basic letters that can be combined to form 27 additional complex letters. This produces 51 graphemes. Hangul optical character recognition has been a research topic for some time; however, handwritten Hangul recognition continues to be challenging owing to the various writing styles, slants, and cursive-like nature of the handwriting. In this study, a dataset containing thousands of samples of 51 Hangul graphemes was gathered from 110 freshmen university students to create a robust dataset with high variance for training an artificial neural network. The collected dataset included 2200 samples for each consonant grapheme and 1100 samples for each vowel grapheme. The dataset was normalized to the MNIST digits dataset, trained in three neural networks, and the obtained results were compared.

Modern Face Recognition using New Masked Face Dataset Generated by Deep Learning (딥러닝 기반의 새로운 마스크 얼굴 데이터 세트를 사용한 최신 얼굴 인식)

  • Pann, Vandet;Lee, Hyo Jong
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2021.11a
    • /
    • pp.647-650
    • /
    • 2021
  • The most powerful and modern face recognition techniques are using deep learning methods that have provided impressive performance. The outbreak of COVID-19 pneumonia has spread worldwide, and people have begun to wear a face mask to prevent the spread of the virus, which has led existing face recognition methods to fail to identify people. Mainly, it pushes masked face recognition has become one of the most challenging problems in the face recognition domain. However, deep learning methods require numerous data samples, and it is challenging to find benchmarks of masked face datasets available to the public. In this work, we develop a new simulated masked face dataset that we can use for masked face recognition tasks. To evaluate the usability of the proposed dataset, we also retrained the dataset with ArcFace based system, which is one the most popular state-of-the-art face recognition methods.

Autism Spectrum Disorder Recognition with Deep Learning

  • Shin, Jongmin;Choi, Jinwoo
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2022.06a
    • /
    • pp.1268-1271
    • /
    • 2022
  • Since it is common to have touch-screen devices, it is less challenging to draw sketches anywhere and save them in vector form. Current research on sketches considers coordinate sequence data and adopts sequential models for learning sketch representation in sketch understanding. In the sketch dataset, it has become customary that the dataset is in vector coordinate format. Moreover, the popular dataset does not consider real-life sketches, sketches from pencil, pen, and paper. Art psychology uses real-life sketches to analyze patients. ETRI presents a unique sketch dataset for sketch recognition of autism spectrum disorder in pixel format. We present a method to formulate the dataset for better generalization of sketch data. Through experiments, we show that pixel-based models can produce a good performance.

  • PDF

Vehicle Detection at Night Based on Style Transfer Image Enhancement

  • Jianing Shen;Rong Li
    • Journal of Information Processing Systems
    • /
    • v.19 no.5
    • /
    • pp.663-672
    • /
    • 2023
  • Most vehicle detection methods have poor vehicle feature extraction performance at night, and their robustness is reduced; hence, this study proposes a night vehicle detection method based on style transfer image enhancement. First, a style transfer model is constructed using cycle generative adversarial networks (cycleGANs). The daytime data in the BDD100K dataset were converted into nighttime data to form a style dataset. The dataset was then divided using its labels. Finally, based on a YOLOv5s network, a nighttime vehicle image is detected for the reliable recognition of vehicle information in a complex environment. The experimental results of the proposed method based on the BDD100K dataset show that the transferred night vehicle images are clear and meet the requirements. The precision, recall, mAP@.5, and mAP@.5:.95 reached 0.696, 0.292, 0.761, and 0.454, respectively.

Towards Texture-Based Visualization of Multivariate Dataset

  • Mehmood, Raja Majid;Lee, Hyo Jong
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2014.04a
    • /
    • pp.582-585
    • /
    • 2014
  • Visualization is a science which makes the invisible to visible through the techniques of experimental visualization and computer-aided visualization. This paper presents the practical aspects of visualization of multivariate dataset. In this paper, we will briefly discuss a previous research work and introduce a new visualization technique which will help us to design and develop a visualization tool for experimental visualization of multivariate dataset. Our newly developed visualization tool can be used in various domains. In this paper, we have chosen a software industry as an application domain and we used the multivariate dataset of software components computed by VizzMaintenance. VizzMaintenance is software analysis tool which give us multiple software metrics of open source Java based programs. Main objective of this research is to develop a new visualization tool for large multivariate dataset which will be more efficient and easy to perceive by viewer. Perception is very important for our research work and we have decided to test the perception level of our proposed visualization approach by researchers of our research lab.

Knowledge Model for Disaster Dataset Navigation

  • Hwang, Yun-Young;Yuk, Jin-Hee;Shin, Sumi
    • Journal of Information Science Theory and Practice
    • /
    • v.9 no.4
    • /
    • pp.35-49
    • /
    • 2021
  • In a situation where there are multiple diverse datasets, it is essential to have an efficient method to provide users with the datasets they require. To address this suggestion, necessary datasets should be selected on the basis of the relationships between the datasets. In particular, in order to discover the necessary datasets for disaster resolution, we need to consider the disaster resolution stage. In this paper, in order to provide the necessary datasets for each stage of disaster resolution, we constructed a disaster type and disaster management process ontology and designed a method to determine the necessary datasets for each disaster type and disaster management process step. In addition, we introduce a method to determine relationships between datasets necessary for disaster response. We propose a method for discovering datasets based on minimal relationships such as "isA," "sameAs," and "subclassOf." To discover suitable datasets, we designed a knowledge exploration model and collected 651 disaster-related datasets for improving our method. These datasets were categorized by disaster type from the perspective of disaster management. Categorizing actual datasets into disaster types and disaster management types allows a single dataset to be classified as multiple types in both categories. We built a knowledge exploration model on the basis of disaster examples to ensure the configuration of our model.

Designing Dataset Management and Service System for Digital Libraries Using DCAT (DCAT을 활용한 디지털도서관 데이터셋 관리와 서비스 설계)

  • Park, Jin Ho
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.53 no.2
    • /
    • pp.247-266
    • /
    • 2019
  • The purpose of this study is to propose a W3C standard, DCAT, to manage and service dataset that is becoming increasingly important as new knowledge information resources. To do this, we first analyzed the class and properties of the four core classes of DCAT. In addition, I modeled and presented a system that can manage and service various data sets based on DCAT in digital library. The system is divided into source data, data set management, linked data connection, and user service. Especially, the DCAT mapping function is suggested in dataset management. This feature can ensure interoperability of various datasets.

OryzaGP 2021 update: a rice gene and protein dataset for named-entity recognition

  • Larmande, Pierre;Liu, Yusha;Yao, Xinzhi;Xia, Jingbo
    • Genomics & Informatics
    • /
    • v.19 no.3
    • /
    • pp.27.1-27.4
    • /
    • 2021
  • Due to the rapid evolution of high-throughput technologies, a tremendous amount of data is being produced in the biological domain, which poses a challenging task for information extraction and natural language understanding. Biological named entity recognition (NER) and named entity normalisation (NEN) are two common tasks aiming at identifying and linking biologically important entities such as genes or gene products mentioned in the literature to biological databases. In this paper, we present an updated version of OryzaGP, a gene and protein dataset for rice species created to help natural language processing (NLP) tools in processing NER and NEN tasks. To create the dataset, we selected more than 15,000 abstracts associated with articles previously curated for rice genes. We developed four dictionaries of gene and protein names associated with database identifiers. We used these dictionaries to annotate the dataset. We also annotated the dataset using pretrained NLP models. Finally, we analysed the annotation results and discussed how to improve OryzaGP.

A Study on Synthetic Dataset Generation Method for Maritime Traffic Situation Awareness (해상교통 상황인지 향상을 위한 합성 데이터셋 구축방안 연구)

  • Youngchae Lee;Sekil Park
    • Journal of Information Technology Applications and Management
    • /
    • v.30 no.6
    • /
    • pp.69-80
    • /
    • 2023
  • Ship collision accidents not only cause loss of life and property damage, but also cause marine pollution and can become national disasters, so prevention is very important. Most of these ship collision accidents are caused by human factors due to the navigation officer's lack of vigilance and carelessness, and in many cases, they can be prevented through the support of a system that helps with situation awareness. Recently, artificial intelligence has been used to develop systems that help navigators recognize the situation, but the sea is very wide and deep, so it is difficult to secure maritime traffic datasets, which also makes it difficult to develop artificial intelligence models. In this paper, to solve these difficulties, we propose a method to build a dataset with characteristics similar to actual maritime traffic datasets. The proposed method uses segmentation and inpainting technologies to build a foreground and background dataset, and then applies compositing technology to create a synthetic dataset. Through prototype implementation and result analysis of the proposed method, it was confirmed that the proposed method is effective in overcoming the difficulties of dataset construction and complementing various scenes similar to reality.

Token-Based Classification and Dataset Construction for Detecting Modified Profanity (변형된 비속어 탐지를 위한 토큰 기반의 분류 및 데이터셋)

  • Sungmin Ko;Youhyun Shin
    • The Transactions of the Korea Information Processing Society
    • /
    • v.13 no.4
    • /
    • pp.181-188
    • /
    • 2024
  • Traditional profanity detection methods have limitations in identifying intentionally altered profanities. This paper introduces a new method based on Named Entity Recognition, a subfield of Natural Language Processing. We developed a profanity detection technique using sequence labeling, for which we constructed a dataset by labeling some profanities in Korean malicious comments and conducted experiments. Additionally, to enhance the model's performance, we augmented the dataset by labeling parts of a Korean hate speech dataset using one of the large language models, ChatGPT, and conducted training. During this process, we confirmed that filtering the dataset created by the large language model by humans alone could improve performance. This suggests that human oversight is still necessary in the dataset augmentation process.