• Title/Summary/Keyword: Korean human dataset

Search Result 161, Processing Time 0.024 seconds

Deletion-Based Sentence Compression Using Sentence Scoring Reflecting Linguistic Information (언어 정보가 반영된 문장 점수를 활용하는 삭제 기반 문장 압축)

  • Lee, Jun-Beom;Kim, So-Eon;Park, Seong-Bae
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.11 no.3
    • /
    • pp.125-132
    • /
    • 2022
  • Sentence compression is a natural language processing task that generates concise sentences that preserves the important meaning of the original sentence. For grammatically appropriate sentence compression, early studies utilized human-defined linguistic rules. Furthermore, while the sequence-to-sequence models perform well on various natural language processing tasks, such as machine translation, there have been studies that utilize it for sentence compression. However, for the linguistic rule-based studies, all rules have to be defined by human, and for the sequence-to-sequence model based studies require a large amount of parallel data for model training. In order to address these challenges, Deleter, a sentence compression model that leverages a pre-trained language model BERT, is proposed. Because the Deleter utilizes perplexity based score computed over BERT to compress sentences, any linguistic rules and parallel dataset is not required for sentence compression. However, because Deleter compresses sentences only considering perplexity, it does not compress sentences by reflecting the linguistic information of the words in the sentences. Furthermore, since the dataset used for pre-learning BERT are far from compressed sentences, there is a problem that this can lad to incorrect sentence compression. In order to address these problems, this paper proposes a method to quantify the importance of linguistic information and reflect it in perplexity-based sentence scoring. Furthermore, by fine-tuning BERT with a corpus of news articles that often contain proper nouns and often omit the unnecessary modifiers, we allow BERT to measure the perplexity appropriate for sentence compression. The evaluations on the English and Korean dataset confirm that the sentence compression performance of sentence-scoring based models can be improved by utilizing the proposed method.

Development of a Machine-Learning based Human Activity Recognition System including Eastern-Asian Specific Activities

  • Jeong, Seungmin;Choi, Cheolwoo;Oh, Dongik
    • Journal of Internet Computing and Services
    • /
    • v.21 no.4
    • /
    • pp.127-135
    • /
    • 2020
  • The purpose of this study is to develop a human activity recognition (HAR) system, which distinguishes 13 activities, including five activities commonly dealt with in conventional HAR researches and eight activities from the Eastern-Asian culture. The eight special activities include floor-sitting/standing, chair-sitting/standing, floor-lying/up, and bed-lying/up. We used a 3-axis accelerometer sensor on the wrist for data collection and designed a machine learning model for the activity classification. Data clustering through preprocessing and feature extraction/reduction is performed. We then tested six machine learning algorithms for recognition accuracy comparison. As a result, we have achieved an average accuracy of 99.7% for the 13 activities. This result is far better than the average accuracy of current HAR researches based on a smartwatch (89.4%). The superiority of the HAR system developed in this study is proven because we have achieved 98.7% accuracy with publically available 'pamap2' dataset of 12 activities, whose conventionally met the best accuracy is 96.6%.

Seasonal Weather Factors and Sensibility Change Relationship via Textmining

  • Yeo, Hyun-Jin
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.8
    • /
    • pp.219-224
    • /
    • 2022
  • The Korea Meteorological Administration(KMA) has been released life-related indexes such as 'Life industrial weather information' and 'Safety weather information' while other countries' meteorological administrations have been made 'Human-biometeorology' and 'Health meteorology' indexes that concern human sensibility effections to diverse criteria. Although human sensibility changes have been studied in psychological research criteria with diverse and innumerous application areas, there are not enough studies that make data mining based validation of sensibility change factors. In this research I made models to estimate sensibility change caused by weather factors such as temperature and humidity, and validated by collecting sensibility data from SNS text crawling and weather data from KMA public dataset. By Logistic Regression, I clarify factors affecting sensibility changes.

Deep learning-based Human Action Recognition Technique Considering the Spatio-Temporal Relationship of Joints (관절의 시·공간적 관계를 고려한 딥러닝 기반의 행동인식 기법)

  • Choi, Inkyu;Song, Hyok
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.05a
    • /
    • pp.413-415
    • /
    • 2022
  • Since human joints can be used as useful information for analyzing human behavior as a component of the human body, many studies have been conducted on human action recognition using joint information. However, it is a very complex problem to recognize human action that changes every moment using only each independent joint information. Therefore, an additional information extraction method to be used for learning and an algorithm that considers the current state based on the past state are needed. In this paper, we propose a human action recognition technique considering the positional relationship of connected joints and the change of the position of each joint over time. Using the pre-trained joint extraction model, position information of each joint is obtained, and bone information is extracted using the difference vector between the connected joints. In addition, a simplified neural network is constructed according to the two types of inputs, and spatio-temporal features are extracted by adding LSTM. As a result of the experiment using a dataset consisting of 9 behaviors, it was confirmed that when the action recognition accuracy was measured considering the temporal and spatial relationship features of each joint, it showed superior performance compared to the result using only single joint information.

  • PDF

A Study on Database Design Model for Production System Record Management Module in DataSet Record Management (데이터세트 기록관리를 위한 생산시스템 기록관리 모듈의 DB 설계 모형연구)

  • Kim, Dongsu;Yim, Jinhee;Kang, Sung-hee
    • The Korean Journal of Archival Studies
    • /
    • no.78
    • /
    • pp.153-195
    • /
    • 2023
  • RDBMS is a widely used database system worldwide, and the term dataset refers to the vast amount of data produced in administrative information systems using RDBMS. Unlike business systems that mainly produce administrative documents, administrative information systems generate records centered around the unique tasks of organizations. These records differ from traditional approval documents and metadata, making it challenging to seamlessly transfer them to standard record management systems. With the 2022 revision of the 'Public Records Act Enforcement Decree,' dataset was included in the types of records for which only management authority is transferred. The core aspect of this revision is the need to manage the lifecycle of records within administrative information systems. However, there has been little exploration into how to manage dataset within administrative information systems. As a result, this research aims to design a database for a record management module that needs to be integrated into administrative information systems to manage the lifecycle of records. By modifying and supplementing ISO 16175-1:2020, we are designing an "human resource management system" and identifying and evaluating personnel management dataset. Through this, we aim to provide a concrete example of record management within administrative information systems. It's worth noting that the prototype system designed in this research has limitations in terms of data volume compared to systems currently in use within organizations, and it has not yet been validated by record researchers and IT developers in the field. However, this endeavor has allowed us to understand the nature of dataset and how they should be managed within administrative information systems. It has also affirmed the need for a record management module's database within administrative information systems. In the future, once a complete record management module is developed and standards are established by the National Archives, it is expected to become a necessary module for organizations to manage dataset effectively.

Impact of the human body in wireless propagation of medical implants for tumor detection

  • Morocho-Cayamcela, Manuel Eugenio;Kim, Myung-Sik;Lim, Wansu
    • Journal of Internet Computing and Services
    • /
    • v.21 no.2
    • /
    • pp.19-26
    • /
    • 2020
  • This paper analyses the feasibility of using implantable antennas to detect and monitor tumors. We analyze this setting according to the wireless propagation loss and signal fading produced by human bodies and their environment in an indoor scenario. The study is based on the ITU-R propagation recommendations and prediction models for the planning of indoor radio communication systems and radio local area networks in the frequency range of 300 MHz to 100 GHz. We conduct primary estimations on 915 MHz and 2.4 GHz operating frequencies. The path loss presented in most short-range wireless implant devices does not take into account the human body as a channel itself, which causes additional losses to wireless designs. In this paper, we examine the propagation through the human body, including losses taken from bones, muscles, fat, and clothes, which results in a more accurate characterization and estimation of the channel. The results obtained from our simulation indicates a variation of the return loss of the spiral antenna when a tumor is located near the implant. This knowledge can be applied in medical detection, and monitoring of early tumors, by analyzing the electromagnetic field behavior of the implant. The tumor was modeled under CST Microwave Studio, using Wisconsin Diagnosis Breast Cancer Dataset. Features like the radius, texture, perimeter, area, and smoothness of the tumor are included along with their label data to determine whether the external shape has malignant or benign physiognomies. An explanation of the feasibility of the system deployment and technical recommendations to avoid interference is also described.

KoCED: English-Korean Critical Error Detection Dataset (KoCED: 윤리 및 사회적 문제를 초래하는 기계번역 오류 탐지를 위한 학습 데이터셋)

  • Sugyeong Eo;Suwon Choi;Seonmin Koo;Dahyun Jung;Chanjun Park;Jaehyung Seo;Hyeonseok Moon;Jeongbae Park;Heuiseok Lim
    • Annual Conference on Human and Language Technology
    • /
    • 2022.10a
    • /
    • pp.225-231
    • /
    • 2022
  • 최근 기계번역 분야는 괄목할만한 발전을 보였으나, 번역 결과의 오류가 불완전한 의미의 왜곡으로 이어지면서 사용자로 하여금 불편한 반응을 야기하거나 사회적 파장을 초래하는 경우가 존재한다. 특히나 오역에 의해 변질된 의미로 인한 경제적 손실 및 위법 가능성, 안전에 대한 잘못된 정보 제공의 위험, 종교나 인종 또는 성차별적 발언에 의한 파장은 실생활과 문제가 직결된다. 이러한 문제를 완화하기 위해, 기계번역 품질 예측 분야에서는 치명적 오류 감지(Critical Error Detection, CED)에 대한 연구가 이루어지고 있다. 그러나 한국어에 관련해서는 연구가 존재하지 않으며, 관련 데이터셋 또한 공개된 바가 없다. AI 기술 수준이 높아지면서 다양한 사회, 윤리적 요소들을 고려하는 것은 필수이며, 한국어에서도 왜곡된 번역의 무분별한 증식을 낮출 수 있도록 CED 기술이 반드시 도입되어야 한다. 이에 본 논문에서는 영어-한국어 기계번역 분야에서의 치명적 오류를 감지하는 KoCED(English-Korean Critical Error Detection) 데이터셋을 구축 및 공개하고자 한다. 또한 구축한 KoCED 데이터셋에 대한 면밀한 통계 분석 및 다국어 언어모델을 활용한 데이터셋의 타당성 실험을 수행함으로써 제안하는 데이터셋의 효용성을 면밀하게 검증한다.

  • PDF

Fast Volume Rendering of VKH dataset using GPU Cluster (GPU 클러스터를 이용한 VKH 데이터의 빠른 볼륨 렌더링)

  • Lee Joong-Youn
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2005.11a
    • /
    • pp.763-765
    • /
    • 2005
  • 볼륨 렌더링은 3차원이나 그 이상의 차원의 볼륨 데이터에서 의미있는 정보를 추출해 내어 직관적으로 표출하는 가시화 기법을 말하며 의료영상 기상학, 유체역학 등 다양한 분야에서 널리 사용되고 있다. 한편, 최근 PC 하드웨어의 급격한 발전으로 과거에는 슈퍼컴퓨터에서나 가능했던 대용량 볼륨 데이터의 가시화가 일반 PC 환경에서도 가능하게 되었다. PC 그래픽스 하드웨어의 꼭지점 및 픽셀 세이더의 수치 계산에 최적화된 벡터 연산으로 빠른 볼륨 가시화를 가능하게 한 것이다. 그러나 그래픽스 하드웨어의 메모리 용량의 한계로 대용량의 볼륨 데이터를 빠르게 가시화하는 것은 지금까지 어려운 문제로 남아있다. 본 논문에서는 한국과학기술정보연구원에서 제작한 대용량의 인체영상 데이터인 Visible Korean Human 데이터를 여러 개의 그래픽스 하드웨어 메모리에 분산시키고 이를 꼭지점 및 픽셀 쉐이더를 이용하여 빠르게 가시화하여 고해상도의 이미지를 얻고자 하였다.

  • PDF

Development of Korean Dialogue Dataset for Restaurant Reservation System (식당 예약 대화 시스템 개발을 위한 한국어 데이터셋 구축)

  • Kim, GyeongMin;Lee, DongYub;Hur, YunA;Lim, HeuiSeok
    • Annual Conference on Human and Language Technology
    • /
    • 2017.10a
    • /
    • pp.267-269
    • /
    • 2017
  • 대화 시스템(dialogue system)은 사용자의 언어를 이해하고 그 의도를 분석하여 사용자가 원하는 목적을 달성할 수 있게 도와주는 시스템이다. 인간과 비슷한 수준의 대화를 위해서는 대량의 데이터가 필요하며 데이터의 양질에 따라 그 결과가 달라진다. 최근 페이스북에서 End-to-end learning 방식을 기반으로 한 영어로 구성된 식당 예약 학습 대화 데이터셋(The 6 dialog bAbI tasks)을 구축하여 해당 모델에 적용한 연구가 있다. 대화 시스템에서 활용 가능한 연구가 활발히 진행되고 있지만 영어 기반의 데이터와는 다르게 식당 예약 시스템에서 다른 연구자들의 연구 목적으로 공유한 한국어 데이터셋은 아직까지도 미흡하다. 본 논문에서는 페이스북에서 구축한 영어로 구성된 식당 예약 학습 대화 데이터셋을 이용하여 한국어 기반의 식당 예약 대화 시스템에서 활용 가능한 한국어 데이터셋을 구축하고, 일상생활에서 발생 가능한 발화(utterance)에 따른 형태 변화를 통해 한국어 식당 예약 시스템 데이터셋 구축 방법을 제안한다.

  • PDF

User Identification Using Real Environmental Human Computer Interaction Behavior

  • Wu, Tong;Zheng, Kangfeng;Wu, Chunhua;Wang, Xiujuan
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.6
    • /
    • pp.3055-3073
    • /
    • 2019
  • In this paper, a new user identification method is presented using real environmental human-computer-interaction (HCI) behavior data to improve method usability. User behavior data in this paper are collected continuously without setting experimental scenes such as text length, action number, etc. To illustrate the characteristics of real environmental HCI data, probability density distribution and performance of keyboard and mouse data are analyzed through the random sampling method and Support Vector Machine(SVM) algorithm. Based on the analysis of HCI behavior data in a real environment, the Multiple Kernel Learning (MKL) method is first used for user HCI behavior identification due to the heterogeneity of keyboard and mouse data. All possible kernel methods are compared to determine the MKL algorithm's parameters to ensure the robustness of the algorithm. Data analysis results show that keyboard data have a narrower range of probability density distribution than mouse data. Keyboard data have better performance with a 1-min time window, while that of mouse data is achieved with a 10-min time window. Finally, experiments using the MKL algorithm with three global polynomial kernels and ten local Gaussian kernels achieve a user identification accuracy of 83.03% in a real environmental HCI dataset, which demonstrates that the proposed method achieves an encouraging performance.