• Title/Summary/Keyword: Korean human dataset

Search Result 161, Processing Time 0.026 seconds

A Study on Methodology on Building NLI Benchmark Dataset in korean (한국어 추론 벤치마크 데이터 구축을 위한 방법론 연구)

  • Han, Jiyoon;Kim, Hansaem
    • Annual Conference on Human and Language Technology
    • /
    • 2020.10a
    • /
    • pp.292-297
    • /
    • 2020
  • 자연어 추론 모델은 전제와 가설 사이의 의미 관계를 함의와 모순, 중립 세 가지로 판별한다. 영어에서는 RTE(recognizing textual entailment) 데이터셋과 다양한 NLI(Natural Language Inference) 데이터셋이 이러한 모델을 개발하고 평가하기 위한 벤치마크로 공개되어 있다. 본 연구는 국외의 텍스트 추론 데이터 주석 가이드라인 및 함의 데이터를 언어학적으로 분석한 결과와 함의 및 모순 관계에 대한 의미론적 연구의 토대 위에서 한국어 자연어 추론 벤치마크 데이터 구축 방법론을 탐구한다. 함의 및 모순 관계를 주석하기 위하여 각각의 의미 관계와 관련된 언어 현상을 정의하고 가설을 생성하는 방안에 대하여 제시하며 이를 바탕으로 실제 구축될 데이터의 형식과 주석 프로세스에 대해서도 논의한다.

  • PDF

EmoNSMC: Constructing Korean Emotion Tagging Dataset Using Distant Supervision (EmoNSMC: Distant Supervision 을 이용한 한국어 감정 태깅 데이터셋 구축)

  • Lee, Young-Jun;Choi, Ho-Jin
    • Annual Conference on Human and Language Technology
    • /
    • 2019.10a
    • /
    • pp.519-521
    • /
    • 2019
  • 최근 소셜 메신저를 통해 많은 사람들이 의사소통을 주고받음에 따라, 텍스트에서 감정을 파악하는 것이 중요하다. 따라서, 감정이 태깅된 데이터가 필요하다. 하지만, 기존 연구는 감정이 태깅된 데이터의 양이 많지가 않다. 이는 텍스트에서 감정을 파악하는데 성능 저하를 야기할 수 있다. 이를 해결하기 위해, 본 논문에서는 단어 매칭 방법과 형태소 매칭 방법을 이용하여 많은 양의 한국어 감정 태깅 데이터셋인 EmoNSMC 를 구축하였다. 구축한 데이터셋은 네이버 영화 감상 리뷰 데이터 (NSMC)에 디스턴트 수퍼비전 방법 (distant supervision) 방법을 적용하여 weak labeling을 진행하였고, 이 과정에서 한국어 감정 어휘 사전 (KTEA) 을 이용하였다. 구축된 데이터셋의 감정 분포 결과, 형태소 매칭 방법을 통해 구축한 데이터셋이 좀 더 감정 분포가 균등한 것을 확인할 수 있었다. 해당 데이터셋은 공개되어 있다.

  • PDF

KorSciQA: A Dataset for Machine Comprehension of Korean Scientific Paper (KorSciQA: 한국어 논문의 기계독해 데이터셋)

  • Hahm, Younggyun;Jeong, Youngbin;Jeong, Heeseok;Hwang, Hyekyong;Choi, Key-Sun
    • Annual Conference on Human and Language Technology
    • /
    • 2019.10a
    • /
    • pp.207-212
    • /
    • 2019
  • 본 논문에서는 한국어로 쓰여진 과학기술 논문에 대한 기계독해 과제(일명 KorSciQA)를 제안하고자 하며, 그와 수반하는 데이터 구축 및 평가를 보고한다. 다양한 제약조건이 부가된 크라우드소싱 디자인을 통하여, 498개의 논문 초록에 대해 일관성 있는 품질의 2,490개의 질의응답으로 구성된 기계독해 데이터셋을 구축하였다. 이 데이터셋은 어느 논문에서나 나타나는 논박 요소들인 논의하는 문제, 푸는 방법, 관련 데이터, 모델 등과 밀접한 질문으로 구성되고, 각 논박 요소의 의미, 목적, 이유 파악 및 다양한 추론을 하여 답을 할 수 있는 것이다. 구축된 KorSciQA 데이터셋은 실험을 통하여 기존의 기계독해 모델의 독해력으로는 풀기 어려운 도전과제로 평가되었다.

  • PDF

Korean Instruction Tuning Dataset (언어 번역 모델을 통한 한국어 지시 학습 데이터 세트 구축)

  • Yeongseo Lim;HyeonChang Chu;San Kim;Jin Yea Jang;Minyoung Jung;Saim Shin
    • Annual Conference on Human and Language Technology
    • /
    • 2023.10a
    • /
    • pp.591-595
    • /
    • 2023
  • 최근 지시 학습을 통해 미세 조정한 자연어 처리 모델들이 큰 성능 향상을 보이고 있다. 하지만 한국어로 학습된 자연어 처리 모델에 대해 지시 학습을 진행할 수 있는 데이터 세트는 공개되어 있지 않아 관련 연구에 큰 어려움을 겪고 있다. 본 논문에서는 T5 기반 한국어 자연어 처리 모델인 Long KE-T5로 영어 데이터 세트를 번역하여 한국어 지시 학습 데이터 세트를 구축한다. 또한 구축한 데이터 세트로 한국어로 사전 학습된 Long KE-T5 모델을 미세 조정한 후 성능을 확인한다.

  • PDF

MultiView-Based Hand Posture Recognition Method Based on Point Cloud

  • Xu, Wenkai;Lee, Ick-Soo;Lee, Suk-Kwan;Lu, Bo;Lee, Eung-Joo
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.9 no.7
    • /
    • pp.2585-2598
    • /
    • 2015
  • Hand posture recognition has played a very important role in Human Computer Interaction (HCI) and Computer Vision (CV) for many years. The challenge arises mainly due to self-occlusions caused by the limited view of the camera. In this paper, a robust hand posture recognition approach based on 3D point cloud from two RGB-D sensors (Kinect) is proposed to make maximum use of 3D information from depth map. Through noise reduction and registering two point sets obtained satisfactory from two views as we designed, a multi-viewed hand posture point cloud with most 3D information can be acquired. Moreover, we utilize the accurate reconstruction and classify each point cloud by directly matching the normalized point set with the templates of different classes from dataset, which can reduce the training time and calculation. Experimental results based on posture dataset captured by Kinect sensors (from digit 1 to 10) demonstrate the effectiveness of the proposed method.

Evaluation of Recent Data Processing Strategies on Q-TOF LC/MS Based Untargeted Metabolomics

  • Kaplan, Ozan;Celebier, Mustafa
    • Mass Spectrometry Letters
    • /
    • v.11 no.1
    • /
    • pp.1-5
    • /
    • 2020
  • In this study, some of the recently reported data processing strategies were evaluated and modified based on their capabilities and a brief workflow for data mining was redefined for Q-TOF LC-MS based untargeted metabolomics. Commercial pooled human plasma samples were used for this purpose. An ultrafiltration procedure was applied on sample preparation. Sample set was analyzed through Q-TOF LC/MS. A C18 column (Agilent Zorbax 1.8 µM, 50 × 2.1 mm) was used for chromatographic separation. Raw chromatograms were processed using XCMS - R programming language edition and Isotopologue Parameter Optimization (IPO) was used to optimize XCMS parameters. The raw XCMS table was processed using MS Excel to find reliable and reproducible peaks. Totally 1650 reliable and reproducible potential metabolite peaks were found based on the data processing procedures given in this paper. The redefined dataset was upload into MetaboAnalyst platform and the identified metabolites were matched with 86 metabolic pathways. Thus, two list were obtained and presented in this study as supplement files. The first list is to present the retention times and m/z values of detected metabolite peaks. The second list is the metabolic pathways related with the identified metabolites. The briefly described data processing strategies and dataset presented in this study could be beneficial for the researchers working on untargeted metabolomics for processing their data and validating their results.

FD-StackGAN: Face De-occlusion Using Stacked Generative Adversarial Networks

  • Jabbar, Abdul;Li, Xi;Iqbal, M. Munawwar;Malik, Arif Jamal
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.7
    • /
    • pp.2547-2567
    • /
    • 2021
  • It has been widely acknowledged that occlusion impairments adversely distress many face recognition algorithms' performance. Therefore, it is crucial to solving the problem of face image occlusion in face recognition. To solve the image occlusion problem in face recognition, this paper aims to automatically de-occlude the human face majority or discriminative regions to improve face recognition performance. To achieve this, we decompose the generative process into two key stages and employ a separate generative adversarial network (GAN)-based network in both stages. The first stage generates an initial coarse face image without an occlusion mask. The second stage refines the result from the first stage by forcing it closer to real face images or ground truth. To increase the performance and minimize the artifacts in the generated result, a new refine loss (e.g., reconstruction loss, perceptual loss, and adversarial loss) is used to determine all differences between the generated de-occluded face image and ground truth. Furthermore, we build occluded face images and corresponding occlusion-free face images dataset. We trained our model on this new dataset and later tested it on real-world face images. The experiment results (qualitative and quantitative) and the comparative study confirm the robustness and effectiveness of the proposed work in removing challenging occlusion masks with various structures, sizes, shapes, types, and positions.

The Chromatin Accessibility Landscape of Nonalcoholic Fatty Liver Disease Progression

  • Kang, Byeonggeun;Kang, Byunghee;Roh, Tae-Young;Seong, Rho Hyun;Kim, Won
    • Molecules and Cells
    • /
    • v.45 no.5
    • /
    • pp.343-352
    • /
    • 2022
  • The advent of the assay for transposase-accessible chromatin using sequencing (ATAC-seq) has shown great potential as a leading method for analyzing the genome-wide profiling of chromatin accessibility. A comprehensive reference to the ATAC-seq dataset for disease progression is important for understanding the regulatory specificity caused by genetic or epigenetic changes. In this study, we present a genome-wide chromatin accessibility profile of 44 liver samples spanning the full histological spectrum of nonalcoholic fatty liver disease (NAFLD). We analyzed the ATAC-seq signal enrichment, fragment size distribution, and correlation coefficients according to the histological severity of NAFLD (healthy control vs steatosis vs fibrotic nonalcoholic steatohepatitis), demonstrating the high quality of the dataset. Consequently, 112,303 merged regions (genomic regions containing one or multiple overlapping peak regions) were identified. Additionally, we found differentially accessible regions (DARs) and performed transcription factor binding motif enrichment analysis and de novo motif analysis to determine new biomarker candidates. These data revealed the gene-regulatory interactions and noncoding factors that can affect NAFLD progression. In summary, our study provides a valuable resource for the human epigenome by applying an advanced approach to facilitate diagnosis and treatment by understanding the non-coding genome of NAFLD.

Anomaly behavior detection using Negative Selection algorithm based anomaly detector (Negative Selection 알고리즘 기반 이상탐지기를 이용한 이상행 위 탐지)

  • 김미선;서재현
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2004.05b
    • /
    • pp.391-394
    • /
    • 2004
  • Change of paradigm of network attack technique was begun by fast extension of the latest Internet and new attack form is appearing. But, Most intrusion detection systems detect informed attack type because is doing based on misuse detection, and active correspondence is difficult in new attack. Therefore, to heighten detection rate for new attack pattern, visibilitys to apply human immunity mechanism are appearing. In this paper, we create self-file from normal behavior profile about network packet and embody self recognition algorithm to use self-nonself discrimination in the human immune system to detect anomaly behavior. Sense change because monitors self-file creating anomaly detector based on Negative Selection Algorithm that is self recognition algorithm's one and detects anomaly behavior. And we achieve simulation to use DARPA Network Dataset and verify effectiveness of algorithm through the anomaly detection rate.

  • PDF

Performance Comparison for Exercise Motion classification using Deep Learing-based OpenPose (OpenPose기반 딥러닝을 이용한 운동동작분류 성능 비교)

  • Nam Rye Son;Min A Jung
    • Smart Media Journal
    • /
    • v.12 no.7
    • /
    • pp.59-67
    • /
    • 2023
  • Recently, research on behavior analysis tracking human posture and movement has been actively conducted. In particular, OpenPose, an open-source software developed by CMU in 2017, is a representative method for estimating human appearance and behavior. OpenPose can detect and estimate various body parts of a person, such as height, face, and hands in real-time, making it applicable to various fields such as smart healthcare, exercise training, security systems, and medical fields. In this paper, we propose a method for classifying four exercise movements - Squat, Walk, Wave, and Fall-down - which are most commonly performed by users in the gym, using OpenPose-based deep learning models, DNN and CNN. The training data is collected by capturing the user's movements through recorded videos and real-time camera captures. The collected dataset undergoes preprocessing using OpenPose. The preprocessed dataset is then used to train the proposed DNN and CNN models for exercise movement classification. The performance errors of the proposed models are evaluated using MSE, RMSE, and MAE. The performance evaluation results showed that the proposed DNN model outperformed the proposed CNN model.