• Title/Summary/Keyword: 러프셋

Search Result 10, Processing Time 0.02 seconds

A Study on the Construction of keyphrase dataset for paraphrase extraction (패러프레이즈 추출을 위한 키프레이즈 데이터셋 구축 방법론 연구)

  • Kang, Hyerin;Kang, Yejee;park, Seoyoon;Jang, Yeonji;Kim, Hansaem
    • Annual Conference on Human and Language Technology
    • /
    • 2020.10a
    • /
    • pp.357-362
    • /
    • 2020
  • 자연어 처리 응용 시스템이 패러프레이즈 표현을 얼마나 정확하게 포착하는가에 따라 응용 시스템의 성능 측면에서 차이가 난다. 따라서 자연어 처리의 응용 분야 전반에서 패러프레이즈 표현에 대한 중요성이 커지고 있다. 시스템의 성능 향상을 위해서는 모델을 학습시킬 충분한 말뭉치가 필요하다. 특히 이러한 패러프레이즈 말뭉치를 구축하기 위해서는 정확한 패러프레이즈 추출이 필수적이다. 따라서 본 연구에서는 패러프레이즈를 추출을 위한 언어 자원으로 키프레이즈 데이터셋을 제안하고 이를 기반으로 유사한 의미를 전달하는 패러프레이즈 관계의 문장을 추출하였다. 구축한 키프레이즈 데이터셋을 패러프레이즈 추출에 활용한다면 본 연구에서 수행한 것과 같은 간단한 방법으로 패러프레이즈 관계에 있는 문장을 찾을 수 있다는 것을 보였다.

  • PDF

Development of Automatic Rule Extraction Method in Data Mining : An Approach based on Hierarchical Clustering Algorithm and Rough Set Theory (데이터마이닝의 자동 데이터 규칙 추출 방법론 개발 : 계층적 클러스터링 알고리듬과 러프 셋 이론을 중심으로)

  • Oh, Seung-Joon;Park, Chan-Woong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.6
    • /
    • pp.135-142
    • /
    • 2009
  • Data mining is an emerging area of computational intelligence that offers new theories, techniques, and tools for analysis of large data sets. The major techniques used in data mining are mining association rules, classification and clustering. Since these techniques are used individually, it is necessary to develop the methodology for rule extraction using a process of integrating these techniques. Rule extraction techniques assist humans in analyzing of large data sets and to turn the meaningful information contained in the data sets into successful decision making. This paper proposes an autonomous method of rule extraction using clustering and rough set theory. The experiments are carried out on data sets of UCI KDD archive and present decision rules from the proposed method. These rules can be successfully used for making decisions.

A Clustering Algorithm for Sequence Data Using Rough Set Theory (러프 셋 이론을 이용한 시퀀스 데이터의 클러스터링 알고리즘)

  • Oh, Seung-Joon;Park, Chan-Woong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.13 no.2
    • /
    • pp.113-119
    • /
    • 2008
  • The World Wide Web is a dynamic collection of pages that includes a huge number of hyperlinks and huge volumes of usage informations. The resulting growth in online information combined with the almost unstructured web data necessitates the development of powerful web data mining tools. Recently, a number of approaches have been developed for dealing with specific aspects of web usage mining for the purpose of automatically discovering user profiles. We analyze sequence data, such as web-logs, protein sequences, and retail transactions. In our approach, we propose the clustering algorithm for sequence data using rough set theory. We present a simple example and experimental results using a splice dataset and synthetic datasets.

  • PDF

Design of Web Agents Module for Information Filtering Based on Rough Sets (러프셋에 기반한 정보필터링 웹에이전트 모듈 설계)

  • 김형수;이상부
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2004.05b
    • /
    • pp.552-556
    • /
    • 2004
  • This paper surveys the design of the adaptive information filtering agents to retrieve the useful information within a large scale database. As the information retrieval through the Internet is generalized, it is necessary to extract the useful information satisfied the user's request condition to reduce the seeking time. For the first, this module is designed by the Rough reduct to generate the reduced minimal knowledge database considered the users natural query language in a large scale knowledge database, and also it is executed the soft computing by the fuzzy composite processing to operate the uncertain value of the reduced schema domain.

  • PDF

Korean Paraphrase Corpus and Building Guidelines for Sentence Similarity Analysis (문장 유사성 분석을 위한 한국어 패러프레이즈 말뭉치 및 구축 가이드라인)

  • Oh, Kyo-Joong;Kim, Hyunmin;Ko, Bowon;Nam, Jehyun;Choi, Ho-Jin
    • Annual Conference on Human and Language Technology
    • /
    • 2019.10a
    • /
    • pp.527-530
    • /
    • 2019
  • 최근 각 산업분야에서 대화 시스템과 챗봇 기술의 업무로의 도입이 활발해짐에 따라 한국어 패러프레이즈 기술에 대한 관심이 높아지고 있다. 기존에는 연구와 평가 목적으로 규모는 작아도 잘 정제된 평가셋을 만드는 것이 중요했으나, 최근에는 기계학습 기술의 발달로 학습을 위한 일정 수준의 품질을 보장하는 대량의 말뭉치를 빠르게 확보하는 방법이 중요해지고 있다. 본 논문에서는 현재 수행하고 있는 한국어 패러프레이즈 말뭉치 구축 경험과 방법에 대해 소개한다.

  • PDF

Structure Optimization of Neural Networks using Rough Set Theory (러프셋 이론을 이용한 신경망의 구조 최적화)

  • 정영준;이동욱;심귀보
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 1998.03a
    • /
    • pp.49-52
    • /
    • 1998
  • Neural Network has good performance in pattern classification, control and many other fields by learning ability. However, there is effective rule or systematic approach to determine optimal structure. In this paper, we propose a new method to find optimal structure of feed-forward multi-layer neural network as a kind of pruning method. That eliminating redundant elements of neural network. To find redundant elements we analysis error and weight changing with Rough Set Theory, in condition of executing back-propagation leaning algorithm.

  • PDF

A New Decision Tree Algorithm Based on Rough Set and Entity Relationship (러프셋 이론과 개체 관계 비교를 통한 의사결정나무 구성)

  • Han, Sang-Wook;Kim, Jae-Yearn
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.33 no.2
    • /
    • pp.183-190
    • /
    • 2007
  • We present a new decision tree classification algorithm using rough set theory that can induce classification rules, the construction of which is based on core attributes and relationship between objects. Although decision trees have been widely used in machine learning and artificial intelligence, little research has focused on improving classification quality. We propose a new decision tree construction algorithm that can be simplified and provides an improved classification quality. We also compare the new algorithm with the ID3 algorithm in terms of the number of rules.

Adaptive Granule Control with the Aid of Rough Set Theory for a HVDC system (러프 셋 이론을 사용한 HVDC 시스템을 위한 적응 Granule 제어)

  • Wang, Zhongxian;Yang, Jeung-Je;Ahn, Tae-Chon
    • Proceedings of the KIEE Conference
    • /
    • 2006.11a
    • /
    • pp.144-147
    • /
    • 2006
  • A proportional intergral (PI) control strategy is commonly used for constant current and extinction angle control in a HVDC (High Voltage Direct Current) system. A PI control strategy is based on a stactic design where the gains of a PI controller are fixed. Since the response of a HVDC plant dynamically changes with variations in the operation point a PI controller performance is far from optimum. The contribution of this paper is the presentation of the design of a rough set based, fuzzy adaptive control scheme. Experimental results that compare the performance of the adaptive control and PI control schemes are also given.

  • PDF

A Design of RSIDS using Rough Set Theory and Support Vector Machine Algorithm (Rough Set Theory와 Support Vector Machine 알고리즘을 이용한 RSIDS 설계)

  • Lee, Byung-Kwan;Jeong, Eun-Hee
    • Journal of the Korea Society of Computer and Information
    • /
    • v.17 no.12
    • /
    • pp.179-185
    • /
    • 2012
  • This paper proposes a design of RSIDS(RST and SVM based Intrusion Detection System) using RST(Rough Set Theory) and SVM(Support Vector Machine) algorithm. The RSIDS consists of PrePro(PreProcessing) module, RRG(RST based Rule Generation) module, and SAD(SVM based Attack Detection) module. The PrePro module changes the collected information to the data format of RSIDS. The RRG module analyzes attack data, generates the rules of attacks, extracts attack information from the massive data by using these rules, and transfers the extracted attack information to the SAD module. The SAD module detects the attacks by using it, which the SAD module notifies to a manager. Therefore, compared to the existing SVM, the RSIDS improved average ADR(Attack Detection Ratio) from 77.71% to 85.28%, and reduced average FPR(False Positive ratio) from 13.25% to 9.87%. Thus, the RSIDS is estimated to have been improved, compared to the existing SVM.