• Title/Summary/Keyword: Knowledge extraction

Search Result 384, Processing Time 0.029 seconds

Comparative Study of Knowledge Extraction on the Industrial Application (산업분야에서의 지식 정보 추출에 대한 비교연구)

  • Woo, Young-Kwang;Kim, Sung-Sin;Bae, Hyun;Woo, Kwang-Bang
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2003.05a
    • /
    • pp.251-254
    • /
    • 2003
  • 데이터는 어떤 특성을 나타내는 언어적 또는 수치적 값들의 표현이다. 이러한 데이터들을 목적에 따라 구성한 것이 정보이며, 문제 해결이나 패턴 분류, 또는 의사 결정을 위해 정보들간의 관계를 규칙으로 체계화하는 것이 지식이다. 현재 대부분의 산업 분야에서 시스템에 대한 이해를 높이고 시스템의 성능을 향상시키기 위해 지식을 추출하고, 적용시키는 작업들이 활발히 이루어지고 있다. 지식 정보의 추출은 지식의 획득, 표현, 구현의 단계로 구성되며 이렇게 추출된 지식 정보는 규칙으로 도출된다. 본 논문에서는 여러 산업 분야에 걸쳐 다양하게 적용되는 지식 정보 추출 방법들에 대해 그 영역별로 알아보고 여러 시험 데이터들과 실제 시스템에 클러스터링(CL), 입력공간 분할(ISP), 뉴로-퍼지(NF), 신경망(NN), 확장 행렬(EM) 등의 방법들을 적용시킨 결과들을 비교 분석하고자 한다.

  • PDF

A Study on Automatic wear Debris Recognition by using Particle Feature Extraction (입자 유형별 형상추출에 의한 마모입자 자동인식에 관한 연구)

  • ;;;A. Y. Grigoriev
    • Tribology and Lubricants
    • /
    • v.15 no.2
    • /
    • pp.206-211
    • /
    • 1999
  • Wear debris morphology is closely related to the wear mode and mechanism occured. Image recognition of wear debris is, therefore, a powerful tool in wear monitoring. But it has usually required expert's experience and the results could be too subjective. Development of automatic tools for wear debris recognition is needed to solve this problem. In this work, an algorithm for automatic wear debris recognition was suggested and implemented by PC base software. The presented method defined a characteristic 3-dimensional feature space where typical types of wear debris were separately located by the knowledge-based system and compared the similarity of object wear debris concerned. The 3-dimensional feature space was obtained from multiple feature vectors by using a multi-dimensional scaling technique. The results showed that the presented automatic wear debris recognition was satisfactory in many cases application.

A Study of Main Contents Extraction from Web News Pages based on XPath Analysis

  • Sun, Bok-Keun
    • Journal of the Korea Society of Computer and Information
    • /
    • v.20 no.7
    • /
    • pp.1-7
    • /
    • 2015
  • Although data on the internet can be used in various fields such as source of data of IR(Information Retrieval), Data mining and knowledge information servece, and contains a lot of unnecessary information. The removal of the unnecessary data is a problem to be solved prior to the study of the knowledge-based information service that is based on the data of the web page, in this paper, we solve the problem through the implementation of XTractor(XPath Extractor). Since XPath is used to navigate the attribute data and the data elements in the XML document, the XPath analysis to be carried out through the XTractor. XTractor Extracts main text by html parsing, XPath grouping and detecting the XPath contains the main data. The result, the recognition and precision rate are showed in 97.9%, 93.9%, except for a few cases in a large amount of experimental data and it was confirmed that it is possible to properly extract the main text of the news.

The HCARD Model using an Agent for Knowledge Discovery

  • Gerardo Bobby D.;Lee Jae-Wan;Joo Su-Chong
    • The Journal of Information Systems
    • /
    • v.14 no.3
    • /
    • pp.53-58
    • /
    • 2005
  • In this study, we will employ a multi-agent for the search and extraction of data in a distributed environment. We will use an Integrator Agent in the proposed model on the Hierarchical Clustering and Association Rule Discovery(HCARD). The HCARD will address the inadequacy of other data mining tools in processing performance and efficiency when use for knowledge discovery. The Integrator Agent was developed based on CORBA architecture for search and extraction of data from heterogeneous servers in the distributed environment. Our experiment shows that the HCARD generated essential association rules which can be practically explained for decision making purposes. Shorter processing time had been noted in computing for clusters using the HCARD and implying ideal processing period than computing the rules without HCARD.

  • PDF

An Evaluation of Applying Knowledge Base to Academic Information Service

  • Lee, Seok-Hyoung;Kim, Hwan-Min;Choe, Ho-Seop
    • International Journal of Knowledge Content Development & Technology
    • /
    • v.3 no.1
    • /
    • pp.81-95
    • /
    • 2013
  • Through a series of precise text handling processes, including automatic extraction of information from documents with knowledge from various fields, recognition of entity names, detection of core topics, analysis of the relations between the extracted information and topics, and automatic inference of new knowledge, the most efficient knowledge base of the relevant field is created, and plans to apply these to the information knowledge management and service are the core requirements necessary for intellectualization of information. In this paper, the knowledge base, which is a necessary core resource and comprehensive technology for intellectualization of science and technology information, is described and the usability of academic information services using it is evaluated. The knowledge base proposed in this article is an amalgamation of information expression and knowledge storage, composed of identifying code systems from terms to documents, by integrating terminologies, word intelligent networks, topic networks, classification systems, and authority data.

Natural language processing techniques for bioinformatics

  • Tsujii, Jun-ichi
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2003.10a
    • /
    • pp.3-3
    • /
    • 2003
  • With biomedical literature expanding so rapidly, there is an urgent need to discover and organize knowledge extracted from texts. Although factual databases contain crucial information the overwhelming amount of new knowledge remains in textual form (e.g. MEDLINE). In addition, new terms are constantly coined as the relationships linking new genes, drugs, proteins etc. As the size of biomedical literature is expanding, more systems are applying a variety of methods to automate the process of knowledge acquisition and management. In my talk, I focus on the project, GENIA, of our group at the University of Tokyo, the objective of which is to construct an information extraction system of protein - protein interaction from abstracts of MEDLINE. The talk includes (1) Techniques we use fDr named entity recognition (1-a) SOHMM (Self-organized HMM) (1-b) Maximum Entropy Model (1-c) Lexicon-based Recognizer (2) Treatment of term variants and acronym finders (3) Event extraction using a full parser (4) Linguistic resources for text mining (GENIA corpus) (4-a) Semantic Tags (4-b) Structural Annotations (4-c) Co-reference tags (4-d) GENIA ontology I will also talk about possible extension of our work that links the findings of molecular biology with clinical findings, and claim that textual based or conceptual based biology would be a viable alternative to system biology that tends to emphasize the role of simulation models in bioinformatics.

  • PDF

Plant Species Identification based on Plant Leaf Using Computer Vision and Machine Learning Techniques

  • Kaur, Surleen;Kaur, Prabhpreet
    • Journal of Multimedia Information System
    • /
    • v.6 no.2
    • /
    • pp.49-60
    • /
    • 2019
  • Plants are very crucial for life on Earth. There is a wide variety of plant species available, and the number is increasing every year. Species knowledge is a necessity of various groups of society like foresters, farmers, environmentalists, educators for different work areas. This makes species identification an interdisciplinary interest. This, however, requires expert knowledge and becomes a tedious and challenging task for the non-experts who have very little or no knowledge of the typical botanical terms. However, the advancements in the fields of machine learning and computer vision can help make this task comparatively easier. There is still not a system so developed that can identify all the plant species, but some efforts have been made. In this study, we also have made such an attempt. Plant identification usually involves four steps, i.e. image acquisition, pre-processing, feature extraction, and classification. In this study, images from Swedish leaf dataset have been used, which contains 1,125 images of 15 different species. This is followed by pre-processing using Gaussian filtering mechanism and then texture and color features have been extracted. Finally, classification has been done using Multiclass-support vector machine, which achieved accuracy of nearly 93.26%, which we aim to enhance further.

Development of a Knowledge Discovery System using Hierarchical Self-Organizing Map and Fuzzy Rule Generation

  • Koo, Taehoon;Rhee, Jongtae
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2001.01a
    • /
    • pp.431-434
    • /
    • 2001
  • Knowledge discovery in databases(KDD) is the process for extracting valid, novel, potentially useful and understandable knowledge form real data. There are many academic and industrial activities with new technologies and application areas. Particularly, data mining is the core step in the KDD process, consisting of many algorithms to perform clustering, pattern recognition and rule induction functions. The main goal of these algorithms is prediction and description. Prediction means the assessment of unknown variables. Description is concerned with providing understandable results in a compatible format to human users. We introduce an efficient data mining algorithm considering predictive and descriptive capability. Reasonable pattern is derived from real world data by a revised neural network model and a proposed fuzzy rule extraction technique is applied to obtain understandable knowledge. The proposed neural network model is a hierarchical self-organizing system. The rule base is compatible to decision makers perception because the generated fuzzy rule set reflects the human information process. Results from real world application are analyzed to evaluate the system\`s performance.

  • PDF

A Distance Approach for Open Information Extraction Based on Word Vector

  • Liu, Peiqian;Wang, Xiaojie
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.6
    • /
    • pp.2470-2491
    • /
    • 2018
  • Web-scale open information extraction (Open IE) plays an important role in NLP tasks like acquiring common-sense knowledge, learning selectional preferences and automatic text understanding. A large number of Open IE approaches have been proposed in the last decade, and the majority of these approaches are based on supervised learning or dependency parsing. In this paper, we present a novel method for web scale open information extraction, which employs cosine distance based on Google word vector as the confidence score of the extraction. The proposed method is a purely unsupervised learning algorithm without requiring any hand-labeled training data or dependency parse features. We also present the mathematically rigorous proof for the new method with Bayes Inference and Artificial Neural Network theory. It turns out that the proposed algorithm is equivalent to Maximum Likelihood Estimation of the joint probability distribution over the elements of the candidate extraction. The proof itself also theoretically suggests a typical usage of word vector for other NLP tasks. Experiments show that the distance-based method leads to further improvements over the newly presented Open IE systems on three benchmark datasets, in terms of effectiveness and efficiency.

Applying Lexical Semantics to Automatic Extraction of Temporal Expressions in Uyghur

  • Murat, Alim;Yusup, Azharjan;Iskandar, Zulkar;Yusup, Azragul;Abaydulla, Yusup
    • Journal of Information Processing Systems
    • /
    • v.14 no.4
    • /
    • pp.824-836
    • /
    • 2018
  • The automatic extraction of temporal information from written texts is a key component of question answering and summarization systems and its efficacy in those systems is very decisive if a temporal expression (TE) is successfully extracted. In this paper, three different approaches for TE extraction in Uyghur are developed and analyzed. A novel approach which uses lexical semantics as an additional information is also presented to extend classical approaches which are mainly based on morphology and syntax. We used a manually annotated news dataset labeled with TIMEX3 tags and generated three models with different feature combinations. The experimental results show that the best run achieved 0.87 for Precision, 0.89 for Recall, and 0.88 for F1-Measure in Uyghur TE extraction. From the analysis of the results, we concluded that the application of semantic knowledge resolves ambiguity problem at shallower language analysis and significantly aids the development of more efficient Uyghur TE extraction system.