• Title/Summary/Keyword: entity extraction

Search Result 86, Processing Time 0.021 seconds

Competition Relation Extraction based on Combining Machine Learning and Filtering (기계학습 및 필터링 방법을 결합한 경쟁관계 인식)

  • Lee, ChungHee;Seo, YoungHoon;Kim, HyunKi
    • Journal of KIISE
    • /
    • v.42 no.3
    • /
    • pp.367-378
    • /
    • 2015
  • This study was directed at the design of a hybrid algorithm for competition relation extraction. Previous works on relation extraction have relied on various lexical and deep parsing indicators and mostly utilize only the machine learning method. We present a new algorithm integrating machine learning with various filtering methods. Some simple but useful features for competition relation extraction are also introduced, and an optimum feature set is proposed. The goal of this paper was to increase the precision of competition relation extraction by combining supervised learning with various filtering methods. Filtering methods were employed for classifying compete relation occurrence, using distance restriction for the filtering of feature pairs, and classifying whether or not the candidate entity pair is spam. For evaluation, a test set consisting of 2,565 sentences was examined. The proposed method was compared with the rule-based method and general relation extraction method. As a result, the rule-based method achieved positive precision of 0.812 and accuracy of 0.568, while the general relation extraction method achieved 0.612 and 0.563, respectively. The proposed system obtained positive precision of 0.922 and accuracy of 0.713. These results demonstrate that the developed method is effective for competition relation extraction.

Extraction of Pivotal Entities of Construction Project Management using the CMBOK Framework (CMBOK Framework을 이용한 건설 프로젝트 핵심관리요소의 도출)

  • Lee Jong-Kook;Lee Hyun-Soo
    • Korean Journal of Construction Engineering and Management
    • /
    • v.5 no.1 s.17
    • /
    • pp.140-148
    • /
    • 2004
  • Based on the CMBOK (Construction Project Management Body of Knowledge) Framework previously developed in early study by the authors in conjunction with use of some questionnaire surveys and personal interviews with industry professionals, the authors analyze interactions among the entities in the CMBOK framework for the extraction of pivotal entities of construction project management and identify twelve pivotal entities in construction project management, then verify the existence of twelve pivotal entities in real construction project management of construction company and checked the validity of the entities with a real case of interaction phenomenon. This research provides the construction industry with a starting point for improving construction project management efficiency by identifying the pivotal entities.

Natural language processing techniques for bioinformatics

  • Tsujii, Jun-ichi
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2003.10a
    • /
    • pp.3-3
    • /
    • 2003
  • With biomedical literature expanding so rapidly, there is an urgent need to discover and organize knowledge extracted from texts. Although factual databases contain crucial information the overwhelming amount of new knowledge remains in textual form (e.g. MEDLINE). In addition, new terms are constantly coined as the relationships linking new genes, drugs, proteins etc. As the size of biomedical literature is expanding, more systems are applying a variety of methods to automate the process of knowledge acquisition and management. In my talk, I focus on the project, GENIA, of our group at the University of Tokyo, the objective of which is to construct an information extraction system of protein - protein interaction from abstracts of MEDLINE. The talk includes (1) Techniques we use fDr named entity recognition (1-a) SOHMM (Self-organized HMM) (1-b) Maximum Entropy Model (1-c) Lexicon-based Recognizer (2) Treatment of term variants and acronym finders (3) Event extraction using a full parser (4) Linguistic resources for text mining (GENIA corpus) (4-a) Semantic Tags (4-b) Structural Annotations (4-c) Co-reference tags (4-d) GENIA ontology I will also talk about possible extension of our work that links the findings of molecular biology with clinical findings, and claim that textual based or conceptual based biology would be a viable alternative to system biology that tends to emphasize the role of simulation models in bioinformatics.

  • PDF

Conceptual Graph Matching Method for Reading Comprehension Tests

  • Zhang, Zhi-Chang;Zhang, Yu;Liu, Ting;Li, Sheng
    • Journal of information and communication convergence engineering
    • /
    • v.7 no.4
    • /
    • pp.419-430
    • /
    • 2009
  • Reading comprehension (RC) systems are to understand a given text and return answers in response to questions about the text. Many previous studies extract sentences that are the most similar to questions as answers. However, texts for RC tests are generally short and facts about an event or entity are often expressed in multiple sentences. The answers for some questions might be indirectly presented in the sentences having few overlapping words with the questions. This paper proposes a conceptual graph matching method towards RC tests to extract answer strings. The method first represents the text and questions as conceptual graphs, and then extracts subgraphs for every candidate answer concept from the text graph. All candidate answer concepts will be scored and ranked according to the matching similarity between their sub-graphs and question graph. The top one will be returned as answer seed to form a concise answer string. Since the sub-graphs for candidate answer concepts are not restricted to only covering a single sentence, our approach improved the performance of answer extraction on the Remedia test data.

Registration of Aerial Image with Lines using RANSAC Algorithm

  • Ahn, Y.;Shin, S.;Schenk, T.;Cho, W.
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.25 no.6_1
    • /
    • pp.529-536
    • /
    • 2007
  • Registration between image and object space is a fundamental step in photogrammetry and computer vision. Along with rapid development of sensors - multi/hyper spectral sensor, laser scanning sensor, radar sensor etc., the needs for registration between different sensors are ever increasing. There are two important considerations on different sensor registration. They are sensor invariant feature extraction and correspondence between them. Since point to point correspondence does not exist in image and laser scanning data, it is necessary to have higher entities for extraction and correspondence. This leads to modify first, existing mathematical and geometrical model which was suitable for point measurement to line measurements, second, matching scheme. In this research, linear feature is selected for sensor invariant features and matching entity. Linear features are incorporated into mathematical equation in the form of extended collinearity equation for registration problem known as photo resection which calculates exterior orientation parameters. The other emphasis is on the scheme of finding matched entities in the aide of RANSAC (RANdom SAmple Consensus) in the absence of correspondences. To relieve computational load which is a common problem in sampling theorem, deterministic sampling technique and selecting 4 line features from 4 sectors are applied.

COVID-19 recommender system based on an annotated multilingual corpus

  • Barros, Marcia;Ruas, Pedro;Sousa, Diana;Bangash, Ali Haider;Couto, Francisco M.
    • Genomics & Informatics
    • /
    • v.19 no.3
    • /
    • pp.24.1-24.7
    • /
    • 2021
  • Tracking the most recent advances in Coronavirus disease 2019 (COVID-19)-related research is essential, given the disease's novelty and its impact on society. However, with the publication pace speeding up, researchers and clinicians require automatic approaches to keep up with the incoming information regarding this disease. A solution to this problem requires the development of text mining pipelines; the efficiency of which strongly depends on the availability of curated corpora. However, there is a lack of COVID-19-related corpora, even more, if considering other languages besides English. This project's main contribution was the annotation of a multilingual parallel corpus and the generation of a recommendation dataset (EN-PT and EN-ES) regarding relevant entities, their relations, and recommendation, providing this resource to the community to improve the text mining research on COVID-19-related literature. This work was developed during the 7th Biomedical Linked Annotation Hackathon (BLAH7).

Building a Business Knowledge Base by a Supervised Learning and Rule-Based Method

  • Shin, Sungho;Jung, Hanmin;Yi, Mun Yong
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.9 no.1
    • /
    • pp.407-420
    • /
    • 2015
  • Natural Language Question Answering (NLQA) and Prescriptive Analytics (PA) have been identified as innovative, emerging technologies in 2015 by the Gartner group. These technologies require knowledge bases that consist of data that has been extracted from unstructured texts. Every business requires a knowledge base for business analytics as it can enhance companies' competitiveness in their industry. Most intelligent or analytic services depend a lot upon on knowledge bases. However, building a qualified knowledge base is very time consuming and requires a considerable amount of effort, especially if it is to be manually created. Another problem that occurs when creating a knowledge base is that it will be outdated by the time it is completed and will require constant updating even when it is ready in use. For these reason, it is more advisable to create a computerized knowledge base. This research focuses on building a computerized knowledge base for business using a supervised learning and rule-based method. The method proposed in this paper is based on information extraction, but it has been specialized and modified to extract information related only to a business. The business knowledge base created by our system can also be used for advanced functions such as presenting the hierarchy of technologies and products, and the relations between technologies and products. Using our method, these relations can be expanded and customized according to business requirements.

Linking Korean Predicates to Knowledge Base Properties (한국어 서술어와 지식베이스 프로퍼티 연결)

  • Won, Yousung;Woo, Jongseong;Kim, Jiseong;Hahm, YoungGyun;Choi, Key-Sun
    • Journal of KIISE
    • /
    • v.42 no.12
    • /
    • pp.1568-1574
    • /
    • 2015
  • Relation extraction plays a role in for the process of transforming a sentence into a form of knowledge base. In this paper, we focus on predicates in a sentence and aim to identify the relevant knowledge base properties required to elucidate the relationship between entities, which enables a computer to understand the meaning of a sentence more clearly. Distant Supervision is a well-known approach for relation extraction, and it performs lexicalization tasks for knowledge base properties by generating a large amount of labeled data automatically. In other words, the predicate in a sentence will be linked or mapped to the possible properties which are defined by some ontologies in the knowledge base. This lexical and ontological linking of information provides us with a way of generating structured information and a basis for enrichment of the knowledge base.

Extraction of Characteristic Information for Image Matching (영상매칭을 위한 특성정보 추출)

  • 이동천;염재홍;김정우;이용욱
    • Proceedings of the Korean Society of Surveying, Geodesy, Photogrammetry, and Cartography Conference
    • /
    • 2004.04a
    • /
    • pp.171-176
    • /
    • 2004
  • Image matching is fundamental process in photogrammetry and computer vision to identify and to measure corresponding features on the multiple images. Uniqueness of the matching entities and robustness of the algorithm are the key issues that have influence on quality of the matching result. The optimal solution could be obtained by utilizing appropriate matching entities in the first place. In this study, candidate matching points were extracted by interest operator, and an area-based matching method was applied with characteristics of the gray value distribution as the matching entities. The characteristic information is based on the concept of "intrinsic image" (or parameter image). The information was utilized as additional and/or complementary matching entities. Matching on interest points with the characteristic information resulted in high quality of matching because matching windows were created with surrounding pixels of the interest points that contain distinct and unique features. The experiment shows that matching quality and reliability increase by exploiting interest operator, and the characteristic information has potential to be matching entity.

  • PDF

Syllables-based Named Entity Extraction and Automatic Corpus Construction using Bidirectional Dynamic LSTM (Bidirectional Dynamic LSTM을 이용한 음절 단위 개체명 추출 및 자동화된 말뭉치 구축)

  • Oh, Sungsik;Lim, Changdae;Ahn, Keeho;Park, Weijin
    • 한국어정보학회:학술대회논문집
    • /
    • 2017.10a
    • /
    • pp.317-320
    • /
    • 2017
  • 개체명 인식은 자연어 문장에서 장소, 제작물, 사람 등 분류를 통한 의미 부여가 가능한 단어를 파악하는 기술로서 의미 분석을 위한 핵심 기술이다. 현재 많은 개체명 분석 관련 연구들은 형태소 분석 결과에 의존적인 형태를 갖고 있어서, 형태소 분석 결과의 정확성이 개체명 분석 결과의 성능에 영향을 미치고 있다. 본 연구에서는 형태소 분석 과정을 거치지 않는 음절 기반의 개체명 분석 기술을 제안하여 형태소 분석의 정확도가 낮은 통신어, 신조어 분석 성능을 향상하였다. 또한, 자동화된 방법으로 음절 단위 개체명 말뭉치 및 개체명 사전을 구축하는 프로세스를 정의하여 개체명 분석의 정확도 향상 및 인지 범주의 확대를 도모하였다. 본 연구에서 제안한 개체명 인식 기술은 한국어 개체명 표준에 기반한 129가지의 개체명 분류가 가능하며, 이는 자연어 처리 기술이 필요한 산업계에서 상용화하는데 큰 기여를 할 것으로 판단된다.

  • PDF