DOI QR코드

DOI QR Code

A Method for Extracting Relationships Between Terms Using Pattern-Based Technique

패턴 기반 기법을 사용한 용어 간 관계 추출 방법

  • 김영태 ((주)케이엔씨 기업부설연구소) ;
  • 김치수 (공주대학교 컴퓨터공학부)
  • Received : 2018.02.26
  • Accepted : 2018.06.05
  • Published : 2018.08.31

Abstract

With recent increase in complexity and variety of information and massively available information, interest in and necessity of ontology has been on the rise as a method of extracting a meaningful search result from massive data. Although there have been proposed many methods of extracting the ontology from a given text of a natural language, the extraction based on most of the current methods is not consistent with the structure of the ontology. In this paper, we propose a method of automatically creating ontology by distinguishing a term needed for establishing the ontology from a text given in a specific domain and extracting various relationships between the terms based on the pattern-based method. To extract the relationship between the terms, there is proposed a method of reducing the size of a searching space by taking a matching set of patterns into account and connecting a join-set concept and a pattern array. The result is that this method reduces the size of the search space by 50-95% without removing any useful patterns from the search space.

최근 정보의 복잡성과 다양성 및 방대한 양의 가용 정보가 증가함에 따라 대규모 데이터로부터 의미 있는 검색 결과를 추출하는 방법으로서 온톨로지에 대한 관심과 필요성이 증가하고 있다. 주어진 자연어 텍스트로부터 온톨로지를 추출하는 많은 방법이 제안되었지만, 현재 대부분의 방법은 온톨로지의 구조에 일치되도록 추출하지 못하는 실정이다. 본 논문에서는 온톨로지를 구축하기 위해 필요한 용어를 특정 도메인에서 주어진 텍스트와 구별하고 패턴 기반 방법을 기반으로 용어 사이의 다양한 관계를 추출하는 방법을 제안한다. 용어들 간의 관계를 추출하기 위해 일치 패턴 집합을 고려하고 조인 집합 개념과 패턴의 정렬을 연결하여 검색 공간의 크기를 줄이는 방법을 제안한다. 그 결과 이 방법이 검색 공간으로부터 유용한 어떤 패턴도 제거하지 않고 50-95% 정도로 검색 공간의 크기를 줄이는 결과를 보였다.

Keywords

References

  1. Y. T. Kim, J. H. Lim, and C. S. Kim, "UML changes for efficient ontology development," Journal of the Korea Academia-Industrial Cooperation Society, Vol.9, No2, pp.415-421, 2008. https://doi.org/10.5762/KAIS.2008.9.2.415
  2. B. Smith, and C. Fellbaum, "Medical WordNet: a new methodology for the construction and validation of information resources for consumer health," In Proceedings of the 20th International Conference on Computational Linguistics, pp.371, 2004.
  3. C. F. Baker, C. J. Fillmore, and J. B. Lowe, "The Berkeley FrameNet Project," In Proceedings of the 17th International Conference on Computational Linguistics, pp.86-90, 1998.
  4. J. Gonzalo, F. Verdejo, I. Chugur, and J. Cigarran, "Indexing with WordNet synsets can improve Text Retrieval," arXiv preprint cmp-lg/9808002., 1998.
  5. M. Q. Stearns, C. Price, K. A. Spackman, and A. Y. Wang, "SNOMED clinical terms: Overview of the development process and project status," In Proceedings of AMIA Symposium, pp.662-666, 2001.
  6. O. Bodenreider, A. Burgun and T. C. Rindflesch, "Lexically suggested hyponymic relations among medical terms and their representation in the UMLS," In TIA'2001: Proceedings of Terminology and Artificial Intelligence, pp.11-21, 2001.
  7. M. A. Hearst, "Automatic acquisition of hyponyms from large text corpora," In Proceedings of the 14th Conference on Computational Linguistics, pp.539-545, 1992.
  8. P. Cimiano and S. Staab, "Learning by Googling," ACM SIGKDD Explorations Newsletter, Vol.6, No.2, pp.24-33, 2004. https://doi.org/10.1145/1046456.1046460
  9. X. Yang and J. Su, "Coreference resolution using semantic relatedness information from automatically discovered patterns," In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pp.528-535, 2007.
  10. S. Soderland, "Learning information extraction rules for semi-structured and free text," Machine Learning, Vol.34, No.3, pp.233-272, 1999. https://doi.org/10.1023/A:1007562322031