Building Domain Ontology through Concept and Relation Classification

개념 및 관계 분류를 통한 분야 온톨로지 구축

  • 황금하 (한국과학기술원 전자전산학부) ;
  • 신지애 (한국정보통신대학교 전산학과) ;
  • 최기선 (한국과학기술원 전자전산학부)
  • Published : 2008.09.15

Abstract

For the purpose of building domain ontology, this paper proposes a methodology for building core ontology first, and then enriching the core ontology with the concepts and relations in the domain thesaurus. First, the top-level concept taxonomy of the core ontology is built using domain dictionary and general domain thesaurus. Then, the concepts of the domain thesaurus are classified into top-level concepts in the core ontology, and relations between broader terms (BT) - narrower terms (NT) and related terms (RT) are classified into semantic relations defined for the core ontology. To classify concepts, a two-step approach is adopted, in which a frequency-based approach is complemented with a similarity-based approach. To classify relations, two techniques are applied: (i) for the case of insufficient training data, a rule-based module is for identifying isa relation out of non-isa ones; a pattern-based approach is for classifying non-taxonomic semantic relations from non-isa. (ii) For the case of sufficient training data, a maximum-entropy model is adopted in the feature-based classification, where k-NN approach is for noisy filtering of training data. A series of experiments show that performances of the proposed systems are quite promising and comparable to judgments by human experts.

본 논문에서는 분야 온톨로지 구축을 위하여 분야 상위 온톨로지를 구축한 다음, 분야 시소러스의 개념과 관계를 이용하여 분야 상위 온톨로지를 확장하는 방법을 제안한다. 이를 위하여 우선 일반분야 시소러스와 분야 사전을 이용하여 분야 상위 개념 분류체계를 구축한다. 다음, 분야 시소러스의 개념을 분야 상위 온톨로지의 상위 개념으로 분류하고, 광의어(Broader Term: BT)-협의어(Narrower Term: NT) 및 광의어-관련어(Related Term: RT) 사이의 관계를 분야 상위 온톨로지에서 정의한 의미관계로 분류한다. 개념 분류는 두 단계로 진행되는데, 1단계에서는 빈도수 기반 방법, 2단계에서는 유사도 기반방법을 적용하여 시소러스 개념을 분야 상위 온톨로지의 개념으로 분류한다. 관계 분류에서는 두 가지 방법을 적용하였는데, (i) 훈련데이타가 부족한 경우를 위하여 규칙기반 방법으로 BT-NT/RT관계를 iso와 기타 관계(non-isa관계)로 분류하고, 다시 패턴기반 방법으로 non-isa관계를 온톨로지를 위한 의미관계로 분류한다. (ii) 훈련데이타를 충분히 가지고 있을 경우, 최대 엔트로피 모델(MEM)을 적용한 특징기반 분류 기법을 사용하되, k-Nearest Neighbors(k-NN)방법으로 훈련데이타를 정제하였다. 본 논문에서 제안한 방법으로 시스템을 구축하였고, 실험 결과 사람에 의한 판단 결과와 비교 가능한 성능을 보여 주었다.

Keywords

References

  1. 최기선, 류법모, "온톨로지 구축과 학습: 상하위 관계", 정보과학회지, 24(4), 2006.4
  2. 최호섭, 임지희, 배영준, 최수일, 옥철영, "온톨로지 구축 방법과 사례", 정보과학회지, 24(4), 2006.4
  3. 고영만, "시소러스 기반 온톨로지에 관한 연구", 성균관대학교, 정보관리 제5집, 2006
  4. Gruber, T.R., "A Translation Approach to Portable Ontology Specifications," Knowledge Acquisition, 5 (2), 1993
  5. "Inspec v2.0 Getting Started Guide," http://scientific.thomson.com/media/scpdf/inspec_gettingstarted_en.pdf
  6. Soergel, D., B. Lauser, A. Liang, F. Fisseha, J. Keizer, S. Katz, "Reengineering Thesauri for New Applications: the AGROVOC Example," Journal of Digital Information, 4(4), Mar 2004
  7. Kawtrakul, A., A. Imsombut, A. Thunkijjanukit, D. Soergel, A. Liang, M. Sini, G. Johannsen, J. Keizer, "Automatic Term Relationship Cleaning and Refinement for AGROVOC," Workshop on the 6th Agricultural Ontology Service, Jul 2005
  8. Drummond, N., M. Horridge, R. Stevens, C. Wroe, S. Sampaio, "Pizza Ontology v1.5," http://www. co-ode.org/ontologies/pizza/, 2007
  9. Navigli, R., P. Velardi, "Learning Domain Ontologies from Document Warehouses and Dedicated Web Sites," Computational Linguistics, 30 (2), 2004
  10. "Summary Report on Taxonomic Databases Working Group(TDWG) Core Ontology Meeting," Edinburgh, UK, May 2006
  11. Oberle, D., S. Lamparter, A. Eberhart, S. Staab, S. Grimm, P. Hitzler, S. Agarwal, R. Studer, "Semantic Management of Web Services using the Core Ontology of Services," W3C Workshop on Frameworks for Semantics in Web Services (Position Paper), 2005
  12. Doerr, M., J. Hunter, C. Lagoze, "Towards a Core Ontology for Information Integration," In Journal of Digital information, 4(1), Apr 2003
  13. KORTERM, http://korterm.or.kr/, IT분야 전문용어 사전
  14. D. Sleeman, S. Potter, D. Robertson, and M. Schorlemmer, "Ontology Extraction for Distributed Environments," In Proceedings of Workshop on Knowledge Transformations for the Semantic Web (ECAI-02), Jul 2002
  15. 황금하, 이신목, 남윤영, 신지애, 최기선, "시소러스를 이용한 온톨로지 구축에서의 Isa 관계 설정", 한국정보과학회 제 33회 정기 총회 및 추계학술대회 논문집, 서울, 2006.10
  16. Assem, M.V., V. Malaisé, A. Miles, G. Schreiber, "A Method to Convert Thesauri to SKOS," In Proceedings in the 3rd European Semantic Web Conference, Jun 2006
  17. Alani, H., "Ontology Construction from Online Ontologies," The 5th International Semantic Web Conference (Position paper), Nov. 2006
  18. Golbeck, J., G. Fragoso, F. Hartel, J. Hendler, J. Oberthaler, B. Parsia, "The National Cancer Institute's Thesaurus and Ontology," Journal of Web Semantics, 1 (1), Dec 2003
  19. Wielinga, B., Schreiber, G., Wielemaker, J., & Sandberg, J.A.C., "From thesaurus to ontology," International Conference on Knowledge Capture, Oct 2001
  20. Kang, S.J., J.H. Lee, "Semi-Automatic Practical Ontology Construction by Using a Thesaurus," Computational Dictionaries, and Large Corpora, Workshop on Human Language Technology and Knowledge Management (ACL2001), Jul 2001
  21. Mika, P., D. Oberle, A. Gangemi, M. Sabou, "Foundations for service ontologies: Aligning owl-s to dolce," The 13th International World Wide Web Conference. 2004
  22. A. Gangemi, F. Fisseha, J. Keizer, J. Lehmann, A. Liang, I. Pettman, M. Sini, M. Taconet, "A Core Ontology of Fishery and its Use in the Fishery Ontology Service Project," EKAW04 Workshop on Core Ontologies in Ontology Engineering, Oct 2004
  23. Gangemi, A., P. Mika, M. Sabou, D. Oberle. "An Ontology of Services and Service Descriptions," Technical report, Laboratory for Applied Ontology (ISTC-CNR), 2003
  24. Breuker, J., R. Hoekstra. "Epistemology and ontology in core ontologies: FOLaw and LRI-Core,. two core ontologies for law," EKAW04 Workshop on Core Ontologies in Ontology Engineering, Oct 2004
  25. C. Baker, M. Ellsworth, K. Erk, "SemEval'07 Task 19: Frame Semantic Structure Extraction," The 4th International Workshop on Semantic Evaluations (SemEval-2007), Jun 2007
  26. R. Girju, P. Nakov, V. Nastase, S. Szpakowicz, P. Turney, D. Yuret, "SemEval-2007 Task 04: Classification of Semantic Relations between Nominal," In the Proceedings of the 4th International Workshop on Semantic Evaluations (SemEval- 2007), Jun 2007
  27. Choi, K.S., H.S. Bae, "Procedures and Problems in Korean-Chinese-Japanese Wordnet with Shared Semantic Hierarchy," The Global WordNet Conference, Jan 2004
  28. 황금하, 이주호, 최기선, "소스-채널 모델을 이용한 한국어 전단어 의미태깅 시스템", 2004년도 한국인지과학회 춘계학술대회. 2004. 06
  29. Zhang, L., "Maximum Entropy Toolkit for Python and C++," 2004