DOI QR코드

DOI QR Code

A Study on Ontology and Topic Modeling-based Multi-dimensional Knowledge Map Services

온톨로지와 토픽모델링 기반 다차원 연계 지식맵 서비스 연구

  • Jeong, Hanjo (NTIS Center, Division of Advanced Information Convergence, Korea Institute of Science and Technology Information (KISTI))
  • 정한조 (한국과학기술정보연구원 (KISTI), 첨단정보융합본부, NTIS 센터)
  • Received : 2015.11.25
  • Accepted : 2015.12.12
  • Published : 2015.12.30

Abstract

Knowledge map is widely used to represent knowledge in many domains. This paper presents a method of integrating the national R&D data and assists of users to navigate the integrated data via using a knowledge map service. The knowledge map service is built by using a lightweight ontology and a topic modeling method. The national R&D data is integrated with the research project as its center, i.e., the other R&D data such as research papers, patents, and reports are connected with the research project as its outputs. The lightweight ontology is used to represent the simple relationships between the integrated data such as project-outputs relationships, document-author relationships, and document-topic relationships. Knowledge map enables us to infer further relationships such as co-author and co-topic relationships. To extract the relationships between the integrated data, a Relational Data-to-Triples transformer is implemented. Also, a topic modeling approach is introduced to extract the document-topic relationships. A triple store is used to manage and process the ontology data while preserving the network characteristics of knowledge map service. Knowledge map can be divided into two types: one is a knowledge map used in the area of knowledge management to store, manage and process the organizations' data as knowledge, the other is a knowledge map for analyzing and representing knowledge extracted from the science & technology documents. This research focuses on the latter one. In this research, a knowledge map service is introduced for integrating the national R&D data obtained from National Digital Science Library (NDSL) and National Science & Technology Information Service (NTIS), which are two major repository and service of national R&D data servicing in Korea. A lightweight ontology is used to design and build a knowledge map. Using the lightweight ontology enables us to represent and process knowledge as a simple network and it fits in with the knowledge navigation and visualization characteristics of the knowledge map. The lightweight ontology is used to represent the entities and their relationships in the knowledge maps, and an ontology repository is created to store and process the ontology. In the ontologies, researchers are implicitly connected by the national R&D data as the author relationships and the performer relationships. A knowledge map for displaying researchers' network is created, and the researchers' network is created by the co-authoring relationships of the national R&D documents and the co-participation relationships of the national R&D projects. To sum up, a knowledge map-service system based on topic modeling and ontology is introduced for processing knowledge about the national R&D data such as research projects, papers, patent, project reports, and Global Trends Briefing (GTB) data. The system has goals 1) to integrate the national R&D data obtained from NDSL and NTIS, 2) to provide a semantic & topic based information search on the integrated data, and 3) to provide a knowledge map services based on the semantic analysis and knowledge processing. The S&T information such as research papers, research reports, patents and GTB are daily updated from NDSL, and the R&D projects information including their participants and output information are updated from the NTIS. The S&T information and the national R&D information are obtained and integrated to the integrated database. Knowledge base is constructed by transforming the relational data into triples referencing R&D ontology. In addition, a topic modeling method is employed to extract the relationships between the S&T documents and topic keyword/s representing the documents. The topic modeling approach enables us to extract the relationships and topic keyword/s based on the semantics, not based on the simple keyword/s. Lastly, we show an experiment on the construction of the integrated knowledge base using the lightweight ontology and topic modeling, and the knowledge map services created based on the knowledge base are also introduced.

미래 핵심 가치 기술 발굴 및 탐색을 위해서는 범국가적인 국가R&D정보와 과학기술정보의 연계 융합이 필요하다. 본 논문에서는 국가R&D정보와 과학기술정보를 온톨로지와 토픽모델링을 사용하여 연계 융합하여 지식베이스를 구축한 방법론을 소개하고, 이를 기반으로 한 다차원 연계 지식맵 서비스를 소개한다. 국가R&D정보는 국가R&D과제와 참여인력, 해당 과제에 대한 성과 정보, 논문, 특허, 연구보고서 정보들을 포함한다. 과학기술정보는 논문, 특허, 동향 등의 과학기술연구에 대한 기술 문서를 일컫는다. 본 논문에서는 지식베이스에서의 지식 처리 및 관리의 효율성을 높이기 위해 Lightweight 온톨로지를 사용한다. Lightweight 온톨로지는 국가R&D과제 참여자와 성과정보, 과학기술정보를 과제-성과 관계, 문서-저자 관계, 저자-소속기관 관계 등의 단순한 연관관계를 이용하여 국가R&D정보와 과학기술정보를 융합한다. 이러한 단순한 연관관계만을 이용함으로써 지식 처리의 효율성을 높이고 온톨로지 구축 과정을 자동화한다. 보다 구체적인 Concept 레벨에서의 온톨로지 구축을 위해 토픽모델링을 활용한다. 토픽모델링을 활용하여 국가R&D정보와 과학기술정보 문서들의 토픽 주제어를 추출하고 각 문서 간 연관관계를 추출한다. 일반적인 Concept 레벨에서의 Fully-Specified 온톨로지를 구축하기 위해서는 거의 100% 수동으로 해야 하기 때문에, 많은 시간과 비용이 소모된다. 본 연구에서는 이러한 수동적인 온톨로지 구축이 아닌 자동화된 온톨로지 구축을 위해 토픽모델링을 활용한다. 토픽모델링을 활용하여 온톨로지 구축에 필요한 문서와 토픽 키워드 간의 관계, 문서 간 의미 상 연관관계를 자동으로 추출한다. 마지막으로, 이와 같이 구축된 지식베이스의 트리플(Triple) 정보를 활용하여, 연구자들의 공동저자관계, 문서간의 공통주제어관계 등을 연구자, 주제어, 기관, 저널 등의 다차원 연관관계를 방사형 네트워크 형식을 이용하여 시각화한 지식맵 서비스들을 소개한다.

Keywords

References

  1. Ahmad, M. N. and R. M. Colomb, "Managing ontologies: a comparative study of ontology servers," Proceedings of the eighteenth conference on Australasian database, Vol.63 (2007), 13-22.
  2. Blei, D. M., A. Y. Ng and M. I. Jordan, "Latent dirichlet allocation," The Journal of machine Learning research, Vol.3(2003), 993-1022.
  3. Blei, D. M., "Probabilistic topic models," Communications of the ACM, Vol.55, No.4(2012), 77-84. https://doi.org/10.1145/2133806.2133826
  4. Brickley, D. and R. V. Guha, RDF Schema 1.1, W3C, 2014, Available at http://www.w3.org/TR/rdf-schema/ (Downloaded 14 December, 2015).
  5. Businska, L., I. Supulniece and M. Kirikova, "On data, information, and knowledge representation in business process models," Information Systems Development, Springer New York, 2013, 613-627.
  6. Eppler, M. J., "Making knowledge visible through intranet knowledge maps: concepts, elements, cases," Proceedings of the 34th Annual Hawaii International Conference on System Sciences, (2001), 9-18.
  7. Hofmann, T., "Probabilistic latent semantic indexing," Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, (1999), 50-57.
  8. Howard, R. A., "Knowledge maps," Management science, Vol.35, No.8(1989), 903-922. https://doi.org/10.1287/mnsc.35.8.903
  9. Kang, I., Y. Park, and Y. Kim, "A framework for designing a workflow-based knowledge map," Business process management journal, Vol.9, No.3(2003), 281-294. https://doi.org/10.1108/14637150310477894
  10. Klavans, R. and K. W. Boyack, "Toward a consensus map of science," Journal of the American Society for information science and technology, Vol.60, No.3(2009), 455-476. https://doi.org/10.1002/asi.20991
  11. Leydesdorff, L. and I. Rafols, "A global map of science based on the ISI subject categories," Journal of the American Society for Information Science and Technology, Vol.60, No.2(2009), 348-362. https://doi.org/10.1002/asi.20967
  12. McCagg, E. C. and D. F. Dansereau, "A convergent paradigm for examining knowledge mapping as a learning strategy," The Journal of Educational Research, Vol.84, No.6(1991), 317-324. https://doi.org/10.1080/00220671.1991.9941812
  13. Morbach, J., A. Wiesner, and W. Marquardt, "OntoCAPE-A (re) usable ontology for computer-aided process engineering," Computers & Chemical Engineering, Vol.33, No.10 (2009), 1546-1556. https://doi.org/10.1016/j.compchemeng.2009.01.019
  14. W3C RDF Working Group, Resource Description Framework (RDF) 1.1, W3C, 2014, Available at http://www.w3.org/RDF/(Downloaded14 December,2015).
  15. Prud'hommeaux, E. and A. Seaborne, SPARQL Query Language for RDF, W3C, 2008, Available at http://www.w3.org/TR/rdfsparql-query/ (Downloaded 14 December, 2015).
  16. Rao, L., G. Mansingh and K. M. Osei-Bryson, "Building ontology based knowledge maps to assist business process re-engineering," Decision Support Systems, Vol.52, No.3(2012), 577-589. https://doi.org/10.1016/j.dss.2011.10.014
  17. Salton, G. and M. J. Mcgill, Introduction to modern information retrieval, McGraw-Hill, New York, 1986.
  18. W3C OWL Working Group, OWL2 Web Ontology Language (Second Edition), W3C, 2012, Available at http://www.w3.org/TR/2012/RECowl2-overview-20121211/ (Downloaded 14 December, 2015).

Cited by

  1. 독후감 텍스트의 토픽모델링 적용에 관한 탐색적 연구 vol.47, pp.4, 2015, https://doi.org/10.16981/kliss.47.4.201612.1
  2. Development of Intelligent Information System for Digital Cultural Contents vol.9, pp.3, 2015, https://doi.org/10.3390/math9030238