DOI QR코드

DOI QR Code

Building Concept Networks using a Wikipedia-based 3-dimensional Text Representation Model

위키피디아 기반의 3차원 텍스트 표현모델을 이용한 개념망 구축 기법

  • 홍기주 (서울시립대학교 전자전기컴퓨터공학부) ;
  • 김한준 (서울시립대학교 전자전기컴퓨터공학부) ;
  • 이승연 (서울시립대학교 전자전기컴퓨터공학부)
  • Received : 2015.03.18
  • Accepted : 2015.06.10
  • Published : 2015.09.15

Abstract

A concept network is an essential knowledge base for semantic search engines, personalized search systems, recommendation systems, and text mining. Recently, studies of extending concept representation using external ontology have been frequently conducted. We thus propose a new way of building 3-dimensional text model-based concept networks using the world knowledge-level Wikipedia ontology. In fact, it is desirable that 'concepts' derived from text documents are defined according to the theoretical framework of formal concept analysis, since relationships among concepts generally change over time. In this paper, concept networks hidden in a given document collection are extracted more reasonably by representing a concept as a term-by-document matrix.

개념망(Concept Network)은 시멘틱 검색, 개인화 검색, 추천, 텍스트마이닝 기법의 개선 등에 필수적인 지식베이스이다. 최근 효과적인 개념망 구축을 위해 온톨로지를 기반으로 하여 개념의 표현을 확장시키는 연구가 활발하다. 이에 본 논문은 World Knowledge로 평가받고 있는 위키피디아 데이터를 '개념' 집합의 원천으로 활용하여 3차원 텍스트 표현 모델 기반 개념망을 구축하는 기법을 제안한다. 사실상 개념들 간의 관계 정보는 시간의 흐름에 따라 변동하기 때문에, 텍스트 문서로부터 도출되는 '개념'은 Formal Concept Analysis 이론체계의 개념에 따르는 것이 바람직하다. 이를 위해 본 논문은 하나의 개념을 '단어'와 '문서' 간의 2차원 행렬로 표현하여 문서집합에 잠재된 개념간의 연관망을 보다 정확하게 생성하게 한다.

Keywords

Acknowledgement

Supported by : 한국연구재단

References

  1. H. Yune, J. Noh, H. Kim, B. Lee, S. Kang, and J. Chang, "Concept Network-based Personalized Web Search Systems," Journal of Korean Society for Internet Information, Vol. 12, No. 2, pp. 63-73, 2011. (in Korean)
  2. V. Nastase, and M. Strube, "Transforming Wikipedia into a large scale multilingual concept network," Artificial Intelligence, Vol. 194, pp. 62-85, 2013. https://doi.org/10.1016/j.artint.2012.06.008
  3. M. Daoud, L. Tamine, and M. Boughanem, "A personalized graph-based document ranking model using a semantic user profile," Proc. of the 18th international conference on User Modeling, Adaption, and Personalization, pp. 171-182, 2010.
  4. D. Milne, and IH. Witten, "An open-source toolkit for mining Wikipedia," Artificial Intelligence, Vol. 194, pp. 222-239, 2013. https://doi.org/10.1016/j.artint.2012.06.007
  5. IH. Witten, and D. Milne, "An effective, low-cost measure of semantic relatedness obtained from Wikipedia links," Proc. of AAAI Workshop on Wikipedia and Artificial Intelligence: an Evolving Synergy, pp. 25-30, 2008.
  6. A. Moro, and R. Navigli, "WiSeNet: Building a Wikipedia-based semantic network with ontologized relations," Proc. of the 21st ACM international conference on Information and knowledge management, pp. 1672-1676, 2012.
  7. E. Gabrilovich, and S. Markovitch, "Overcoming the brittleness bottleneck using Wikipedia: Enhancing text categorization with encyclopedic knowledge," Proc. of AAAI'06, pp. 1301-1306, 2006.
  8. A. Huang, D. Milne, E. Frank, and IH. Witten, "Clustering documents using a Wikipedia-based concept representation," Proc. of 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, pp. 628-636, 2009.
  9. S. Banerjee, and T. Pedersen, "An adapted Lesk algorithm for word sense disambiguation using WordNet," Proc. of International Conference on Computational Linguistics and Intelligent Text Processing, pp. 136-145, 2002.

Cited by

  1. Multidimensional Text Warehousing for Automated Text Classification vol.11, pp.2, 2018, https://doi.org/10.4018/JITR.2018040110