DOI QR코드

DOI QR Code

토픽 모델을 사용한 도메인 중심 질의 확장 기술

Domain Centered Query Expansion Technique using Topic Model

  • 이상훈 (에모리대학교 신경학과) ;
  • 문승진 (수원대학교 컴퓨터학과)
  • 투고 : 2016.06.02
  • 심사 : 2016.09.26
  • 발행 : 2017.11.15

초록

정보검색에서 질의확장은 가장 널리 알려진 기술로서 사용자가 입력한 질의에 외부적인 지식을 추가해서 조건에 맞게 질의를 확장시켜 검색도구의 능력을 향상시키는데 많이 사용되어 왔다. 하지만, 질의에 사용되는 단어의 애매모호함은 검색도구가 성능을 낮추기 때문에 이러한 문제는 여전히 풀어야 할 과제로 남아있다. 본 논문에서는 단어의 의미를 나타낼 수 있는 도메인을 사용해서 이러한 문제를 해결하는 방법을 제시한다. 특히 토픽 모델을 이용한 도메인 중심 모델을 사용해서 질의를 확장하는 기술을 제안한다. 실험은 기존 모델들과 비교로 이루어졌고, 그 결과 제시된 방법은 높은 성능을 보이는 것으로 나타났다.

In the area of Information Retrieval, Query Expansion is a well-known technique that uses external knowledge to increase an inquiry generated by users. However, ambiguous words used in the query decrease the performance of search tools. In this paper, we propose a solution to the above problem, by using domain knowledge which identifies the meaning of words in the query. In particular, we present a domain centered query expansion technique that magnifies a query using domains. By comparing with various query expansion models, we demonstrate that the proposed model performs better than the other models.

키워드

참고문헌

  1. Ekmekcioglu, F. Cuna, Alexander M. Robertson, and Peter Willett, "Effectiveness of query expansion in ranked-output document retrieval systems," Journal of Information Science, Vol. 18, No. 2, pp. 139-147, 1992. https://doi.org/10.1177/016555159201800208
  2. E. Voorhees, "Using WordNet to disambiguate Word Senses for Text retrieval," ACM SIGIR, Pittsbourgh, PA, 1993.
  3. Miller, George A., Richard Beckwith, Christiane Fellbaum, Derek Gross, and Katherine J. Miller, "Introduction to wordnet: An on-line lexical database," International journal of lexicography, Vol. 3, No. 4, pp. 235-244, 1990. https://doi.org/10.1093/ijl/3.4.235
  4. Gonzalo, Julio, Felisa Verdejo, Irina Chugur, and Juan Cigarran, "Indexing with WordNet synsets can improve text retrieval," Proc. of COLING/ACL 1998, Workshop on Usage of WordNet for Natural Language Processing, 1998.
  5. Carpineto, Claudio, Giovanni Romano, and Vittorio Giannini, "Improving retrieval feedback with multiple term-ranking function combination," ACM Transactions on Information Systems (TOIS), Vol. 20, No. 3, pp. 259-290, 2002. https://doi.org/10.1145/568727.568728
  6. Cui, Hang, Ji-Rong Wen, Jian-Yun Nie, and Wei-Ying Ma, "Probabilistic query expansion using query logs," Proc. of the 11th international conference on World Wide Web, pp. 325-332, ACM, 2002.
  7. Bentivogli, Luisa, Pamela Forner, Bernardo Magnini, and Emanuele Pianta, "Revising the wordnet domains hierarchy: semantics, coverage and balancing," Proc. of the Workshop on Multilingual Linguistic Ressources, pp. 101-108, Association for Computational Linguistics, 2004.
  8. Blei, David M., Andrew Y. Ng, and Michael I. Jordan, "Latent dirichlet allocation," the Journal of machine Learning research, Vol. 3, pp. 993-1022, 2003.
  9. Gliozzo, Alfio Massimiliano, Bernardo Magnini, and Carlo Strapparava, "Unsupervised Domain Relevance Estimation for Word Sense Disambiguation," EMNLP, pp. 380-387. 2004.
  10. Salton, Gerard, Anita Wong, and Chung-Shu Yang, "A vector space model for automatic indexing," Communications of the ACM 18, no. 11 (1975):613-620. https://doi.org/10.1145/361219.361220
  11. Deerwester, Scott C., Susan T. Dumais, Thomas K. Landauer, George W. Furnas, and Richard A. Harshman, "Indexing by latent semantic analysis," JAsIs 41, No. 6, pp. 391-407, 1990. https://doi.org/10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9
  12. Hofmann, Thomas, "Probabilistic latent semantic indexing," Proc. of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pp. 50-57, ACM, 1999.
  13. Hersh, William, Chris Buckley, T. J. Leone, and David Hickam, "OHSUMED: An interactive retrieval evaluation and new large test collection for research," SIGIR'94, pp. 192-201, Springer London, 1994.
  14. Sahlgren, Magnus, "An introduction to random indexing," Methods and applications of semantic indexing workshop at the 7th international conference on terminology and knowledge engineering, TKE, Vol. 5. 2005.
  15. Lipscomb, Carolyn E., "Medical subject headings (MeSH)," Bulletin of the Medical Library Association, Vol. 88, No. 3, pp. 265, 2000.