An Analysis of the Applications of the Language Models for Information Retrieval

정보검색에서의 언어모델 적용에 관한 분석

  • 김희섭 (경북대학교 문헌정보학과) ;
  • 정영미 (경북대학교 문헌정보학과)
  • Published : 2005.06.01

Abstract

The purpose of this study is to examine the research trends and their experiment results on the applications of the language models for information retrieval. We reviewed the previous studies with the following categories: (1) the first generation of language modeling information retrieval (LMIR) experiments which are mainly focused on comparing the language modeling information retrieval with the traditional retrieval models in their retrieval performance, and (2) the second generation of LMIR experiments which are focused on comparing the expanded language modeling information retrieval with the basic language models in their retrieval performance. Through the analysis of the previous experiments results, we found that (1) language models are outperformed the probabilistic model or vector space model approaches, and (2) the expended language models demonstrated better results than the basic language models in their retrieval performance.

본 연구의 목적은 정보검색 분야에서의 언어모델의 적용에 관한 연구동향을 개관하고 이 분야의 선행연구 결과들을 분석해 보는 것이다. 선행연구들은 (1)전통적인 모델 기반 정보검색과 언어모델링 정보검색의 성능 비교 실험에 초점을 두고 있는 1세대 언어모델링 정보검색(LMIR)과 (2)기본적인 언어모델링 정보검색과 확장된 언어모델링 정보검색의 성능 비교를 통해 보다 우수한 언어모델링 확장기법을 찾아내는 것에 초점을 두고 있는 2세대 LMIR로 구분하여 분석하였다. 선행연구들의 실험결과를 분석해 본 결과 첫째, 언어모델링 정보검색은 확률모델, 벡터모델 정보검색보다 그 성능이 뛰어나고 둘째 확장된 언어모델들은 기본적인 언어 모델 정보검색보다 그 성능이 우수한 것으로 나타났다.

Keywords

References

  1. 강미경. 권혁철, "효율적인 문서처리를 위한 띄어쓰기 교정 기법 개선." 한국정보과학회, 2003 봄 학술발표논문집(B)(2003),pp.486-488
  2. 강승식. "음절 bigram을 이용한 띄어쓰기 오류의 자동 교정." 음성과학회논문지, 제8권, 제2호(2001) pp.83-90
  3. 박선희, 노용완, 홍광성, "문장음성인식을 위한 VCCV 기반의 언어모텔과 Smoothing 기법 평가." 정보처리학회논문지B, 제11-B권, 제2호(2004,4), pp.241-246
  4. 심철민, 권혁철, "언어 정보에 기반한 한국어 철자 검사와 교정기의 구현." 정보과학회논문지, 제23 권, 제8호(1996),pp.776-785
  5. 이도길 외. "한글 문장의 자동 띄어쓰기를 위한 두 가지 통계적 모델." 정보과학회논문지 : 소프트웨어 및 응용, 제30권, 제4호(2003), pp.358-370
  6. 이진석, 박재득, 이근배, 'K-SLM Toolkit을 이용한 한국어의 통계적 언어 모델링 비교,' 제11회 한글 및 한국어정보처리 학술대회 논문발표집(1999) (http://nlp.postech.ac.kr/lab_papers/9910_h%26h_wolfpack.doc) [cited 2005. 4. 12]
  7. 최학윤. "Back-off bigram을 이용한 대용량 연속어의 화자적응에 관한 연구," 한국통신학회논문지,Vol.28, No.9C(2003, 9), pp.884-890
  8. Croft,W. Bruce, Jamie Callan, John Lafferty, Workshop on Language Modeling and Information Retrieval, Carnegie Mellon University, Pittsburgh, Pennsylvania, 2001
  9. Croft, W. Bruce, "Language Models for Information Retrieval." Proceedings of the 19th nternational Conference on Data Englneering(2003), pp,3-7
  10. Gao, Jianfeng et al, "Dependence Language Model for Information Retrieval," SIGIR'04, Sheffield, South Yorkshire(2004), pp.170-177
  11. Harper, David J., Sara Coulthard and Sun Yixing, "A Language Modeling Approach to Relevance Profiling for Document Browsing." JCDL'02,Portland, Oregon(2002), pp.76-83
  12. Jin, Rong, Alex G. Hauptmann. "Title Language Model for Information Retrieval." SIGIR'02, Tempere(2002), pp.42-28
  13. Jin, Rong et al, "Language Model for IR Using Collection Information." SIGIR'02, Tampere(2002), pp.419-420
  14. Lavrenko, Viktor, Chengxiang Zhai, "Text Retrieval and Mining," (http://www.stanford.edu/class/cs276a/handouts/lecture12.ppt) [cited 2005. 5. 22]
  15. Lavrenko, V., W. B. Croft, "Relevance-based Language Models." In W. B. Croft, D. J. Harper,D. H. Kraft. J. Zobel, eds., SIGIR'01(2001), pp.123-125
  16. List, J., V. Mihajiovic, G. Ramirez, and D. Hiermstra, "The Tijah XML-IR System at INEX 2003." In INEX 2003 Workshop Proceedings(2003), pp.102-109
  17. Liu, Xiaoyon, W. Bruce Croft, "Passage Retrieval Based On Language Models." CIKM'02, Mclean, Virginia(2002), pp.375-382
  18. Luk, R. et al, "A Survey in Indexing and Searching XML Document." JASIS&T, Vol.53, No.6(Feb. 2002), pp.415-437 https://doi.org/10.1002/asi.10056
  19. Lyer, R. and M. Ostendorf, "Relevance Weighting for Combining Multi-domain Data for N-gram Language Modeling." Computer Speech and Language, Vol.13(1999), pp.280-284
  20. Metzler, Donald and W. Bruce Croft, 'Combining the Language Model and Inference Network Approaches to Retrieval,' Information Processing & Management, Vol.40(2004), pp.735-750
  21. Miller, D., T. Leek, R. Schwartz, "A Hidden Markov Model Information Retrieval System." In Proceedings of the 22nd Annual Intemational ACM SIGIR Conference(1999), pp.214-221
  22. Ogilvie, P. and J. Callan, "Language Models and Structured Document Retrieval." In Proceedings of the First Workshop of the INitiative for the Evaluation of XML Retrieval(INEX)(2003),pp.12-18
  23. Ogilvie, P. and J. Callan, "Hierarchical Language Models for XML Component Retrieval." In Pre-Proceedings of the Workshop of the INitiative for the Evaluation of XML Retrieva1(INEX)(2004), pp.119-125
  24. Ponte, Jay M., W. Bruce Croft, "A Language Modeling Approach for Information RetrievaL." SIGIR'98, Melboume(1998), pp.275-281
  25. Rosenfeld, R. "Two Decades of Statistical Language Modeling: Where Do We Go From Here?" In Proceeding of the IEEE, Vol.88, No.8(2000), pp.1274-1278
  26. Si, Luo et al, "A Language Modeling Framework for Resource Selection and Results Merging." CIKM'02, Mclean, Virginia(2002), pp.391-397
  27. Song, Fei, W. Bruce Croft, "A General Language Model for Information Retrieval." CIKM'99, Kansas City, Mo(1999), pp.316-321
  28. Sparck Jones, Karen et al., "Language Modeeling and Relevance." ln W. B. Croft and J. Lafferty., editors, Language Modeling for Information Retrieval. London: Kluwer Academic Publishers, 2003
  29. Srinkanth M. and R. Srinlkanth, "Bi-term Language Mcxiels for Document Retrieval." SIGIR'02, Tempere(2002), pp.425-426
  30. Zaragiza, Hugo, Djoerd Hiemstra, Michael Tipping. "Bayesian Extension to the Language Model for Ad Hoc Information Retrieval." SIGIR'03, Toronto(2003), pp4-9
  31. Zhai, Chengxiang, John Lafferty, "Document Language Models, Query Models, and Risk Minimization for Information Retrieval." SIGIR'01, New Orleans, Louisiana(2001), pp.111-119
  32. Zhai, Chengxiang, John Lafferty, "Model-based Feedback in the Language Modeling Approach to Information Retrieval," CIKM'01, Atlanta, Georgia(2000, pp.403-410
  33. Zhai, Chengxiang, John Lafferty, "Two-Stage Language Models for Information Retrieval." SIGIR'02, Tampere(2002), pp.49-56
  34. Zhai, Chengxiang, John Lafferty, "A Study of Smoothing Methods for Language Models Applied to Information Retrieval." ACM Transactions on Information Systems, Vol.22, No.2(April 2004), pp.179-214 https://doi.org/10.1145/984321.984322
  35. INEX Home page.
  36. TREC Home page.