DOI QR코드

DOI QR Code

An Experimental Study on Selecting Association Terms Using Text Mining Techniques

텍스트 마이닝 기법을 이용한 연관용어 선정에 관한 실험적 연구

  • 김수연 (연세대학교 문헌정보학과 대학원) ;
  • 정영미 (연세대학교 문헌정보학과)
  • Published : 2006.09.29

Abstract

In this study, experiments for selection of association terms were conducted in order to discover the optimum method in selecting additional terms that are related to an initial query term. Association term sets were generated by using support, confidence, and lift measures of the Apriori algorithm, and also by using the similarity measures such as GSS, Jaccard coefficient, cosine coefficient, and Sokal & Sneath 5, and mutual information. In performance evaluation of term selection methods, precision of association terms as well as the overlap ratio of association terms and relevant documents' indexing terms were used. It was found that Apriori algorithm and GSS achieved the highest level of performances.

이 연구에서는 전체 문헌집단으로부터 초기 질의어에 대한 연관용어 선정 시 사용할 수 있는 최적의 기법을 찾기 위해 연관규칙 마이닝과 용어 클러스터링 기법을 이용하여 연관용어 선정 실험을 수행하였다. 연관규칙 마이닝 기법에서는 Apriori 알고리즘을 사용하였으며, 용어 클러스터링 기법에서는 연관성 척도로 GSS 계수, 자카드계수, 코사인계수, 소칼 & 스니스 5, 상호정보량을사용하였다. 성능평가 척도로는 연관용어 정확률과 연관용어 일치율을 사용하였으며, 실험결과 Apriori 알고리즘과 GSS 계수가 가장 좋은 성능을 나타냈다.

Keywords

References

  1. 박우창, 승현우, 용환승. 2003. '데이터마이닝 : 개념 및 기법'. 서울 : 자유아카데미
  2. 이재윤. 2004. 연관성 척도의 빈도수준 선호경향에 대한 연구 '정보관리학회지', 21(4) : 281-294
  3. 정영미. 2005. '정보검색연구'. 서울 : 구미 무역출판부
  4. Agrawal, R., Imielinski, T., and Swami, A. 1993. 'Mining Association Rules between Sets of Items in Large Database.' Proceeding of the ACM SIGMOD International Conference on Management of Data, Washington, U.S.A., 207-216
  5. Agrawal, R., and Srikant, R. 1994. 'Fast Algorithms for Mining Association Rules.' Proceeding of the 20th International Conference on Very Large Databases, Santiago, Chile, September
  6. Anick, P. G., and Taithyanathan, S. 1997. 'Exploiting Clustering and Phrases for Context-Based Information Retrieval.' Proceeding of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 314-323
  7. Baeza-Yates, R. and Ribeiro-Neto, B. 1999. Modern Information Retrieval. ACM Press
  8. Buckley, C., Salton, G., and Allan, J. 1994. 'The Effect of Adding Relevance Information in a Relevance Feedback Environment.' Proceeding of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 292-300
  9. Chung, Y.M. and Lee, J.Y. 2004. Optimization of Some Factors Affecting the Performance of Query Expansion. Information Processing and Management, 40(6) : 891-917 https://doi.org/10.1016/j.ipm.2003.11.003
  10. Fayyad, U.M., Piatesky-shapiro, G., Smyth, P., and Uthurusamy, R. 1996. Advances in Knowledge Discovery and Data Mining. MIT Press
  11. Galavoti, L., Sebastiani, F., and Simi, M. 2000. 'Experiments on the Use of Feature Selection and Negative Evidence in Automated Text Categorization.' Proc. of ECDL-00. 4th European Conference on Research and Advanced Technology for Digital Libraries(Lisbon, Portugal, 2000) : 59-68
  12. Ide, E. 1971. 'New Experiments in Relevance Feedback.' In Salton, G. ed. The SMART Retrieval System Experiments in Automatic Document Processing. 337-354
  13. Kim, M.C. and Choi, K. S. 1999. 'A Comparison of Collocation-Based Similarity Measures in Query Expansion.' Information Processing and Management, 35(1) : 19-30 https://doi.org/10.1016/S0306-4573(98)00040-5
  14. Lesk, M. E. 1969. Word-Word 'Association in Document Retrieval Systems.' American Documentation, 20(1) : 27-38 https://doi.org/10.1002/asi.4630200106
  15. Michael, J., and Gorden, L. 1997. Data Mining Techniques: For Marketing, Sales, and Customer Support. John Wiley & Sons, Inc
  16. Peat, H, J., and Willett, P. 1991. 'The Limitation of Term Co-Occurrence Data for Query Expansion in Document Retrieval Systems.' Journal of the American Society for Information Science, 42(5) : 378-383 https://doi.org/10.1002/(SICI)1097-4571(199106)42:5<378::AID-ASI8>3.0.CO;2-8
  17. Qiu, Y., and Frei, H. P. 1993. 'Concept Based Query Expansion.' Proceedings of the 16th Annual International ACM SIGIR conference on Research and Development in Information Retrieval. 160-169
  18. Rocchio, J.J. 1971. 'Relevance Feedback in Information Retrieval.' In Salton, G., ed. The SMART Retrieval System Experiments in Automatic Document Processing. 313-323
  19. Rungsawang, A., Tangpong, A., Laohawee, P., Khampachua, T. 1999. Novel Query Expansion Technique using Apriori Algorithm.
  20. Tribula, W.J. 1999. 'Text Mining.' Annual Review of Information Science and Technology, 34 : 385-419
  21. Wei, J., Bressan S., and Ooi B.C. 2000. Mining Term Rules for Automatic Global Query Expansion : Methodology and Preliminary Results. Web Information Systems Engineering, Proceeding of the First International Conference, 1 : 366-373
  22. Xu, J. and Croft, W.B. 1996. 'Query Expansion using Local and Global Document Analysis,' Proceedings of the 19th Annual International ACM SIGIR conference on Research and Development in Information Retrieval. 4-11

Cited by

  1. A Study on Keyword Extraction From a Single Document Using Term Clustering vol.44, pp.3, 2010, https://doi.org/10.4275/KSLIS.2010.44.3.155