Document Retrieval using Concept Network

개념 네트워크를 이용한 정보 검색 방법

  • 허원창 (인하대학교 경영학부) ;
  • 이상진 (삼성네트웍스 기술솔루션사업팀)
  • Published : 2006.12.31

Abstract

The advent of KM(knowledge management) concept have led many organizations to seek an effective way to make use of their knowledge. But the absence of right tools for systematic handling of unstructured information makes it difficult to automatically retrieve and share relevant information that exactly meet user's needs. we propose a systematic method to enable content-based information retrieval from corpus of unstructured documents. In our method, a document is represented by using several key terms which are automatically selected based on their quantitative relevancy to the document. Basically, the relevancy is calculated by using a traditional TFIDF measure that are widely accepted in the related research, but to improve effectiveness of the measure, we exploited 'concept network' that represents term-term relationships. In particular, in constructing the concept network, we have also considered relative position of terms occurring in a document. A prototype system for experiment has been implemented. The experiment result shows that our approach can have higher performance over the conventional TFIDF method.

Keywords

References

  1. Alavi, M. and Leidner, D., "Knowledge Management Systems: Emerging Views and Practices from the Field," In IEEE Proceedings of the 32nd Hawaii International Conference on System Sciences, pp.1-11, 1999
  2. Awad, E.M. and Ghaziri, H. M., Knowledge Management, Prentice Hall, Upper Saddle River, NJ 07458, 2004
  3. Tiwana, A., "Knowledge Management Toolkit: Orchestrating IT, Strategy, and Knowledge Plafrom," Prentice Hall, Upper Saddle River, NJ 07458, 2002
  4. Bartell, B.T. et al., "Optimizing Parameters in a Ranked Retrieval System Using Multi-Query Relevance Feedback," In the Proceeding of 3rd Annual Symposium on Document Analysis and Information Retrieval, 1992
  5. Chen, H., "Machine Learning for Information Retrieval: Neural Networks, Symbolic Learning, and Genetic Algorithms," MIS Department, College of Business and Public Administration, University of Arizona, July 1994
  6. Chen, S.M., Wang, J.Y., "Fuzzy Query Processing for Document Retrieval Based on Extended Fuzzy Concept Network," Transactions on Systems, Man, and Cybernetics, Vol. 29, No. 1, February 1999
  7. Combarro, E.F., Montanes, E., Diaz, I., Ranilla, J, and Mones, R., "Introducing a Family of Linear Measures for Feature Selection in Text Categorization," IEEE Transaction on Knowledge and Data Engineering, Vol. 17, No. 9, 2005
  8. Faloutsos, C., Orad, D., "A Survey of Information Retrieval and Filtering Methods," University of Maryland, College Park, MD20742, 1994
  9. Fraeks. W. and Baeza-Yates, R., Information Retrieval: Data Structures and Algorithms, Prentice-Hall, 1992
  10. Gopal, B. and Manber, U., Integrating Content-Based Access Mechanisms with Hierarchical File Systems, In ACM OSDI, pp. 265-278, 1999
  11. Grossman, D.A. and Frieder, O., "Information Retrieval: Algorithms and Heuristics," Kluwer Academic Publishers, 1998
  12. Gudivada, V.N. and Raghavan, V.V., Design and evaluation of algorithms for image retrieval by spatial similarity, ACM Transaction on Information Systems, Vol. 13, No. 2, April, pp. 115-144, 1995 https://doi.org/10.1145/201040.201041
  13. Gudivada, V. N., Reaghavan, V. V., Grosky, W. I., and Kasanagottu, R., "Information Retrieval On The World Wide Web," IEEE Internet Computing, September. October 1997
  14. Kalt, T., "A New Probabilistic Model of Text Classification and Retrieval", University of Massachusetts, 1996
  15. Kim, Y. H. et al., "InfoFlow: A Web- based Workflow Management System," Proceedings of International Conference CALS/ EC Korea'99, 1999
  16. Lewis, D.D., "Representation and Learning in Information Retrieval," University of Massachusetts, 1992
  17. Luhn, H., Keyword in Context Index for Technical Literature, American Documentation, XI (4), 1960
  18. Miller, G. A., Beckwith, R., "Introduction to Word-Net: An On-line Lexical Database," Princeton University,1993
  19. Singhal, A., Salton, G., Mitra, M., and Buckley, C. "Document length normalization," Information Processing and Management, Vol. 32, No. 5, pp 619-633, 1996 https://doi.org/10.1016/0306-4573(96)00008-8
  20. Sebastiani, F., "Machine Learning in Automated Text Categorization," ACM Computing Surveys, Vol. 34, No. 1, pp. 1-47, Mar. 2002 https://doi.org/10.1145/505282.505283
  21. Brin, S. and Page, L., "The anatomy of a large-scale hypertextual Web search engine," Proceedings of the seventh international conference on World Wide Web 7, pp. 107-117, 1998 https://doi.org/10.1016/S0169-7552(98)00110-X
  22. Zimmermann, H. J., Fuzzy Set Theory - and Its Applications, Kluwer Academic Publishers, 1991