Browse > Article

Combining Multiple Sources of Evidence to Enhance Web Search Performance  

Yang, Kiduk (Department of Library and Information Science, Kyungpook National University)
Publication Information
Journal of Korean Library and Information Science Society / v.45, no.3, 2014 , pp. 5-36 More about this Journal
Abstract
The Web is rich with various sources of information that go beyond the contents of documents, such as hyperlinks and manually classified directories of Web documents such as Yahoo. This research extends past fusion IR studies, which have repeatedly shown that combining multiple sources of evidence (i.e. fusion) can improve retrieval performance, by investigating the effects of combining three distinct retrieval approaches for Web IR: the text-based approach that leverages document texts, the link-based approach that leverages hyperlinks, and the classification-based approach that leverages Yahoo categories. Retrieval results of text-, link-, and classification-based methods were combined using variations of the linear combination formula to produce fusion results, which were compared to individual retrieval results using traditional retrieval evaluation metrics. Fusion results were also examined to ascertain the significance of overlap (i.e. the number of systems that retrieve a document) in fusion. The analysis of results suggests that the solution spaces of text-, link-, and classification-based retrieval methods are diverse enough for fusion to be beneficial while revealing important characteristics of the fusion environment, such as effects of system parameters and relationship between overlap, document ranking and relevance.
Keywords
Fusion; Web search; Information retrieval;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Lee, Joon Ho. 1996. "Combining multiple evidence from different relevance feedback methods(Tech. Rep. No.IR-87)." Amherst: University of Massachusetts, Center for Intelligent Information Retrieval.
2 Keen, E. Michael. 1973. "The Aberystwyth index languages test." Journal of Documentation, 29, 1-35.   DOI
3 Lee, Joon Ho. 1997. "Analyses of multiple evidence combination." Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval, 267-276.
4 Kleinberg, Jon. 1999. "Authoritative sources in a hyperlinked environment." Journal of the Association for Computing Machinery, 46(5), 604-632.   DOI   ScienceOn
5 Modha, Dharmendra and W. S. Spangler. 2000. "Clustering hypertext with applications to Web searching." Proceedings of the 11th ACM Hypertext Conference, 143-152.
6 Singhal, Amit, C. Buckley and M. Mitra. 1996. "Pivoted document length normalization." Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval, 21-29.
7 Page, Larry, S. Brin, R. Motwani and T. Winograd.1998. "The Page Rank citation ranking: Bringing order to the Web." Technical Report, Stanford Digital Library Technologies Project.
8 Plaunt, Christian and B. A. Norgard. 1998. "An Association Based Method for Automatic Indexing with a Controlled Vocabulary." Journal of the American Society for Information Science, 49(10): 888-902.
9 Saracevic, Tefko and P. Kantor. 1988. "A study of information seeking and retrieving. III. Searchers, searches, overlap." Journal of American Society for Information Science, 39: 197-216.   DOI
10 Smith, Linda. C. 1979. Selected Artificial Intelligence Techniques in Information Retrieval Systems Research. Ph. D. diss., Syracuse University, U. S.
11 Sparck Jones, Karen. 1974. "Automatic indexing." Journal of Documentation 30, 393-432.   DOI
12 Sumner, Robert. G., K. Yang, R. Akers and W. M. Shaw. 1998. "Interactive retrieval using IRIS: TREC-6 experiments." In E. M. Voorhees & D. K. Harman(Eds.), The Sixth Text REtrieval Conference(TREC-6).
13 Vogt, Christopher. C and G. W. Cottrell. 1998. "Predicting the performance of linearly combined IR systems." Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval, 190-196.
14 Williams, Martha E. 1977. "Analysis of terminology in various CAS data files as access points for retrieval." Journal of Chemical Information and Computer Sciences, 17: 16-20.
15 Wong, S. K. Michael, Y. Y. Yao and P.Bollmann. 1988. "Linear structure in information retrieval." Proceedings of the 11th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 219-232.
16 Buckley, Chris, G. Salton, J. Allan and A. Singhal. 1995. "Automatic query expansion using SMART: TREC 3." In D. K. Harman (Ed.), The Third Text Rerieval Conference (TREC-3) (NIST Spec. Publ. 500-225, pp.1-19). Washington, DC: U.S. Government Printing Office
17 Wong, S. K. Michael, Y. Y. Yao, G. Salton and C. Buckley. 1991. "Evaluation of an adaptive linear model." Journal of the American Society for Information Science, 42: 723-730.   DOI
18 Yang, Kiduk. 2005. "Information retrieval on the web." ARIST, 39(1): 33-80.
19 Bartell, Brian T., G. W. Cottrell and R. K. Belew. 1994. "Automatic combination of multiple ranked retrieval systems." Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval.
20 Belkin, Nicholas J., C. Cool, W. B. Croft and J. P. Callan. 1993. "The effect of multiple query representations on information retrieval system performance." Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval, 339-346.
21 Bharat, Krishnaand M. R. Henzinger. 1998. "Improved Algorithms for Topic Distillation in Hyperlinked Environments." Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval, 104-111.
22 Brin, Serge andL. Page. 1998. "The anatomy of a large-scale hyper textual Web search engine." Computer networks and ISDN systems, 30(1): 107-117.   DOI   ScienceOn
23 Buckley, Chris, A. Singhal and M. Mitra. 1997. "Using query zoning and correlation within SMART: TREC 5." In E. M. Voorhees & D. K. Harman (Eds.),The Fifth Text REtrieval Conference (TREC-5) (NIST Spec. Publ. 500-238, pp. 105-118). Washington, DC: U.S. Government Printing Office.
24 Buckley, Chris, A. Singhal, M. Mitra and G. Salton. 1996. "New retrieval approaches using SMART: TREC 4." In D. K. Harman (Ed.), The Fourth Text REtrieval Conference (TREC-4) (NIST Spec. Publ. 500-236, pp. 25-48). Washington, DC: U.S. Government Printing Office.
25 Chakrabarti, Soumen, B. Dom, P. Raghavan, S. Rajagopalan, D. Gibson and J. Kleinberg. 1998. "Automatic resource list compilation by analyzing hyperlink structure and associated text." Proceedings of the 7th International World Wide Web Conference.
26 Frakes, Williams B. and R.Baeza-Yates.eds. 1992. Information retrieval: Data structures & algorithms. Englewood Cliffs, NJ: Prentice Hall.
27 Fishburn, Peter C. 1970. Utility theory for decision making. New York: John Wiley & Sons.
28 Fox, Edward A. andJ. A. Shaw. 1994. "Combination of multiple searches." In D. K. Harman (Ed.), The Second Text Rerieval Conference (TREC-2) (NIST Spec. Publ. 500-215, pp.243-252). Washington, DC: U.S. Government Printing Office.
29 Fox, Edward A. and J. A. Shaw. 1995. "Combination of multiple searches." In D. K. Harman (Ed.), The Third Text Rerieval Conference (TREC-3) (NIST Spec. Publ. 500-225, pp. 105-108). Washington, DC: U.S. Government Printing Office.
30 Gurrin, Cathal and A. F.Smeaton. 2001. "Dublin City University experiments in connectivity analysis for TREC-9." In E. M. Voorhees & D. K. Harman (Eds.), TheNineth Text Rerieval Conference(TREC-9). Washington, DC: U.S. Government Printing Office.
31 Katzer, Jeffrey, M. J. McGill, J. A. Tessier, W. Frakes and P. DasGupta. 1982. "A study of the overlap among document representations." Information Technology: Research and Development, 1, 261-274.