[KSCI] Korea Science Citation Index Service

Performance Improvement of Web Information Retrieval Using Sentence-Query Similarity

Park Eui-Kyu (연세대학교 전산학과)
Ra Dong-Yul (연세대학교 전산학과)
Jang Myung-Gil (한국전자통신연구원 지식마이닝연구팀)

Publication Information

Journal of KIISE:Software and Applications / v.32, no.5, 2005 , pp. 406-415 More about this Journal

Abstract

Prosperity of Internet led to the web containing huge number of documents. Thus increasing importance is given to the web information retrieval technology that can provide users with documents that contain the right information they want. This paper proposes several techniques that are effective for the improvement of web information retrieval. Similarity between a document and the query is a major source of information exploited by conventional systems. However, we suggest a technique to make use of similarity between a sentence and the query. We introduce a technique to compute the approximate score of the sentence-query similarity even without a mature technology of natural language processing. It was shown that the amount of computation for this task is linear to the number of documents in the total collection, which implies that practical systems can make use of this technique. The next important technique proposed in this paper is to use stratification of documents in re-ranking the documents to output. It was shown that it can lead to significant improvement in performance. We furthermore showed that using hyper links, anchor texts, and titles can result in enhancement of performance. To justify the proposed techniques we developed a large scale web information retrieval system and used it for experiments.

Keywords

web; information retrieval; sentence-query similarity; stratification; hyper link; anchor text;

Citations & Related Records

Reference

1	D. Hawking, 'Overview of the TREC-9 Web Track,' Proc. of the Ninth Text Retrieval Conference TREC 2000, NIST, May, 2001
2	J-M Lim, H-J Oh, S-H Maeng and M-H Lee, 'Improving efficiency with document category information in Link-based retrieval,' In Proc. of the Information Retrieval on Asian Languages Conference, 1999
3	Sumio Fujita, 'More reflections on 'aboutness' TREC-2001 evaluation experiments at Justsystem,' Proc. of the Tenth Text Retrieval Conference TREC 2001, May, 2002
4	E. Voorhees, 'Variations in relevance judgements and the measurement of retrieval effectiveness,' Information Processing and Management, 36, pp. 697-716, 2000 DOI ScienceOn
5	G. Salton, A. Wong, and C. S. Tang, 'A Vector Space Model for Automatic Indexing,' Communications of the ACM, 18:11, pp. 614-620, Nov, 1975 DOI ScienceOn
6	J. Perez-Carballo and T. Strzalkowski, 'Natural language information retrieval: progress report,' Information Processing and Management, Vol. 36, pp.155-178, 2000 DOI ScienceOn
7	J. Kleinberg, 'Authoritative sources in a hyerlinked environment,' Technical Report RJ 10076, IBM, 1997
8	National Institute of Informatics, 'NTCIR Workshop 3 Meeting OVERVIEW,' Working Notes of the Third NTCIR Workshop Meeting, October 8-10, 2002
9	G. Salton, Automatic Text Processing, Addison-wesley, 1989
10	N. Craswell and D. Hawking, 'Overview of the TREC-2002 Web Track,' Proc. of the Eleventh Text Retrieval Conference TREC-2002, NIST, May, 2003
11	P. Bailey, N. Craswell and D. Hawking, 'Engineering a multi-purpose test collection for Web retrieval experiments,' Technical report, CSIRO, 2001
12	D. Harman, 'The TREC Conferences,' In Readings in Information Retrieval, pp. 247-256, Morgan Kaufman, 1997
13	E. Voorhees and D. Harman, 'Overview of TREC 2001,' Proc. of the Tenth Text Retrieval Conference TREC 2001, May, 2002

1	A Search Efficiency Improvement Method using Internal Contiguity in Query Terms / [Yoon, Soung-Woong;Chae, Jin-Ki;Lee, Sang-Hoon;] / Journal of KIISE:Databases
2	Using Query Word Senses and User Feedback to Improve Precision of Search Engine / [Yoon, Sung-Hee;] / Journal of the Korean Society for information Management

KSCI

Performance Improvement of Web Information Retrieval Using Sentence-Query Similarity 문장-질의 유사성을 이용한 웹 정보 검색의 성능 향상

Performance Improvement of Web Information Retrieval Using Sentence-Query Similarity