[KSCI] Korea Science Citation Index Service

Implementation of a Large-scale Web Query Processing System Using the Multi-level Cache Scheme

Lim, Sung-Chae (동덕여자대학교 컴퓨터학과)

Publication Information

Journal of KIISE:Computing Practices and Letters / v.14, no.7, 2008 , pp. 669-679 More about this Journal

Abstract

With the increasing demands of information sharing and searches via the web, the web search engine has drawn much attention. Although many researches have been done to solve technical challenges to build the web search engine, the issue regarding its query processing system is rarely dealt with. Since the software architecture and operational schemes of the query processing system are hard to elaborate, we here present related techniques implemented on a commercial system. The implemented system is a very large-scale system that can process 5-million user queries per day by using index files built on about 65-million web pages. We implement a multi-level cache scheme to save already returned query results for performance considerations, and the multi-level cache is managed in 4-level cache storage areas. Using the multi-level cache, we can improve the system throughput by a factor of 4, thereby reducing around 70% of the server cost.

Keywords

web search engine; cache scheme; server cluster; web query processing;

Citations & Related Records

Reference

1	Search Engine Report, http://www.searchenginewatch.com, 2005
2	Sriram Raghvan and Hector Garcia-Molina. Crawling the Hidden Web. In Proc. of the VLDB Conference, pp. 129-138, 2001
3	Maxim Lifantsev and Tzi-cker Chiueh, I/O-Conscious Data Preparation for Large-Scale Web Search Engines, In Proc. of the 28th VLDB Conf., pp. Hong Kong, 2002
4	Zheng Chen, Shengping Liu, Liu Wenyin, Geguang Pu, and Wei-Ying Ma, Building a web thesaurus from web link structure, In Proc. of the ACM SIGIR' 03, pp. 48-55, Toronto, Canada, 2003
5	C. Lee, G. Golub and S. Zenios. A Fast Two Stage Algorithm for Computing PageRank, Technical report, Stanford University, 2003
6	Steve Lawrence, Context in Web Search, IEEE Data Engineering Bulletin, Vol. 23(3), pp. 25-32, 2000
7	C. Ruemmler and J. Wilkes, An Introduction to Disk Modeling, IEEE Computer, Vol. 17, No. 3, pp. 17-28, 1994
8	Ronny Lempel and Shlomo Moran, Predictive Caching and Prefetching of Query Results in Search Engines, In Proc. of the 12th International Conf. on World Wide Web, pp. 19-28, New York, 2003
9	Boosting Craig Silverstein, Hannes Marais, Monika Henzinger, and Michael Moricz, Analysis of a very large web search engine query log, ACM SIGIR Forum, Vol. 33(1), pp. 6-12, 1999
10	Tiziano Fagni, Raffaele Perego, Fabrizio Silvestri, and Salvatore Orlando, Boosting the performance of Web search engines: Caching and prefetching query results by exploiting historical usage data, ACM Trans. on Information Systems, Vol. 24(1), pp. 51-78, 2006 DOI ScienceOn
11	Andrei Z. Broder, Marc Najork, and Janet L. Wiener, Efficient URL Caching for World Wide Crawling, In Proc. of the 12th WWW Conference, Budapest, Hungary, 2003
12	Taher H. Haveliwala. Topic-sensitive PageRank, In Proc. of the 11th International Conf. on World Wide Web, 2002
13	Sergey Melnik, Sriram Raghavan, Beverly Yang, and Hector Garcia-Molina. Building a Distributed Full-text Index for the Web, In Proc. of the 10th International World Wide Web Conference. pp. 396-406, 2001
14	Arvind Arasu, et al., Searching the Web, ACM Trans. on Internet Technology, Vol. 1(1), pp. 2-43, August 2001 DOI
15	Reiner Kraft, Chi Chao Chang, Farzin Maghoul, and Ravi Kumar, Searching with Context, In Proc. of the WWW Conf., pp. 477-486, Edinburgh, Scotland, 2006
16	Larry Page, Sergey Brin, R. Motwani, and T. Winograd, The PageRank Citation Ranking: Bring Order to the Web, Stanford Univ. Technical Report, 1998
17	Alfred V. Aho and Margaret J. Corasick, Efficient String Matching: An Aid to Bibliographic Search, Communication of the ACM, Vol. 18(6), pp. 333-340, 1975 DOI ScienceOn
18	Maxim Lifantsev and Tzi-cker Chiueh, Implementation of a modern web search engine cluster, In Proc. of the USENIX Annual Technical Conference, Texas, 2003

KSCI

Implementation of a Large-scale Web Query Processing System Using the Multi-level Cache Scheme 계층적 캐시 기법을 이용한 대용량 웹 검색 질의 처리 시스템의 구현

Implementation of a Large-scale Web Query Processing System Using the Multi-level Cache Scheme