Browse > Article

Implementation of a Large-scale Web Query Processing System Using the Multi-level Cache Scheme  

Lim, Sung-Chae (동덕여자대학교 컴퓨터학과)
Abstract
With the increasing demands of information sharing and searches via the web, the web search engine has drawn much attention. Although many researches have been done to solve technical challenges to build the web search engine, the issue regarding its query processing system is rarely dealt with. Since the software architecture and operational schemes of the query processing system are hard to elaborate, we here present related techniques implemented on a commercial system. The implemented system is a very large-scale system that can process 5-million user queries per day by using index files built on about 65-million web pages. We implement a multi-level cache scheme to save already returned query results for performance considerations, and the multi-level cache is managed in 4-level cache storage areas. Using the multi-level cache, we can improve the system throughput by a factor of 4, thereby reducing around 70% of the server cost.
Keywords
web search engine; cache scheme; server cluster; web query processing;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Search Engine Report, http://www.searchenginewatch.com, 2005
2 Sriram Raghvan and Hector Garcia-Molina. Crawling the Hidden Web. In Proc. of the VLDB Conference, pp. 129-138, 2001
3 Maxim Lifantsev and Tzi-cker Chiueh, I/O-Conscious Data Preparation for Large-Scale Web Search Engines, In Proc. of the 28th VLDB Conf., pp. Hong Kong, 2002
4 Zheng Chen, Shengping Liu, Liu Wenyin, Geguang Pu, and Wei-Ying Ma, Building a web thesaurus from web link structure, In Proc. of the ACM SIGIR' 03, pp. 48-55, Toronto, Canada, 2003
5 C. Lee, G. Golub and S. Zenios. A Fast Two Stage Algorithm for Computing PageRank, Technical report, Stanford University, 2003
6 Steve Lawrence, Context in Web Search, IEEE Data Engineering Bulletin, Vol. 23(3), pp. 25-32, 2000
7 C. Ruemmler and J. Wilkes, An Introduction to Disk Modeling, IEEE Computer, Vol. 17, No. 3, pp. 17-28, 1994
8 Ronny Lempel and Shlomo Moran, Predictive Caching and Prefetching of Query Results in Search Engines, In Proc. of the 12th International Conf. on World Wide Web, pp. 19-28, New York, 2003
9 Boosting Craig Silverstein, Hannes Marais, Monika Henzinger, and Michael Moricz, Analysis of a very large web search engine query log, ACM SIGIR Forum, Vol. 33(1), pp. 6-12, 1999
10 Tiziano Fagni, Raffaele Perego, Fabrizio Silvestri, and Salvatore Orlando, Boosting the performance of Web search engines: Caching and prefetching query results by exploiting historical usage data, ACM Trans. on Information Systems, Vol. 24(1), pp. 51-78, 2006   DOI   ScienceOn
11 Andrei Z. Broder, Marc Najork, and Janet L. Wiener, Efficient URL Caching for World Wide Crawling, In Proc. of the 12th WWW Conference, Budapest, Hungary, 2003
12 Taher H. Haveliwala. Topic-sensitive PageRank, In Proc. of the 11th International Conf. on World Wide Web, 2002
13 Sergey Melnik, Sriram Raghavan, Beverly Yang, and Hector Garcia-Molina. Building a Distributed Full-text Index for the Web, In Proc. of the 10th International World Wide Web Conference. pp. 396-406, 2001
14 Arvind Arasu, et al., Searching the Web, ACM Trans. on Internet Technology, Vol. 1(1), pp. 2-43, August 2001   DOI
15 Reiner Kraft, Chi Chao Chang, Farzin Maghoul, and Ravi Kumar, Searching with Context, In Proc. of the WWW Conf., pp. 477-486, Edinburgh, Scotland, 2006
16 Larry Page, Sergey Brin, R. Motwani, and T. Winograd, The PageRank Citation Ranking: Bring Order to the Web, Stanford Univ. Technical Report, 1998
17 Alfred V. Aho and Margaret J. Corasick, Efficient String Matching: An Aid to Bibliographic Search, Communication of the ACM, Vol. 18(6), pp. 333-340, 1975   DOI   ScienceOn
18 Maxim Lifantsev and Tzi-cker Chiueh, Implementation of a modern web search engine cluster, In Proc. of the USENIX Annual Technical Conference, Texas, 2003