Implementation of a Large-scale Web Query Processing System Using the Multi-level Cache Scheme

Lim, Sung-Chae;

Journal of KIISE:Computing Practices and Letters (한국정보과학회논문지:컴퓨팅의 실제 및 레터)

Volume 14 Issue 7
/
Pages.669-679
/
2008
/
1229-7712(pISSN)

Korean Institute of Information Scientists and Engineers (한국정보과학회)

Implementation of a Large-scale Web Query Processing System Using the Multi-level Cache Scheme

계층적 캐시 기법을 이용한 대용량 웹 검색 질의 처리 시스템의 구현

Lim, Sung-Chae

임성채 (동덕여자대학교 컴퓨터학과)

Published : 2008.10.15

PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

With the increasing demands of information sharing and searches via the web, the web search engine has drawn much attention. Although many researches have been done to solve technical challenges to build the web search engine, the issue regarding its query processing system is rarely dealt with. Since the software architecture and operational schemes of the query processing system are hard to elaborate, we here present related techniques implemented on a commercial system. The implemented system is a very large-scale system that can process 5-million user queries per day by using index files built on about 65-million web pages. We implement a multi-level cache scheme to save already returned query results for performance considerations, and the multi-level cache is managed in 4-level cache storage areas. Using the multi-level cache, we can improve the system throughput by a factor of 4, thereby reducing around 70% of the server cost.

웹을 이용한 정보 공개 및 검색이 확대됨에 따라 웹 검색 엔진도 지속적인 주목을 받고 있다. 이에 따라 웹 검색 엔진의 다양한 기술적 문제를 해결하고자 하는 연구가 있었음에도 웹 검색 엔진의 질의 처리 시스템에 대한 기술적 내용은 잘 다뤄지지 않았다. 질의 처리 시스템의 경우 소프트웨어 아키텍처나 운영 기법을 고안하기 어렵기 때문에 본 논문에서는 구현된 상용 시스템을 바탕으로 관련 기술을 소개하고자 한다. 구현된 질의 처리 시스템은 6,500 만개 웹 문서를 색인하여 일 500만개 이상의 사용자 질의 요청을 수행하는 큰 규모의 시스템이다. 구현한 시스템은 질의 처리 결과를 재사용하기 위해 계층적 캐시 기법을 적용했으며, 저장된 캐시 데이타는 4계층으로 구성된 데이타 저장소에 분산 저장되는 것이 특징이다. 계층적 캐시 기법을 통해 질의 처리 용량을 400% 정도로 향상 시킬 수 있었으며 이를 통해 서버 구축비용을 70% 정도 절감할 수 있었다.

Keywords

References

Search Engine Report, http://www.searchenginewatch.com, 2005
Arvind Arasu, et al., Searching the Web, ACM Trans. on Internet Technology, Vol. 1(1), pp. 2-43, August 2001 https://doi.org/10.1145/383034.383035
Sriram Raghvan and Hector Garcia-Molina. Crawling the Hidden Web. In Proc. of the VLDB Conference, pp. 129-138, 2001
Andrei Z. Broder, Marc Najork, and Janet L. Wiener, Efficient URL Caching for World Wide Crawling, In Proc. of the 12th WWW Conference, Budapest, Hungary, 2003
Maxim Lifantsev and Tzi-cker Chiueh, I/O-Conscious Data Preparation for Large-Scale Web Search Engines, In Proc. of the 28th VLDB Conf., pp. Hong Kong, 2002
Sergey Melnik, Sriram Raghavan, Beverly Yang, and Hector Garcia-Molina. Building a Distributed Full-text Index for the Web, In Proc. of the 10th International World Wide Web Conference. pp. 396-406, 2001
Larry Page, Sergey Brin, R. Motwani, and T. Winograd, The PageRank Citation Ranking: Bring Order to the Web, Stanford Univ. Technical Report, 1998
Zheng Chen, Shengping Liu, Liu Wenyin, Geguang Pu, and Wei-Ying Ma, Building a web thesaurus from web link structure, In Proc. of the ACM SIGIR' 03, pp. 48-55, Toronto, Canada, 2003
C. Lee, G. Golub and S. Zenios. A Fast Two Stage Algorithm for Computing PageRank, Technical report, Stanford University, 2003
Steve Lawrence, Context in Web Search, IEEE Data Engineering Bulletin, Vol. 23(3), pp. 25-32, 2000
Reiner Kraft, Chi Chao Chang, Farzin Maghoul, and Ravi Kumar, Searching with Context, In Proc. of the WWW Conf., pp. 477-486, Edinburgh, Scotland, 2006
Taher H. Haveliwala. Topic-sensitive PageRank, In Proc. of the 11th International Conf. on World Wide Web, 2002
Maxim Lifantsev and Tzi-cker Chiueh, Implementation of a modern web search engine cluster, In Proc. of the USENIX Annual Technical Conference, Texas, 2003
Ronny Lempel and Shlomo Moran, Predictive Caching and Prefetching of Query Results in Search Engines, In Proc. of the 12th International Conf. on World Wide Web, pp. 19-28, New York, 2003
Boosting Craig Silverstein, Hannes Marais, Monika Henzinger, and Michael Moricz, Analysis of a very large web search engine query log, ACM SIGIR Forum, Vol. 33(1), pp. 6-12, 1999
Tiziano Fagni, Raffaele Perego, Fabrizio Silvestri, and Salvatore Orlando, Boosting the performance of Web search engines: Caching and prefetching query results by exploiting historical usage data, ACM Trans. on Information Systems, Vol. 24(1), pp. 51-78, 2006 https://doi.org/10.1145/1125857.1125859
Alfred V. Aho and Margaret J. Corasick, Efficient String Matching: An Aid to Bibliographic Search, Communication of the ACM, Vol. 18(6), pp. 333-340, 1975 https://doi.org/10.1145/360825.360855
C. Ruemmler and J. Wilkes, An Introduction to Disk Modeling, IEEE Computer, Vol. 17, No. 3, pp. 17-28, 1994

Journal of KIISE:Computing Practices and Letters (한국정보과학회논문지:컴퓨팅의 실제 및 레터)

Implementation of a Large-scale Web Query Processing System Using the Multi-level Cache Scheme

계층적 캐시 기법을 이용한 대용량 웹 검색 질의 처리 시스템의 구현

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)