Browse > Article

Odysseus/Parallel-OOSQL: A Parallel Search Engine using the Odysseus DBMS Tightly-Coupled with IR Capability  

Ryu, Jae-Joon (한국과학기술원 전산학과)
Whang, Kyu-Young (한국과학기술원 전산학과)
Lee, Jae-Gil (한국과학기술원 전산학과)
Kwon, Hyuk-Yoon (한국과학기술원 전산학과)
Kim, Yi-Reun (한국과학기술원 전산학과)
Heo, Jun-Suk (한국과학기술원 전산학과)
Lee, Ki-Hoon (한국과학기술원 전산학과)
Abstract
As the amount of electronic documents increases rapidly with the growth of the Internet, a parallel search engine capable of handling a large number of documents are becoming ever important. To implement a parallel search engine, we need to partition the inverted index and search through the partitioned index in parallel. There are two methods of partitioning the inverted index: 1) document-identifier based partitioning and 2) keyword-identifier based partitioning. However, each method alone has the following drawbacks. The former is convenient in inserting documents and has high throughput, but has poor performance for top h query processing. The latter has good performance for top-k query processing, but is inconvenient in inserting documents and has low throughput. In this paper, we propose a hybrid partitioning method to compensate for the drawback of each method. We design and implement a parallel search engine that supports the hybrid partitioning method using the Odysseus DBMS tightly coupled with information retrieval capability. We first introduce the architecture of the parallel search engine-Odysseus/parallel-OOSQL. We then show the effectiveness of the proposed system through systematic experiments. The experimental results show that the query processing time of the document-identifier based partitioning method is approximately inversely proportional to the number of blocks in the partition of the inverted index. The results also show that the keyword-identifier based partitioning method has good performance in top-k query processing. The proposed parallel search engine can be optimized for performance by customizing the methods of partitioning the inverted index according to the application environment. The Odysseus/parallel OOSQL parallel search engine is capable of indexing, storing, and querying 100 million web documents per node or tens of billions of web documents for the entire system.
Keywords
parallel; large-scale; search engine;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 Frakes, W. and Baeze-Yates, R., Information Retrieval: Data Structures and Algorithms, Prentice- Hall, 1992
2 Tomasic, A., Garcia-Molina, H., and Shoens, K., "Incremental Updates of Inverted Lists for Text Document Retrieval," In Proc. 1994 ACM SIGMOD Int'l Conf. on Management of Data, pp. 289-300, June 1994
3 Zobel, J., Moffat, A., and Ramamohanarao, K., "Inverted Files Versus Signature Files for Text Indexing," ACM Trans. on Database Systems, Vol.23, No.4, pp. 453-490, Dec. 1998   DOI   ScienceOn
4 Salton, G., Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer, Addison-Wesley, Aug. 1988
5 Faloutsos, C. and Oard, D., "A Survey of Information Retrieval and Filtering Methods," Tech. Report: CS-TR 3514, Univ. of Maryland, Aug. 1995
6 박 병권, "정보 검색과 데이타베이스 관리 시스템의 밀결합을 위한 역 색인 구조와 질의 최적화", 박사 학위 논문, KAIST 전산학과, 1998
7 Jeong, B. and Omiecinski, E., "Inverted File Partitioning Schemes in Multiple Disk Systems," IEEE Trans. on Parallel and Distributed Systems, Vol.6, No.2, pp. 142-153, Feb. 1995   DOI   ScienceOn
8 Baeza-Yates, R., Castillo, C., Junqueira, F., Plachouras, V., and Silvestri, F., "Challenges on Distributed Web Retrieval," In Proc. 23rd Int'l Conf. on Data Engineering, Istanbul, Turkey, pp. 6-20, Apr. 2007
9 임 효상, 오디세우스/코스모스 객체 저장 시스템을 위한 벌크 로드 기능의 설계 및 구현, 석사 학위 논문, KAIST 전산학과, 1999
10 Bhatia, S. and Deogun, J., "Cluster Characterization in Information Retrieval," In Proc. 1993 ACM/ SIGAPP Symposium on Applied Computing States of the Art and Practice, pp. 721-728, Feb. 1993
11 Tomasic, A. and Garcia-Molina, H., "Issues in Parallel Information Retrieval," IEEE Data Engineering Bulletin, Vol.17, No.3, pp. 41-49, Sept. 1994
12 Grossman, D. and Frieder, O., Information Retrieval: Algorithms and Heuristics, Springer, Dec. 2004
13 황 규영, 이 민재, 이 재길, 김 민수, 한 욱신, "오디세우스/IR: 정보 검색 기능과 밀결합된 고성능 객체 관계형 DBMS", 한국정보과학회 논문지: 컴퓨팅의 실제, Vol.11, No.3, pp. 209-215, 2005년 6월   과학기술학회마을
14 Cahoon, B. and McKinley, K., "Performance Evaluation of a Distributed Architecture for Information Retrieval," In Proc. 19th Int'l Conf. on Information Retrieval(ACM SIGIR), 1996
15 Li, C., Chang, K., Ilyas, I., and Song, S., "RankSQL: query algebra and optimization for relational top-k queries," In Proc. 2005 ACM SIGMOD Int'l Conf. on Management of Data, Baltimore, Maryland, pp. 131-142, June 2005
16 류 재준, 이 재길, 이 민재, 황 규영, "오디세우스/Parallel-OOSQL: 오디세우스 객체 관계형 데이타베이스 관리 시스템을 사용한 병렬 정보 검색 시스템", 한국정보과학회 봄 학술발표논문집(B), pp. 187-189, 2002년 4월
17 MacFarlane, A., McCann, J., and Robertson, S., "PLIERS : A Parallel Information Retrieval System using MPI," In Proc. 6th Europrean PVM/ MPI Users' Group Meeting, pp. 317-324, Sept. 1999
18 Codd, E. F., "Relational Completeness of Database Sublanguages," Prentice Hall and IBM Research Report RJ 987, San Jose, California, 1972
19 Oracle Corp., interMedia Text, http://otn.oracle. co.kr/docs/Oracle817/index.htm, 1999
20 Chang, K. and Hwang, S., "Minimal probing: supporting expensive predicates for top-k queries," In Proc. 2002 ACM SIGMOD Int'l Conf. on Management of Data, Madison, Wisconsin, pp. 346- 357, June 2002
21 Chaudhuri, S. and Gravano, L., "Evaluating Top-k Selection Queries," In Proc. 25th Int'l Conf. on Very Large Data Bases (VLDB), Edinburgh, Scotland, pp. 399-410, Sept. 1999
22 Tomasic, A. and Garcia-Molina, H., "Query Processing and Inverted Indices in Shared-Nothing Text Document Information Retrieval Systems," In The VLDB Journal, Vol.2, No.3, pp. 243-275, 1993   DOI
23 Google, http://www.google.com
24 Gulli, A. and Signorini, A., "The Indexable Web is More than 11.5 Billion Pages," In Proc. 14th Int'l Conf. on World Wide Web, pp. 902-903, Chiba, Japan, May 2005
25 Baeze-Yates, R. and Ribeiro-Neto, B., Modern Information Retrieval, ACM Press, 1999
26 Barroso, L. A., Dean, J., and Holzle, U., "Web Search for a Plant: The Google Cluster Architecture," IEEE Micro, Vol.23, No.2, pp. 22-28, Mar./Apr. 2003   DOI   ScienceOn