Browse > Article

A Distributed High Dimensional Indexing Structure for Content-based Retrieval of Large Scale Data  

Cho, Hyun-Hwa (한국전자통신연구원 데이터베이스연구팀)
Lee, Mi-Young (한국전자통신연구원 데이터베이스연구팀)
Kim, Young-Chang (한국전자통신연구원 데이터베이스연구팀)
Chang, Jae-Woo (전북대학교 컴퓨터시스템공학과)
Lee, Kyu-Chul (충남대학교 컴퓨터공학과)
Abstract
Although conventional index structures provide various nearest-neighbor search algorithms for high-dimensional data, there are additional requirements to increase search performances as well as to support index scalability for large scale data. To support these requirements, we propose a distributed high-dimensional indexing structure based on cluster systems, called a Distributed Vector Approximation-tree (DVA-tree), which is a two-level structure consisting of a hybrid spill-tree and VA-files. We also describe the algorithms used for constructing the DVA-tree over multiple machines and performing distributed k-nearest neighbors (NN) searches. To evaluate the performance of the DVA-tree, we conduct an experimental study using both real and synthetic datasets. The results show that our proposed method contributes to significant performance advantages over existing index structures on difference kinds of datasets.
Keywords
High Dimensional Data; Distributed Indexing Structure; k-NN Search; Content-based Retrieval;
Citations & Related Records
연도 인용수 순위
  • Reference
1 S.G. Han and J.W. Chang, "A New High- Dimensional Index Structure Using a Cell-Based Filtering Technique," Proc. DASFAA, LNCS, vol.1884, pp.79-92, 2000.
2 A. Gionis, P. Indyk, and R. Motwani, "Similarity Search in High Dimensions via hashing," Proc. VLDB, 1999.
3 E. Cohen, M. Datar, S. Fujiwara, et al., "Finding Interesting Associations without Support Pruning," Proc. ICDE, 2000.
4 T. Liu, C. Rosenberg, and H.A. Rowley, "Clustering Billions of Images with Large Scale Nearest Neighbor Search," Proc. IEEE WACV, 2007.
5 T. Yamane, Statistics, An Introductory Analysis, 2nd Ed., 1976.
6 http://www-deis.unibo.it/research/Mtree
7 http://www.autonlab.org/autonweb/15960.html
8 S. Berchtold, D.A. Keim, and H.-P. Kriegel, "The X-tree: An Index Structure for High-Dimensional Data," Proc. VLDB, pp.28-39, 1996.
9 P. Ciaccia, M. Patella, and P. Zezula, "M-tree: An Efficient Access Method for Similarity Search in Metric Spaces," Proc. VLDB, pp.426-435, 1997.
10 T. Liu, A.W. Moore, and A. Gray, "An Investigation of Practical Approximate Nearest Neighbor Algorithms," Proc. ANIPS, 2004.
11 C. Bohm and H.P. Kriegel, "Dynamically Optimizing High-Dimensional Index Structures," Proc. EBDT, 2000.
12 G.H. Cha, X. Zhu, D. Petkovic, et al., "An Efficient Indexing Method for Nearest Neighbor Searches in High-Dimensional Image Databases," IEEE Transaction on Multimedia, vol.4, no.1, pp.76-87, 2002.   DOI   ScienceOn
13 M. Bawa, T. Condie, and P. Ganesan, "LSH Forest: Self-Tuning Indexes for Similarity Search," Proc. WWW, 2005.
14 C. Zhang, A. Krishnamurthy, and R.Y. Wang, "SkipIndex: Towards a Scalable Peer-to-Peer Index Service for High Dimensional Data," Technical Report TR-703-04, Princeton University, 2004.
15 B. Nam and A. Sussman, "DiST: Fully Decentralized Indexing for Querying Distributed Multidimensional Datasets," Technical Report CS-TR-4720 and UMIACS-TR-2005-28, Maryland University, 2005.
16 H.V. Jagadish, B.C. Ooi, Q. H. Vu, et al., "VBITree: A Peer-to-Peer Framework for Supporting Multi-Dimensional Indexing Schemes," Proc. ICDE, 2006.
17 P. Haghani, S. Michel, P. Cudre-Mauroux, et al., "LSH At Large-Distributed KNN Search in High Dimensions," Proc. WebDB, 2008.
18 R. Weber, H.J. Schek and S. Blott, "A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces," Proc. VLDB, pp.194-205, 1998.
19 J.T. Robinson, "The K-D-B Tree: A Search Structure for Large Multidimensional Dynamic Indexes," Proc. SIGMOD, 1981.
20 D.B. Lomet and B. Salzberg, "A Robust Multi- Attribute Search Structure," Proc. IEEE Data Engineering, pp.296-304, 1989.
21 N. Beckmann, H.P. Kriegel, R. Schneider, et al., "The R*-tree: An Efficient and Robust Access Method for Point and Rectangles," Proc. ACM SIGMOD, pp.322-331, 1990.