Browse > Article
http://dx.doi.org/10.3745/KIPSTD.2012.19D.2.151

Performance Enhancement of a DVA-tree by the Independent Vector Approximation  

Choi, Hyun-Hwa (한국전자통신연구원)
Lee, Kyu-Chul (충남대학교 컴퓨터공학과)
Abstract
Most of the distributed high-dimensional indexing structures provide a reasonable search performance especially when the dataset is uniformly distributed. However, in case when the dataset is clustered or skewed, the search performances gradually degrade as compared with the uniformly distributed dataset. We propose a method of improving the k-nearest neighbor search performance for the distributed vector approximation-tree based on the strongly clustered or skewed dataset. The basic idea is to compute volumes of the leaf nodes on the top-tree of a distributed vector approximation-tree and to assign different number of bits to them in order to assure an identification performance of vector approximation. In other words, it can be done by assigning more bits to the high-density clusters. We conducted experiments to compare the search performance with the distributed hybrid spill-tree and distributed vector approximation-tree by using the synthetic and real data sets. The experimental results show that our proposed scheme provides consistent results with significant performance improvements of the distributed vector approximation-tree for strongly clustered or skewed datasets.
Keywords
High-Dimensional Indexing; Distributed Indexing; Approximate k-NN Queries; Distributed and Parallel Algorithms;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 R. Weber, S. Blott, "An Approximation-Based Data Structure for Similarity Search", Technical Report 24, ESPRIT project HERMES (No.9141), 1997.
2 T. Yamane, Statistics: An Introductory Analysis, second ed., 1976.
3 P. Ciaccia, M. Patella, P. Zezula, "A Cost Model for Similarity Queries in Metric Spaces", PODS, pp.59-68, 1998.
4 R. Weber, K. Böhm, "Trading Quality for Time with Nearest-Neighbor Search", EDBT, pp.21-35, 2000.
5 Real Data source website, http://www.autonlab.org/autonweb/15960.html.
6 M-tree homepage, http://www-db.deis.unibo.it/research/Mtree.
7 H. V. Jagadish, B. C. Ooi, Q. H. Vu, et al., "VBI-Tree: A Peer-to-Peer Framework for Supporting Multi-Dimensional Indexing Schemes", ICDE, 2006.
8 M. Bawa, T. Condie, P. Ganesan, "LSH Forest: Self-Tuning Indexes for Similarity Search", WWW, 2005.
9 P. Haghani, S. Michel, P. Cudré-Mauroux, et al., "LSH At Large-Distributed KNN Search in High Dimensions", WebDB, 2008.
10 N. Koudas, C. Faloutsos, I. Kamel, "Declustering Spatial Databases on a Multi-computer Architecture", EDBT, 1996.
11 B. Schnitzer, S.T. Leutenegger, "Master-Client R-trees: A New Parallel R-tree Architecture", SSDBM, 1999.
12 X. Fu, D. Wang, W. Zheng, M. Sheng, "GPR-tree: A Global Parallel Index Structure for Multiattribute Declustering on Cluster of Workstations", APDC, pp.300-306, 1997.
13 T. Liu, C. Rosenberg, H.A. Rowley, "Clustering Billions of Images with Large Scale Nearest Neighbor Search", IEEE WACV, 2007.
14 R. Weber, K. Böhm, H.-J. Schek, "Interactive-Time Similarity Search for Large Image Collection Using Parallel VA-Files", ICDE, 2000.
15 J. Chang, A. Lee, "Parallel High-dimensional Index Structure for Content-based Information Retrieval", CIT, 2008.
16 H.-H Choi, M.-Y. Lee, Y.-C. Kim, J.-W Chang, K.-C. Lee, "A Distributed High Dimensional Indexing Structure for Content-based Retrieval of Large Scale Data", KIISE:Databases Journal, Vol.37, No.5, pp.228-237, 2010.
17 C. Zhang, A. Krishnamurthy, R. Y. Wang, "SkipIndex: Towards a Scalable Peer-to-Peer Index Service for High Dimensional Data", Technical Report TR-703-04, Princeton University, 2004.
18 T. Liu, A.W. Moore, A. Gray, "An Investigation of Practical Approximate Nearest Neighbor Algorithms", ANIPS, 2004.
19 R. Weber, H. J. Schek, S. Blott, "A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces", VLDB, pp.194-205, 1998.
20 P. Ciaccia, M. Patella, P. Zezula, "M-tree: An Efficient Access Method for Similarity Search in Metric Spaces", VLDB, pp.426-435, 1997.
21 B. Nam, A. Sussman, "DiST: Fully Decentralized Indexing for Querying Distributed Multidimensional Datasets", Technical Report CS-TR-4720 and UMIACS-TR-2005-28, Maryland University, 2005.