[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.5391/IJFIS.2012.12.4.300

Locality-Sensitive Hashing Techniques for Nearest Neighbor Search

Lee, Keon Myung (Dept of Computer Science and PT-ERC Chungbuk National University)

Publication Information

International Journal of Fuzzy Logic and Intelligent Systems / v.12, no.4, 2012 , pp. 300-307 More about this Journal

Abstract

When the volume of data grows big, some simple tasks could become a significant concern. Nearest neighbor search is such a task which finds from a data set the k nearest data points to queries. Locality-sensitive hashing techniques have been developed for approximate but fast nearest neighbor search. This paper introduces the notion of locality-sensitive hashing and surveys the locality-sensitive hashing techniques. It categories them based on several criteria, presents their characteristics, and compares their performance.

Keywords

locality-sensitive hashing; hashing; nearest neighbor search; similarity search;

Citations & Related Records

Reference

1	B. C. Russell, A. Torralba, K. P. Murphy, and W. T. Freeman, LabelMe, http://labelme.csail.mit.edu/.
2	R. R. Salakhutdinov and G.E. Hinton, "Semantic hashing," Proc. of Int.l J. of Approximate Reasoning, vol.50, no.7, 2009.
3	R. E. Schapire, "The Boosting Approach to Machine Learning : An Overview," Nonlinear Estimation and Classification, Springer, 2003.
4	G. Shakhnarovich, P. Viola, and T. Darrell, "Fast Pose Estimation with Parameter Sensitive Hashing," Proc. ICCV, 2003.
5	B. Stein, S. M. Eissen, and M. Potthas, "Strategies for retrieving plagiarized documents," SIGIR, 2007.
6	C. Strecha, A. M. Bronstein, M. M. Bronstein, and P. Fua," LDAHash: Improved Matching with Smaller Descriptors," IEEE TPAMI, vol34, no.1, 2012.
7	M. Tata, T. Muto, M. Iwamura, and K. Kise, "Extension of Approximate Nearest Neighbor Search Based on Multi-Valued Expression on Closeness to General Distributions," DEIM Forum, 2010(in Japanese).
8	M. Theodbald, J. Siddhaarth, and A. Paepcke, "Spot-Sigs: robust and efficient near duplicate detection in large web collections," Proc. ACM SIGIR, Singapore, pp.563-570, 2008.
9	A. Torralba, R. Fergus, and Y. Weiss, "Small Codes and Large Image Databases for Recognition," Proc. of CVPR, pp.1-8, 2008.
10	A. Torralba, R. Fergus, and W. T. Freeman, 80 Million Tiny Images: A Large Data Set for Nonparametric Object and Scene Recognition," IEEE PAMI, vol.30, no.11, 2008.
11	J. K. Uhlmann, "Satisfying general proximity/ similarity queries with metric trees," Information Processing Letters,, vol.4, pp.175-179, 1991.
12	J.Wang, S. Kumar, and S.-F. Chang, "Sequential Projection Learning for Hashing with Compact Codes," Proc. of Int. Conf. on Machine Learning, 2010.
13	J. Wang, S. Kumar, and S.-F. Chang, "Semi-Supervised Hashing for Large Scale Search," IEEE PAMI, vol.34, no.12, 2012.
14	Y. Weiss, A. Torralba, and R. Fergus, "Spectral hashing," Proc. of Neural Information Processing Systems, pp.1753-1760, 2008.
15	H. Xu, J. Wang, Z. Li, G. Zeng, S. Le, and N. Yu, "Complementary Hashing for Approximate Nearest Neighbor Search," Proc. of IEEE Int. Conf. on Computer Vision, 2011.
16	D. Zhang, J. Wang, D. Cai, and J. Lu, "Self-taught hashing for fast similarity search," Proc. SIGIR, pp.18-25, 2010.
17	D. Zhang, J. Wang, D. Cai, and J. Lu, "Laplacian Cohashing of Terms and Documents," Proc. ECIR2010, LNCS, vol.5993, pp.577-580, 2010.
18	J. L. Bentley, "Multidimensional Binary Search Trees used for Associative Searching," Commun. Ass. Comput. Mach., vol. 19, pp. 509-517, 1975.
19	A. Andoni and P. Indyk, "Near-optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions," Comm. ACM, vol.51, no.1, pp.117-122, 2008. DOI ScienceOn
20	S. Baluja and M. Covell, "Learning Forgiving Hash Functions: Algorithms and Large Scale Tests," Proc. 20th Int. Joint Conf. on Artifical intelligence, pp. 2663-2669, 2007.
21	S. Boriah, V. Chandola, and V. Kumar, "Similarity Measures for Categorical Data: A Comparative Evaluation," Proc. of the 8th SIAM Int. Conf. on Data Mining, pp.243-254, 2008.
22	A. Z. Broder, "On the Resemblance and Containment of Documents," Proc. Compression and Complexity of Sequence, pp. 21-29, 1997.
23	A. Z. Broder, M. Charikar, A. M. Frieze, and M. Mitzenmacher, "Min-wise Independent Permutations," ACM Symposium on Theory of Computing, pp. 327-336, 1998.
24	Y. Gong, S. Lazebnik, A. Gordo, and F. Perronnin, "Iterative quantization: A Procrustean approach to learning binary codes for large-scale image retrieval," IEEE Trans. Pattern Anal. Mach. Intell., 2012.
25	M. Datar, N. Immorlica, P. Indyk, and V. S. Mirrokni, "Locality-sensitive Hashing Scheme based on p-stable Distribution," Symp. on Computational Geometry, pp. 253-262, 2004.
26	D. G. Lowe, "Object recognition from local scaleinvariant features," Proc. of the Int.l Conf. on Computer Vision, vol.2. pp.1150-1157, 1999.
27	A. Gionis, P. Indyk, and R. Motwani, "Similarity Search in High Dimensions via Hashing," Proc. of VLDB, 1999.
28	A. Guttman, "R-Trees: A Dynamic Index Structure for Spatial Searching," Proc. of SIGMOD'84, 1984.
29	J. Hays and A. A. Efros, "Scene Completion Using Millions of Photographs," Proc. of SIGGRAPH, 2007.
30	J. He, W. Liu, and S.-F. Chang, "Scalable Similarity Search with Optimized Kernel Hashing," Proc. of IEEE Int. Conf. on Knowledge Discovery and Data Mining, pp.1129-1138 2010.
31	H. Henzinger, "Finding Nearest-Duplicate Web Pages: a Large-Scale Evaluation of Algorithms," Proc. of SIGIR, pp. 284-291, 2006.
32	J.-P. Heo, Y. Lee, J. He, S.-F. Chang, and S.-E. Yoon, "Spectral Hashing," Proc. of CVPR, 2012.
33	P. Indyk and R. Motwani, "Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality," Proc. of STOC, 1998.
34	Q. Jiang and M. Sun, "Semi-supervised Simhash for Efficient Document Similarity Search," Proc. The 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp.93-101, 2011.
35	W. Kong and W.-J. Li, "Isotropic Hashing," Proc. of NIPS2012, 2012.
36	B. Kulis and K. Grauman, "Kernelized Localitysensitive Hashing," Proc. of 12th Int. Conf. on Computer Vision, 2009.
37	W. Kong, W.-J. Li, and M. Guo, "Manhattan hashing for large-scale image retrieval," Proc. of SIGIR, 2012.
38	Y. Koren, "Factorization Meets the Neighborhood: a Multifaceted Collaborative Filtering Model," KDD, 2008.
39	A. Krizhevsky, V. Nair, and G. Hinton, The CIFAR-10 and CIFAR-100 Databases, http://www.cs.toronto.edu/ kriz/cifar.html.
40	B. Kulis and T. Barrell, "Learning to Hash with Binary Reconstructive Embeddings," Tech. Rep., UC Berkeley, 2009.
41	B. Kulis, P. Jain, and K. Grauman, "Fast Similarity Search for Learned Metrics," IEEE TPAMI, vol.31, no. 12, 2009.
42	Y. LeCun and C. Cortes, MNIST Database, http://yann. lecun.com/exdb/mnist/.
43	K. M. Lee and K.M. Lee, "A Locality Sensitive Hashing Technique for Categorical Data," Applied Mech. And Mat., 2013(to appear).
44	F.-F. Li, M. Andreetto, and M. A. Ranzato, Caltech 101 Database, http://www.vision.caltech.edu/ImageDatasets/Caltech101/.
45	Y. Lin, D. Cai, "Density Sensitive Hashing," ArXive-prints arXiv:1205.2930, 2012.
46	T.Liu, A. W. Moore, A. Gray, and K. Yang, "An Investigation of Practical Approximate Nearest Neighbor Algorithms," Proc. of NIPS, pp.825-832. 2005.
47	W. Liu, J. Wang, S. Kumar, and S.-F. Chang, "Hashing with Graphs," Proc. of Int. Conf. on Machine Learning, 2011.
48	U. von Luxburg, "A Tutorial on Spectral Clustering," Stat. Comput., vol.17, pp. 395-416, 2007. DOI ScienceOn
49	Y. Matsushita and T. Wada, "Principal Component Hashing: An Accelerated Approximate Nearest Neighbor Search," Proc. of PSIVT, 2009.
50	U. Manber, "Finding Similar Files in a Large File System," Proc. USENIX Conference, pp. 1-10, 1994.
51	B. McFee and G. Lanckriet, "Large-Scale Music Similarity Search With Spatial Trees," Proc. of ISMIR, 2011.
52	G. A. Miller, R. Beckwith, C. D. Fellbaum, D. Gross, and K. Miller, "WordNet: An Online Lexical Database," Int. J. Lexicograph, vol.3, no.4, pp. 235-244, 1990. DOI
53	Y. Mu, J. Shen, and S. Yan, "Weakly-Supervised Hashing in Kernel Space," Proc. of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp.3344-3351, 2010.
54	D. Nister and H. Stewenius, "Scalable Recognition with a Vocabulary Tree," Proc. CVPR , vol. 5, 2006.
55	M. Norouzi and D. J. Fleet, "Minimal Loss Hashing for Compact Binary Codes," Proc. of ICML, 2011.
56	A. Oliva, A. Torralba, "Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope," Int. J. of Computer Vision, vol.42,no.3, pp.145-175, 1989.
57	S. Omohundro, "Five balltree construction algorithms," Technical Report, ICSI, 1989.
58	S. Pandey, A. Broder, and F. Chierichetti, "Nearest-Neighbor Caching for Content-Match Applications," Proc. of WWW Conf., 2009.
59	M. Potthast and B. Stein, "New Issues in Near-Duplicate Detection," Data Analysis, Machine Learning and Applications, pp. 601-609, Springer, 2008.
60	M. Raginsky, and S. Lazebnik, "Locality-sensitive binary codes from shift-invariant kernels," Proc. of NIPS, 2009.

1573-7543	(2017) Cluster Computing Bucket-size balancing locality sensitive hashing using the map reduce paradigm / (1573-7543)
15320626	(2018) Concurrency and Computation: Practice and Experience MapReduce-based storage and indexing for big health data / (15320626) , e4854

1	Locality-Sensitive Hashing for Data with Categorical and Numerical Attributes Using Dual Hashing / [Lee, Keon Myung;] / International Journal of Fuzzy Logic and Intelligent Systems
2	Big Numeric Data Classification Using Grid-based Bayesian Inference in the MapReduce Framework / [Kim, Young Joon;Lee, Keon Myung;] / International Journal of Fuzzy Logic and Intelligent Systems