[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.4313/TEEM.2014.15.3.125

Enhanced Locality Sensitive Clustering in High Dimensional Space

Chen, Gang (Department of Data Processing Engineering, Zhengzhou Information Science and Technology Institute)
Gao, Hao-Lin (Department of Data Processing Engineering, Zhengzhou Information Science and Technology Institute)
Li, Bi-Cheng (Department of Data Processing Engineering, Zhengzhou Information Science and Technology Institute)
Hu, Guo-En (Department of Data Processing Engineering, Zhengzhou Information Science and Technology Institute)

Publication Information

Transactions on Electrical and Electronic Materials / v.15, no.3, 2014 , pp. 125-129 More about this Journal

Abstract

A dataset can be clustered by merging the bucket indices that come from the random projection of locality sensitive hashing functions. It should be noted that for this to work the merging interval must be calculated first. To improve the feasibility of large scale data clustering in high dimensional space we propose an enhanced Locality Sensitive Hashing Clustering Method. Firstly, multiple hashing functions are generated. Secondly, data points are projected to bucket indices. Thirdly, bucket indices are clustered to get class labels. Experimental results showed that on synthetic datasets this method achieves high accuracy at much improved cluster speeds. These attributes make it well suited to clustering data in high dimensional space.

Keywords

Enhanced locality sensitive clustering; Bucket indices; Random projection; Data clustering;

Citations & Related Records

Reference

1	Andoni and P. Indyk, Communications of the ACM, 51, 117 (2008).
2	S. Dasgupta and A. Gupta, Random Structures & Algorithms, 22, 60 (2002). http://dx.doi.org/10.1002/rsa.10073 DOI
3	M. F. Balcan, A. Blum, and S. Vempala, Machine Learning, 65, 79 (2006) [DOI: http://dx.doi.org/10.1007/s10994-006-7550-1]. DOI
4	Q. Shi, J. Petterson, G. Dror, J. Langford, A. J. Smola, and S.V.N. Vishwanathan, J. Mach. Learn. Res., 10, 2615 (2009).
5	Y. Y. Liang, J. M. Li, and B. Zhang, Proc. of International Conference on Multimedia (Beijing, China, ACM, 2009) p. 589.
6	H. Jegou, M. Douze, and C. Schmid, International Journal of Computer Vision, 87, 316 (2010) [DOI: http://dx.doi.org/10.1007/s11263-009-0285-2]. DOI
7	D. Ravichandran, P. Pantel, and E. Hovy, Proc. of the 43rd Annual Meeting on Association for Computational Linguistics (Stroudsburg, USA, ACM, 2005) p. 622.
8	S. Dasgupta, Experiments with Random Projection. In Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence (San Francisco, USA, 2000) p. 143.
9	A. Blum, Proc. of the 2005 International Conference on Subspace, Latent Structure and Feature Selection (LNCS 3940, 2006) p.52.
10	Q. Shi, C. Shen, and R. Hill, International Conference on Machine Learning (Edinburgh, Scotland, UK, 2012).
11	Y. Cao and J. Wu, Projective ART for Clustering Datasets in High Dimensional Spaces (Neural Networks,15, 2002) p. 105-120.
12	R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan, Proc. of SIGMOD Record ACM Special Interest Group on Management of Data, 94 (1998).
13	D. Nister and H. Stewenius, Proc. of IEEE Conference on Computer Vision and Pattern Recognition[C] (New York, USA, 2006) p. 2161.
14	J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman, Proc. of IEEE Conference on Computer Vision and Pattern Recognition(Minneapolis, USA, 2007) p.1-8.
15	R. Maree, P. Denis, and L. Wehenkel, Proc. of ACM SIGMM International Conference on Multimedia Information Retrieval Philadelphia (Pennsylvania, USA:ACM, 2010) p. 29-38.
16	J. Shi and J. Malik, IEEE Transactions on Pattern Analysis and Machine Intelligence, 22, 888 (2000) [DOI: http://dx.doi.org/10.1109/34.868688]. DOI ScienceOn
17	B. J. Frey and D. Dueck, Science, 315, 972 (2007) [DOI: http://dx.doi.org/10.1126/science.1136800]. DOI ScienceOn
18	Schulman, L. J. Clustering for edge-cost minimization. In Proc. Annual ACM Symp. Theory of Computing, 2000: 547-55.
19	P. Indyk , R. Motwani, Proc. of the Symposium on Theory of Computing (Dallas, USA:ACM, 1998) p. 604.
20	S. Dasgupta and K. Sinha, Randomized Partition Trees for Exact Nearest Neighbor Search JMLR: Workshop and Conference Proc., 30, 1 (2013).
21	D. L. Donoho, Compressed Sensing. IEEE Trans. Information Theory, 52, 1289 (2006). DOI ScienceOn
22	J. E. Fowler and Q. Du, IEEE Transaction on Image Processing, 21, 184 (2012). DOI
23	A. Schclara, L. Rokachb, and A. Amit, Ensembles of Classifiers Based on Dimensionality Reduction (2013).