DOI QR코드

DOI QR Code

Enhanced Locality Sensitive Clustering in High Dimensional Space

  • Chen, Gang (Department of Data Processing Engineering, Zhengzhou Information Science and Technology Institute) ;
  • Gao, Hao-Lin (Department of Data Processing Engineering, Zhengzhou Information Science and Technology Institute) ;
  • Li, Bi-Cheng (Department of Data Processing Engineering, Zhengzhou Information Science and Technology Institute) ;
  • Hu, Guo-En (Department of Data Processing Engineering, Zhengzhou Information Science and Technology Institute)
  • 투고 : 2014.01.13
  • 심사 : 2014.03.24
  • 발행 : 2014.06.25

초록

A dataset can be clustered by merging the bucket indices that come from the random projection of locality sensitive hashing functions. It should be noted that for this to work the merging interval must be calculated first. To improve the feasibility of large scale data clustering in high dimensional space we propose an enhanced Locality Sensitive Hashing Clustering Method. Firstly, multiple hashing functions are generated. Secondly, data points are projected to bucket indices. Thirdly, bucket indices are clustered to get class labels. Experimental results showed that on synthetic datasets this method achieves high accuracy at much improved cluster speeds. These attributes make it well suited to clustering data in high dimensional space.

키워드

참고문헌

  1. Y. Cao and J. Wu, Projective ART for Clustering Datasets in High Dimensional Spaces (Neural Networks,15, 2002) p. 105-120.
  2. R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan, Proc. of SIGMOD Record ACM Special Interest Group on Management of Data, 94 (1998).
  3. D. Nister and H. Stewenius, Proc. of IEEE Conference on Computer Vision and Pattern Recognition[C] (New York, USA, 2006) p. 2161.
  4. J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman, Proc. of IEEE Conference on Computer Vision and Pattern Recognition(Minneapolis, USA, 2007) p.1-8.
  5. R. Maree, P. Denis, and L. Wehenkel, Proc. of ACM SIGMM International Conference on Multimedia Information Retrieval Philadelphia (Pennsylvania, USA:ACM, 2010) p. 29-38.
  6. J. Shi and J. Malik, IEEE Transactions on Pattern Analysis and Machine Intelligence, 22, 888 (2000) [DOI: http://dx.doi.org/10.1109/34.868688].
  7. B. J. Frey and D. Dueck, Science, 315, 972 (2007) [DOI: http://dx.doi.org/10.1126/science.1136800].
  8. P. Indyk , R. Motwani, Proc. of the Symposium on Theory of Computing (Dallas, USA:ACM, 1998) p. 604.
  9. S. Dasgupta and K. Sinha, Randomized Partition Trees for Exact Nearest Neighbor Search JMLR: Workshop and Conference Proc., 30, 1 (2013).
  10. Schulman, L. J. Clustering for edge-cost minimization. In Proc. Annual ACM Symp. Theory of Computing, 2000: 547-55.
  11. D. L. Donoho, Compressed Sensing. IEEE Trans. Information Theory, 52, 1289 (2006). https://doi.org/10.1109/TIT.2006.871582
  12. J. E. Fowler and Q. Du, IEEE Transaction on Image Processing, 21, 184 (2012). https://doi.org/10.1109/TIP.2011.2159730
  13. A. Schclara, L. Rokachb, and A. Amit, Ensembles of Classifiers Based on Dimensionality Reduction (2013).
  14. S. Dasgupta and A. Gupta, Random Structures & Algorithms, 22, 60 (2002). http://dx.doi.org/10.1002/rsa.10073
  15. M. F. Balcan, A. Blum, and S. Vempala, Machine Learning, 65, 79 (2006) [DOI: http://dx.doi.org/10.1007/s10994-006-7550-1].
  16. Q. Shi, J. Petterson, G. Dror, J. Langford, A. J. Smola, and S.V.N. Vishwanathan, J. Mach. Learn. Res., 10, 2615 (2009).
  17. Andoni and P. Indyk, Communications of the ACM, 51, 117 (2008).
  18. Y. Y. Liang, J. M. Li, and B. Zhang, Proc. of International Conference on Multimedia (Beijing, China, ACM, 2009) p. 589.
  19. H. Jegou, M. Douze, and C. Schmid, International Journal of Computer Vision, 87, 316 (2010) [DOI: http://dx.doi.org/10.1007/s11263-009-0285-2].
  20. D. Ravichandran, P. Pantel, and E. Hovy, Proc. of the 43rd Annual Meeting on Association for Computational Linguistics (Stroudsburg, USA, ACM, 2005) p. 622.
  21. S. Dasgupta, Experiments with Random Projection. In Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence (San Francisco, USA, 2000) p. 143.
  22. A. Blum, Proc. of the 2005 International Conference on Subspace, Latent Structure and Feature Selection (LNCS 3940, 2006) p.52.
  23. Q. Shi, C. Shen, and R. Hill, International Conference on Machine Learning (Edinburgh, Scotland, UK, 2012).