DOI QR코드

DOI QR Code

A Generic Algorithm for k-Nearest Neighbor Graph Construction Based on Balanced Canopy Clustering

Balanced Canopy Clustering에 기반한 일반적 k-인접 이웃 그래프 생성 알고리즘

  • 박영기 (서울대학교 컴퓨터공학부) ;
  • 황혜수 (서울시립대학교 컴퓨터과학부) ;
  • 이상구 (서울대학교 컴퓨터공학부)
  • Received : 2014.09.11
  • Accepted : 2015.02.10
  • Published : 2015.04.15

Abstract

Constructing a k-nearest neighbor (k-NN) graph is a primitive operation in the field of recommender systems, information retrieval, data mining and machine learning. Although there have been many algorithms proposed for constructing a k-NN graph, either the existing approaches cannot be used for various types of similarity measures, or the performance of the approaches is decreased as the number of nodes or dimensions increases. In this paper, we present a novel algorithm for k-NN graph construction based on "balanced" canopy clustering. The experimental results show that irrespective of the number of nodes or dimensions, our algorithm is at least five times faster than the brute-force approach while retaining an accuracy of approximately 92%.

k-인접 이웃 그래프는 모든 정점에 대한 k-NN 정보를 나타내는 데이터 구조로서, 많은 정보검색 및 추천 시스템에서 k-인접 이웃 그래프를 활용하고 있다. 현재까지 k-인접 이웃 그래프를 생성하는 다양한 방법들이 제안되었지만, 다음의 두 조건을 동시에 만족하는 알고리즘은 제안되지 못했다: (1) 특정유사도 척도를 가정하지 않는다. (2) 정점 또는 차원의 수가 증가하더라도 정확도가 감소하지 않는다. 본 논문에서는 balanced canopy clustering을 이용하여 위 두 조건을 모두 만족하는 k-NN 그래프 생성 알고리즘을 제안한다. 실험 결과, 정점과 차원의 수에 상관없이 기본 알고리즘에 비해 5배 이상 빠르면서 약 92%의 정확도를 유지했다. 본 알고리즘은 새로운 유사도 척도를 사용하거나, 높은 정확도를 보장해야 할 경우 효과적으로 사용될 수 있다.

Keywords

Acknowledgement

Supported by : 한국연구재단

References

  1. J. Davidson, B. Liebald, J. Liu, P. Nandy, T. V. Vleet, U. Gargi, S. Gupta, Y. He, M. Lambert, B. Livingston, D. Sampath, "The YouTube video recommendation system," Proc. of the 4th ACM Conference on Recommender Systems, pp. 293-296, 2010.
  2. Y. Park, S. Park, S. Lee, W. Jung, "Scalable Knearest neighbor graph construction based on Greedy Filtering," Proc. of the 22nd International World Wide Web Conference, pp. 227-228, 2013.
  3. Y. Park, S. Park, S. Lee, W. Jung, "Greedy Filtering: A scalable algorithm for k-nearest neighbor graph construction," Proc. of the 19th International Conference on Database Systems for Advanced Applications, pp. 327-341, 2014.
  4. W. Dong, C. Moses, K. Li, "Efficient k-nearest neighbor graph construction for generic similarity measures," Proc. of the 20th International World Wide Web Conference, pp. 577-586, 2011.
  5. A. McCallum, K. Nigam, L. Ungar, "Efficient clustering of high-dimensional data sets with application to reference matching," Proc. of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 169-178, 2000.
  6. M. Charikar, "Similarity estimation techniques from rounding algorithms," Proc. of the 34th Annual ACM Symposium on Theory of Computing, pp. 380-388, 2011.
  7. Y. Tao, K. Yi, C. Sheng, P. Kalnis, "Quality and efficiency in high dimensional nearest neighbor search," Proc. of the 2009 ACM SIGMOD International Conference on Management of Data, pp. 563-576, 2009.
  8. Y. Zhang, K. Huang, G. Geng, C. Liu, "Fast kNN graph construction with Locality Sensitive Hashing," Proc. of the 2013 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, pp. 660-674, 2013.
  9. J. Chen, H. Fang, Y. Saad, "Fast approximate kNN graph construction for high dimensional data via recursive Lanczos bisection," The Journal of Machine Learning Research, Vol. 10, pp. 26-43, 2009.
  10. Y. Park, S. Park, S. Lee, W. Jung, "Fast collaborative filtering with a k-nearest neighbor graph," Proc. of 2014 International Conference on Big Data and Smart Computing, pp. 92-95, 2014.
  11. Y. Weiss, A. Torralba, R. Fergus, "Spectral hashing," Proc. of the 23nd Annual Conference on Neural Information Processing Systems, pp. 1753-1760, 2009.
  12. W. Liu, J. Wang, S. Kumar, S. F. Chang, "Hashing with graphs," Proc. of the 28th International Conference on Machine Learning, pp. 1-8, 2011.
  13. J. P. Heo, Y. Lee, J. He, S. F. Chang, S. E. Yoon, "Spherical hashing," Proc. of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2957-2964, 2012.
  14. J. Wang, J. Wang, G. Zeng, Z. Tu, R. Gan, S. Li, "Scalable k-NN graph construction for visual descriptors," Proc. of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1106-1113, 2012.