Search | Korea Science

Lee, Inhoe;Oh, Hyesung;Kim, Hyoung-Joo
- KIISE Transactions on Computing Practices
- /
- v.21 no.11
- /
- pp.681-688
- /
- 2015
The k nearest neighbor (k-NN) graph construction is an important operation with many web-related applications, including collaborative filtering, similarity search, and many others in data mining and machine learning. Despite its many elegant properties, the brute force k-NN graph construction method has a computational complexity of $O(n^2)$, which is prohibitive for large scale data sets. Thus, (Key, Value)-based distributed framework, MapReduce, is gaining increasingly widespread use in Locality Sensitive Hashing which is efficient for high-dimension and sparse data. Based on the two-stage strategy, we engage the locality sensitive hashing technique to divide users into small subsets, and then calculate similarity between pairs in the small subsets using a brute force method on MapReduce. Specifically, generating a candidate group stage is important since brute-force calculation is performed in the following step. However, existing methods do not prevent large candidate groups. In this paper, we proposed an efficient algorithm for approximate k-NN graph construction by regrouping candidate groups. Experimental results show that our approach is more effective than existing methods in terms of graph accuracy and scan rate.
https://doi.org/10.5626/KTCP.2015.21.11.681 인용 KSCI

Park, Youngki;Hwang, Heasoo;Lee, Sang-Goo
- KIISE Transactions on Computing Practices
- /
- v.21 no.4
- /
- pp.327-332
- /
- 2015
Constructing a k-nearest neighbor (k-NN) graph is a primitive operation in the field of recommender systems, information retrieval, data mining and machine learning. Although there have been many algorithms proposed for constructing a k-NN graph, either the existing approaches cannot be used for various types of similarity measures, or the performance of the approaches is decreased as the number of nodes or dimensions increases. In this paper, we present a novel algorithm for k-NN graph construction based on "balanced" canopy clustering. The experimental results show that irrespective of the number of nodes or dimensions, our algorithm is at least five times faster than the brute-force approach while retaining an accuracy of approximately 92%.
https://doi.org/10.5626/KTCP.2015.21.4.327 인용 KSCI

Cao, Jiangzhong;Chen, Pei;Ling, Bingo Wing-Kuen;Yang, Zhijing;Dai, Qingyun
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.9 no.7
- /
- pp.2568-2584
- /
- 2015
Spectral clustering has become one of the most popular clustering approaches in recent years. Similarity graph constructed on the data is one of the key factors that influence the performance of spectral clustering. However, the similarity graphs constructed by existing methods usually contain some unreliable edges. To construct reliable similarity graph for spectral clustering, an efficient method based on Markov random walk (MRW) is proposed in this paper. In the proposed method, theMRW model is defined on the raw k-NN graph and the neighbors of each sample are determined by the probability of the MRW. Since the high order transition probabilities carry complex relationships among data, the neighbors in the graph determined by our proposed method are more reliable than those of the existing methods. Experiments are performed on the synthetic and real-world datasets for performance evaluation and comparison. The results show that the graph obtained by our proposed method reflects the structure of the data better than those of the state-of-the-art methods and can effectively improve the performance of spectral clustering.
https://doi.org/10.3837/tiis.2015.07.013 인용 PDF KSCI KPUBS HTML