• Title/Summary/Keyword: k-NN graph

Search Result 8, Processing Time 0.026 seconds

An Improvement in K-NN Graph Construction using re-grouping with Locality Sensitive Hashing on MapReduce (MapReduce 환경에서 재그룹핑을 이용한 Locality Sensitive Hashing 기반의 K-Nearest Neighbor 그래프 생성 알고리즘의 개선)

  • Lee, Inhoe;Oh, Hyesung;Kim, Hyoung-Joo
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.11
    • /
    • pp.681-688
    • /
    • 2015
  • The k nearest neighbor (k-NN) graph construction is an important operation with many web-related applications, including collaborative filtering, similarity search, and many others in data mining and machine learning. Despite its many elegant properties, the brute force k-NN graph construction method has a computational complexity of $O(n^2)$, which is prohibitive for large scale data sets. Thus, (Key, Value)-based distributed framework, MapReduce, is gaining increasingly widespread use in Locality Sensitive Hashing which is efficient for high-dimension and sparse data. Based on the two-stage strategy, we engage the locality sensitive hashing technique to divide users into small subsets, and then calculate similarity between pairs in the small subsets using a brute force method on MapReduce. Specifically, generating a candidate group stage is important since brute-force calculation is performed in the following step. However, existing methods do not prevent large candidate groups. In this paper, we proposed an efficient algorithm for approximate k-NN graph construction by regrouping candidate groups. Experimental results show that our approach is more effective than existing methods in terms of graph accuracy and scan rate.

A Generic Algorithm for k-Nearest Neighbor Graph Construction Based on Balanced Canopy Clustering (Balanced Canopy Clustering에 기반한 일반적 k-인접 이웃 그래프 생성 알고리즘)

  • Park, Youngki;Hwang, Heasoo;Lee, Sang-Goo
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.4
    • /
    • pp.327-332
    • /
    • 2015
  • Constructing a k-nearest neighbor (k-NN) graph is a primitive operation in the field of recommender systems, information retrieval, data mining and machine learning. Although there have been many algorithms proposed for constructing a k-NN graph, either the existing approaches cannot be used for various types of similarity measures, or the performance of the approaches is decreased as the number of nodes or dimensions increases. In this paper, we present a novel algorithm for k-NN graph construction based on "balanced" canopy clustering. The experimental results show that irrespective of the number of nodes or dimensions, our algorithm is at least five times faster than the brute-force approach while retaining an accuracy of approximately 92%.

Robust Similarity Measure for Spectral Clustering Based on Shared Neighbors

  • Ye, Xiucai;Sakurai, Tetsuya
    • ETRI Journal
    • /
    • v.38 no.3
    • /
    • pp.540-550
    • /
    • 2016
  • Spectral clustering is a powerful tool for exploratory data analysis. Many existing spectral clustering algorithms typically measure the similarity by using a Gaussian kernel function or an undirected k-nearest neighbor (kNN) graph, which cannot reveal the real clusters when the data are not well separated. In this paper, to improve the spectral clustering, we consider a robust similarity measure based on the shared nearest neighbors in a directed kNN graph. We propose two novel algorithms for spectral clustering: one based on the number of shared nearest neighbors, and one based on their closeness. The proposed algorithms are able to explore the underlying similarity relationships between data points, and are robust to datasets that are not well separated. Moreover, the proposed algorithms have only one parameter, k. We evaluated the proposed algorithms using synthetic and real-world datasets. The experimental results demonstrate that the proposed algorithms not only achieve a good level of performance, they also outperform the traditional spectral clustering algorithms.

Spectral Clustering with Sparse Graph Construction Based on Markov Random Walk

  • Cao, Jiangzhong;Chen, Pei;Ling, Bingo Wing-Kuen;Yang, Zhijing;Dai, Qingyun
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.9 no.7
    • /
    • pp.2568-2584
    • /
    • 2015
  • Spectral clustering has become one of the most popular clustering approaches in recent years. Similarity graph constructed on the data is one of the key factors that influence the performance of spectral clustering. However, the similarity graphs constructed by existing methods usually contain some unreliable edges. To construct reliable similarity graph for spectral clustering, an efficient method based on Markov random walk (MRW) is proposed in this paper. In the proposed method, theMRW model is defined on the raw k-NN graph and the neighbors of each sample are determined by the probability of the MRW. Since the high order transition probabilities carry complex relationships among data, the neighbors in the graph determined by our proposed method are more reliable than those of the existing methods. Experiments are performed on the synthetic and real-world datasets for performance evaluation and comparison. The results show that the graph obtained by our proposed method reflects the structure of the data better than those of the state-of-the-art methods and can effectively improve the performance of spectral clustering.

Text Categorization Using TextRank Algorithm (TextRank 알고리즘을 이용한 문서 범주화)

  • Bae, Won-Sik;Cha, Jeong-Won
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.1
    • /
    • pp.110-114
    • /
    • 2010
  • We describe a new method for text categorization using TextRank algorithm. Text categorization is a problem that over one pre-defined categories are assigned to a text document. TextRank algorithm is a graph-based ranking algorithm. If we consider that each word is a vertex, and co-occurrence of two adjacent words is a edge, we can get a graph from a document. After that, we find important words using TextRank algorithm from the graph and make feature which are pairs of words which are each important word and a word adjacent to the important word. We use classifiers: SVM, Na$\ddot{i}$ve Bayesian classifier, Maximum Entropy Model, and k-NN classifier. We use non-cross-posted version of 20 Newsgroups data set. In consequence, we had an improved performance in whole classifiers, and the result tells that is a possibility of TextRank algorithm in text categorization.

Smooth Formation Navigation of Multiple Mobile Robots for Avoiding Moving Obstacles

  • Chen Xin;Li Yangmin
    • International Journal of Control, Automation, and Systems
    • /
    • v.4 no.4
    • /
    • pp.466-479
    • /
    • 2006
  • This paper addresses a formation navigation issue for a group of mobile robots passing through an environment with either static or moving obstacles meanwhile keeping a fixed formation shape. Based on Lyapunov function and graph theory, a NN formation control is proposed, which guarantees to maintain a formation if the formation pattern is $C^k,\;k\geq1$. In the process of navigation, the leader can generate a proper trajectory to lead formation and avoid moving obstacles according to the obtained information. An evolutionary computational technique using particle swarm optimization (PSO) is proposed for motion planning so that the formation is kept as $C^1$ function. The simulation results demonstrate that this algorithm is effective and the experimental studies validate the formation ability of the multiple mobile robots system.

A Clinical Study of the Relation between the 7-Zone-Diagnostic System and Heart Rate Variability (7구역진단기와 심박변이도의 연관성에 대한 임상연구)

  • Song, Beom-Yong;Kwon, Kyong-Suk
    • Journal of Acupuncture Research
    • /
    • v.25 no.1
    • /
    • pp.15-23
    • /
    • 2008
  • Objectives : The aim of our study was to demonstrate the clinical application of a diagnosis relating the 7-zone-diagnostic system and heart rate variability. Materials and Methods : Subjects were divided into two groups according to the factor AA form of the 7-zone-diagnostic system(VEGA-DFM722, VEGA, Germany). Subjects in group A showed a factor-AA red bar graph in which zone 2 was higher than the normal range, and zone 6 was lower than the normal range. Subjects in group B showed a factor-AA red bar graph in which zone 2 was lower than the normal range, and zone 6 was higher than the normal range. We investigated how to differentiate the index of heart-rate variability(HRV, LX-3202, LAXTHA, Korea) for each group. We did independent sample t-tests and evaluated the results of the HRV at the 5% significance level using SPSS 10.0 for Windows. Results : The differences of the MeanRR, MeanHRV, SDNN, complexity, HRV-index, RMSSD, SDSD, and pNN50 values between the groups was not significant. The differences of the Ln(TP), Ln(VLF), Ln(HF), LF/(LF+HF), LF/HF, norm LF, and norm HF values between the groups was also not significant. The differences were not significant, but generally the values of SDNN, complexity, RMSSD, SDSD, Ln(VLF), Ln(HF) and norm LF for group B were higher than those for group A, and the values of pNN50 and norm HF for group B were lower than those for group A. Conclusions : This study suggests that differences in the HRV values between the groups was not significant, but group B has a tendency to be healthier than group A because of stress. Accordingly, further study will be required.

  • PDF

A Study on the Correlation between the Patterns of the Zone 4 of Factor AA in 7-Zone-diagnostic System and Heart Rate Variability (7구역진단기의 Factor AA 제4구역 유형과 심박변이도(HRV)와의 상관성 연구)

  • Yu, Jung-Suk;Cho, Yi-Hyun;Lee, Jin-Seok;Lee, Hui-Yong;Song, Beom-Yong
    • Journal of Acupuncture Research
    • /
    • v.25 no.4
    • /
    • pp.71-80
    • /
    • 2008
  • Objectives : The 7-zonediagnostic system is a diagnostic device to predetermine bodily locations by measuring the energy of body. This study was to investigate the relation between the different patterns of Zone 4 of Factor AA in VEGA DFM 722 (VEGA, Germany), 7-zone-diagnositic system and heart rate variability. Methods : We made three groups according to the Factor AA patterns of VEGA DFM 722. The Factor AA pattern of Group A is that the red bar graph of zone 4 was higher than the normal range. The Factor AA patterns of Group B was that the red bar graph of zone 4 was located at the normal range. The Factor AA patterns of Group C was that the red bar graph of zone 4 was lower than the normal range. We investigated how to difference of the index of heart rate variability(HRV, LX-3202, LAXTHA, Korea) according to each groups. Results : Complexity, HRV-index, RMSSD, SDSD values of Group B were higher than other Groups. pNN50 values of Group B were lower than other groups. And Ln(TP), Ln(VLF), Ln(LF), Ln(HF) values of Group B were higher than other groups. Conclusions : We presumed that Group B was healthier than other groups for the stress.

  • PDF