• Title/Summary/Keyword: k-nearest neighbor graph

Search Result 10, Processing Time 0.026 seconds

A Generic Algorithm for k-Nearest Neighbor Graph Construction Based on Balanced Canopy Clustering (Balanced Canopy Clustering에 기반한 일반적 k-인접 이웃 그래프 생성 알고리즘)

  • Park, Youngki;Hwang, Heasoo;Lee, Sang-Goo
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.4
    • /
    • pp.327-332
    • /
    • 2015
  • Constructing a k-nearest neighbor (k-NN) graph is a primitive operation in the field of recommender systems, information retrieval, data mining and machine learning. Although there have been many algorithms proposed for constructing a k-NN graph, either the existing approaches cannot be used for various types of similarity measures, or the performance of the approaches is decreased as the number of nodes or dimensions increases. In this paper, we present a novel algorithm for k-NN graph construction based on "balanced" canopy clustering. The experimental results show that irrespective of the number of nodes or dimensions, our algorithm is at least five times faster than the brute-force approach while retaining an accuracy of approximately 92%.

Robust Similarity Measure for Spectral Clustering Based on Shared Neighbors

  • Ye, Xiucai;Sakurai, Tetsuya
    • ETRI Journal
    • /
    • v.38 no.3
    • /
    • pp.540-550
    • /
    • 2016
  • Spectral clustering is a powerful tool for exploratory data analysis. Many existing spectral clustering algorithms typically measure the similarity by using a Gaussian kernel function or an undirected k-nearest neighbor (kNN) graph, which cannot reveal the real clusters when the data are not well separated. In this paper, to improve the spectral clustering, we consider a robust similarity measure based on the shared nearest neighbors in a directed kNN graph. We propose two novel algorithms for spectral clustering: one based on the number of shared nearest neighbors, and one based on their closeness. The proposed algorithms are able to explore the underlying similarity relationships between data points, and are robust to datasets that are not well separated. Moreover, the proposed algorithms have only one parameter, k. We evaluated the proposed algorithms using synthetic and real-world datasets. The experimental results demonstrate that the proposed algorithms not only achieve a good level of performance, they also outperform the traditional spectral clustering algorithms.

An Improvement in K-NN Graph Construction using re-grouping with Locality Sensitive Hashing on MapReduce (MapReduce 환경에서 재그룹핑을 이용한 Locality Sensitive Hashing 기반의 K-Nearest Neighbor 그래프 생성 알고리즘의 개선)

  • Lee, Inhoe;Oh, Hyesung;Kim, Hyoung-Joo
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.11
    • /
    • pp.681-688
    • /
    • 2015
  • The k nearest neighbor (k-NN) graph construction is an important operation with many web-related applications, including collaborative filtering, similarity search, and many others in data mining and machine learning. Despite its many elegant properties, the brute force k-NN graph construction method has a computational complexity of $O(n^2)$, which is prohibitive for large scale data sets. Thus, (Key, Value)-based distributed framework, MapReduce, is gaining increasingly widespread use in Locality Sensitive Hashing which is efficient for high-dimension and sparse data. Based on the two-stage strategy, we engage the locality sensitive hashing technique to divide users into small subsets, and then calculate similarity between pairs in the small subsets using a brute force method on MapReduce. Specifically, generating a candidate group stage is important since brute-force calculation is performed in the following step. However, existing methods do not prevent large candidate groups. In this paper, we proposed an efficient algorithm for approximate k-NN graph construction by regrouping candidate groups. Experimental results show that our approach is more effective than existing methods in terms of graph accuracy and scan rate.

A Nearest Neighbor Query Processing Algorithm Supporting K-anonymity Based on Weighted Adjacency Graph in LBS (위치 기반 서비스에서 K-anonymity를 보장하는 가중치 근접성 그래프 기반 최근접 질의처리 알고리즘)

  • Jang, Mi-Young;Chang, Jae-Woo
    • Spatial Information Research
    • /
    • v.20 no.4
    • /
    • pp.83-92
    • /
    • 2012
  • Location-based services (LBS) are increasingly popular due to the improvement of geo-positioning capabilities and wireless communication technology. However, in order to enjoy LBS services, a user requesting a query must send his/her exact location to the LBS provider. Therefore, it is a key challenge to preserve user's privacy while providing LBS. To solve this problem, the existing method employs a 2PASS cloaking framework that not only hides the actual user location but also reduces bandwidth consumption. However, 2PASS does not fully guarantee the actual user privacy because it does not take the real user distribution into account. Hence, in this paper, we propose a nearest neighbor query processing algorithm that supports K-anonymity property based on the weighted adjacency graph(WAG). Our algorithm not only preserves the location of a user by guaranteeing k-anonymity in a query region, but also improves a bandwidth usage by reducing unnecessary search for a query result. We demonstrate from experimental results that our algorithm outperforms the existing one in terms of query processing time and bandwidth usage.

Spectral clustering based on the local similarity measure of shared neighbors

  • Cao, Zongqi;Chen, Hongjia;Wang, Xiang
    • ETRI Journal
    • /
    • v.44 no.5
    • /
    • pp.769-779
    • /
    • 2022
  • Spectral clustering has become a typical and efficient clustering method used in a variety of applications. The critical step of spectral clustering is the similarity measurement, which largely determines the performance of the spectral clustering method. In this paper, we propose a novel spectral clustering algorithm based on the local similarity measure of shared neighbors. This similarity measurement exploits the local density information between data points based on the weight of the shared neighbors in a directed k-nearest neighbor graph with only one parameter k, that is, the number of nearest neighbors. Numerical experiments on synthetic and real-world datasets demonstrate that our proposed algorithm outperforms other existing spectral clustering algorithms in terms of the clustering performance measured via the normalized mutual information, clustering accuracy, and F-measure. As an example, the proposed method can provide an improvement of 15.82% in the clustering performance for the Soybean dataset.

A Parallel Algorithm for Constructing the Delaunay Triangulation in the$L_\infty(L_1)$ Metric ($L_\infty(L_1)$디루니 삼각분할의 병렬처리 알고리즘)

  • Wi, Yeong-Cheol
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.28 no.3
    • /
    • pp.155-160
    • /
    • 2001
  • 본 논문은 영역별 근접 그래프 (geographic nearest neighbor graph)와 레인지 트리 (range tree)를 이용하여 평면 위의 n 개의 점에 대한 L$_{\infty}$ (L$_1$) 거리 (metric) 상의 디루니 삼각분할 (Delaunay triangulation)을 구축하는 방법을 소개한다. 이 방법은 L$_{\infty}$ (L$_1$) 거리 상에서 디루니 삼각분할에 있는 각 삼각형의 최소한 한 선분이 영역별 근접 그래프에 포함됨을 이용하여 레인지 트리 방법으로 디루니 삼각분할을 구축한다. 본 방법은 0(nlogn)의 순차계산 시간에 L$_{\infty}$ (L$_1$) 디루니 삼각분할을 구축하며, CREW-PRAM (Concurrent Read Exclusive Write Parallel Random Access Machine)에서 0(n)의 프로세서로 0(logn)의 병렬처리 시간에 L$_{\infty}$ (L$_1$) 디루니 삼각분할을 구축한다. 또한, 이 방법은 직선간의 교차점 계산 대신 거리비교를 하기 때문에 수치오차가 적고 구현이 용이하다.

  • PDF

[$L_1$] Shortest Paths with Isothetic Roads (축에 평행한 도로들이 놓여 있을 때의 $L_1$ 최단 경로)

  • Bae Sang Won;Kim Jae-Hoon;Chwa Kyung-Yong
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2005.11a
    • /
    • pp.976-978
    • /
    • 2005
  • We present a nearly optimal ($O(\nu\;min(\nu,\;n)n\;log\;n)$ time and O(n) srace) algorithm that constructs a shortest path map with n isothetic roads of speed $\nu$ under the $L_1$ metric. The algorithm uses the continuous Dijkstra method and its efficiency is based on a new geometric insight; the minimum in-degree of any nearest neighbor graph for points with roads of speed $\nu$ is $\Theta(\nu\;min(\nu,\;n))$, which is first shown in this paper. Also, this algorithm naturally extends to the multi-source case so that the Voronoi diagram for m sites can be computed in $O(\nu\;min(\nu,\;n)(n+m)log(n+m))$ time and O(n+m) space, which is also nearly optimal.

  • PDF

Intensity and Ambient Enhanced Lidar-Inertial SLAM for Unstructured Construction Environment (비정형의 건설환경 매핑을 위한 레이저 반사광 강도와 주변광을 활용한 향상된 라이다-관성 슬램)

  • Jung, Minwoo;Jung, Sangwoo;Jang, Hyesu;Kim, Ayoung
    • The Journal of Korea Robotics Society
    • /
    • v.16 no.3
    • /
    • pp.179-188
    • /
    • 2021
  • Construction monitoring is one of the key modules in smart construction. Unlike structured urban environment, construction site mapping is challenging due to the characteristics of an unstructured environment. For example, irregular feature points and matching prohibit creating a map for management. To tackle this issue, we propose a system for data acquisition in unstructured environment and a framework for Intensity and Ambient Enhanced Lidar Inertial Odometry via Smoothing and Mapping, IA-LIO-SAM, that achieves highly accurate robot trajectories and mapping. IA-LIO-SAM utilizes a factor graph same as Tightly-coupled Lidar Inertial Odometry via Smoothing and Mapping (LIO-SAM). Enhancing the existing LIO-SAM, IA-LIO-SAM leverages point's intensity and ambient value to remove unnecessary feature points. These additional values also perform as a new factor of the K-Nearest Neighbor algorithm (KNN), allowing accurate comparisons between stored points and scanned points. The performance was verified in three different environments and compared with LIO-SAM.

A Parameter-Free Approach for Clustering and Outlier Detection in Image Databases (이미지 데이터베이스에서 매개변수를 필요로 하지 않는 클러스터링 및 아웃라이어 검출 방법)

  • Oh, Hyun-Kyo;Yoon, Seok-Ho;Kim, Sang-Wook
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.47 no.1
    • /
    • pp.80-91
    • /
    • 2010
  • As the volume of image data increases dramatically, its good organization of image data is crucial for efficient image retrieval. Clustering is a typical way of organizing image data. However, traditional clustering methods have a difficulty of requiring a user to provide the number of clusters as a parameter before clustering. In this paper, we discuss an approach for clustering image data that does not require the parameter. Basically, the proposed approach is based on Cross-Association that finds a structure or patterns hidden in data using the relationship between individual objects. In order to apply Cross-Association to clustering of image data, we convert the image data into a graph first. Then, we perform Cross-Association on the graph thus obtained and interpret the results in the clustering perspective. We also propose the method of hierarchical clustering and the method of outlier detection based on Cross-Association. By performing a series of experiments, we verify the effectiveness of the proposed approach. Finally, we discuss the finding of a good value of k used in k-nearest neighbor search and also compare the clustering results with symmetric and asymmetric ways used in building a graph.

Incorporating Social Relationship discovered from User's Behavior into Collaborative Filtering (사용자 행동 기반의 사회적 관계를 결합한 사용자 협업적 여과 방법)

  • Thay, Setha;Ha, Inay;Jo, Geun-Sik
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.2
    • /
    • pp.1-20
    • /
    • 2013
  • Nowadays, social network is a huge communication platform for providing people to connect with one another and to bring users together to share common interests, experiences, and their daily activities. Users spend hours per day in maintaining personal information and interacting with other people via posting, commenting, messaging, games, social events, and applications. Due to the growth of user's distributed information in social network, there is a great potential to utilize the social data to enhance the quality of recommender system. There are some researches focusing on social network analysis that investigate how social network can be used in recommendation domain. Among these researches, we are interested in taking advantages of the interaction between a user and others in social network that can be determined and known as social relationship. Furthermore, mostly user's decisions before purchasing some products depend on suggestion of people who have either the same preferences or closer relationship. For this reason, we believe that user's relationship in social network can provide an effective way to increase the quality in prediction user's interests of recommender system. Therefore, social relationship between users encountered from social network is a common factor to improve the way of predicting user's preferences in the conventional approach. Recommender system is dramatically increasing in popularity and currently being used by many e-commerce sites such as Amazon.com, Last.fm, eBay.com, etc. Collaborative filtering (CF) method is one of the essential and powerful techniques in recommender system for suggesting the appropriate items to user by learning user's preferences. CF method focuses on user data and generates automatic prediction about user's interests by gathering information from users who share similar background and preferences. Specifically, the intension of CF method is to find users who have similar preferences and to suggest target user items that were mostly preferred by those nearest neighbor users. There are two basic units that need to be considered by CF method, the user and the item. Each user needs to provide his rating value on items i.e. movies, products, books, etc to indicate their interests on those items. In addition, CF uses the user-rating matrix to find a group of users who have similar rating with target user. Then, it predicts unknown rating value for items that target user has not rated. Currently, CF has been successfully implemented in both information filtering and e-commerce applications. However, it remains some important challenges such as cold start, data sparsity, and scalability reflected on quality and accuracy of prediction. In order to overcome these challenges, many researchers have proposed various kinds of CF method such as hybrid CF, trust-based CF, social network-based CF, etc. In the purpose of improving the recommendation performance and prediction accuracy of standard CF, in this paper we propose a method which integrates traditional CF technique with social relationship between users discovered from user's behavior in social network i.e. Facebook. We identify user's relationship from behavior of user such as posts and comments interacted with friends in Facebook. We believe that social relationship implicitly inferred from user's behavior can be likely applied to compensate the limitation of conventional approach. Therefore, we extract posts and comments of each user by using Facebook Graph API and calculate feature score among each term to obtain feature vector for computing similarity of user. Then, we combine the result with similarity value computed using traditional CF technique. Finally, our system provides a list of recommended items according to neighbor users who have the biggest total similarity value to the target user. In order to verify and evaluate our proposed method we have performed an experiment on data collected from our Movies Rating System. Prediction accuracy evaluation is conducted to demonstrate how much our algorithm gives the correctness of recommendation to user in terms of MAE. Then, the evaluation of performance is made to show the effectiveness of our method in terms of precision, recall, and F1-measure. Evaluation on coverage is also included in our experiment to see the ability of generating recommendation. The experimental results show that our proposed method outperform and more accurate in suggesting items to users with better performance. The effectiveness of user's behavior in social network particularly shows the significant improvement by up to 6% on recommendation accuracy. Moreover, experiment of recommendation performance shows that incorporating social relationship observed from user's behavior into CF is beneficial and useful to generate recommendation with 7% improvement of performance compared with benchmark methods. Finally, we confirm that interaction between users in social network is able to enhance the accuracy and give better recommendation in conventional approach.