• Title/Summary/Keyword: K-nearest neighbor approach

Search Result 96, Processing Time 0.021 seconds

Optimization of Case-based Reasoning Systems using Genetic Algorithms: Application to Korean Stock Market (유전자 알고리즘을 이용한 사례기반추론 시스템의 최적화: 주식시장에의 응용)

  • Kim, Kyoung-Jae;Ahn, Hyun-Chul;Han, In-Goo
    • Asia pacific journal of information systems
    • /
    • v.16 no.1
    • /
    • pp.71-84
    • /
    • 2006
  • Case-based reasoning (CBR) is a reasoning technique that reuses past cases to find a solution to the new problem. It often shows significant promise for improving effectiveness of complex and unstructured decision making. It has been applied to various problem-solving areas including manufacturing, finance and marketing for the reason. However, the design of appropriate case indexing and retrieval mechanisms to improve the performance of CBR is still a challenging issue. Most of the previous studies on CBR have focused on the similarity function or optimization of case features and their weights. According to some of the prior research, however, finding the optimal k parameter for the k-nearest neighbor (k-NN) is also crucial for improving the performance of the CBR system. In spite of the fact, there have been few attempts to optimize the number of neighbors, especially using artificial intelligence (AI) techniques. In this study, we introduce a genetic algorithm (GA) to optimize the number of neighbors to combine. This study applies the novel approach to Korean stock market. Experimental results show that the GA-optimized k-NN approach outperforms other AI techniques for stock market prediction.

A Study of using Emotional Features for Information Retrieval Systems (감정요소를 사용한 정보검색에 관한 연구)

  • Kim, Myung-Gwan;Park, Young-Tack
    • The KIPS Transactions:PartB
    • /
    • v.10B no.6
    • /
    • pp.579-586
    • /
    • 2003
  • In this paper, we propose a novel approach to employ emotional features to document retrieval systems. Fine emotional features, such as HAPPY, SAD, ANGRY, FEAR, and DISGUST, have been used to represent Korean document. Users are allowed to use these features for retrieving their documents. Next, retrieved documents are learned by classification methods like cohesion factor, naive Bayesian, and, k-nearest neighbor approaches. In order to combine various approaches, voting method has been used. In addition, k-means clustering has been used for our experimentation. The performance of our approach proved to be better in accuracy than other methods, and be better in short texts rather than large documents.

Improving Web Service Recommendation using Clustering with K-NN and SVD Algorithms

  • Weerasinghe, Amith M.;Rupasingha, Rupasingha A.H.M.
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.5
    • /
    • pp.1708-1727
    • /
    • 2021
  • In the advent of the twenty-first century, human beings began to closely interact with technology. Today, technology is developing, and as a result, the world wide web (www) has a very important place on the Internet and the significant task is fulfilled by Web services. A lot of Web services are available on the Internet and, therefore, it is difficult to find matching Web services among the available Web services. The recommendation systems can help in fixing this problem. In this paper, our observation was based on the recommended method such as the collaborative filtering (CF) technique which faces some failure from the data sparsity and the cold-start problems. To overcome these problems, we first applied an ontology-based clustering and then the k-nearest neighbor (KNN) algorithm for each separate cluster group that effectively increased the data density using the past user interests. Then, user ratings were predicted based on the model-based approach, such as singular value decomposition (SVD) and the predictions used for the recommendation. The evaluation results showed that our proposed approach has a less prediction error rate with high accuracy after analyzing the existing recommendation methods.

A study on neighbor selection methods in k-NN collaborative filtering recommender system (근접 이웃 선정 협력적 필터링 추천시스템에서 이웃 선정 방법에 관한 연구)

  • Lee, Seok-Jun
    • Journal of the Korean Data and Information Science Society
    • /
    • v.20 no.5
    • /
    • pp.809-818
    • /
    • 2009
  • Collaborative filtering approach predicts the preference of active user about specific items transacted on the e-commerce by using others' preference information. To improve the prediction accuracy through collaborative filtering approach, it must be needed to gain enough preference information of users' for predicting preference. But, a bit much information of users' preference might wrongly affect on prediction accuracy, and also too small information of users' preference might make bad effect on the prediction accuracy. This research suggests the method, which decides suitable numbers of neighbor users for applying collaborative filtering algorithm, improved by existing k nearest neighbors selection methods. The result of this research provides useful methods for improving the prediction accuracy and also refines exploratory data analysis approach for deciding appropriate numbers of nearest neighbors.

  • PDF

Optimal k-Nearest Neighborhood Classifier Using Genetic Algorithm (유전알고리즘을 이용한 최적 k-최근접이웃 분류기)

  • Park, Chong-Sun;Huh, Kyun
    • Communications for Statistical Applications and Methods
    • /
    • v.17 no.1
    • /
    • pp.17-27
    • /
    • 2010
  • Feature selection and feature weighting are useful techniques for improving the classification accuracy of k-Nearest Neighbor (k-NN) classifier. The main propose of feature selection and feature weighting is to reduce the number of features, by eliminating irrelevant and redundant features, while simultaneously maintaining or enhancing classification accuracy. In this paper, a novel hybrid approach is proposed for simultaneous feature selection, feature weighting and choice of k in k-NN classifier based on Genetic Algorithm. The results have indicated that the proposed algorithm is quite comparable with and superior to existing classifiers with or without feature selection and feature weighting capability.

A Generic Algorithm for k-Nearest Neighbor Graph Construction Based on Balanced Canopy Clustering (Balanced Canopy Clustering에 기반한 일반적 k-인접 이웃 그래프 생성 알고리즘)

  • Park, Youngki;Hwang, Heasoo;Lee, Sang-Goo
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.4
    • /
    • pp.327-332
    • /
    • 2015
  • Constructing a k-nearest neighbor (k-NN) graph is a primitive operation in the field of recommender systems, information retrieval, data mining and machine learning. Although there have been many algorithms proposed for constructing a k-NN graph, either the existing approaches cannot be used for various types of similarity measures, or the performance of the approaches is decreased as the number of nodes or dimensions increases. In this paper, we present a novel algorithm for k-NN graph construction based on "balanced" canopy clustering. The experimental results show that irrespective of the number of nodes or dimensions, our algorithm is at least five times faster than the brute-force approach while retaining an accuracy of approximately 92%.

k-NN Join Based on LSH in Big Data Environment

  • Ji, Jiaqi;Chung, Yeongjee
    • Journal of information and communication convergence engineering
    • /
    • v.16 no.2
    • /
    • pp.99-105
    • /
    • 2018
  • k-Nearest neighbor join (k-NN Join) is a computationally intensive algorithm that is designed to find k-nearest neighbors from a dataset S for every object in another dataset R. Most related studies on k-NN Join are based on single-computer operations. As the data dimensions and data volume increase, running the k-NN Join algorithm on a single computer cannot generate results quickly. To solve this scalability problem, we introduce the locality-sensitive hashing (LSH) k-NN Join algorithm implemented in Spark, an approach for high-dimensional big data. LSH is used to map similar data onto the same bucket, which can reduce the data search scope. In order to achieve parallel implementation of the algorithm on multiple computers, the Spark framework is used to accelerate the computation of distances between objects in a cluster. Results show that our proposed approach is fast and accurate for high-dimensional and big data.

A Modified Grey-Based k-NN Approach for Treatment of Missing Value

  • Chun, Young-M.;Lee, Joon-W.;Chung, Sung-S.
    • Journal of the Korean Data and Information Science Society
    • /
    • v.17 no.2
    • /
    • pp.421-436
    • /
    • 2006
  • Huang proposed a grey-based nearest neighbor approach to predict accurately missing attribute value in 2004. Our study proposes which way to decide the number of nearest neighbors using not only the deng's grey relational grade but also the wen's grey relational grade. Besides, our study uses not an arithmetic(unweighted) mean but a weighted one. Also, GRG is used by a weighted value when we impute missing values. There are four different methods - DU, DW, WU, WW. The performance of WW(Wen's GRG & weighted mean) method is the best of any other methods. It had been proven by Huang that his method was much better than mean imputation method and multiple imputation method. The performance of our study is far superior to that of Huang.

  • PDF

The Optimized Detection Range of RFID-based Positioning System using k-Nearest Neighbor Algorithm

  • Kim, Jung-Hwan;Heo, Joon;Han, Soo-Hee;Kim, Sang-Min
    • Proceedings of the Korean Association of Geographic Inforamtion Studies Conference
    • /
    • 2008.10a
    • /
    • pp.270-271
    • /
    • 2008
  • The positioning technology for a moving object is an important and essential component of ubiquitous communication computing environment and applications, for which Radio Frequency IDentification Identification(RFID) is has been considered as also a core technology for ubiquitous wireless communication. RFID-based positioning system calculates the position of moving object based on k-nearest neighbor(k-nn) algorithm using detected k-tags which have known coordinates and k can be determined according to the detection range of RFID system. In this paper, RFID-based positioning system determines the position of moving object not using weight factor which depends on received signal strength but assuming that tags within the detection range always operate and have same weight value. Because the latter system is much more economical than the former one. The geometries of tags were determined with considerations in huge buildings like office buildings, shopping malls and warehouses, so they were determined as the line in 1-Dimensional space, the square in 2-Dimensional space and the cubic in 3-Dimensional space. In 1-Dimensional space, the optimal detection range is determined as 125% of the tag spacing distance through the analytical and numerical approach. Here, the analytical approach means a mathematical proof and the numerical approach means a simulation using matlab. But the analytical approach is very difficult in 2- and 3-Dimensional space, so through the numerical approach, the optimal detection range is determined as 134% of the tag spacing distance in 2-Dimensional space and 143% of the tag spacing distance in 3-Dimensional space. This result can be used as a fundamental study for designing RFID-based positioning system.

  • PDF

A Study on the Treatment of Missing Value using Grey Relational Grade and k-NN Approach

  • Chun, Young-Min;Chung, Sung-Suk
    • 한국데이터정보과학회:학술대회논문집
    • /
    • 2006.04a
    • /
    • pp.55-62
    • /
    • 2006
  • Huang proposed a grey-based nearest neighbor approach to predict accurately missing attribute value in 2004. Our study proposes which way to decide the number of nearest neighbors using not only the dong's grey relational grade but also the wen's grey relational grade. Besides, our study uses not an arithmetic(unweighted) mean but a weighted one. Also, GRG is used by a weighted value when we impute a missing values. There are four different methods - DU, DW, WU, WW. The performance of WW(wen's GRG & weighted mean) method is the best of my other methods. It had been proven by Huang that his method was much better than mean imputation method and multiple imputation method. The performance of our study is far superior to that of Huang.

  • PDF