• Title/Summary/Keyword: k-Nearest Neighbors

Search Result 197, Processing Time 0.021 seconds

A KD-Tree-Based Nearest Neighbor Search for Large Quantities of Data

  • Yen, Shwu-Huey;Hsieh, Ya-Ju
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.7 no.3
    • /
    • pp.459-470
    • /
    • 2013
  • The discovery of nearest neighbors, without training in advance, has many applications, such as the formation of mosaic images, image matching, image retrieval and image stitching. When the quantity of data is huge and the number of dimensions is high, the efficient identification of a nearest neighbor (NN) is very important. This study proposes a variation of the KD-tree - the arbitrary KD-tree (KDA) - which is constructed without the need to evaluate variances. Multiple KDAs can be constructed efficiently and possess independent tree structures, when the amount of data is large. Upon testing, using extended synthetic databases and real-world SIFT data, this study concludes that the KDA method increases computational efficiency and produces satisfactory accuracy, when solving NN problems.

A New Memory-Based Reasoning Algorithm using the Recursive Partition Averaging (재귀 분할 평균 법을 이용한 새로운 메모리기반 추론 알고리즘)

  • Lee, Hyeong-Il;Jeong, Tae-Seon;Yun, Chung-Hwa;Gang, Gyeong-Sik
    • The Transactions of the Korea Information Processing Society
    • /
    • v.6 no.7
    • /
    • pp.1849-1857
    • /
    • 1999
  • We proposed the RPA (Recursive Partition Averaging) method in order to improve the storage requirement and classification rate of the Memory Based Reasoning. This algorithm recursively partitions the pattern space until each hyperrectangle contains only those patterns of the same class, then it computes the average values of patterns in each hyperrectangle to extract a representative. Also we have used the mutual information between the features and classes as weights for features to improve the classification performance. The proposed algorithm used 30~90% of memory space that is needed in the k-NN (k-Nearest Neighbors) classifier, and showed a comparable classification performance to the k-NN. Also, by reducing the number of stored patterns, it showed an excellent result in terms of classification time when we compare it to the k-NN.

  • PDF

Software Vulnerability Prediction System Using Machine Learning Algorithm (기계학습 알고리즘을 이용한 소프트웨어 취약 여부 예측 시스템)

  • Choi, Minjun;Kim, Juhwan;Yun, Joobeom
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.28 no.3
    • /
    • pp.635-642
    • /
    • 2018
  • In the Era of the Fourth Industrial Revolution, we live in huge amounts of software. However, as software increases, software vulnerabilities are also increasing. Therefore, it is important to detect and remove software vulnerabilities. Currently, many researches have been studied to predict and detect software security problems, but it takes a long time to detect and does not have high prediction accuracy. Therefore, in this paper, we describe a method for efficiently predicting software vulnerabilities using machine learning algorithms. In addition, various machine learning algorithms are compared through experiments. Experimental results show that the k-nearest neighbors prediction model has the highest prediction rate.

Optimization of Case-based Reasoning Systems using Genetic Algorithms: Application to Korean Stock Market (유전자 알고리즘을 이용한 사례기반추론 시스템의 최적화: 주식시장에의 응용)

  • Kim, Kyoung-Jae;Ahn, Hyun-Chul;Han, In-Goo
    • Asia pacific journal of information systems
    • /
    • v.16 no.1
    • /
    • pp.71-84
    • /
    • 2006
  • Case-based reasoning (CBR) is a reasoning technique that reuses past cases to find a solution to the new problem. It often shows significant promise for improving effectiveness of complex and unstructured decision making. It has been applied to various problem-solving areas including manufacturing, finance and marketing for the reason. However, the design of appropriate case indexing and retrieval mechanisms to improve the performance of CBR is still a challenging issue. Most of the previous studies on CBR have focused on the similarity function or optimization of case features and their weights. According to some of the prior research, however, finding the optimal k parameter for the k-nearest neighbor (k-NN) is also crucial for improving the performance of the CBR system. In spite of the fact, there have been few attempts to optimize the number of neighbors, especially using artificial intelligence (AI) techniques. In this study, we introduce a genetic algorithm (GA) to optimize the number of neighbors to combine. This study applies the novel approach to Korean stock market. Experimental results show that the GA-optimized k-NN approach outperforms other AI techniques for stock market prediction.

A Study on the Drug Classification Using Machine Learning Techniques (머신러닝 기법을 이용한 약물 분류 방법 연구)

  • Anmol Kumar Singh;Ayush Kumar;Adya Singh;Akashika Anshum;Pradeep Kumar Mallick
    • Advanced Industrial SCIence
    • /
    • v.3 no.2
    • /
    • pp.8-16
    • /
    • 2024
  • This paper shows the system of drug classification, the goal of this is to foretell the apt drug for the patients based on their demographic and physiological traits. The dataset consists of various attributes like Age, Sex, BP (Blood Pressure), Cholesterol Level, and Na_to_K (Sodium to Potassium ratio), with the objective to determine the kind of drug being given. The models used in this paper are K-Nearest Neighbors (KNN), Logistic Regression and Random Forest. Further to fine-tune hyper parameters using 5-fold cross-validation, GridSearchCV was used and each model was trained and tested on the dataset. To assess the performance of each model both with and without hyper parameter tuning evaluation metrics like accuracy, confusion matrices, and classification reports were used and the accuracy of the models without GridSearchCV was 0.7, 0.875, 0.975 and with GridSearchCV was 0.75, 1.0, 0.975. According to GridSearchCV Logistic Regression is the most suitable model for drug classification among the three-model used followed by the K-Nearest Neighbors. Also, Na_to_K is an essential feature in predicting the outcome.

A Study of Environmental Effects on Galaxy Spin Using MaNGA Data

  • Lee, Jong Chul;Hwang, Ho Seong;Chung, Haeun
    • The Bulletin of The Korean Astronomical Society
    • /
    • v.42 no.2
    • /
    • pp.47.2-47.2
    • /
    • 2017
  • We investigate the environmental effects on galaxy spin using the sample of ~1100 galaxies from the first public data of MaNGA integral field unit survey. We determine the spin parameter ${\lambda}_{Re}$ of galaxies by analyzing the two-dimensional stellar kinematic measurements within the effective radius, and study its dependence on the large-scale (background mass density determined with 20 nearby galaxies) and small-scale (distance to and morphology of the nearest neighbor galaxy) environments. We first examine the mass dependence of galaxy spin, and find that the spin parameter decreases with stellar mass at log ($M_{\ast}/M_{\odot}$) > 10, consistent with previous studies. We then divide the galaxies into three subsamples using their stellar masses to minimize the mass effects on galaxy spin. The spin parameter of galaxies in each subsample does not change with the background density, but do change with the distance to and morphology of the nearest neighbor. The spin parameter increases when late-type neighbors are within the virial radius, and decreases when early-type neighbors are within the virial radius. These results suggest that the large-scale environments hardly affect the galaxy spin, but the effects of small-scale environments such as hydrodynamic galaxy-galaxy interactions are substantial.

  • PDF

Deterministic and probabilistic analysis of tunnel face stability using support vector machine

  • Li, Bin;Fu, Yong;Hong, Yi;Cao, Zijun
    • Geomechanics and Engineering
    • /
    • v.25 no.1
    • /
    • pp.17-30
    • /
    • 2021
  • This paper develops a convenient approach for deterministic and probabilistic evaluations of tunnel face stability using support vector machine classifiers. The proposed method is comprised of two major steps, i.e., construction of the training dataset and determination of instance-based classifiers. In step one, the orthogonal design is utilized to produce representative samples after the ranges and levels of the factors that influence tunnel face stability are specified. The training dataset is then labeled by two-dimensional strength reduction analyses embedded within OptumG2. For any unknown instance, the second step applies the training dataset for classification, which is achieved by an ad hoc Python program. The classification of unknown samples starts with selection of instance-based training samples using the k-nearest neighbors algorithm, followed by the construction of an instance-based SVM-KNN classifier. It eventually provides labels of the unknown instances, avoiding calculate its corresponding performance function. Probabilistic evaluations are performed by Monte Carlo simulation based on the SVM-KNN classifier. The ratio of the number of unstable samples to the total number of simulated samples is computed and is taken as the failure probability, which is validated and compared with the response surface method.

The Role of Data Technologies with Machine Learning Approaches in Makkah Religious Seasons

  • Waleed Al Shehri
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.8
    • /
    • pp.26-32
    • /
    • 2023
  • Hajj is a fundamental pillar of Islam that all Muslims must perform at least once in their lives. However, Umrah can be performed several times yearly, depending on people's abilities. Every year, Muslims from all over the world travel to Saudi Arabia to perform Hajj. Hajj and Umrah pilgrims face multiple issues due to the large volume of people at the same time and place during the event. Therefore, a system is needed to facilitate the people's smooth execution of Hajj and Umrah procedures. Multiple devices are already installed in Makkah, but it would be better to suggest the data architectures with the help of machine learning approaches. The proposed system analyzes the services provided to the pilgrims regarding gender, location, and foreign pilgrims. The proposed system addressed the research problem of analyzing the Hajj pilgrim dataset most effectively. In addition, Visualizations of the proposed method showed the system's performance using data architectures. Machine learning algorithms classify whether male pilgrims are more significant than female pilgrims. Several algorithms were proposed to classify the data, including logistic regression, Naive Bayes, K-nearest neighbors, decision trees, random forests, and XGBoost. The decision tree accuracy value was 62.83%, whereas K-nearest Neighbors had 62.86%; other classifiers have lower accuracy than these. The open-source dataset was analyzed using different data architectures to store the data, and then machine learning approaches were used to classify the dataset.

Classification of nuclear activity types for neighboring countries of South Korea using machine learning techniques with xenon isotopic activity ratios

  • Sang-Kyung Lee;Ser Gi Hong
    • Nuclear Engineering and Technology
    • /
    • v.56 no.4
    • /
    • pp.1372-1384
    • /
    • 2024
  • The discrimination of the source for xenon gases' release can provide an important clue for detecting the nuclear activities in the neighboring countries. In this paper, three machine learning techniques, which are logistic regression, support vector machine (SVM), and k-nearest neighbors (KNN), were applied to develop the predictive models for discriminating the source for xenon gases' release based on the xenon isotopic activity ratio data which were generated using the depletion codes, i.e., ORIGEN in SCALE 6.2 and Serpent, for the probable sources. The considered sources for the neighboring countries of South Korea include PWRs, CANDUs, IRT-2000, Yongbyun 5 MWe reactor, and nuclear tests with plutonium and uranium. The results of the analysis showed that the overall prediction accuracies of models with SVM and KNN using six inputs, all exceeded 90%. Particularly, the models based on SVM and KNN that used six or three xenon isotope activity ratios with three classification categories, namely reactor, plutonium bomb, and uranium bomb, had accuracy levels greater than 88%. The prediction performances demonstrate the applicability of machine learning algorithms to predict nuclear threat using ratios of xenon isotopic activity.

Determining the optimal number of cases to combine in a case-based reasoning system for eCRM

  • Hyunchul Ahn;Kim, Kyoung-jae;Ingoo Han
    • Proceedings of the KAIS Fall Conference
    • /
    • 2003.11a
    • /
    • pp.178-184
    • /
    • 2003
  • Case-based reasoning (CBR) often shows significant promise for improving effectiveness of complex and unstructured decision making. Consequently, it has been applied to various problem-solving areas including manufacturing, finance and marketing. However, the design of appropriate case indexing and retrieval mechanisms to improve the performance of CBR is still challenging issue. Most of previous studies to improve the effectiveness for CBR have focused on the similarity function or optimization of case features and their weights. However, according to some of prior researches, finding the optimal k parameter for k-nearest neighbor (k-NN) is also crucial to improve the performance of CBR system. Nonetheless, there have been few attempts which have tried to optimize the number of neighbors, especially using artificial intelligence (AI) techniques. In this study, we introduce a genetic algorithm (GA) to optimize the number of neighbors to combine. This study applies the new model to the real-world case provided by an online shopping mall in Korea. Experimental results show that a GA-optimized k-NN approach outperforms other AI techniques for purchasing behavior forecasting.

  • PDF