• Title/Summary/Keyword: Nearest Neighbors

Search Result 232, Processing Time 0.03 seconds

Linear interpolation and Machine Learning Methods for Gas Leakage Prediction Base on Multi-source Data Integration (다중소스 데이터 융합 기반의 가스 누출 예측을 위한 선형 보간 및 머신러닝 기법)

  • Dashdondov, Khongorzul;Jo, Kyuri;Kim, Mi-Hye
    • Journal of the Korea Convergence Society
    • /
    • v.13 no.3
    • /
    • pp.33-41
    • /
    • 2022
  • In this article, we proposed to predict natural gas (NG) leakage levels through feature selection based on a factor analysis (FA) of the integrating the Korean Meteorological Agency data and natural gas leakage data for considering complex factors. The paper has been divided into three modules. First, we filled missing data based on the linear interpolation method on the integrated data set, and selected essential features using FA with OrdinalEncoder (OE)-based normalization. The dataset is labeled by K-means clustering. The final module uses four algorithms, K-nearest neighbors (KNN), decision tree (DT), random forest (RF), Naive Bayes (NB), to predict gas leakage levels. The proposed method is evaluated by the accuracy, area under the ROC curve (AUC), and mean standard error (MSE). The test results indicate that the OrdinalEncoder-Factor analysis (OE-F)-based classification method has improved successfully. Moreover, OE-F-based KNN (OE-F-KNN) showed the best performance by giving 95.20% accuracy, an AUC of 96.13%, and an MSE of 0.031.

Prediction of Blast Vibration in Quarry Using Machine Learning Models (머신러닝 모델을 이용한 석산 개발 발파진동 예측)

  • Jung, Dahee;Choi, Yosoon
    • Tunnel and Underground Space
    • /
    • v.31 no.6
    • /
    • pp.508-519
    • /
    • 2021
  • In this study, a model was developed to predict the peak particle velocity (PPV) that affects people and the surrounding environment during blasting. Four machine learning models using the k-nearest neighbors (kNN), classification and regression tree (CART), support vector regression (SVR), and particle swarm optimization (PSO)-SVR algorithms were developed and compared with each other to predict the PPV. Mt. Yogmang located in Changwon-si, Gyeongsangnam-do was selected as a study area, and 1048 blasting data were acquired to train the machine learning models. The blasting data consisted of hole length, burden, spacing, maximum charge per delay, powder factor, number of holes, ratio of emulsion, monitoring distance and PPV. To evaluate the performance of the trained models, the mean absolute error (MAE), mean square error (MSE), and root mean square error (RMSE) were used. The PSO-SVR model showed superior performance with MAE, MSE and RMSE of 0.0348, 0.0021 and 0.0458, respectively. Finally, a method was proposed to predict the degree of influence on the surrounding environment using the developed machine learning models.

Machine Learning Methods to Predict Vehicle Fuel Consumption

  • Ko, Kwangho
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.9
    • /
    • pp.13-20
    • /
    • 2022
  • It's proposed and analyzed ML(Machine Learning) models to predict vehicle FC(Fuel Consumption) in real-time. The test driving was done for a car to measure vehicle speed, acceleration, road gradient and FC for training dataset. The various ML models were trained with feature data of speed, acceleration and road-gradient for target FC. There are two kind of ML models and one is regression type of linear regression and k-nearest neighbors regression and the other is classification type of k-nearest neighbors classifier, logistic regression, decision tree, random forest and gradient boosting in the study. The prediction accuracy is low in range of 0.5 ~ 0.6 for real-time FC and the classification type is more accurate than the regression ones. The prediction error for total FC has very low value of about 0.2 ~ 2.0% and regression models are more accurate than classification ones. It's for the coefficient of determination (R2) of accuracy score distributing predicted values along mean of targets as the coefficient decreases. Therefore regression models are good for total FC and classification ones are proper for real-time FC prediction.

Light-weight Classification Model for Android Malware through the Dimensional Reduction of API Call Sequence using PCA

  • Jeon, Dong-Ha;Lee, Soo-Jin
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.11
    • /
    • pp.123-130
    • /
    • 2022
  • Recently, studies on the detection and classification of Android malware based on API Call sequence have been actively carried out. However, API Call sequence based malware classification has serious limitations such as excessive time and resource consumption in terms of malware analysis and learning model construction due to the vast amount of data and high-dimensional characteristic of features. In this study, we analyzed various classification models such as LightGBM, Random Forest, and k-Nearest Neighbors after significantly reducing the dimension of features using PCA(Principal Component Analysis) for CICAndMal2020 dataset containing vast API Call information. The experimental result shows that PCA significantly reduces the dimension of features while maintaining the characteristics of the original data and achieves efficient malware classification performance. Both binary classification and multi-class classification achieve higher levels of accuracy than previous studies, even if the data characteristics were reduced to less than 1% of the total size.

Study on the Failure Diagnosis of Robot Joints Using Machine Learning (기계학습을 이용한 로봇 관절부 고장진단에 대한 연구)

  • Mi Jin Kim;Kyo Mun Ku;Jae Hong Shim;Hyo Young Kim;Kihyun Kim
    • Journal of the Semiconductor & Display Technology
    • /
    • v.22 no.4
    • /
    • pp.113-118
    • /
    • 2023
  • Maintenance of semiconductor equipment processes is crucial for the continuous growth of the semiconductor market. The process must always be upheld in optimal condition to ensure a smooth supply of numerous parts. Additionally, it is imperative to monitor the status of the robots that play a central role in the process. Just as many senses of organs judge a person's body condition, robots also have numerous sensors that play a role, and like human joints, they can detect the condition first in the joints, which are the driving parts of the robot. Therefore, a normal state test bed and an abnormal state test bed using an aging reducer were constructed by simulating the joint, which is the driving part of the robot. Various sensors such as vibration, torque, encoder, and temperature were attached to accurately diagnose the robot's failure, and the test bed was built with an integrated system to collect and control data simultaneously in real-time. After configuring the user screen and building a database based on the collected data, the characteristic values of normal and abnormal data were analyzed, and machine learning was performed using the KNN (K-Nearest Neighbors) machine learning algorithm. This approach yielded an impressive 94% accuracy in failure diagnosis, underscoring the reliability of both the test bed and the data it produced.

  • PDF

Exploring Time Series Data Information Extraction and Regression using DTW based kNN (DTW 거리 기반 kNN을 활용한 시계열 데이터 정보 추출 및 회귀 예측)

  • Hyeonjun Yang;Chaeguk Lim;Woohyuk Jung;Jihwan Woo
    • Information Systems Review
    • /
    • v.26 no.2
    • /
    • pp.83-93
    • /
    • 2024
  • This study proposes a preprocessing methodology based on Dynamic Time Warping (DTW) and k-Nearest Neighbors (kNN) to effectively represent time series data for predicting the completion quality of electroplating baths. The proposed DTW-based kNN preprocessing approach was applied to various regression models and compared. The results demonstrated a performance improvement of up to 43% in maximum RMSE and 24% in MAE compared to traditional decision tree models. Notably, when integrated with neural network-based regression models, the performance improvements were pronounced. The combined structure of the proposed preprocessing method and regression models appears suitable for situations with long time series data and limited data samples, reducing the risk of overfitting and enabling reasonable predictions even with scarce data. However, as the number of data samples increases, the computational load of the DTW and kNN algorithms also increases, indicating a need for future research to improve computational efficiency.

A Hierarchical Bitmap-based Spatial Index use k-Nearest Neighbor Query Processing on the Wireless Broadcast Environment (무선방송환경에서 계층적 비트맵 기반 공간 색인을 이용한 k-최근접 질의처리)

  • Song, Doo-Hee;Park, Kwang-Jin
    • Journal of the Korea Society of Computer and Information
    • /
    • v.17 no.1
    • /
    • pp.203-209
    • /
    • 2012
  • Recently, k-nearest neighbors query methods based on wireless broadcasting environment are actively studied. The advantage of wireless broadcasting environment is the scalability that enables collective query processing for unspecified users connected to the server. However, in case existing k-NN query is applied in wireless broadcasting environment, there can be a disadvantage that backtracking may occur and consequently the query processing time is increasing. In this paper proposes a hierarchical bitmap-based spatial index in order to efficiently process the k-NN queries in wireless broadcasting environment. HBI reduces the bitmap size using such bitmap information and tree structure. As a result, reducing the broadcast cycle can reduce the client's tuning time and query processing time. In addition, since the locations of all the objects can be detected using bitmap information, it is possible to tune to necessary data selectively. For this paper, a test was conducted implementing HBI to k-NN query and the proposed technique was proved to be excellent by a performance evaluation.

Optimization of Warp-wide CUDA Implementation for Parallel Shifted Sort Algorithm (병렬 Shifted Sort 알고리즘의 Warp 단위 CUDA 구현 최적화)

  • Park, Taejung
    • Journal of Digital Contents Society
    • /
    • v.18 no.4
    • /
    • pp.739-745
    • /
    • 2017
  • This paper presents and discusses an implementation of the GPU shifted sorting method to find approximate k nearest neighbors which executes within "warp", the minimum execution unit in GPU parallel architecture. Also, this paper presents the comparison results with other two common nearest neighbor searching methods, GPU-based kd-tree and ANN (Approximate Nearest Neighbor) library. The proposed implementation focuses on the cases when k is small, i.e. 2, 4, 8, and 16, which are handled efficiently within warp to consider it is very common for applications to handle small k's. Also, this paper discusses optimization ways to implementation by improving memory management in a loop for the CUB open library and adopting CUDA commands which are supported by GPU hardware. The proposed implementation shows more than 16-fold speed-up against GPU-based other methods in the tests, implying that the improvement would become higher for more larger input data.

APMDI-CF: An Effective and Efficient Recommendation Algorithm for Online Users

  • Ya-Jun Leng;Zhi Wang;Dan Peng;Huan Zhang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.11
    • /
    • pp.3050-3063
    • /
    • 2023
  • Recommendation systems provide personalized products or services to online users by mining their past preferences. Collaborative filtering is a popular recommendation technique because it is easy to implement. However, with the rapid growth of the number of users in recommendation systems, collaborative filtering suffers from serious scalability and sparsity problems. To address these problems, a novel collaborative filtering recommendation algorithm is proposed. The proposed algorithm partitions the users using affinity propagation clustering, and searches for k nearest neighbors in the partition where active user belongs, which can reduce the range of searching and improve real-time performance. When predicting the ratings of active user's unrated items, mean deviation method is used to impute values for neighbors' missing ratings, thus the sparsity can be decreased and the recommendation quality can be ensured. Experiments based on two different datasets show that the proposed algorithm is excellent both in terms of real-time performance and recommendation quality.

Generic Training Set based Multimanifold Discriminant Learning for Single Sample Face Recognition

  • Dong, Xiwei;Wu, Fei;Jing, Xiao-Yuan
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.1
    • /
    • pp.368-391
    • /
    • 2018
  • Face recognition (FR) with a single sample per person (SSPP) is common in real-world face recognition applications. In this scenario, it is hard to predict intra-class variations of query samples by gallery samples due to the lack of sufficient training samples. Inspired by the fact that similar faces have similar intra-class variations, we propose a virtual sample generating algorithm called k nearest neighbors based virtual sample generating (kNNVSG) to enrich intra-class variation information for training samples. Furthermore, in order to use the intra-class variation information of the virtual samples generated by kNNVSG algorithm, we propose image set based multimanifold discriminant learning (ISMMDL) algorithm. For ISMMDL algorithm, it learns a projection matrix for each manifold modeled by the local patches of the images of each class, which aims to minimize the margins of intra-manifold and maximize the margins of inter-manifold simultaneously in low-dimensional feature space. Finally, by comprehensively using kNNVSG and ISMMDL algorithms, we propose k nearest neighbor virtual image set based multimanifold discriminant learning (kNNMMDL) approach for single sample face recognition (SSFR) tasks. Experimental results on AR, Multi-PIE and LFW face datasets demonstrate that our approach has promising abilities for SSFR with expression, illumination and disguise variations.