• Title/Summary/Keyword: k-Nearest Neighbor Classification

Search Result 182, Processing Time 0.032 seconds

Classification of Soil Creep Hazard Class Using Machine Learning (기계학습기법을 이용한 땅밀림 위험등급 분류)

  • Lee, Gi Ha;Le, Xuan-Hien;Yeon, Min Ho;Seo, Jun Pyo;Lee, Chang Woo
    • Journal of Korean Society of Disaster and Security
    • /
    • v.14 no.3
    • /
    • pp.17-27
    • /
    • 2021
  • In this study, classification models were built using machine learning techniques that can classify the soil creep risk into three classes from A to C (A: risk, B: moderate, C: good). A total of six machine learning techniques were used: K-Nearest Neighbor, Support Vector Machine, Logistic Regression, Decision Tree, Random Forest, and Extreme Gradient Boosting and then their classification accuracy was analyzed using the nationwide soil creep field survey data in 2019 and 2020. As a result of classification accuracy analysis, all six methods showed excellent accuracy of 0.9 or more. The methods where numerical data were applied for data training showed better performance than the methods based on character data of field survey evaluation table. Moreover, the methods learned with the data group (R1~R4) reflecting the expert opinion had higher accuracy than the field survey evaluation score data group (C1~C4). The machine learning can be used as a tool for prediction of soil creep if high-quality data are continuously secured and updated in the future.

A Method of Highspeed Similarity Retrieval based on Self-Organizing Maps (자기 조직화 맵 기반 유사화상 검색의 고속화 수법)

  • Oh, Kun-Seok;Yang, Sung-Ki;Bae, Sang-Hyun;Kim, Pan-Koo
    • The KIPS Transactions:PartB
    • /
    • v.8B no.5
    • /
    • pp.515-522
    • /
    • 2001
  • Feature-based similarity retrieval become an important research issue in image database systems. The features of image data are useful to discrimination of images. In this paper, we propose the highspeed k-Nearest Neighbor search algorithm based on Self-Organizing Maps. Self-Organizing Map(SOM) provides a mapping from high dimensional feature vectors onto a two-dimensional space. A topological feature map preserves the mutual relations (similarity) in feature spaces of input data, and clusters mutually similar feature vectors in a neighboring nodes. Each node of the topological feature map holds a node vector and similar images that is closest to each node vector. We implemented about k-NN search for similar image classification as to (1) access to topological feature map, and (2) apply to pruning strategy of high speed search. We experiment on the performance of our algorithm using color feature vectors extracted from images. Promising results have been obtained in experiments.

  • PDF

Analyzing Key Variables in Network Attack Classification on NSL-KDD Dataset using SHAP (SHAP 기반 NSL-KDD 네트워크 공격 분류의 주요 변수 분석)

  • Sang-duk Lee;Dae-gyu Kim;Chang Soo Kim
    • Journal of the Society of Disaster Information
    • /
    • v.19 no.4
    • /
    • pp.924-935
    • /
    • 2023
  • Purpose: The central aim of this study is to leverage machine learning techniques for the classification of Intrusion Detection System (IDS) data, with a specific focus on identifying the variables responsible for enhancing overall performance. Method: First, we classified 'R2L(Remote to Local)' and 'U2R (User to Root)' attacks in the NSL-KDD dataset, which are difficult to detect due to class imbalance, using seven machine learning models, including Logistic Regression (LR) and K-Nearest Neighbor (KNN). Next, we use the SHapley Additive exPlanation (SHAP) for two classification models that showed high performance, Random Forest (RF) and Light Gradient-Boosting Machine (LGBM), to check the importance of variables that affect classification for each model. Result: In the case of RF, the 'service' variable and in the case of LGBM, the 'dst_host_srv_count' variable were confirmed to be the most important variables. These pivotal variables serve as key factors capable of enhancing performance in the context of classification for each respective model. Conclusion: In conclusion, this paper successfully identifies the optimal models, RF and LGBM, for classifying 'R2L' and 'U2R' attacks, while elucidating the crucial variables associated with each selected model.

Human activity recognition with analysis of angles between skeletal joints using a RGB-depth sensor

  • Ince, Omer Faruk;Ince, Ibrahim Furkan;Yildirim, Mustafa Eren;Park, Jang Sik;Song, Jong Kwan;Yoon, Byung Woo
    • ETRI Journal
    • /
    • v.42 no.1
    • /
    • pp.78-89
    • /
    • 2020
  • Human activity recognition (HAR) has become effective as a computer vision tool for video surveillance systems. In this paper, a novel biometric system that can detect human activities in 3D space is proposed. In order to implement HAR, joint angles obtained using an RGB-depth sensor are used as features. Because HAR is operated in the time domain, angle information is stored using the sliding kernel method. Haar-wavelet transform (HWT) is applied to preserve the information of the features before reducing the data dimension. Dimension reduction using an averaging algorithm is also applied to decrease the computational cost, which provides faster performance while maintaining high accuracy. Before the classification, a proposed thresholding method with inverse HWT is conducted to extract the final feature set. Finally, the K-nearest neighbor (k-NN) algorithm is used to recognize the activity with respect to the given data. The method compares favorably with the results using other machine learning algorithms.

Development of Freeway Traffic Incident Clearance Time Prediction Model by Accident Level (사고등급별 고속도로 교통사고 처리시간 예측모형 개발)

  • LEE, Soong-bong;HAN, Dong Hee;LEE, Young-Ihn
    • Journal of Korean Society of Transportation
    • /
    • v.33 no.5
    • /
    • pp.497-507
    • /
    • 2015
  • Nonrecurrent congestion of freeway was primarily caused by incident. The main cause of incident was known as a traffic accident. Therefore, accurate prediction of traffic incident clearance time is very important in accident management. Traffic accident data on freeway during year 2008 to year 2014 period were analyzed for this study. KNN(K-Nearest Neighbor) algorithm was hired for developing incident clearance time prediction model with the historical traffic accident data. Analysis result of accident data explains the level of accident significantly affect on the incident clearance time. For this reason, incident clearance time was categorized by accident level. Data were sorted by classification of traffic volume, number of lanes and time periods to consider traffic conditions and roadway geometry. Factors affecting incident clearance time were analyzed from the extracted data for identifying similar types of accident. Lastly, weight of detail factors was calculated in order to measure distance metric. Weight was calculated with applying standard method of normal distribution, then incident clearance time was predicted. Prediction result of model showed a lower prediction error(MAPE) than models of previous studies. The improve model developed in this study is expected to contribute to the efficient highway operation management when incident occurs.

SOMk-NN Search Algorithm for Content-Based Retrieval (내용기반 검색을 위한 SOMk-NN탐색 알고리즘)

  • O, Gun-Seok;Kim, Pan-Gu
    • Journal of KIISE:Databases
    • /
    • v.29 no.5
    • /
    • pp.358-366
    • /
    • 2002
  • Feature-based similarity retrieval become an important research issue in image database systems. The features of image data are useful to discrimination of images. In this paper, we propose the high speed k-Nearest Neighbor search algorithm based on Self-Organizing Maps. Self-Organizing Maps(SOM) provides a mapping from high dimensional feature vectors onto a two-dimensional space and generates a topological feature map. A topological feature map preserves the mutual relations (similarities) in feature spaces of input data, and clusters mutually similar feature vectors in a neighboring nodes. Therefore each node of the topological feature map holds a node vector and similar images that is closest to each node vector. We implemented a k-NN search for similar image classification as to (1) access to topological feature map, and (2) apply to pruning strategy of high speed search. We experiment on the performance of our algorithm using color feature vectors extracted from images. Promising results have been obtained in experiments.

A Study on Performance of ML Algorithms and Feature Extraction to detect Malware (멀웨어 검출을 위한 기계학습 알고리즘과 특징 추출에 대한 성능연구)

  • Ahn, Tae-Hyun;Park, Jae-Gyun;Kwon, Young-Man
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.18 no.1
    • /
    • pp.211-216
    • /
    • 2018
  • In this paper, we studied the way that classify whether unknown PE file is malware or not. In the classification problem of malware detection domain, feature extraction and classifier are important. For that purpose, we studied what the feature is good for classifier and the which classifier is good for the selected feature. So, we try to find the good combination of feature and classifier for detecting malware. For it, we did experiments at two step. In step one, we compared the accuracy of features using Opcode only, Win. API only, the one with both. We founded that the feature, Opcode and Win. API, is better than others. In step two, we compared AUC value of classifiers, Bernoulli Naïve Bayes, K-nearest neighbor, Support Vector Machine and Decision Tree. We founded that Decision Tree is better than others.

Biometrics Based on Multi-View Features of Teeth Using Principal Component Analysis (주성분분석을 이용한 치아의 다면 특징 기반 생체식별)

  • Chang, Chan-Wuk;Kim, Myung-Su;Shin, Young-Suk
    • Korean Journal of Cognitive Science
    • /
    • v.18 no.4
    • /
    • pp.445-455
    • /
    • 2007
  • We present a new biometric identification system based on multi-view features of teeth using principal components analysis(PCA). The multi-view features of teeth consist of the frontal view, the left side view and the right side view. In this paper, we try to stan the foundations of a dental biometrics for secure access in real life environment. We took the pictures of the three views teeth in the experimental environment designed specially and 42 principal components as the features for individual identification were developed. The classification for individual identification based on the nearest neighbor(NN) algorithm is created with the distance between the multi-view teeth and the multi-view teeth rotated. The identification performance after rotating two degree of test data is 95.2% on the left side view teeth and 91.3% on the right side view teeth as the average values.

  • PDF

Machine Learning Based Automatic Categorization Model for Text Lines in Invoice Documents

  • Shin, Hyun-Kyung
    • Journal of Korea Multimedia Society
    • /
    • v.13 no.12
    • /
    • pp.1786-1797
    • /
    • 2010
  • Automatic understanding of contents in document image is a very hard problem due to involvement with mathematically challenging problems originated mainly from the over-determined system induced by document segmentation process. In both academic and industrial areas, there have been incessant and various efforts to improve core parts of content retrieval technologies by the means of separating out segmentation related issues using semi-structured document, e.g., invoice,. In this paper we proposed classification models for text lines on invoice document in which text lines were clustered into the five categories in accordance with their contents: purchase order header, invoice header, summary header, surcharge header, purchase items. Our investigation was concentrated on the performance of machine learning based models in aspect of linear-discriminant-analysis (LDA) and non-LDA (logic based). In the group of LDA, na$\"{\i}$ve baysian, k-nearest neighbor, and SVM were used, in the group of non LDA, decision tree, random forest, and boost were used. We described the details of feature vector construction and the selection processes of the model and the parameter including training and validation. We also presented the experimental results of comparison on training/classification error levels for the models employed.

Statistical Approach to Noisy Band Removal for Enhancement of HIRIS Image Classification

  • Huan, Nguyen Van;Kim, Hak-Il
    • Proceedings of the KSRS Conference
    • /
    • 2008.03a
    • /
    • pp.195-200
    • /
    • 2008
  • The accuracy of classifying pixels in HIRIS images is usually degraded by noisy bands since noisy bands may deform the typical shape of spectral reflectance. Proposed in this paper is a statistical method for noisy band removal which mainly makes use of the correlation coefficients between bands. Considering each band as a random variable, the correlation coefficient measures the strength and direction of a linear relationship between two random variables. While the correlation between two signal bands is high, existence of a noisy band will produce a low correlation due to ill-correlativeness and undirectedness. The application of the correlation coefficient as a measure for detecting noisy bands is under a two-pass screening scheme. This method is independent of the prior knowledge of the sensor or the cause resulted in the noise. The classification in this experiment uses the unsupervised k-nearest neighbor algorithm in accordance with the well-accepted Euclidean distance measure and the spectral angle mapper measure. This paper also proposes a hierarchical combination of these measures for spectral matching. Finally, a separability assessment based on the between-class and within-class scatter matrices is followed to evaluate the performance.

  • PDF