• Title/Summary/Keyword: XGBoost 알고리즘

Search Result 57, Processing Time 0.022 seconds

Anomaly CAN Message Detection Using Heuristics and XGBoost (휴리스틱과 XGBoost 를 활용한 비정상 CAN 메시지 탐지)

  • Se-Rin Kim;Beom-Heon Youn;Hark-Su Cho
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2024.05a
    • /
    • pp.362-363
    • /
    • 2024
  • 최근 자동차의 네트워크화와 연결성이 증가함에 따라, CAN(Controller Area Network) bus 의 설계상 취약점이 보안 위협으로 대두되고 있다. 이에 대응하여 CAN bus 의 취약점을 극복하고 보안을 강화하기 위해 머신러닝을 활용한 침입 탐지 시스템에 대한 연구가 필요하다. 본 논문은 XGBoost 를 활용한 비정상 분류 방법론을 제안한다. 고려대학교 해킹 대응 기술 연구실에서 개발한 데이터 세트를 기반으로 실험을 수행한 결과, 초기 모델의 정확도는 96%였다. 그러나 추가적으로 TimeDiff(발생 간격)과 DataDiff(바이트의 차분 값)을 모델에 통합하면서 정확도가 3% 상승하였다. 본 논문은 향후에 보다 정교한 머신러닝 알고리즘과 데이터 전처리 기법을 적용하여 세밀한 모델을 개발하고, 업체의 CAN Database 를 활용하여 데이터 분석을 보다 정확하게 수행할 계획이다. 이를 통해 보다 신뢰성 높은 자동차 네트워크 보안 시스템을 구축할 수 있을 것으로 기대된다.

  • PDF

API Feature Based Ensemble Model for Malware Family Classification (악성코드 패밀리 분류를 위한 API 특징 기반 앙상블 모델 학습)

  • Lee, Hyunjong;Euh, Seongyul;Hwang, Doosung
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.29 no.3
    • /
    • pp.531-539
    • /
    • 2019
  • This paper proposes the training features for malware family analysis and analyzes the multi-classification performance of ensemble models. We construct training data by extracting API and DLL information from malware executables and use Random Forest and XGBoost algorithms which are based on decision tree. API, API-DLL, and DLL-CM features for malware detection and family classification are proposed by analyzing frequently used API and DLL information from malware and converting high-dimensional features to low-dimensional features. The proposed feature selection method provides the advantages of data dimension reduction and fast learning. In performance comparison, the malware detection rate is 93.0% for Random Forest, the accuracy of malware family dataset is 92.0% for XGBoost, and the false positive rate of malware family dataset including benign is about 3.5% for Random Forest and XGBoost.

Darknet Traffic Detection and Classification Using Gradient Boosting Techniques (Gradient Boosting 기법을 활용한 다크넷 트래픽 탐지 및 분류)

  • Kim, Jihye;Lee, Soo Jin
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.32 no.2
    • /
    • pp.371-379
    • /
    • 2022
  • Darknet is based on the characteristics of anonymity and security, and this leads darknet to be continuously abused for various crimes and illegal activities. Therefore, it is very important to detect and classify darknet traffic to prevent the misuse and abuse of darknet. This work proposes a novel approach, which uses the Gradient Boosting techniques for darknet traffic detection and classification. XGBoost and LightGBM algorithm achieve detection accuracy of 99.99%, and classification accuracy of over 99%, which could get more than 3% higher detection accuracy and over 13% higher classification accuracy, compared to the previous research. In particular, LightGBM algorithm could detect and classify darknet traffic in a way that is superior to XGBoost by reducing the learning time by about 1.6 times and hyperparameter tuning time by more than 10 times.

A Study on Resolving Barriers to Entry into the Resell Market by Exploring and Predicting Price Increases Using the XGBoost Model (XGBoost 모형을 활용한 가격 상승 요인 탐색 및 예측을 통한 리셀 시장 진입 장벽 해소에 관한 연구)

  • Yoon, HyunSeop;Kang, Juyoung
    • The Journal of Society for e-Business Studies
    • /
    • v.26 no.3
    • /
    • pp.155-174
    • /
    • 2021
  • This study noted the emergence of the Resell investment within the fashion market, among emerging investment techniques. Worldwide, the market size is growing rapidly, and currently, there is a craze taking place throughout Korea. Therefore, we would like to use shoe data from StockX, the representative site of Resell, to present basic guidelines to consumers and to break down barriers to entry into the Resell market. Moreover, it showed the current status of the Resell craze, which was based on information from various media outlets, and then presented the current status and research model of the Resell market through prior research. Raw data was collected and analyzed using the XGBoost algorithm and the Prophet model. Analysis showed that the factors that affect the Resell market were identified, and the shoes suitable for the Resell market were also identified. Furthermore, historical data on shoes allowed us to predict future prices, thereby predicting future profitability. Through this study, the market will allow unfamiliar consumers to actively participate in the market with the given information. It also provides a variety of vital information regarding Resell investments, thus. forming a fundamental guideline for the market and further contributing to addressing entry barriers.

Prediction of Track Quality Index (TQI) Using Vehicle Acceleration Data based on Machine Learning (차량가속도데이터를 이용한 머신러닝 기반의 궤도품질지수(TQI) 예측)

  • Choi, Chanyong;Kim, Hunki;Kim, Young Cheul;Kim, Sang-su
    • Journal of the Korean Geosynthetics Society
    • /
    • v.19 no.1
    • /
    • pp.45-53
    • /
    • 2020
  • There is an increasing tendency to try to make predictive analysis using measurement data based on machine learning techniques in the railway industries. In this paper, it was predicted that Track quality index (TQI) using vehicle acceleration data based on the machine learning method. The XGB (XGBoost) was the most accurate with 85% in the all data sets. Unlike the SVM model with a single algorithm, the RF and XGB model with a ensemble system were considered to be good at the prediction performance. In the case of the Surface TQI, it is shown that the acceleration of the z axis is highly related to the vertical direction and is in good agreement with the previous studies. Therefore, it is appropriate to apply the model with the ensemble algorithm to predict the track quality index using the vehicle vibration acceleration data because the accuracy may vary depending on the applied model in the machine learning methods.

Indoor positioning method using WiFi signal based on XGboost (XGboost 기반의 WiFi 신호를 이용한 실내 측위 기법)

  • Hwang, Chi-Gon;Yoon, Chang-Pyo;Kim, Dae-Jin
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.1
    • /
    • pp.70-75
    • /
    • 2022
  • Accurately measuring location is necessary to provide a variety of services. The data for indoor positioning measures the RSSI values from the WiFi device through an application of a smartphone. The measured data becomes the raw data of machine learning. The feature data is the measured RSSI value, and the label is the name of the space for the measured position. For this purpose, the machine learning technique is to study a technique that predicts the exact location only with the WiFi signal by applying an efficient technique to classification. Ensemble is a technique for obtaining more accurate predictions through various models than one model, including backing and boosting. Among them, Boosting is a technique for adjusting the weight of a model through a modeling result based on sampled data, and there are various algorithms. This study uses Xgboost among the above techniques and evaluates performance with other ensemble techniques.

Comparison of Machine Learning Model Performance based on Observation Methods using Naked-eye and Visibility-meter (머신러닝을 이용한 안개 예측 시 목측과 시정계 계측 방법에 따른 모델 성능 차이 비교)

  • Changhyoun Park;Soon-hwan Lee
    • Journal of the Korean earth science society
    • /
    • v.44 no.2
    • /
    • pp.105-118
    • /
    • 2023
  • In this study, we predicted the presence of fog with a one-hour delay using the XGBoost DART machine learning algorithm for Andong, which had the highest occurrence of fog among inland stations from 2016 to 2020. We used six datasets: meteorological data, agricultural observation data, additional derived data, and their expanded data. The weather phenomenon numbers obtained through naked-eye observations and the visibility distances measured by visibility meters were classified as fog [1] or no-fog [0]. We set up twelve machine learning modeling experiments and used data from 2021 for model validation. We mainly evaluated model performance using recall and AUC-ROC, considering the harmful effects of fog on society and local communities. The combination of oversampled meteorological data features and the target induced by weather phenomenon numbers showed the best performance. This result highlights the importance of naked-eye observations in predicting fog using machine learning algorithms.

Development of machine learning model for reefer container failure determination and cause analysis with unbalanced data (불균형 데이터를 갖는 냉동 컨테이너 고장 판별 및 원인 분석을 위한 기계학습 모형 개발)

  • Lee, Huiwon;Park, Sungho;Lee, Seunghyun;Lee, Seungjae;Lee, Kangbae
    • Journal of the Korea Convergence Society
    • /
    • v.13 no.1
    • /
    • pp.23-30
    • /
    • 2022
  • The failure of the reefer container causes a great loss of cost, but the current reefer container alarm system is inefficient. Existing studies using simulation data of refrigeration systems exist, but studies using actual operation data of refrigeration containers are lacking. Therefore, this study classified the causes of failure using actual refrigerated container operation data. Data imbalance occurred in the actual data, and the data imbalance problem was solved by comparing the logistic regression analysis with ENN-SMOTE and class weight with the 2-stage algorithm developed in this study. The 2-stage algorithm uses XGboost, LGBoost, and DNN to classify faults and normalities in the first step, and to classify the causes of faults in the second step. The model using LGBoost in the 2-stage algorithm was the best with 99.16% accuracy. This study proposes a final model using a two-stage algorithm to solve data imbalance, which is thought to be applicable to other industries.

A Study on the Real-Time Risk Analysis of Heavy-Snow according to the Characteristics of Traffic and Area (교통과 지역의 특성에 따른 대설의 실시간 피해 위험도 분석 연구)

  • KwangRim, Ha;YongCheol, Jung;JinYoung, Yoo;JunHee, Lee
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.27 no.6
    • /
    • pp.77-93
    • /
    • 2022
  • In this study, we present an algorithm that analyzes the risk by reflecting regional characteristics for factors affected by direct and indirect damage from heavy-snow. Factors affected by heavy-snow damage by 29 regions are selected as influencing variables, and the concept of sensitivity is derived through the relationship with the amount of damage. A snow damage risk prediction model was developed using a machine learning (XGBoost) algorithm by setting weather conditions (snow cover, humidity, temperature) and sensitivity as independent variables, and setting the risk derived according to changes in the independent variables as dependent variables.

A Study on the Prediction of Apartment Sale Price Using Machine Learning : Focused on the Collection of Internal and External Data and Price Prediction of Korean Apartments (기계학습을 이용한 아파트 매매가격 예측 연구 : 한국 아파트의 내·외적 데이터 수집과 가격 예측 중심으로)

  • Ju, Jeong-Min;Kang, Sun-Mee;Choi, Ji-Wung;Han, Youngwoo
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2020.11a
    • /
    • pp.956-959
    • /
    • 2020
  • 본 연구에서는 아파트를 대표할 수 있는 내·외적 데이터를 수집하고 인공지능 기술들을 활용하여 아파트 가격을 예측하는 시스템을 구축하고자 한다. 구체적으로 웹크롤링 기법을 통해 수집한 아파트 내·외적 데이터의 변수들에 대한 특성 선택(Feature Selection)을 수행하였고, 다양한 인공지능 기법을 활용하여 부동산 가격 예측 모형을 개발하였다. 아파트 가격 예측 모형 생성을 위해 Linear Regression, Ridge, Xgboost, Lightgbm, Catboost 등의 기계학습 알고리즘을 사용하였고, RMSE를 사용하여 각 예측 모형 간의 성능 비교를 수행하였다. 가장 성능이 좋은 예측 모형은 Xgboost기반 예측 모형이였으며, RMSE값이 약 0.0366으로 가장 낮았으며 테스트 데이터에 대한 정확도는 약 95.1%였다.