• Title/Summary/Keyword: Random Forest Algorithm

Search Result 222, Processing Time 0.028 seconds

Machine Learning Based Intrusion Detection Systems for Class Imbalanced Datasets (클래스 불균형 데이터에 적합한 기계 학습 기반 침입 탐지 시스템)

  • Cheong, Yun-Gyung;Park, Kinam;Kim, Hyunjoo;Kim, Jonghyun;Hyun, Sangwon
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.27 no.6
    • /
    • pp.1385-1395
    • /
    • 2017
  • This paper aims to develop an IDS (Intrusion Detection System) that takes into account class imbalanced datasets. For this, we first built a set of training data sets from the Kyoto 2006+ dataset in which the amounts of normal data and abnormal (intrusion) data are not balanced. Then, we have run a number of tests to evaluate the effectiveness of machine learning techniques for detecting intrusions. Our evaluation results demonstrated that the Random Forest algorithm achieved the best performances.

Prediction of dynamic soil properties coupled with machine learning algorithms

  • Dae-Hong Min;Hyung-Koo Yoon
    • Geomechanics and Engineering
    • /
    • v.37 no.3
    • /
    • pp.253-262
    • /
    • 2024
  • Dynamic properties are pivotal in soil analysis, yet their experimental determination is hampered by complex methodologies and the need for costly equipment. This study aims to predict dynamic soil properties using static properties that are relatively easier to obtain, employing machine learning techniques. The static properties considered include soil cohesion, friction angle, water content, specific gravity, and compressional strength. In contrast, the dynamic properties of interest are the velocities of compressional and shear waves. Data for this study are sourced from 26 boreholes, as detailed in a geotechnical investigation report database, comprising a total of 130 data points. An importance analysis, grounded in the random forest algorithm, is conducted to evaluate the significance of each dynamic property. This analysis informs the prediction of dynamic properties, prioritizing those static properties identified as most influential. The efficacy of these predictions is quantified using the coefficient of determination, which indicated exceptionally high reliability, with values reaching 0.99 in both training and testing phases when all input properties are considered. The conventional method is used for predicting dynamic properties through Standard Penetration Test (SPT) and compared the outcomes with this technique. The error ratio has decreased by approximately 0.95, thereby validating its reliability. This research marks a significant advancement in the indirect estimation of the relationship between static and dynamic soil properties through the application of machine learning techniques.

Crowdfunding Scams: The Profiles and Language of Deceivers

  • Lee, Seung-hun;Kim, Hyun-chul
    • Journal of the Korea Society of Computer and Information
    • /
    • v.23 no.3
    • /
    • pp.55-62
    • /
    • 2018
  • In this paper, we propose a model to detect crowdfunding scams, which have been reportedly occurring over the last several years, based on their project information and linguistic features. To this end, we first collect and analyze crowdfunding scam projects, and then reveal which specific project-related information and linguistic features are particularly useful in distinguishing scam projects from non-scams. Our proposed model built with the selected features and Random Forest machine learning algorithm can successfully detect scam campaigns with 84.46% accuracy.

Contrast Media Side Effects Prediction Study using Artificial Intelligence Technique (인공지능 기법을 이용한 조영제 부작용 예측 연구)

  • Sang-Hyun Kim
    • Journal of the Korean Society of Radiology
    • /
    • v.17 no.3
    • /
    • pp.423-431
    • /
    • 2023
  • The purpose of this study is to analyze the factors affecting the classification of the severity of contrast media side effects based on the patient's body information using artificial intelligence techniques to be used as basic data to reduce the degree of contrast medium side effects. The data used in this study were 606 examiners who had no contrast medium side effects in the past history survey among 1,235 cases of contrast medium side effects among 58,000 CT scans performed at a general hospital in Seoul. The total data is 606, of which 70% was used as a training set and the remaining 30% was used as a test set for validation. Age, BMI(Body Mass Index), GFR(Glomerular Filtration Rate), BUN(Blood Urea Nitrogen), GGT(Gamma Glutamyl Transgerase), AST(Aspartate Amino Transferase,), and ALT(Alanine Amiono Transferase) features were used as independent variables, and contrast media severity was used as a target variable. AUC(Area under curve), CA(Classification Accuracy), F1, Precision, and Recall were identified through AdaBoost, Tree, Neural network, SVM, and Random foest algorithm. AdaBoost and Random Forest show the highest evaluation index in the classification prediction algorithm. The largest factors in the predictions of all models were GFR, BMI, and GGT. It was found that the difference in the amount of contrast media injected according to renal filtration function and obesity, and the presence or absence of metabolic syndrome affected the severity of contrast medium side effects.

Hybrid genetic-paired-permutation algorithm for improved VLSI placement

  • Ignatyev, Vladimir V.;Kovalev, Andrey V.;Spiridonov, Oleg B.;Kureychik, Viktor M.;Ignatyeva, Alexandra S.;Safronenkova, Irina B.
    • ETRI Journal
    • /
    • v.43 no.2
    • /
    • pp.260-271
    • /
    • 2021
  • This paper addresses Very large-scale integration (VLSI) placement optimization, which is important because of the rapid development of VLSI design technologies. The goal of this study is to develop a hybrid algorithm for VLSI placement. The proposed algorithm includes a sequential combination of a genetic algorithm and an evolutionary algorithm. It is commonly known that local search algorithms, such as random forest, hill climbing, and variable neighborhoods, can be effectively applied to NP-hard problem-solving. They provide improved solutions, which are obtained after a global search. The scientific novelty of this research is based on the development of systems, principles, and methods for creating a hybrid (combined) placement algorithm. The principal difference in the proposed algorithm is that it obtains a set of alternative solutions in parallel and then selects the best one. Nonstandard genetic operators, based on problem knowledge, are used in the proposed algorithm. An investigational study shows an objective-function improvement of 13%. The time complexity of the hybrid placement algorithm is O(N2).

Vehicle Headlight and Taillight Recognition in Nighttime using Low-Exposure Camera and Wavelet-based Random Forest (저노출 카메라와 웨이블릿 기반 랜덤 포레스트를 이용한 야간 자동차 전조등 및 후미등 인식)

  • Heo, Duyoung;Kim, Sang Jun;Kwak, Choong Sub;Nam, Jae-Yeal;Ko, Byoung Chul
    • Journal of Broadcast Engineering
    • /
    • v.22 no.3
    • /
    • pp.282-294
    • /
    • 2017
  • In this paper, we propose a novel intelligent headlight control (IHC) system which is durable to various road lights and camera movement caused by vehicle driving. For detecting candidate light blobs, the region of interest (ROI) is decided as front ROI (FROI) and back ROI (BROI) by considering the camera geometry based on perspective range estimation model. Then, light blobs such as headlights, taillights of vehicles, reflection light as well as the surrounding road lighting are segmented using two different adaptive thresholding. From the number of segmented blobs, taillights are first detected using the redness checking and random forest classifier based on Haar-like feature. For the headlight and taillight classification, we use the random forest instead of popular support vector machine or convolutional neural networks for supporting fast learning and testing in real-life applications. Pairing is performed by using the predefined geometric rules, such as vertical coordinate similarity and association check between blobs. The proposed algorithm was successfully applied to various driving sequences in night-time, and the results show that the performance of the proposed algorithms is better than that of recent related works.

Prediction of Customer Satisfaction Using RFE-SHAP Feature Selection Method (RFE-SHAP을 활용한 온라인 리뷰를 통한 고객 만족도 예측)

  • Olga Chernyaeva;Taeho Hong
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.4
    • /
    • pp.325-345
    • /
    • 2023
  • In the rapidly evolving domain of e-commerce, our study presents a cohesive approach to enhance customer satisfaction prediction from online reviews, aligning methodological innovation with practical insights. We integrate the RFE-SHAP feature selection with LDA topic modeling to streamline predictive analytics in e-commerce. This integration facilitates the identification of key features-specifically, narrowing down from an initial set of 28 to an optimal subset of 14 features for the Random Forest algorithm. Our approach strategically mitigates the common issue of overfitting in models with an excess of features, leading to an improved accuracy rate of 84% in our Random Forest model. Central to our analysis is the understanding that certain aspects in review content, such as quality, fit, and durability, play a pivotal role in influencing customer satisfaction, especially in the clothing sector. We delve into explaining how each of these selected features impacts customer satisfaction, providing a comprehensive view of the elements most appreciated by customers. Our research makes significant contributions in two key areas. First, it enhances predictive modeling within the realm of e-commerce analytics by introducing a streamlined, feature-centric approach. This refinement in methodology not only bolsters the accuracy of customer satisfaction predictions but also sets a new standard for handling feature selection in predictive models. Second, the study provides actionable insights for e-commerce platforms, especially those in the clothing sector. By highlighting which aspects of customer reviews-like quality, fit, and durability-most influence satisfaction, we offer a strategic direction for businesses to tailor their products and services.

Application study of random forest method based on Sentinel-2 imagery for surface cover classification in rivers - A case of Naeseong Stream - (하천 내 지표 피복 분류를 위한 Sentinel-2 영상 기반 랜덤 포레스트 기법의 적용성 연구 - 내성천을 사례로 -)

  • An, Seonggi;Lee, Chanjoo;Kim, Yongmin;Choi, Hun
    • Journal of Korea Water Resources Association
    • /
    • v.57 no.5
    • /
    • pp.321-332
    • /
    • 2024
  • Understanding the status of surface cover in riparian zones is essential for river management and flood disaster prevention. Traditional survey methods rely on expert interpretation of vegetation through vegetation mapping or indices. However, these methods are limited by their ability to accurately reflect dynamically changing river environments. Against this backdrop, this study utilized satellite imagery to apply the Random Forest method to assess the distribution of vegetation in rivers over multiple years, focusing on the Naeseong Stream as a case study. Remote sensing data from Sentinel-2 imagery were combined with ground truth data from the Naeseong Stream surface cover in 2016. The Random Forest machine learning algorithm was used to extract and train 1,000 samples per surface cover from ten predetermined sampling areas, followed by validation. A sensitivity analysis, annual surface cover analysis, and accuracy assessment were conducted to evaluate their applicability. The results showed an accuracy of 85.1% based on the validation data. Sensitivity analysis indicated the highest efficiency in 30 trees, 800 samples, and the downstream river section. Surface cover analysis accurately reflects the actual river environment. The accuracy analysis identified 14.9% boundary and internal errors, with high accuracy observed in six categories, excluding scattered and herbaceous vegetation. Although this study focused on a single river, applying the surface cover classification method to multiple rivers is necessary to obtain more accurate and comprehensive data.

Evaluation of Hemiplegic Gait Using Accelerometer (가속도센서를 이용한 편마비성보행 평가)

  • Lee, Jun Seok;Park, Sooji;Shin, Hangsik
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.66 no.11
    • /
    • pp.1634-1640
    • /
    • 2017
  • The study aims to distinguish hemiplegic gait and normal gait using simple wearable device and classification algorithm. Thus, we developed a wearable system equipped three axis accelerometer and three axis gyroscope. The developed wearable system was verified by clinical experiment. In experiment, twenty one normal subjects and twenty one patients undergoing stroke treatment were participated. Based on the measured inertial signal, a random forest algorithm was used to classify hemiplegic gait. Four-fold cross validation was applied to ensure the reliability of the results. To select optimal attributes, we applied the forward search algorithm with 10 times of repetition, then selected five most frequently attributes were chosen as a final attribute. The results of this study showed that 95.2% of accuracy in hemiplegic gait and normal gait classification and 77.4% of accuracy in hemiplegic-side and normal gait classification.

Electrical fire prediction model study using machine learning (기계학습을 통한 전기화재 예측모델 연구)

  • Ko, Kyeong-Seok;Hwang, Dong-Hyun;Park, Sang-June;Moon, Ga-Gyeong
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.11 no.6
    • /
    • pp.703-710
    • /
    • 2018
  • Although various efforts have been made every year to reduce electric fire accidents such as accident analysis and inspection for electric fire accidents, there is no effective countermeasure due to lack of effective decision support system and existing cumulative data utilization method. The purpose of this study is to develop an algorithm for predicting electric fire based on data such as electric safety inspection data, electric fire accident information, building information, and weather information. Through the pre-processing of collected data for each institution such as Korea Electrical Safety Corporation, Meteorological Administration, Ministry of Land, Infrastructure, and Transport, Fire Defense Headquarters, convergence, analysis, modeling, and verification process, we derive the factors influencing electric fire and develop prediction models. The results showed insulation resistance value, humidity, wind speed, building deterioration(aging), floor space ratio, building coverage ratio and building use. The accuracy of prediction model using random forest algorithm was 74.7%.