• Title/Summary/Keyword: Random Forest Classification

Search Result 311, Processing Time 0.031 seconds

Machine Learning-Based Rapid Prediction Method of Failure Mode for Reinforced Concrete Column (기계학습 기반 철근콘크리트 기둥에 대한 신속 파괴유형 예측 모델 개발 연구)

  • Kim, Subin;Oh, Keunyeong;Shin, Jiuk
    • Journal of the Earthquake Engineering Society of Korea
    • /
    • v.28 no.2
    • /
    • pp.113-119
    • /
    • 2024
  • Existing reinforced concrete buildings with seismically deficient column details affect the overall behavior depending on the failure type of column. This study aims to develop and validate a machine learning-based prediction model for the column failure modes (shear, flexure-shear, and flexure failure modes). For this purpose, artificial neural network (ANN), K-nearest neighbor (KNN), decision tree (DT), and random forest (RF) models were used, considering previously collected experimental data. Using four machine learning methodologies, we developed a classification learning model that can predict the column failure modes in terms of the input variables using concrete compressive strength, steel yield strength, axial load ratio, height-to-dept aspect ratio, longitudinal reinforcement ratio, and transverse reinforcement ratio. The performance of each machine learning model was compared and verified by calculating accuracy, precision, recall, F1-Score, and ROC. Based on the performance measurements of the classification model, the RF model represents the highest average value of the classification model performance measurements among the considered learning methods, and it can conservatively predict the shear failure mode. Thus, the RF model can rapidly predict the column failure modes with simple column details.

Classification Analysis for Unbalanced Data (불균형 자료에 대한 분류분석)

  • Kim, Dongah;Kang, Suyeon;Song, Jongwoo
    • The Korean Journal of Applied Statistics
    • /
    • v.28 no.3
    • /
    • pp.495-509
    • /
    • 2015
  • We study a classification problem of significant differences in the proportion of two groups known as the unbalanced classification problem. It is usually more difficult to classify classes accurately in unbalanced data than balanced data. Most observations are likely to be classified to the bigger group if we apply classification methods to the unbalanced data because it can minimize the misclassification loss. However, this smaller group is misclassified as the larger group problem that can cause a bigger loss in most real applications. We compare several classification methods for the unbalanced data using sampling techniques (up and down sampling). We also check the total loss of different classification methods when the asymmetric loss is applied to simulated and real data. We use the misclassification rate, G-mean, ROC and AUC (area under the curve) for the performance comparison.

An Application of Support Vector Machines to Customer Loyalty Classification of Korean Retailing Company Using R Language

  • Nguyen, Phu-Thien;Lee, Young-Chan
    • The Journal of Information Systems
    • /
    • v.26 no.4
    • /
    • pp.17-37
    • /
    • 2017
  • Purpose Customer Loyalty is the most important factor of customer relationship management (CRM). Especially in retailing industry, where customers have many options of where to spend their money. Classifying loyal customers through customers' data can help retailing companies build more efficient marketing strategies and gain competitive advantages. This study aims to construct classification models of distinguishing the loyal customers within a Korean retailing company using data mining techniques with R language. Design/methodology/approach In order to classify retailing customers, we used combination of support vector machines (SVMs) and other classification algorithms of machine learning (ML) with the support of recursive feature elimination (RFE). In particular, we first clean the dataset to remove outlier and impute the missing value. Then we used a RFE framework for electing most significant predictors. Finally, we construct models with classification algorithms, tune the best parameters and compare the performances among them. Findings The results reveal that ML classification techniques can work well with CRM data in Korean retailing industry. Moreover, customer loyalty is impacted by not only unique factor such as net promoter score but also other purchase habits such as expensive goods preferring or multi-branch visiting and so on. We also prove that with retailing customer's dataset the model constructed by SVMs algorithm has given better performance than others. We expect that the models in this study can be used by other retailing companies to classify their customers, then they can focus on giving services to these potential vip group. We also hope that the results of this ML algorithm using R language could be useful to other researchers for selecting appropriate ML algorithms.

Analysis of Feature Importance of Ship's Berthing Velocity Using Classification Algorithms of Machine Learning (머신러닝 분류 알고리즘을 활용한 선박 접안속도 영향요소의 중요도 분석)

  • Lee, Hyeong-Tak;Lee, Sang-Won;Cho, Jang-Won;Cho, Ik-Soon
    • Journal of the Korean Society of Marine Environment & Safety
    • /
    • v.26 no.2
    • /
    • pp.139-148
    • /
    • 2020
  • The most important factor affecting the berthing energy generated when a ship berths is the berthing velocity. Thus, an accident may occur if the berthing velocity is extremely high. Several ship features influence the determination of the berthing velocity. However, previous studies have mostly focused on the size of the vessel. Therefore, the aim of this study is to analyze various features that influence berthing velocity and determine their respective importance. The data used in the analysis was based on the berthing velocity of a ship on a jetty in Korea. Using the collected data, machine learning classification algorithms were compared and analyzed, such as decision tree, random forest, logistic regression, and perceptron. As an algorithm evaluation method, indexes according to the confusion matrix were used. Consequently, perceptron demonstrated the best performance, and the feature importance was in the following order: DWT, jetty number, and state. Hence, when berthing a ship, the berthing velocity should be determined in consideration of various features, such as the size of the ship, position of the jetty, and loading condition of the cargo.

Identification of Bird Community Characteristics by Habitat Environment of Jeongmaek Using Self-organizing Map - Case Stuty Area Geumnamhonam and Honam, Hannamgeumbuk and Geumbuk, Naknam Jeongmaek, South Korea - (자기조직화지도를 활용한 정맥의 서식지 환경에 따른 조류 군집 특성 파악 - 금남호남 및 호남정맥, 한남금북 및 금북정맥, 낙남정맥을 대상으로 -)

  • Hwang, Jong-Kyeong;Kang, Te-han;Han, Seung-Woo;Cho, Hae-Jin;Nam, Hyung-Kyu;Kim, Su-Jin;Lee, Joon-Woo
    • Korean Journal of Environment and Ecology
    • /
    • v.35 no.4
    • /
    • pp.377-386
    • /
    • 2021
  • This study was conducted to provide basic data for habitat management and preservation of Jeongmaek. A total of 18 priority research areas were selected with consideration to terrain and habitat environment, and 54 fixed plots were selected for three types of habits: development, valley, and forest road and ridge. The survey was conducted in each season (May, August, and October), excluding the winter season, from 2016 to 2018. The distribution analysis of birds observed in each habitat type using a self-organizing map (SOM) classified them into a total of four groups (MRPP, A=0.12, and p <0.005). The comparative analysis of the number of species, the number of individuals, and the species diversity index for each SOM group showed that they were all the highest in group III (Kruskal-Wallis, the number species: x2 = 13.436, P <0.005; the number of individuals: x2 = 8.229, P <0.05; the species diversity index: x2 = 17.115, P <0.005). Moreover, the analysis by applying the land cover map to the random forest model to examine the index species of each group and identify the characteristics of the habitat environment showed a difference in the ratio of the habitat environment and the indicator species among the four groups. The index species analysis identified a total of 18 bird species as the indicator species in three groups except for group II. When applying the random forest model and indicator species analysis to the results of classification into four groups using the SOM, the composition of the indicator species by the group showed a correlation with the habitat characteristics of each group. Moreover, the distribution patterns and densities of observed species were clearly distinguished according to the dominant habitat for each group. The results of the analysis that applied the SOM, indicator species, and random forest model together can derive useful results for the characterization of bird habitats according to the habitat environment.

Data Mining-Aided Automatic Landslide Detection Using Airborne Laser Scanning Data in Densely Forested Tropical Areas

  • Mezaal, Mustafa Ridha;Pradhan, Biswajeet
    • Korean Journal of Remote Sensing
    • /
    • v.34 no.1
    • /
    • pp.45-74
    • /
    • 2018
  • Landslide is a natural hazard that threats lives and properties in many areas around the world. Landslides are difficult to recognize, particularly in rainforest regions. Thus, an accurate, detailed, and updated inventory map is required for landslide susceptibility, hazard, and risk analyses. The inconsistency in the results obtained using different features selection techniques in the literature has highlighted the importance of evaluating these techniques. Thus, in this study, six techniques of features selection were evaluated. Very-high-resolution LiDAR point clouds and orthophotos were acquired simultaneously in a rainforest area of Cameron Highlands, Malaysia by airborne laser scanning (LiDAR). A fuzzy-based segmentation parameter (FbSP optimizer) was used to optimize the segmentation parameters. Training samples were evaluated using a stratified random sampling method and set to 70% training samples. Two machine-learning algorithms, namely, Support Vector Machine (SVM) and Random Forest (RF), were used to evaluate the performance of each features selection algorithm. The overall accuracies of the SVM and RF models revealed that three of the six algorithms exhibited higher ranks in landslide detection. Results indicated that the classification accuracies of the RF classifier were higher than the SVM classifier using either all features or only the optimal features. The proposed techniques performed well in detecting the landslides in a rainforest area of Malaysia, and these techniques can be easily extended to similar regions.

The Analysis of the Activity Patterns of Dog with Wearable Sensors Using Machine Learning

  • Hussain, Ali;Ali, Sikandar;Kim, Hee-Cheol
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.05a
    • /
    • pp.141-143
    • /
    • 2021
  • The Activity patterns of animal species are difficult to access and the behavior of freely moving individuals can not be assessed by direct observation. As it has become large challenge to understand the activity pattern of animals such as dogs, and cats etc. One approach for monitoring these behaviors is the continuous collection of data by human observers. Therefore, in this study we assess the activity patterns of dog using the wearable sensors data such as accelerometer and gyroscope. A wearable, sensor -based system is suitable for such ends, and it will be able to monitor the dogs in real-time. The basic purpose of this study was to develop a system that can detect the activities based on the accelerometer and gyroscope signals. Therefore, we purpose a method which is based on the data collected from 10 dogs, including different nine breeds of different sizes and ages, and both genders. We applied six different state-of-the-art classifiers such as Random forests (RF), Support vector machine (SVM), Gradient boosting machine (GBM), XGBoost, k-nearest neighbors (KNN), and Decision tree classifier, respectively. The Random Forest showed a good classification result. We achieved an accuracy 86.73% while the detecting the activity.

  • PDF

Discriminant analysis of grain flours for rice paper using fluorescence hyperspectral imaging system and chemometric methods

  • Seo, Youngwook;Lee, Ahyeong;Kim, Bal-Geum;Lim, Jongguk
    • Korean Journal of Agricultural Science
    • /
    • v.47 no.3
    • /
    • pp.633-644
    • /
    • 2020
  • Rice paper is an element of Vietnamese cuisine that can be used to wrap vegetables and meat. Rice and starch are the main ingredients of rice paper and their mixing ratio is important for quality control. In a commercial factory, assessment of food safety and quantitative supply is a challenging issue. A rapid and non-destructive monitoring system is therefore necessary in commercial production systems to ensure the food safety of rice and starch flour for the rice paper wrap. In this study, fluorescence hyperspectral imaging technology was applied to classify grain flours. Using the 3D hyper cube of fluorescence hyperspectral imaging (fHSI, 420 - 730 nm), spectral and spatial data and chemometric methods were applied to detect and classify flours. Eight flours (rice: 4, starch: 4) were prepared and hyperspectral images were acquired in a 5 (L) × 5 (W) × 1.5 (H) cm container. Linear discriminant analysis (LDA), partial least square discriminant analysis (PLSDA), support vector machine (SVM), classification and regression tree (CART), and random forest (RF) with a few preprocessing methods (multivariate scatter correction [MSC], 1st and 2nd derivative and moving average) were applied to classify grain flours and the accuracy was compared using a confusion matrix (accuracy and kappa coefficient). LDA with moving average showed the highest accuracy at A = 0.9362 (K = 0.9270). 1D convolutional neural network (CNN) demonstrated a classification result of A = 0.94 and showed improved classification results between mimyeon flour (MF)1 and MF2 of 0.72 and 0.87, respectively. In this study, the potential of non-destructive detection and classification of grain flours using fHSI technology and machine learning methods was demonstrated.

Adversarial Example Detection and Classification Model Based on the Class Predicted by Deep Learning Model (데이터 예측 클래스 기반 적대적 공격 탐지 및 분류 모델)

  • Ko, Eun-na-rae;Moon, Jong-sub
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.31 no.6
    • /
    • pp.1227-1236
    • /
    • 2021
  • Adversarial attack, one of the attacks on deep learning classification model, is attack that add indistinguishable perturbations to input data and cause deep learning classification model to misclassify the input data. There are various adversarial attack algorithms. Accordingly, many studies have been conducted to detect adversarial attack but few studies have been conducted to classify what adversarial attack algorithms to generate adversarial input. if adversarial attacks can be classified, more robust deep learning classification model can be established by analyzing differences between attacks. In this paper, we proposed a model that detects and classifies adversarial attacks by constructing a random forest classification model with input features extracted from a target deep learning model. In feature extraction, feature is extracted from a output value of hidden layer based on class predicted by the target deep learning model. Through Experiments the model proposed has shown 3.02% accuracy on clean data, 0.80% accuracy on adversarial data higher than the result of pre-existing studies and classify new adversarial attack that was not classified in pre-existing studies.

Machine Learning Methods to Predict Vehicle Fuel Consumption

  • Ko, Kwangho
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.9
    • /
    • pp.13-20
    • /
    • 2022
  • It's proposed and analyzed ML(Machine Learning) models to predict vehicle FC(Fuel Consumption) in real-time. The test driving was done for a car to measure vehicle speed, acceleration, road gradient and FC for training dataset. The various ML models were trained with feature data of speed, acceleration and road-gradient for target FC. There are two kind of ML models and one is regression type of linear regression and k-nearest neighbors regression and the other is classification type of k-nearest neighbors classifier, logistic regression, decision tree, random forest and gradient boosting in the study. The prediction accuracy is low in range of 0.5 ~ 0.6 for real-time FC and the classification type is more accurate than the regression ones. The prediction error for total FC has very low value of about 0.2 ~ 2.0% and regression models are more accurate than classification ones. It's for the coefficient of determination (R2) of accuracy score distributing predicted values along mean of targets as the coefficient decreases. Therefore regression models are good for total FC and classification ones are proper for real-time FC prediction.