• Title/Summary/Keyword: Random forest algorithm

Search Result 218, Processing Time 0.032 seconds

RFA: Recursive Feature Addition Algorithm for Machine Learning-Based Malware Classification

  • Byeon, Ji-Yun;Kim, Dae-Ho;Kim, Hee-Chul;Choi, Sang-Yong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.2
    • /
    • pp.61-68
    • /
    • 2021
  • Recently, various technologies that use machine learning to classify malicious code have been studied. In order to enhance the effectiveness of machine learning, it is most important to extract properties to identify malicious codes and normal binaries. In this paper, we propose a feature extraction method for use in machine learning using recursive methods. The proposed method selects the final feature using recursive methods for individual features to maximize the performance of machine learning. In detail, we use the method of extracting the best performing features among individual feature at each stage, and then combining the extracted features. We extract features with the proposed method and apply them to machine learning algorithms such as Decision Tree, SVM, Random Forest, and KNN, to validate that machine learning performance improves as the steps continue.

Short-Term Water Quality Prediction of the Paldang Reservoir Using Recurrent Neural Network Models (순환신경망 모델을 활용한 팔당호의 단기 수질 예측)

  • Jiwoo Han;Yong-Chul Cho;Soyoung Lee;Sanghun Kim;Taegu Kang
    • Journal of Korean Society on Water Environment
    • /
    • v.39 no.1
    • /
    • pp.46-60
    • /
    • 2023
  • Climate change causes fluctuations in water quality in the aquatic environment, which can cause changes in water circulation patterns and severe adverse effects on aquatic ecosystems in the future. Therefore, research is needed to predict and respond to water quality changes caused by climate change in advance. In this study, we tried to predict the dissolved oxygen (DO), chlorophyll-a, and turbidity of the Paldang reservoir for about two weeks using long short-term memory (LSTM) and gated recurrent units (GRU), which are deep learning algorithms based on recurrent neural networks. The model was built based on real-time water quality data and meteorological data. The observation period was set from July to September in the summer of 2021 (Period 1) and from March to May in the spring of 2022 (Period 2). We tried to select an algorithm with optimal predictive power for each water quality parameter. In addition, to improve the predictive power of the model, an important variable extraction technique using random forest was used to select only the important variables as input variables. In both Periods 1 and 2, the predictive power after extracting important variables was further improved. Except for DO in Period 2, GRU was selected as the best model in all water quality parameters. This methodology can be useful for preventive water quality management by identifying the variability of water quality in advance and predicting water quality in a short period.

Inhalation Configuration Detection for COVID-19 Patient Secluded Observing using Wearable IoTs Platform

  • Sulaiman Sulmi Almutairi;Rehmat Ullah;Qazi Zia Ullah;Habib Shah
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.18 no.6
    • /
    • pp.1478-1499
    • /
    • 2024
  • Coronavirus disease (COVID-19) is an infectious disease caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus. COVID-19 become an active epidemic disease due to its spread around the globe. The main causes of the spread are through interaction and transmission of the droplets through coughing and sneezing. The spread can be minimized by isolating the susceptible patients. However, it necessitates remote monitoring to check the breathing issues of the patient remotely to minimize the interactions for spread minimization. Thus, in this article, we offer a wearable-IoTs-centered framework for remote monitoring and recognition of the breathing pattern and abnormal breath detection for timely providing the proper oxygen level required. We propose wearable sensors accelerometer and gyroscope-based breathing time-series data acquisition, temporal features extraction, and machine learning algorithms for pattern detection and abnormality identification. The sensors provide the data through Bluetooth and receive it at the server for further processing and recognition. We collect the six breathing patterns from the twenty subjects and each pattern is recorded for about five minutes. We match prediction accuracies of all machine learning models under study (i.e. Random forest, Gradient boosting tree, Decision tree, and K-nearest neighbor. Our results show that normal breathing and Bradypnea are the most correctly recognized breathing patterns. However, in some cases, algorithm recognizes kussmaul well also. Collectively, the classification outcomes of Random Forest and Gradient Boost Trees are better than the other two algorithms.

Selecting Stock by Value Investing based on Machine Learning: Focusing on Intrinsic Value (머신러닝 기반 가치투자를 통한 주식 종목 선정 연구: 내재가치를 중심으로)

  • Kim, Youn Seung;Yoo, Dong Hee
    • The Journal of Information Systems
    • /
    • v.32 no.1
    • /
    • pp.179-199
    • /
    • 2023
  • Purpose This study builds a prediction model to find stocks that can reach intrinsic value among KOSPI and KOSDAQ-listed companies to improve the stability and profitability of the stock investment. And investment simulations are conducted to verify whether stock investment performance is improved by comparing the prediction model, random stock selection, and the market indexes. Design/methodology/approach Value investment theory and machine learning techniques are applied to build the model. Various experiments find conditions such as the algorithm with the best predictive performance, learning period, and intrinsic value-reaching period. This study selects stocks through the prediction model learned with inventive variables, does not limit the holding period after buying to reach the intrinsic value of the stocks, and targets all KOSPI and KOSDAQ companies. The stock and financial data are collected for 21 years (2001-2021). Findings As a result of the experiment, using the random forest technique, the prediction model's performance was the best with one year of learning period and within one year of the intrinsic value reaching period. As a result of the investment simulation, the cumulative return of the prediction model was up to 1.68 times higher than the random stock selection and 17 times higher than the KOSPI index. The usefulness of the prediction model was confirmed in that the number of intrinsic values reaching the predicted stock was up to 70% higher than the random selection.

Classification of Parent Company's Downward Business Clients Using Random Forest: Focused on Value Chain at the Industry of Automobile Parts (랜덤포레스트를 이용한 모기업의 하향 거래처 기업의 분류: 자동차 부품산업의 가치사슬을 중심으로)

  • Kim, Teajin;Hong, Jeongshik;Jeon, Yunsu;Park, Jongryul;An, Teayuk
    • The Journal of Society for e-Business Studies
    • /
    • v.23 no.1
    • /
    • pp.1-22
    • /
    • 2018
  • The value chain has been utilized as a strategic tool to improve competitive advantage, mainly at the enterprise level and at the industrial level. However, in order to conduct value chain analysis at the enterprise level, the client companies of the parent company should be classified according to whether they belong to it's value chain. The establishment of a value chain for a single company can be performed smoothly by experts, but it takes a lot of cost and time to build one which consists of multiple companies. Thus, this study proposes a model that automatically classifies the companies that form a value chain based on actual transaction data. A total of 19 transaction attribute variables were extracted from the transaction data and processed into the form of input data for machine learning method. The proposed model was constructed using the Random Forest algorithm. The experiment was conducted on a automobile parts company. The experimental results demonstrate that the proposed model can classify the client companies of the parent company automatically with 92% of accuracy, 76% of F1-score and 94% of AUC. Also, the empirical study confirm that a few transaction attributes such as transaction concentration, transaction amount and total sales per customer are the main characteristics representing the companies that form a value chain.

Assessment of changes on water quality and aquatic ecosystem health in Han river basin by additional dam release of stream maintenance flow (하천유지유량 추가 댐방류에 따른 한강유역의 수질 및 수생태계 건강성 변화 평가)

  • Woo, So Young;Kim, Seong Joon;Hwang, Sun Jin;Jung, Chung Gil
    • Journal of Korea Water Resources Association
    • /
    • v.52 no.spc2
    • /
    • pp.777-789
    • /
    • 2019
  • The purpose of this study is to evaluate changes in water quality and aquatic ecosystem health by additional dam release of stream maintenance flow from multipurpose dams in Han river basin ($34,148km^2$) using SWAT (Soil and Water Assessment Tool). The period of additional release was spring (April to June) and autumn (August to October) to evaluate the changes with the data of aquatic ecosystem health survey. The amount of additional release was set proportional to the present dam release, and the maximum release amount was controlled not to exceed the officially notified stream maintenance flow from dam. The 10 percent to 50 percent additional releases showed that the stream water quality (T-N, $NH_4$, T-P, and $PO_4-P$) concentrations except $NO_3-N$ decreased in spring while increased in autumn period. Using the stream water quality results and applying with Random Forest algorithm, the grade of aquatic ecosystem health index (FAI, TDI, and BMI) was improved for both periods especially in the downstream of basin. This study showed that the additional release of stream maintenance flow was more effective in spring than autumn period for the improvement of water quality and aquatic ecosystem.

Optimization of Input Features for Vegetation Classification Based on Random Forest and Sentinel-2 Image (랜덤포레스트와 Sentinel-2를 이용한 식생 분류의 입력특성 최적화)

  • LEE, Seung-Min;JEONG, Jong-Chul
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.23 no.4
    • /
    • pp.52-67
    • /
    • 2020
  • Recently, the Arctic has been exposed to snow-covered land due to melting permafrost every year, and the Korea Geographic Information Institute(NGII) provides polar spatial information service by establishing spatial information of the polar region. However, there is a lack of spatial information on vegetation sensitive to climate change. This research used a multi-temporal Sentinel-2 image to perform land cover classification of the Ny-Ålesund in Arctic Svalbard. In the pre-processing step, 10 bands and 6 vegetation spectral index were generated from multi-temporal Sentinel-2 images. In image-classification step is consisted of extracting the vegetation area through 8-class land cover classification and performing the vegetation species classification. The image classification algorithm used Random Forest to evaluate the accuracy and calculate feature importance through Out-Of-Bag(OOB). To identify the advantages of multi- temporary Sentinel-2 for vegetation classification, the overall accuracy was compared according to the number of images stacked and vegetation spectral index. Overall accuracy was 77% when using single-time Sentinel-2 images, but improved to 81% when using multi-time Sentinel-2 images. In addition, the overall accuracy improved to about 83% in learning when the vegetation index was used additionally. The most important spectral variables to distinguish between vegetation classes are located in the Red, Green, and short wave infrared-1(SWIR1). This research can be used as a basic study that optimizes input characteristics in performing the classification of vegetation in the polar regions.

A study on EPB shield TBM face pressure prediction using machine learning algorithms (머신러닝 기법을 활용한 토압식 쉴드TBM 막장압 예측에 관한 연구)

  • Kwon, Kibeom;Choi, Hangseok;Oh, Ju-Young;Kim, Dongku
    • Journal of Korean Tunnelling and Underground Space Association
    • /
    • v.24 no.2
    • /
    • pp.217-230
    • /
    • 2022
  • The adequate control of TBM face pressure is of vital importance to maintain face stability by preventing face collapse and surface settlement. An EPB shield TBM excavates the ground by applying face pressure with the excavated soil in the pressure chamber. One of the challenges during the EPB shield TBM operation is the control of face pressure due to difficulty in managing the excavated soil. In this study, the face pressure of an EPB shield TBM was predicted using the geological and operational data acquired from a domestic TBM tunnel site. Four machine learning algorithms: KNN (K-Nearest Neighbors), SVM (Support Vector Machine), RF (Random Forest), and XGB (eXtreme Gradient Boosting) were applied to predict the face pressure. The model comparison results showed that the RF model yielded the lowest RMSE (Root Mean Square Error) value of 7.35 kPa. Therefore, the RF model was selected as the optimal machine learning algorithm. In addition, the feature importance of the RF model was analyzed to evaluate appropriately the influence of each feature on the face pressure. The water pressure indicated the highest influence, and the importance of the geological conditions was higher in general than that of the operation features in the considered site.

Data Mining-Aided Automatic Landslide Detection Using Airborne Laser Scanning Data in Densely Forested Tropical Areas

  • Mezaal, Mustafa Ridha;Pradhan, Biswajeet
    • Korean Journal of Remote Sensing
    • /
    • v.34 no.1
    • /
    • pp.45-74
    • /
    • 2018
  • Landslide is a natural hazard that threats lives and properties in many areas around the world. Landslides are difficult to recognize, particularly in rainforest regions. Thus, an accurate, detailed, and updated inventory map is required for landslide susceptibility, hazard, and risk analyses. The inconsistency in the results obtained using different features selection techniques in the literature has highlighted the importance of evaluating these techniques. Thus, in this study, six techniques of features selection were evaluated. Very-high-resolution LiDAR point clouds and orthophotos were acquired simultaneously in a rainforest area of Cameron Highlands, Malaysia by airborne laser scanning (LiDAR). A fuzzy-based segmentation parameter (FbSP optimizer) was used to optimize the segmentation parameters. Training samples were evaluated using a stratified random sampling method and set to 70% training samples. Two machine-learning algorithms, namely, Support Vector Machine (SVM) and Random Forest (RF), were used to evaluate the performance of each features selection algorithm. The overall accuracies of the SVM and RF models revealed that three of the six algorithms exhibited higher ranks in landslide detection. Results indicated that the classification accuracies of the RF classifier were higher than the SVM classifier using either all features or only the optimal features. The proposed techniques performed well in detecting the landslides in a rainforest area of Malaysia, and these techniques can be easily extended to similar regions.

An exploration of the relationship between crime/victim characteristics and the victim's criminal damages: Variable selection based on random forest algorithm (범죄 및 피해자 특성과 범죄피해 내용의 관계 탐색: 랜덤포레스트 알고리즘에 기초한 변인선택)

  • Han, Yuhwa;Lee, Wooyeol
    • Korean Journal of Forensic Psychology
    • /
    • v.13 no.2
    • /
    • pp.121-145
    • /
    • 2022
  • The current study applied the random forest algorithm to Korean crime victim survey data collected biennially between 2010 and 2018 to explore the relationship between crime/victim characteristics and the victim's criminal damages. A total of 3,080 cases including gender, age (life cycle stage), type of crime, perpetrator acquisition, repeated victimization, psychological damage (depression, isolation, extreme fear, somatic symptoms, interpersonal problems, moving out to avoid people, suicidal impulses, suicide attempts), and emotional changes after victimization (changes in self-protection confidence, self-esteem, confidence in others, confidence in legal institutions, and respect for Korean legal system/law) were analyzed. Considering the features of data that are difficult to apply traditional statistical techniques, this study implemented random forest algorithms to predict crime and victim characteristics using the victim's criminal damages (psychological damage and emotional change) and selected good predictors using VSURF function in VSURF package for R. As a result of the analysis, it was confirmed that the relationship between the type of crime and depression, extreme fear, somatic symptoms, and interpersonal problems, between perpetrator acquisition and somatic symptoms and interpersonal problems, and between repeated victimization and changes in respect for Korean legal system/law. Gender and life cycle stage (youth/adult/elderly) were found to be related to extreme fear and changes in self-protection confidence, respectively. However, more empirical evidence should be aggregated to explain the results as meaningful. The results of this study suggest that it is necessary to enhance the experts' knowledge and educate them on cases about the relationship between crime/victim characteristics and criminal damage. Strengthening their interview strategy and knowledge about law/rules were also needed to increase the effectiveness of the Korean victim assessment system.