• Title/Summary/Keyword: Random Forest

Search Result 1,039, Processing Time 0.035 seconds

Comparative analysis of random forest on depression experiences of metropolitan and provincial residents (광역시·도민의 우울경험에 대한 Random Forest 비교분석)

  • Dong Su Lee;Yu Jeong Kim
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2023.07a
    • /
    • pp.321-324
    • /
    • 2023
  • 본 연구는 광역시와 광역도 간의 개인적 요인과 건강수준 정도가 우울경험 여부에 영향을 미치는 변수의 중요도를 파악하고자 시도되었다. 본 연구의 자료는 질병관리청의 2021년 지역사회건강조사 데이터를 활용하였다. 광역시의 데이터는 4,602건을 이용하였고, 광역도는 19,545건의 데이터를 이용하였다. 자료 분석에 활용된 빅데이터는 R 4.3.0 for Windows를 활용하여 단어 빈도 분석과 machine learning기법인 Random Forest분석을 실시하였다. 연구결과, train 데이터와 test 데이터의 과적합(overfitting)의 문제는 발생하지 않았으며, machine learning 기법의 분류모델은 약 94% 수준으로 나타났다. 분석 결과 광역시와 광역도 간의 우울경험여부에 미치는 중요도가 각각 다르게 나타났다. 두 지역의 시민에게 미치는 우울경험의 원인을 다르게 접근함으로써 보다 더 효율적인 정책수립이 가능 할 것으로 판단된다.

  • PDF

A Random Forest Algorithm-based Accident Prediction to Prevent Marine Pilot Occupational Accidents

  • Gokhan Camliyurt;Won Sik Kang;Daewon Kim;Sangwon Park;Youngsoo Park
    • Proceedings of the Korean Institute of Navigation and Port Research Conference
    • /
    • 2022.06a
    • /
    • pp.415-416
    • /
    • 2022
  • Marine pilot occupational accidents during transfer to/from the ship are at the top of the agenda after several safety campaigns by IMPA and individual attemptsThere is multiple transfer method for the marine pilot, but a most common way is to use the pilot cutter. This paper aims to predict marine pilot occupational accidents before it occurs by using historical data. Since the problem depends on several variables, this paper develops a model by using the random forest method to predict marine pilot accidents before happening with the random forest method by using RStudio software

  • PDF

A Mixed-effects Height-Diameter Model for Pinus densiflora Trees in Gangwon Province, Korea

  • Lee, Young Jin;Coble, Dean W.;Pyo, Jung Kee;Kim, Sung Ho;Lee, Woo Kyun;Choi, Jung Kee
    • Journal of Korean Society of Forest Science
    • /
    • v.98 no.2
    • /
    • pp.178-182
    • /
    • 2009
  • A new mixed-effects model was developed that predicts individual-tree total height for Pinus densiflora trees in Gangwon province as a function of individual-tree diameter (cm). The mixed-effects model contains two random-effects parameters. Maximum likelihood estimation was used to fit the model to 560 height-diameter observations of individual trees measured throughout Gwangwon province in 2007 as part of the National Forest Inventory Program in Korea. The new model is an improvement over fixed-effects models because it can be calibrated to a local area, such as an inventory plot or individual stand. The new model also appears to be an improvement over the Forest Resources Evaluation and Prediction Program for the ten calibration trees used in this study. An example is provided that describes how to estimate the random-effects parameters using ten calibration trees.

Forest Vertical Structure Mapping from Bi-Seasonal Sentinel-2 Images and UAV-Derived DSM Using Random Forest, Support Vector Machine, and XGBoost

  • Young-Woong Yoon;Hyung-Sup Jung
    • Korean Journal of Remote Sensing
    • /
    • v.40 no.2
    • /
    • pp.123-139
    • /
    • 2024
  • Forest vertical structure is vital for comprehending ecosystems and biodiversity, in addition to fundamental forest information. Currently, the forest vertical structure is predominantly assessed via an in-situ method, which is not only difficult to apply to inaccessible locations or large areas but also costly and requires substantial human resources. Therefore, mapping systems based on remote sensing data have been actively explored. Recently, research on analyzing and classifying images using machine learning techniques has been actively conducted and applied to map the vertical structure of forests accurately. In this study, Sentinel-2 and digital surface model images were obtained on two different dates separated by approximately one month, and the spectral index and tree height maps were generated separately. Furthermore, according to the acquisition time, the input data were separated into cases 1 and 2, which were then combined to generate case 3. Using these data, forest vetical structure mapping models based on random forest, support vector machine, and extreme gradient boost(XGBoost)were generated. Consequently, nine models were generated, with the XGBoost model in Case 3 performing the best, with an average precision of 0.99 and an F1 score of 0.91. We confirmed that generating a forest vertical structure mapping model utilizing bi-seasonal data and an appropriate model can result in an accuracy of 90% or higher.

Applicability Evaluation of a Mixed Model for the Analysis of Repeated Inventory Data : A Case Study on Quercus variabilis Stands in Gangwon Region (반복측정자료 분석을 위한 혼합모형의 적용성 검토: 강원지역 굴참나무 임분을 대상으로)

  • Pyo, Jungkee;Lee, Sangtae;Seo, Kyungwon;Lee, Kyungjae
    • Journal of Korean Society of Forest Science
    • /
    • v.104 no.1
    • /
    • pp.111-116
    • /
    • 2015
  • The purpose of this study was to evaluate mixed model of dbh-height relation containing random effect. Data were obtained from a survey site for Quercus variabilis in Gangwon region and remeasured the same site after three years. The mixed model were used to fixed effect in the dbh-height relation for Quercus variabilis, with random effect representing correlation of survey period were obtained. To verify the evaluation of the model for random effect, the akaike information criterion (abbreviated as, AIC) was used to calculate the variance-covariance matrix, and residual of repeated data. The estimated variance-covariance matrix, and residual were -0.0291, 0.1007, respectively. The model with random effect (AIC = -215.5) has low AIC value, comparison with model with fixed effect (AIC = -154.4). It is for this reason that random effect associated with categorical data is used in the data fitting process, the model can be calibrated to fit repeated site by obtaining measurements. Therefore, the results of this study could be useful method for developing model using repeated measurement.

A Study on the Performance Evaluation of Machine Learning for Predicting the Number of Movie Audiences (영화 관객 수 예측을 위한 기계학습 기법의 성능 평가 연구)

  • Jeong, Chan-Mi;Min, Daiki
    • The Journal of Society for e-Business Studies
    • /
    • v.25 no.2
    • /
    • pp.49-63
    • /
    • 2020
  • The accurate prediction of box office in the early stage is crucial for film industry to make better managerial decision. With aims to improve the prediction performance, the purpose of this paper is to evaluate the use of machine learning methods. We tested both classification and regression based methods including k-NN, SVM and Random Forest. We first evaluate input variables, which show that reputation-related information generated during the first two-week period after release is significant. Prediction test results show that regression based methods provides lower prediction error, and Random Forest particularly outperforms other machine learning methods. Regression based method has better prediction power when films have small box office earnings. On the other hand, classification based method works better for predicting large box office earnings.

Patterning Waterbird Assemblages on Rice Fields Using Self-Organizing Map and Random Forest (자기조직화지도(Self-organizing map)와 랜덤 포레스트 분석(Random forest)을 이용한 논습지에 도래하는 수조류 군집 특성 파악)

  • Nam, Hyung-Kyu;Choi, Seung-Hye;Yoo, Jeong-Chil
    • Korean Journal of Environmental Agriculture
    • /
    • v.34 no.3
    • /
    • pp.168-177
    • /
    • 2015
  • BACKGROUND: In recent year, there has been great concern regarding agricultural land uses and their importance for the conservation of biodiversity. Rice fields are managed unique wetland for wildlife, especially waterbirds. A comprehensive monitoring of the waterbird assemblage to understand patterning changes was attempted for rice ecosystem in South Korea. This rice ecosystem has been recognized as one of the most important for waterbirds conservation. METHODS AND RESULTS: Biweekly monitoring was implemented for the 4 years from April 2009 to March 2010, from April 2011 to March 2014. 32 species of waterbirds were observed. Self-organizing map (SOM) and random forest were applied to the waterbirds dataset to identify the characteristics in waterbirds distribution. SOM and random forest analysis clearly classified into four clusters and extract ecological information from waterbird dataset. Waterbird assemblages represented strong seasonality and habitat use according to waterbird group such as shorebirds, herons and waterfowl. CONCLUSION: Our results showed that the combination of SOM and random forest analysis could be useful for ecosystem assessment and management. Furthermore, we strongly suggested that a strict management strategy for the rice fields to conserve the waterbirds. The strategy could be seasonally and species specific.

API Feature Based Ensemble Model for Malware Family Classification (악성코드 패밀리 분류를 위한 API 특징 기반 앙상블 모델 학습)

  • Lee, Hyunjong;Euh, Seongyul;Hwang, Doosung
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.29 no.3
    • /
    • pp.531-539
    • /
    • 2019
  • This paper proposes the training features for malware family analysis and analyzes the multi-classification performance of ensemble models. We construct training data by extracting API and DLL information from malware executables and use Random Forest and XGBoost algorithms which are based on decision tree. API, API-DLL, and DLL-CM features for malware detection and family classification are proposed by analyzing frequently used API and DLL information from malware and converting high-dimensional features to low-dimensional features. The proposed feature selection method provides the advantages of data dimension reduction and fast learning. In performance comparison, the malware detection rate is 93.0% for Random Forest, the accuracy of malware family dataset is 92.0% for XGBoost, and the false positive rate of malware family dataset including benign is about 3.5% for Random Forest and XGBoost.

Imbalanced Data Improvement Techniques Based on SMOTE and Light GBM (SMOTE와 Light GBM 기반의 불균형 데이터 개선 기법)

  • Young-Jin, Han;In-Whee, Joe
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.11 no.12
    • /
    • pp.445-452
    • /
    • 2022
  • Class distribution of unbalanced data is an important part of the digital world and is a significant part of cybersecurity. Abnormal activity of unbalanced data should be found and problems solved. Although a system capable of tracking patterns in all transactions is needed, machine learning with disproportionate data, which typically has abnormal patterns, can ignore and degrade performance for minority layers, and predictive models can be inaccurately biased. In this paper, we predict target variables and improve accuracy by combining estimates using Synthetic Minority Oversampling Technique (SMOTE) and Light GBM algorithms as an approach to address unbalanced datasets. Experimental results were compared with logistic regression, decision tree, KNN, Random Forest, and XGBoost algorithms. The performance was similar in accuracy and reproduction rate, but in precision, two algorithms performed at Random Forest 80.76% and Light GBM 97.16%, and in F1-score, Random Forest 84.67% and Light GBM 91.96%. As a result of this experiment, it was confirmed that Light GBM's performance was similar without deviation or improved by up to 16% compared to five algorithms.

Easy and Quick Survey Method to Estimate Quantitative Characteristics in the Thin Forests

  • Mirzaei, Mehrdad;Bonyad, Amir Eslam;Bijarpas, Mahboobeh Mohebi;Golmohamadi, Fatemeh
    • Journal of Forest and Environmental Science
    • /
    • v.31 no.2
    • /
    • pp.73-77
    • /
    • 2015
  • Acquiring accurate quantitative and qualitative information is necessary for the technical and scientific management of forest stands. In this study, stratification and systematic random sampling methods were used to estimation of quantitative characteristics in study area. The estimator ($((E%)^2xT)$) was used to compare the systematic random and stratified sampling methods. 100 percent inventory was carried out in an area of 400 hectares; characteristics as: tree density, crown cover (canopy), and basal area were measured. Tree density of stands was compared through systemic random and stratified sampling methods. Findings of the study reveal that stratified sampling method gives a better representation of estimates than systematic random sampling.