• Title/Summary/Keyword: tree-based models

Search Result 437, Processing Time 0.02 seconds

Effects of Geological Structure and Tree Density on the Forest Fire Patterns (지형구조와 나무밀도가 산불패턴에 미치는 영향)

  • Song, Hark-Soo;Kwon, Oh Sung;Lee, Sang-Hee
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.16 no.4
    • /
    • pp.259-266
    • /
    • 2014
  • Understanding the forest fire patterns is necessary to comprehend the stability of the forest ecosystems. Thus, researchers have suggested the simulation models to mimic the forest fire spread dynamics, which enables us to predict the forest damage in the scenarios that are difficult to be experimentally tested in laboratory scale. However, many of the models have the limitation that many of them did not consider the complicated environmental factors, such as fuel types, wind, and moisture. In this study, we suggested a simple model with the factors, especially, the geomorphological structure of the forest and two types of fuel. The two fuels correspond to susceptible tree and resistant tree with different probabilities of transferring fire. The trees were randomly distributed in simulation space at densities ranging from 0.5 (low) to 1.0 (high). The susceptible tree had higher value of the probability than the resistant tree. Based on the number of burnt trees, we then carried out the sensitivity analysis to quantify how the forest fire patterns are affected by the structure and tree density. We believe that our model can be a useful tool to explore forest fire spreading patterns.

Splitting Decision Tree Nodes with Multiple Target Variables (의사결정나무에서 다중 목표변수를 고려한)

  • 김성준
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2003.05a
    • /
    • pp.243-246
    • /
    • 2003
  • Data mining is a process of discovering useful patterns for decision making from an amount of data. It has recently received much attention in a wide range of business and engineering fields Classifying a group into subgroups is one of the most important subjects in data mining Tree-based methods, known as decision trees, provide an efficient way to finding classification models. The primary concern in tree learning is to minimize a node impurity, which is evaluated using a target variable in the data set. However, there are situations where multiple target variables should be taken into account, for example, such as manufacturing process monitoring, marketing science, and clinical and health analysis. The purpose of this article is to present several methods for measuring the node impurity, which are applicable to data sets with multiple target variables. For illustrations, numerical examples are given with discussion.

  • PDF

Contemporary review on the bifurcating autoregressive models : Overview and perspectives

  • Hwang, S.Y.
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.5
    • /
    • pp.1137-1149
    • /
    • 2014
  • Since the bifurcating autoregressive (BAR) model was developed by Cowan and Staudte (1986) to analyze cell lineage data, a lot of research has been directed to BAR and its generalizations. Based mainly on the author's works, this paper is concerned with a contemporary review on the BAR in terms of an overview and perspectives. Specifically, bifurcating structure is extended to multi-cast tree and to branching tree structure. The AR(1) time series model of Cowan and Staudte (1986) is generalized to tree structured random processes. Branching correlations between individuals sharing the same parent are introduced and discussed. Various methods for estimating parameters and related asymptotics are also reviewed. Consequently, the paper aims to give a contemporary overview on the BAR model, providing some perspectives to the future works in this area.

Machine Learning Approach to Blood Stasis Pattern Identification Based on Self-reported Symptoms (기계학습을 적용한 자기보고 증상 기반의 어혈 변증 모델 구축)

  • Kim, Hyunho;Yang, Seung-Bum;Kang, Yeonseok;Park, Young-Bae;Kim, Jae-Hyo
    • Korean Journal of Acupuncture
    • /
    • v.33 no.3
    • /
    • pp.102-113
    • /
    • 2016
  • Objectives : This study is aimed at developing and discussing the prediction model of blood stasis pattern of traditional Korean medicine(TKM) using machine learning algorithms: multiple logistic regression and decision tree model. Methods : First, we reviewed the blood stasis(BS) questionnaires of Korean, Chinese, and Japanese version to make a integrated BS questionnaire of patient-reported outcomes. Through a human subject research, patients-reported BS symptoms data were acquired. Next, experts decisions of 5 Korean medicine doctor were also acquired, and supervised learning models were developed using multiple logistic regression and decision tree. Results : Integrated BS questionnaire with 24 items was developed. Multiple logistic regression models with accuracy of 0.92(male) and 0.95(female) validated by 10-folds cross-validation were constructed. By decision tree modeling methods, male model with 8 decision node and female model with 6 decision node were made. In the both models, symptoms of 'recent physical trauma', 'chest pain', 'numbness', and 'menstrual disorder(female only)' were considered as important factors. Conclusions : Because machine learning, especially supervised learning, can reveal and suggest important or essential factors among the very various symptoms making up a pattern identification, it can be a very useful tool in researching diagnostics of TKM. With a proper patient-reported outcomes or well-structured database, it can also be applied to a pre-screening solutions of healthcare system in Mibyoung stage.

Data Mining-Based Performance Prediction Technology of Geothermal Heat Pump System (지열 히트펌프 시스템의 데이터 마이닝 기반 성능 예측 기술)

  • Hwang, Min Hye;Park, Myung Kyu;Jun, In Ki;Sohn, Byonghu
    • Transactions of the KSME C: Technology and Education
    • /
    • v.4 no.1
    • /
    • pp.27-34
    • /
    • 2016
  • This preliminary study investigated data mining-based methods to assess and predict the performance of geothermal heat pump(GHP) system. Data mining is a key process of the knowledge discovery in database (KDD), which includes five steps: 1) Selection; 2) Pre-processing; 3) Transformation; 4) Analysis(data mining); and 5) Interpretation/Evaluation. We used two analysis models, categorical and numerical decision tree models to ascertain the patterns of performance(COP) and electrical consumption of the GHP system. Prior to applying the decision tree models, we statistically analyzed measurement database to determine the effect of sampling intervals on the system performance. Analysis results showed that 10-min sampling data for the performance analysis had highest accuracy of 97.7% over the actual dataset of the GHP system.

A Study on Time Series Cross-Validation Techniques for Enhancing the Accuracy of Reservoir Water Level Prediction Using Automated Machine Learning TPOT (자동기계학습 TPOT 기반 저수위 예측 정확도 향상을 위한 시계열 교차검증 기법 연구)

  • Bae, Joo-Hyun;Park, Woon-Ji;Lee, Seoro;Park, Tae-Seon;Park, Sang-Bin;Kim, Jonggun;Lim, Kyoung-Jae
    • Journal of The Korean Society of Agricultural Engineers
    • /
    • v.66 no.1
    • /
    • pp.1-13
    • /
    • 2024
  • This study assessed the efficacy of improving the accuracy of reservoir water level prediction models by employing automated machine learning models and efficient cross-validation methods for time-series data. Considering the inherent complexity and non-linearity of time-series data related to reservoir water levels, we proposed an optimized approach for model selection and training. The performance of twelve models was evaluated for the Obong Reservoir in Gangneung, Gangwon Province, using the TPOT (Tree-based Pipeline Optimization Tool) and four cross-validation methods, which led to the determination of the optimal pipeline model. The pipeline model consisting of Extra Tree, Stacking Ridge Regression, and Simple Ridge Regression showed outstanding predictive performance for both training and test data, with an R2 (Coefficient of determination) and NSE (Nash-Sutcliffe Efficiency) exceeding 0.93. On the other hand, for predictions of water levels 12 hours later, the pipeline model selected through time-series split cross-validation accurately captured the change pattern of time-series water level data during the test period, with an NSE exceeding 0.99. The methodology proposed in this study is expected to greatly contribute to the efficient generation of reservoir water level predictions in regions with high rainfall variability.

Probabilistic Risk Assessment Techniques for the Risk Analysis of Construction Projects (건설공사의 위험도분석을 위한 확률적 위험도 평가)

  • 조효남;임종권;박영빈
    • Proceedings of the Computational Structural Engineering Institute Conference
    • /
    • 1997.04a
    • /
    • pp.27-34
    • /
    • 1997
  • In this paper, systematic and comprehensive approaches are suggested for the application of quantitative PRA techniques especially for those risk events that cannot be easily evaluated quantitatively In addition, dominant risk events are identified based on their occurrence frequency assessed by both actual survey of construction site conditions and the statistical data related with the probable accidents. Practical FTA(Fault Tree Analysis) and ETA(Event Tree Analysis) models are used for the assessment of the identified risks. When the risk events are lack of statistical data, appropriate Bayesian models incorporating engineering judgement and test results are also introduced in this paper. Moreover, a fuzzy probability technique is used for the quantitative risk assessment of those risk components which are difficult to evaluate quantitatively.

  • PDF

Selecting Machine Learning Model Based on Natural Language Processing for Shanghanlun Diagnostic System Classification (자연어 처리 기반 『상한론(傷寒論)』 변병진단체계(辨病診斷體系) 분류를 위한 기계학습 모델 선정)

  • Young-Nam Kim
    • 대한상한금궤의학회지
    • /
    • v.14 no.1
    • /
    • pp.41-50
    • /
    • 2022
  • Objective : The purpose of this study is to explore the most suitable machine learning model algorithm for Shanghanlun diagnostic system classification using natural language processing (NLP). Methods : A total of 201 data items were collected from 『Shanghanlun』 and 『Clinical Shanghanlun』, 'Taeyangbyeong-gyeolhyung' and 'Eumyangyeokchahunobokbyeong' were excluded to prevent oversampling or undersampling. Data were pretreated using a twitter Korean tokenizer and trained by logistic regression, ridge regression, lasso regression, naive bayes classifier, decision tree, and random forest algorithms. The accuracy of the models were compared. Results : As a result of machine learning, ridge regression and naive Bayes classifier showed an accuracy of 0.843, logistic regression and random forest showed an accuracy of 0.804, and decision tree showed an accuracy of 0.745, while lasso regression showed an accuracy of 0.608. Conclusions : Ridge regression and naive Bayes classifier are suitable NLP machine learning models for the Shanghanlun diagnostic system classification.

  • PDF

Human Normalization Approach based on Disease Comparative Prediction Model between Covid-19 and Influenza

  • Janghwan Kim;Min-Yong Jung;Da-Yun Lee;Na-Hyeon Cho;Jo-A Jin;R. Young-Chul Kim
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.15 no.3
    • /
    • pp.32-42
    • /
    • 2023
  • There are serious problems worldwide, such as a pandemic due to an unprecedented infection caused by COVID-19. On previous approaches, they invented medical vaccines and preemptive testing tools for medical engineering. However, it is difficult to access poor medical systems and medical institutions due to disparities between countries and regions. In advanced nations, the damage was even greater due to high medical and examination costs because they did not go to the hospital. Therefore, from a software engineering-based perspective, we propose a learning model for determining coronavirus infection through symptom data-based software prediction models and tools. After a comparative analysis of various models (decision tree, Naive Bayes, KNN, multi-perceptron neural network), we decide to choose an appropriate decision tree model. Due to a lack of data, additional survey data and overseas symptom data are applied and built into the judgment model. To protect from thiswe also adapt human normalization approach with traditional Korean medicin approach. We expect to be possible to determine coronavirus, flu, allergy, and cold without medical examination and diagnosis tools through data collection and analysis by applying decision trees.

Performance Evaluation of Stacking Models Based on Random Forest, XGBoost, and LGBM for Wind Power Forecasting (Random Forest, XGBoost, LGBM 조합형 Stacking 모델을 이용한 풍력 발전량 예측 성능 평가)

  • Hui-Chan Kim;Dae-Young Kim;Bum-Suk Kim
    • Journal of Wind Energy
    • /
    • v.15 no.3
    • /
    • pp.21-29
    • /
    • 2024
  • Wind power is highly variable due to the intermittent nature of wind. This can lead to power grid instability and decreased efficiency. Therefore, it is necessary to improve wind power prediction performance to minimize the negative impact on the power system. Recently, wind power prediction using machine learning has gained popularity, and ensemble models in machine learning have shown high prediction accuracy. RF, GB, XGB and LGBM are decision tree-based ensemble models and have high predictive performance in wind power, but these models have problems from over-fitting and strong dependence on certain variables. However, the stacking model can improve prediction performance by combining individual models and compensate for the shortcomings of each model. In this study, The MAE of RF, XGB and LGBM is 310.42 kWh, 217.07 kWh and 265.20 kWh, respectively, while the stacking model based on RF, XGB and LGBM is 202.33 kWh. Stacking models can improve prediction performance. Finally, it is expected to contribute to electricity supply and demand planning.