• Title/Summary/Keyword: gradient boosting

Search Result 216, Processing Time 0.022 seconds

Comparison of Machine Learning-Based Greenhouse VPD Prediction Models (머신러닝 기반의 온실 VPD 예측 모델 비교)

  • Jang Kyeong Min;Lee Myeong Bae;Lim Jong Hyun;Oh Han Byeol;Shin Chang Sun;Park Jang Woo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.3
    • /
    • pp.125-132
    • /
    • 2023
  • In this study, we compared the performance of machine learning models for predicting Vapor Pressure Deficits (VPD) in greenhouses that affect pore function and photosynthesis as well as plant growth due to nutrient absorption of plants. For VPD prediction, the correlation between the environmental elements in and outside the greenhouse and the temporal elements of the time series data was confirmed, and how the highly correlated elements affect VPD was confirmed. Before analyzing the performance of the prediction model, the amount and interval of analysis time series data (1 day, 3 days, 7 days) and interval (20 minutes, 1 hour) were checked to adjust the amount and interval of data. Finally, four machine learning prediction models (XGB Regressor, LGBM Regressor, Random Forest Regressor, etc.) were applied to compare the prediction performance by model. As a result of the prediction of the model, when data of 1 day at 20 minute intervals were used, the highest prediction performance was 0.008 for MAE and 0.011 for RMSE in LGBM. In addition, it was confirmed that the factor that most influences VPD prediction after 20 minutes was VPD (VPD_y__71) from the past 20 minutes rather than environmental factors. Using the results of this study, it is possible to increase crop productivity through VPD prediction, condensation of greenhouses, and prevention of disease occurrence. In the future, it can be used not only in predicting environmental data of greenhouses, but also in various fields such as production prediction and smart farm control models.

Remote Sensing based Algae Monitoring in Dams using High-resolution Satellite Image and Machine Learning (고해상도 위성영상과 머신러닝을 활용한 녹조 모니터링 기법 연구)

  • Jung, Jiyoung;Jang, Hyeon June;Kim, Sung Hoon;Choi, Young Don;Yi, Hye-Suk;Choi, Sunghwa
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2022.05a
    • /
    • pp.42-42
    • /
    • 2022
  • 지금까지도 유역에서의 녹조 모니터링은 현장채수를 통한 점 단위 모니터링에 크게 의존하고 있어 기후, 유속, 수온조건 등에 따라 수체에 광범위하게 발생하는 녹조를 효율적으로 모니터링하고 대응하기에는 어려운 점들이 있어왔다. 또한, 그동안 제한된 관측 데이터로 인해 현장 측정된 실측 데이터 보다는 녹조와 관련이 높은 NDVI, FGAI, SEI 등의 파생적인 지수를 산정하여 원격탐사자료와 매핑하는 방식의 분석연구 등이 선행되었다. 본 연구는 녹조의 모니터링시 정확도와 효율성을 향상을 목표로 하여, 우선은 녹조 측정장비를 활용, 7000개 이상의 녹조 관측 데이터를 확보하였으며, 이를 바탕으로 동기간의 고해상도 위성 자료와 실측자료를 매핑하기 위해 다양한Machine Learning기법을 적용함으로써 그 효과성을 검토하고자 하였다. 연구대상지는 낙동강 내성천 상류에 위치한 영주댐 유역으로서 데이터 수집단계에서는 면단위 현장(in-situ) 관측을 위해 2020년 2~9월까지 4회에 걸쳐 7291개의 녹조를 측정하고, 동일 시간 및 공간의 Sentinel-2자료 중 Band 1~12까지 총 13개(Band 8은 8과 8A로 2개)의 분광특성자료를 추출하였다. 다음으로 Machine Learning 분석기법의 적용을 위해 algae_monitoring Python library를 구축하였다. 개발된 library는 1) Training Set과 Test Set의 구분을 위한 Data 준비단계, 2) Random Forest, Gradient Boosting Regression, XGBoosting 알고리즘 중 선택하여 적용할 수 있는 모델적용단계, 3) 모델적용결과를 확인하는 Performance test단계(R2, MSE, MAE, RMSE, NSE, KGE 등), 4) 모델결과의 Visualization단계, 5) 선정된 모델을 활용 위성자료를 녹조값으로 변환하는 적용단계로 구분하여 영주댐뿐만 아니라 다양한 유역에 범용적으로 적용할 수 있도록 구성하였다. 본 연구의 사례에서는 Sentinel-2위성의 12개 밴드, 기상자료(대기온도, 구름비율) 총 14개자료를 활용하여 Machine Learning기법 중 Random Forest를 적용하였을 경우에, 전반적으로 가장 높은 적합도를 나타내었으며, 적용결과 Test Set을 기준으로 NSE(Nash Sutcliffe Efficiency)가 0.96(Training Set의 경우에는 0.99) 수준의 성능을 나타내어, 광역적인 위성자료와 충분히 확보된 현장실측 자료간의 데이터 학습을 통해서 조류 모니터링 분석의 효율성이 획기적으로 증대될 수 있음을 확인하였다.

  • PDF

Prediction of Residual Axillary Nodal Metastasis Following Neoadjuvant Chemotherapy for Breast Cancer: Radiomics Analysis Based on Chest Computed Tomography

  • Hyo-jae Lee;Anh-Tien Nguyen;Myung Won Song;Jong Eun Lee;Seol Bin Park;Won Gi Jeong;Min Ho Park;Ji Shin Lee;Ilwoo Park;Hyo Soon Lim
    • Korean Journal of Radiology
    • /
    • v.24 no.6
    • /
    • pp.498-511
    • /
    • 2023
  • Objective: To evaluate the diagnostic performance of chest computed tomography (CT)-based qualitative and radiomics models for predicting residual axillary nodal metastasis after neoadjuvant chemotherapy (NAC) for patients with clinically node-positive breast cancer. Materials and Methods: This retrospective study included 226 women (mean age, 51.4 years) with clinically node-positive breast cancer treated with NAC followed by surgery between January 2015 and July 2021. Patients were randomly divided into the training and test sets (4:1 ratio). The following predictive models were built: a qualitative CT feature model using logistic regression based on qualitative imaging features of axillary nodes from the pooled data obtained using the visual interpretations of three radiologists; three radiomics models using radiomics features from three (intranodal, perinodal, and combined) different regions of interest (ROIs) delineated on pre-NAC CT and post-NAC CT using a gradient-boosting classifier; and fusion models integrating clinicopathologic factors with the qualitative CT feature model (referred to as clinical-qualitative CT feature models) or with the combined ROI radiomics model (referred to as clinical-radiomics models). The area under the curve (AUC) was used to assess and compare the model performance. Results: Clinical N stage, biological subtype, and primary tumor response indicated by imaging were associated with residual nodal metastasis during the multivariable analysis (all P < 0.05). The AUCs of the qualitative CT feature model and radiomics models (intranodal, perinodal, and combined ROI models) according to post-NAC CT were 0.642, 0.812, 0.762, and 0.832, respectively. The AUCs of the clinical-qualitative CT feature model and clinical-radiomics model according to post-NAC CT were 0.740 and 0.866, respectively. Conclusion: CT-based predictive models showed good diagnostic performance for predicting residual nodal metastasis after NAC. Quantitative radiomics analysis may provide a higher level of performance than qualitative CT features models. Larger multicenter studies should be conducted to confirm their performance.

Probability Map of Migratory Bird Habitat for Rational Management of Conservation Areas - Focusing on Busan Eco Delta City (EDC) - (보존지역의 합리적 관리를 위한 철새 서식 확률지도 구축 - 부산 Eco Delta City (EDC)를 중심으로 -)

  • Kim, Geun Han;Kong, Seok Jun;Kim, Hee Nyun;Koo, Kyung Ah
    • Journal of the Korean Society of Environmental Restoration Technology
    • /
    • v.26 no.6
    • /
    • pp.67-84
    • /
    • 2023
  • In some areas of the Republic of Korea, the designation and management of conservation areas do not adequately reflect regional characteristics and often impose behavioral regulations without considering the local context. One prominent example is the Busan EDC area. As a result, conflicts may arise, including large-scale civil complaints, regarding the conservation and utilization of these areas. Therefore, for the efficient designation and management of protected areas, it is necessary to consider various ecosystem factors, changes in land use, and regional characteristics. In this study, we specifically focused on the Busan EDC area and applied machine learning techniques to analyze the habitat of regional species. Additionally, we employed Explainable Artificial Intelligence techniques to interpret the results of our analysis. To analyze the regional characteristics of the waterfront area in the Busan EDC district and the habitat of migratory birds, we used bird observations as dependent variables, distinguishing between presence and absence. The independent variables were constructed using land cover, elevation, slope, bridges, and river depth data. We utilized the XGBoost (eXtreme Gradient Boosting) model, known for its excellent performance in various fields, to predict the habitat probabilities of 11 bird species. Furthermore, we employed the SHapley Additive exPlanations technique, one of the representative methodologies of XAI, to analyze the relative importance and impact of the variables used in the model. The analysis results showed that in the EDC business district, as one moves closer to the river from the waterfront, the likelihood of bird habitat increases based on the overlapping habitat probabilities of the analyzed bird species. By synthesizing the major variables influencing the habitat of each species, key variables such as rivers, rice fields, fields, pastures, inland wetlands, tidal flats, orchards, cultivated lands, cliffs & rocks, elevation, lakes, and deciduous forests were identified as areas that can serve as habitats, shelters, resting places, and feeding grounds for birds. On the other hand, artificial structures such as bridges, railways, and other public facilities were found to have a negative impact on bird habitat. The development of a management plan for conservation areas based on the objective analysis presented in this study is expected to be extensively utilized in the future. It will provide diverse evidential materials for establishing effective conservation area management strategies.

Performance of Prediction Models for Diagnosing Severe Aortic Stenosis Based on Aortic Valve Calcium on Cardiac Computed Tomography: Incorporation of Radiomics and Machine Learning

  • Nam gyu Kang;Young Joo Suh;Kyunghwa Han;Young Jin Kim;Byoung Wook Choi
    • Korean Journal of Radiology
    • /
    • v.22 no.3
    • /
    • pp.334-343
    • /
    • 2021
  • Objective: We aimed to develop a prediction model for diagnosing severe aortic stenosis (AS) using computed tomography (CT) radiomics features of aortic valve calcium (AVC) and machine learning (ML) algorithms. Materials and Methods: We retrospectively enrolled 408 patients who underwent cardiac CT between March 2010 and August 2017 and had echocardiographic examinations (240 patients with severe AS on echocardiography [the severe AS group] and 168 patients without severe AS [the non-severe AS group]). Data were divided into a training set (312 patients) and a validation set (96 patients). Using non-contrast-enhanced cardiac CT scans, AVC was segmented, and 128 radiomics features for AVC were extracted. After feature selection was performed with three ML algorithms (least absolute shrinkage and selection operator [LASSO], random forests [RFs], and eXtreme Gradient Boosting [XGBoost]), model classifiers for diagnosing severe AS on echocardiography were developed in combination with three different model classifier methods (logistic regression, RF, and XGBoost). The performance (c-index) of each radiomics prediction model was compared with predictions based on AVC volume and score. Results: The radiomics scores derived from LASSO were significantly different between the severe AS and non-severe AS groups in the validation set (median, 1.563 vs. 0.197, respectively, p < 0.001). A radiomics prediction model based on feature selection by LASSO + model classifier by XGBoost showed the highest c-index of 0.921 (95% confidence interval [CI], 0.869-0.973) in the validation set. Compared to prediction models based on AVC volume and score (c-indexes of 0.894 [95% CI, 0.815-0.948] and 0.899 [95% CI, 0.820-0.951], respectively), eight and three of the nine radiomics prediction models showed higher discrimination abilities for severe AS. However, the differences were not statistically significant (p > 0.05 for all). Conclusion: Models based on the radiomics features of AVC and ML algorithms may perform well for diagnosing severe AS, but the added value compared to AVC volume and score should be investigated further.

A Recidivism Prediction Model Based on XGBoost Considering Asymmetric Error Costs (비대칭 오류 비용을 고려한 XGBoost 기반 재범 예측 모델)

  • Won, Ha-Ram;Shim, Jae-Seung;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.127-137
    • /
    • 2019
  • Recidivism prediction has been a subject of constant research by experts since the early 1970s. But it has become more important as committed crimes by recidivist steadily increase. Especially, in the 1990s, after the US and Canada adopted the 'Recidivism Risk Assessment Report' as a decisive criterion during trial and parole screening, research on recidivism prediction became more active. And in the same period, empirical studies on 'Recidivism Factors' were started even at Korea. Even though most recidivism prediction studies have so far focused on factors of recidivism or the accuracy of recidivism prediction, it is important to minimize the prediction misclassification cost, because recidivism prediction has an asymmetric error cost structure. In general, the cost of misrecognizing people who do not cause recidivism to cause recidivism is lower than the cost of incorrectly classifying people who would cause recidivism. Because the former increases only the additional monitoring costs, while the latter increases the amount of social, and economic costs. Therefore, in this paper, we propose an XGBoost(eXtream Gradient Boosting; XGB) based recidivism prediction model considering asymmetric error cost. In the first step of the model, XGB, being recognized as high performance ensemble method in the field of data mining, was applied. And the results of XGB were compared with various prediction models such as LOGIT(logistic regression analysis), DT(decision trees), ANN(artificial neural networks), and SVM(support vector machines). In the next step, the threshold is optimized to minimize the total misclassification cost, which is the weighted average of FNE(False Negative Error) and FPE(False Positive Error). To verify the usefulness of the model, the model was applied to a real recidivism prediction dataset. As a result, it was confirmed that the XGB model not only showed better prediction accuracy than other prediction models but also reduced the cost of misclassification most effectively.