• Title/Summary/Keyword: gradient boosting

Search Result 221, Processing Time 0.029 seconds

Predicting the Pre-Harvest Sprouting Rate in Rice Using Machine Learning (기계학습을 이용한 벼 수발아율 예측)

  • Ban, Ho-Young;Jeong, Jae-Hyeok;Hwang, Woon-Ha;Lee, Hyeon-Seok;Yang, Seo-Yeong;Choi, Myong-Goo;Lee, Chung-Keun;Lee, Ji-U;Lee, Chae Young;Yun, Yeo-Tae;Han, Chae Min;Shin, Seo Ho;Lee, Seong-Tae
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.22 no.4
    • /
    • pp.239-249
    • /
    • 2020
  • Rice flour varieties have been developed to replace wheat, and consumption of rice flour has been encouraged. damage related to pre-harvest sprouting was occurring due to a weather disaster during the ripening period. Thus, it is necessary to develop pre-harvest sprouting rate prediction system to minimize damage for pre-harvest sprouting. Rice cultivation experiments from 20 17 to 20 19 were conducted with three rice flour varieties at six regions in Gangwon-do, Chungcheongbuk-do, and Gyeongsangbuk-do. Survey components were the heading date and pre-harvest sprouting at the harvest date. The weather data were collected daily mean temperature, relative humidity, and rainfall using Automated Synoptic Observing System (ASOS) with the same region name. Gradient Boosting Machine (GBM) which is a machine learning model, was used to predict the pre-harvest sprouting rate, and the training input variables were mean temperature, relative humidity, and total rainfall. Also, the experiment for the period from days after the heading date (DAH) to the subsequent period (DA2H) was conducted to establish the period related to pre-harvest sprouting. The data were divided into training-set and vali-set for calibration of period related to pre-harvest sprouting, and test-set for validation. The result for training-set and vali-set showed the highest score for a period of 22 DAH and 24 DA2H. The result for test-set tended to overpredict pre-harvest sprouting rate on a section smaller than 3.0 %. However, the result showed a high prediction performance (R2=0.76). Therefore, it is expected that the pre-harvest sprouting rate could be able to easily predict with weather components for a specific period using machine learning.

A Study on the Retrieval of River Turbidity Based on KOMPSAT-3/3A Images (KOMPSAT-3/3A 영상 기반 하천의 탁도 산출 연구)

  • Kim, Dahui;Won, You Jun;Han, Sangmyung;Han, Hyangsun
    • Korean Journal of Remote Sensing
    • /
    • v.38 no.6_1
    • /
    • pp.1285-1300
    • /
    • 2022
  • Turbidity, the measure of the cloudiness of water, is used as an important index for water quality management. The turbidity can vary greatly in small river systems, which affects water quality in national rivers. Therefore, the generation of high-resolution spatial information on turbidity is very important. In this study, a turbidity retrieval model using the Korea Multi-Purpose Satellite-3 and -3A (KOMPSAT-3/3A) images was developed for high-resolution turbidity mapping of Han River system based on eXtreme Gradient Boosting (XGBoost) algorithm. To this end, the top of atmosphere (TOA) spectral reflectance was calculated from a total of 24 KOMPSAT-3/3A images and 150 Landsat-8 images. The Landsat-8 TOA spectral reflectance was cross-calibrated to the KOMPSAT-3/3A bands. The turbidity measured by the National Water Quality Monitoring Network was used as a reference dataset, and as input variables, the TOA spectral reflectance at the locations of in situ turbidity measurement, the spectral indices (the normalized difference vegetation index, normalized difference water index, and normalized difference turbidity index), and the Moderate Resolution Imaging Spectroradiometer (MODIS)-derived atmospheric products(the atmospheric optical thickness, water vapor, and ozone) were used. Furthermore, by analyzing the KOMPSAT-3/3A TOA spectral reflectance of different turbidities, a new spectral index, new normalized difference turbidity index (nNDTI), was proposed, and it was added as an input variable to the turbidity retrieval model. The XGBoost model showed excellent performance for the retrieval of turbidity with a root mean square error (RMSE) of 2.70 NTU and a normalized RMSE (NRMSE) of 14.70% compared to in situ turbidity, in which the nNDTI proposed in this study was used as the most important variable. The developed turbidity retrieval model was applied to the KOMPSAT-3/3A images to map high-resolution river turbidity, and it was possible to analyze the spatiotemporal variations of turbidity. Through this study, we could confirm that the KOMPSAT-3/3A images are very useful for retrieving high-resolution and accurate spatial information on the river turbidity.

Generation of Daily High-resolution Sea Surface Temperature for the Seas around the Korean Peninsula Using Multi-satellite Data and Artificial Intelligence (다종 위성자료와 인공지능 기법을 이용한 한반도 주변 해역의 고해상도 해수면온도 자료 생산)

  • Jung, Sihun;Choo, Minki;Im, Jungho;Cho, Dongjin
    • Korean Journal of Remote Sensing
    • /
    • v.38 no.5_2
    • /
    • pp.707-723
    • /
    • 2022
  • Although satellite-based sea surface temperature (SST) is advantageous for monitoring large areas, spatiotemporal data gaps frequently occur due to various environmental or mechanical causes. Thus, it is crucial to fill in the gaps to maximize its usability. In this study, daily SST composite fields with a resolution of 4 km were produced through a two-step machine learning approach using polar-orbiting and geostationary satellite SST data. The first step was SST reconstruction based on Data Interpolate Convolutional AutoEncoder (DINCAE) using multi-satellite-derived SST data. The second step improved the reconstructed SST targeting in situ measurements based on light gradient boosting machine (LGBM) to finally produce daily SST composite fields. The DINCAE model was validated using random masks for 50 days, whereas the LGBM model was evaluated using leave-one-year-out cross-validation (LOYOCV). The SST reconstruction accuracy was high, resulting in R2 of 0.98, and a root-mean-square-error (RMSE) of 0.97℃. The accuracy increase by the second step was also high when compared to in situ measurements, resulting in an RMSE decrease of 0.21-0.29℃ and an MAE decrease of 0.17-0.24℃. The SST composite fields generated using all in situ data in this study were comparable with the existing data assimilated SST composite fields. In addition, the LGBM model in the second step greatly reduced the overfitting, which was reported as a limitation in the previous study that used random forest. The spatial distribution of the corrected SST was similar to those of existing high resolution SST composite fields, revealing that spatial details of oceanic phenomena such as fronts, eddies and SST gradients were well simulated. This research demonstrated the potential to produce high resolution seamless SST composite fields using multi-satellite data and artificial intelligence.

Estimation of High Resolution Sea Surface Salinity Using Multi Satellite Data and Machine Learning (다종 위성자료와 기계학습을 이용한 고해상도 표층 염분 추정)

  • Sung, Taejun;Sim, Seongmun;Jang, Eunna;Im, Jungho
    • Korean Journal of Remote Sensing
    • /
    • v.38 no.5_2
    • /
    • pp.747-763
    • /
    • 2022
  • Ocean salinity affects ocean circulation on a global scale and low salinity water around coastal areas often has an impact on aquaculture and fisheries. Microwave satellite sensors (e.g., Soil Moisture Active Passive [SMAP]) have provided sea surface salinity (SSS) based on the dielectric characteristics of water associated with SSS and sea surface temperature (SST). In this study, a Light Gradient Boosting Machine (LGBM)-based model for generating high resolution SSS from Geostationary Ocean Color Imager (GOCI) data was proposed, having machine learning-based improved SMAP SSS by Jang et al. (2022) as reference data (SMAP SSS (Jang)). Three schemes with different input variables were tested, and scheme 3 with all variables including Multi-scale Ultra-high Resolution SST yielded the best performance (coefficient of determination = 0.60, root mean square error = 0.91 psu). The proposed LGBM-based GOCI SSS had a similar spatiotemporal pattern with SMAP SSS (Jang), with much higher spatial resolution even in coastal areas, where SMAP SSS (Jang) was not available. In addition, when tested for the great flood occurred in Southern China in August 2020, GOCI SSS well simulated the spatial and temporal change of Changjiang Diluted Water. This research provided a potential that optical satellite data can be used to generate high resolution SSS associated with the improved microwave-based SSS especially in coastal areas.

Machine Learning Based MMS Point Cloud Semantic Segmentation (머신러닝 기반 MMS Point Cloud 의미론적 분할)

  • Bae, Jaegu;Seo, Dongju;Kim, Jinsoo
    • Korean Journal of Remote Sensing
    • /
    • v.38 no.5_3
    • /
    • pp.939-951
    • /
    • 2022
  • The most important factor in designing autonomous driving systems is to recognize the exact location of the vehicle within the surrounding environment. To date, various sensors and navigation systems have been used for autonomous driving systems; however, all have limitations. Therefore, the need for high-definition (HD) maps that provide high-precision infrastructure information for safe and convenient autonomous driving is increasing. HD maps are drawn using three-dimensional point cloud data acquired through a mobile mapping system (MMS). However, this process requires manual work due to the large numbers of points and drawing layers, increasing the cost and effort associated with HD mapping. The objective of this study was to improve the efficiency of HD mapping by segmenting semantic information in an MMS point cloud into six classes: roads, curbs, sidewalks, medians, lanes, and other elements. Segmentation was performed using various machine learning techniques including random forest (RF), support vector machine (SVM), k-nearest neighbor (KNN), and gradient-boosting machine (GBM), and 11 variables including geometry, color, intensity, and other road design features. MMS point cloud data for a 130-m section of a five-lane road near Minam Station in Busan, were used to evaluate the segmentation models; the average F1 scores of the models were 95.43% for RF, 92.1% for SVM, 91.05% for GBM, and 82.63% for KNN. The RF model showed the best segmentation performance, with F1 scores of 99.3%, 95.5%, 94.5%, 93.5%, and 90.1% for roads, sidewalks, curbs, medians, and lanes, respectively. The variable importance results of the RF model showed high mean decrease accuracy and mean decrease gini for XY dist. and Z dist. variables related to road design, respectively. Thus, variables related to road design contributed significantly to the segmentation of semantic information. The results of this study demonstrate the applicability of segmentation of MMS point cloud data based on machine learning, and will help to reduce the cost and effort associated with HD mapping.

Estimation of Chlorophyll-a Concentration in Nakdong River Using Machine Learning-Based Satellite Data and Water Quality, Hydrological, and Meteorological Factors (머신러닝 기반 위성영상과 수질·수문·기상 인자를 활용한 낙동강의 Chlorophyll-a 농도 추정)

  • Soryeon Park;Sanghun Son;Jaegu Bae;Doi Lee;Dongju Seo;Jinsoo Kim
    • Korean Journal of Remote Sensing
    • /
    • v.39 no.5_1
    • /
    • pp.655-667
    • /
    • 2023
  • Algal bloom outbreaks are frequently reported around the world, and serious water pollution problems arise every year in Korea. It is necessary to protect the aquatic ecosystem through continuous management and rapid response. Many studies using satellite images are being conducted to estimate the concentration of chlorophyll-a (Chl-a), an indicator of algal bloom occurrence. However, machine learning models have recently been used because it is difficult to accurately calculate Chl-a due to the spectral characteristics and atmospheric correction errors that change depending on the water system. It is necessary to consider the factors affecting algal bloom as well as the satellite spectral index. Therefore, this study constructed a dataset by considering water quality, hydrological and meteorological factors, and sentinel-2 images in combination. Representative ensemble models random forest and extreme gradient boosting (XGBoost) were used to predict the concentration of Chl-a in eight weirs located on the Nakdong river over the past five years. R-squared score (R2), root mean square errors (RMSE), and mean absolute errors (MAE) were used as model evaluation indicators, and it was confirmed that R2 of XGBoost was 0.80, RMSE was 6.612, and MAE was 4.457. Shapley additive expansion analysis showed that water quality factors, suspended solids, biochemical oxygen demand, dissolved oxygen, and the band ratio using red edge bands were of high importance in both models. Various input data were confirmed to help improve model performance, and it seems that it can be applied to domestic and international algal bloom detection.

Retrieval of Hourly Aerosol Optical Depth Using Top-of-Atmosphere Reflectance from GOCI-II and Machine Learning over South Korea (GOCI-II 대기상한 반사도와 기계학습을 이용한 남한 지역 시간별 에어로졸 광학 두께 산출)

  • Seyoung Yang;Hyunyoung Choi;Jungho Im
    • Korean Journal of Remote Sensing
    • /
    • v.39 no.5_3
    • /
    • pp.933-948
    • /
    • 2023
  • Atmospheric aerosols not only have adverse effects on human health but also exert direct and indirect impacts on the climate system. Consequently, it is imperative to comprehend the characteristics and spatiotemporal distribution of aerosols. Numerous research endeavors have been undertaken to monitor aerosols, predominantly through the retrieval of aerosol optical depth (AOD) via satellite-based observations. Nonetheless, this approach primarily relies on a look-up table-based inversion algorithm, characterized by computationally intensive operations and associated uncertainties. In this study, a novel high-resolution AOD direct retrieval algorithm, leveraging machine learning, was developed using top-of-atmosphere reflectance data derived from the Geostationary Ocean Color Imager-II (GOCI-II), in conjunction with their differences from the past 30-day minimum reflectance, and meteorological variables from numerical models. The Light Gradient Boosting Machine (LGBM) technique was harnessed, and the resultant estimates underwent rigorous validation encompassing random, temporal, and spatial N-fold cross-validation (CV) using ground-based observation data from Aerosol Robotic Network (AERONET) AOD. The three CV results consistently demonstrated robust performance, yielding R2=0.70-0.80, RMSE=0.08-0.09, and within the expected error (EE) of 75.2-85.1%. The Shapley Additive exPlanations(SHAP) analysis confirmed the substantial influence of reflectance-related variables on AOD estimation. A comprehensive examination of the spatiotemporal distribution of AOD in Seoul and Ulsan revealed that the developed LGBM model yielded results that are in close concordance with AERONET AOD over time, thereby confirming its suitability for AOD retrieval at high spatiotemporal resolution (i.e., hourly, 250 m). Furthermore, upon comparing data coverage, it was ascertained that the LGBM model enhanced data retrieval frequency by approximately 8.8% in comparison to the GOCI-II L2 AOD products, ameliorating issues associated with excessive masking over very illuminated surfaces that are often encountered in physics-based AOD retrieval processes.

A Study on Risk Parity Asset Allocation Model with XGBoos (XGBoost를 활용한 리스크패리티 자산배분 모형에 관한 연구)

  • Kim, Younghoon;Choi, HeungSik;Kim, SunWoong
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.1
    • /
    • pp.135-149
    • /
    • 2020
  • Artificial intelligences are changing world. Financial market is also not an exception. Robo-Advisor is actively being developed, making up the weakness of traditional asset allocation methods and replacing the parts that are difficult for the traditional methods. It makes automated investment decisions with artificial intelligence algorithms and is used with various asset allocation models such as mean-variance model, Black-Litterman model and risk parity model. Risk parity model is a typical risk-based asset allocation model which is focused on the volatility of assets. It avoids investment risk structurally. So it has stability in the management of large size fund and it has been widely used in financial field. XGBoost model is a parallel tree-boosting method. It is an optimized gradient boosting model designed to be highly efficient and flexible. It not only makes billions of examples in limited memory environments but is also very fast to learn compared to traditional boosting methods. It is frequently used in various fields of data analysis and has a lot of advantages. So in this study, we propose a new asset allocation model that combines risk parity model and XGBoost machine learning model. This model uses XGBoost to predict the risk of assets and applies the predictive risk to the process of covariance estimation. There are estimated errors between the estimation period and the actual investment period because the optimized asset allocation model estimates the proportion of investments based on historical data. these estimated errors adversely affect the optimized portfolio performance. This study aims to improve the stability and portfolio performance of the model by predicting the volatility of the next investment period and reducing estimated errors of optimized asset allocation model. As a result, it narrows the gap between theory and practice and proposes a more advanced asset allocation model. In this study, we used the Korean stock market price data for a total of 17 years from 2003 to 2019 for the empirical test of the suggested model. The data sets are specifically composed of energy, finance, IT, industrial, material, telecommunication, utility, consumer, health care and staple sectors. We accumulated the value of prediction using moving-window method by 1,000 in-sample and 20 out-of-sample, so we produced a total of 154 rebalancing back-testing results. We analyzed portfolio performance in terms of cumulative rate of return and got a lot of sample data because of long period results. Comparing with traditional risk parity model, this experiment recorded improvements in both cumulative yield and reduction of estimated errors. The total cumulative return is 45.748%, about 5% higher than that of risk parity model and also the estimated errors are reduced in 9 out of 10 industry sectors. The reduction of estimated errors increases stability of the model and makes it easy to apply in practical investment. The results of the experiment showed improvement of portfolio performance by reducing the estimated errors of the optimized asset allocation model. Many financial models and asset allocation models are limited in practical investment because of the most fundamental question of whether the past characteristics of assets will continue into the future in the changing financial market. However, this study not only takes advantage of traditional asset allocation models, but also supplements the limitations of traditional methods and increases stability by predicting the risks of assets with the latest algorithm. There are various studies on parametric estimation methods to reduce the estimated errors in the portfolio optimization. We also suggested a new method to reduce estimated errors in optimized asset allocation model using machine learning. So this study is meaningful in that it proposes an advanced artificial intelligence asset allocation model for the fast-developing financial markets.

Suggestion of Urban Regeneration Type Recommendation System Based on Local Characteristics Using Text Mining (텍스트 마이닝을 활용한 지역 특성 기반 도시재생 유형 추천 시스템 제안)

  • Kim, Ikjun;Lee, Junho;Kim, Hyomin;Kang, Juyoung
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.3
    • /
    • pp.149-169
    • /
    • 2020
  • "The Urban Renewal New Deal project", one of the government's major national projects, is about developing underdeveloped areas by investing 50 trillion won in 100 locations on the first year and 500 over the next four years. This project is drawing keen attention from the media and local governments. However, the project model which fails to reflect the original characteristics of the area as it divides project area into five categories: "Our Neighborhood Restoration, Housing Maintenance Support Type, General Neighborhood Type, Central Urban Type, and Economic Base Type," According to keywords for successful urban regeneration in Korea, "resident participation," "regional specialization," "ministerial cooperation" and "public-private cooperation", when local governments propose urban regeneration projects to the government, they can see that it is most important to accurately understand the characteristics of the city and push ahead with the projects in a way that suits the characteristics of the city with the help of local residents and private companies. In addition, considering the gentrification problem, which is one of the side effects of urban regeneration projects, it is important to select and implement urban regeneration types suitable for the characteristics of the area. In order to supplement the limitations of the 'Urban Regeneration New Deal Project' methodology, this study aims to propose a system that recommends urban regeneration types suitable for urban regeneration sites by utilizing various machine learning algorithms, referring to the urban regeneration types of the '2025 Seoul Metropolitan Government Urban Regeneration Strategy Plan' promoted based on regional characteristics. There are four types of urban regeneration in Seoul: "Low-use Low-Level Development, Abandonment, Deteriorated Housing, and Specialization of Historical and Cultural Resources" (Shon and Park, 2017). In order to identify regional characteristics, approximately 100,000 text data were collected for 22 regions where the project was carried out for a total of four types of urban regeneration. Using the collected data, we drew key keywords for each region according to the type of urban regeneration and conducted topic modeling to explore whether there were differences between types. As a result, it was confirmed that a number of topics related to real estate and economy appeared in old residential areas, and in the case of declining and underdeveloped areas, topics reflecting the characteristics of areas where industrial activities were active in the past appeared. In the case of the historical and cultural resource area, since it is an area that contains traces of the past, many keywords related to the government appeared. Therefore, it was possible to confirm political topics and cultural topics resulting from various events. Finally, in the case of low-use and under-developed areas, many topics on real estate and accessibility are emerging, so accessibility is good. It mainly had the characteristics of a region where development is planned or is likely to be developed. Furthermore, a model was implemented that proposes urban regeneration types tailored to regional characteristics for regions other than Seoul. Machine learning technology was used to implement the model, and training data and test data were randomly extracted at an 8:2 ratio and used. In order to compare the performance between various models, the input variables are set in two ways: Count Vector and TF-IDF Vector, and as Classifier, there are 5 types of SVM (Support Vector Machine), Decision Tree, Random Forest, Logistic Regression, and Gradient Boosting. By applying it, performance comparison for a total of 10 models was conducted. The model with the highest performance was the Gradient Boosting method using TF-IDF Vector input data, and the accuracy was 97%. Therefore, the recommendation system proposed in this study is expected to recommend urban regeneration types based on the regional characteristics of new business sites in the process of carrying out urban regeneration projects."

A Smart Farm Environment Optimization and Yield Prediction Platform based on IoT and Deep Learning (IoT 및 딥 러닝 기반 스마트 팜 환경 최적화 및 수확량 예측 플랫폼)

  • Choi, Hokil;Ahn, Heuihak;Jeong, Yina;Lee, Byungkwan
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.12 no.6
    • /
    • pp.672-680
    • /
    • 2019
  • This paper proposes "A Smart Farm Environment Optimization and Yield Prediction Platform based on IoT and Deep Learning" which gathers bio-sensor data from farms, diagnoses the diseases of growing crops, and predicts the year's harvest. The platform collects all the information currently available such as weather and soil microbes, optimizes the farm environment so that the crops can grow well, diagnoses the crop's diseases by using the leaves of the crops being grown on the farm, and predicts this year's harvest by using all the information on the farm. The result shows that the average accuracy of the AEOM is about 15% higher than that of the RF and about 8% higher than the GBD. Although data increases, the accuracy is reduced less than that of the RF or GBD. The linear regression shows that the slope of accuracy is -3.641E-4 for the ReLU, -4.0710E-4 for the Sigmoid, and -7.4534E-4 for the step function. Therefore, as the amount of test data increases, the ReLU is more accurate than the other two activation functions. This paper is a platform for managing the entire farm and, if introduced to actual farms, will greatly contribute to the development of smart farms in Korea.