• Title/Summary/Keyword: machine learning

Search Result 5,495, Processing Time 0.034 seconds

Comparative Assessment of Linear Regression and Machine Learning for Analyzing the Spatial Distribution of Ground-level NO2 Concentrations: A Case Study for Seoul, Korea (서울 지역 지상 NO2 농도 공간 분포 분석을 위한 회귀 모델 및 기계학습 기법 비교)

  • Kang, Eunjin;Yoo, Cheolhee;Shin, Yeji;Cho, Dongjin;Im, Jungho
    • Korean Journal of Remote Sensing
    • /
    • v.37 no.6_1
    • /
    • pp.1739-1756
    • /
    • 2021
  • Atmospheric nitrogen dioxide (NO2) is mainly caused by anthropogenic emissions. It contributes to the formation of secondary pollutants and ozone through chemical reactions, and adversely affects human health. Although ground stations to monitor NO2 concentrations in real time are operated in Korea, they have a limitation that it is difficult to analyze the spatial distribution of NO2 concentrations, especially over the areas with no stations. Therefore, this study conducted a comparative experiment of spatial interpolation of NO2 concentrations based on two linear-regression methods(i.e., multi linear regression (MLR), and regression kriging (RK)), and two machine learning approaches (i.e., random forest (RF), and support vector regression (SVR)) for the year of 2020. Four approaches were compared using leave-one-out-cross validation (LOOCV). The daily LOOCV results showed that MLR, RK, and SVR produced the average daily index of agreement (IOA) of 0.57, which was higher than that of RF (0.50). The average daily normalized root mean square error of RK was 0.9483%, which was slightly lower than those of the other models. MLR, RK and SVR showed similar seasonal distribution patterns, and the dynamic range of the resultant NO2 concentrations from these three models was similar while that from RF was relatively small. The multivariate linear regression approaches are expected to be a promising method for spatial interpolation of ground-level NO2 concentrations and other parameters in urban areas.

A Comparative Evaluation of Multiple Meteorological Datasets for the Rice Yield Prediction at the County Level in South Korea (우리나라 시군단위 벼 수확량 예측을 위한 다종 기상자료의 비교평가)

  • Cho, Subin;Youn, Youjeong;Kim, Seoyeon;Jeong, Yemin;Kim, Gunah;Kang, Jonggu;Kim, Kwangjin;Cho, Jaeil;Lee, Yangwon
    • Korean Journal of Remote Sensing
    • /
    • v.37 no.2
    • /
    • pp.337-357
    • /
    • 2021
  • Because the growth of paddy rice is affected by meteorological factors, the selection of appropriate meteorological variables is essential to build a rice yield prediction model. This paper examines the suitability of multiple meteorological datasets for the rice yield modeling in South Korea, 1996-2019, and a hindcast experiment for rice yield using a machine learning method by considering the nonlinear relationships between meteorological variables and the rice yield. In addition to the ASOS in-situ observations, we used CRU-JRA ver. 2.1 and ERA5 reanalysis. From the multiple meteorological datasets, we extracted the four common variables (air temperature, relative humidity, solar radiation, and precipitation) and analyzed the characteristics of each data and the associations with rice yields. CRU-JRA ver. 2.1 showed an overall agreement with the other datasets. While relative humidity had a rare relationship with rice yields, solar radiation showed a somewhat high correlation with rice yields. Using the air temperature, solar radiation, and precipitation of July, August, and September, we built a random forest model for the hindcast experiments of rice yields. The model with CRU-JRA ver. 2.1 showed the best performance with a correlation coefficient of 0.772. The solar radiation in the prediction model had the most significant importance among the variables, which is in accordance with the generic agricultural knowledge. This paper has an implication for selecting from multiple meteorological datasets for rice yield modeling.

Regeneration of a defective Railroad Surface for defect detection with Deep Convolution Neural Networks (Deep Convolution Neural Networks 이용하여 결함 검출을 위한 결함이 있는 철도선로표면 디지털영상 재 생성)

  • Kim, Hyeonho;Han, Seokmin
    • Journal of Internet Computing and Services
    • /
    • v.21 no.6
    • /
    • pp.23-31
    • /
    • 2020
  • This study was carried out to generate various images of railroad surfaces with random defects as training data to be better at the detection of defects. Defects on the surface of railroads are caused by various factors such as friction between track binding devices and adjacent tracks and can cause accidents such as broken rails, so railroad maintenance for defects is necessary. Therefore, various researches on defect detection and inspection using image processing or machine learning on railway surface images have been conducted to automate railroad inspection and to reduce railroad maintenance costs. In general, the performance of the image processing analysis method and machine learning technology is affected by the quantity and quality of data. For this reason, some researches require specific devices or vehicles to acquire images of the track surface at regular intervals to obtain a database of various railway surface images. On the contrary, in this study, in order to reduce and improve the operating cost of image acquisition, we constructed the 'Defective Railroad Surface Regeneration Model' by applying the methods presented in the related studies of the Generative Adversarial Network (GAN). Thus, we aimed to detect defects on railroad surface even without a dedicated database. This constructed model is designed to learn to generate the railroad surface combining the different railroad surface textures and the original surface, considering the ground truth of the railroad defects. The generated images of the railroad surface were used as training data in defect detection network, which is based on Fully Convolutional Network (FCN). To validate its performance, we clustered and divided the railroad data into three subsets, one subset as original railroad texture images and the remaining two subsets as another railroad surface texture images. In the first experiment, we used only original texture images for training sets in the defect detection model. And in the second experiment, we trained the generated images that were generated by combining the original images with a few railroad textures of the other images. Each defect detection model was evaluated in terms of 'intersection of union(IoU)' and F1-score measures with ground truths. As a result, the scores increased by about 10~15% when the generated images were used, compared to the case that only the original images were used. This proves that it is possible to detect defects by using the existing data and a few different texture images, even for the railroad surface images in which dedicated training database is not constructed.

Gridded Expansion of Forest Flux Observations and Mapping of Daily CO2 Absorption by the Forests in Korea Using Numerical Weather Prediction Data and Satellite Images (국지예보모델과 위성영상을 이용한 극상림 플럭스 관측의 공간연속면 확장 및 우리나라 산림의 일일 탄소흡수능 격자자료 산출)

  • Kim, Gunah;Cho, Jaeil;Kang, Minseok;Lee, Bora;Kim, Eun-Sook;Choi, Chuluong;Lee, Hanlim;Lee, Taeyun;Lee, Yangwon
    • Korean Journal of Remote Sensing
    • /
    • v.36 no.6_1
    • /
    • pp.1449-1463
    • /
    • 2020
  • As recent global warming and climate changes become more serious, the importance of CO2 absorption by forests is increasing to cope with the greenhouse gas issues. According to the UN Framework Convention on Climate Change, it is required to calculate national CO2 absorptions at the local level in a more scientific and rigorous manner. This paper presents the gridded expansion of forest flux observations and mapping of daily CO2 absorption by the forests in Korea using numerical weather prediction data and satellite images. To consider the sensitive daily changes of plant photosynthesis, we built a machine learning model to retrieve the daily RACA (reference amount of CO2 absorption) by referring to the climax forest in Gwangneung and adopted the NIFoS (National Institute of Forest Science) lookup table for the CO2 absorption by forest type and age to produce the daily AACA (actual amount of CO2 absorption) raster data with the spatial variation of the forests in Korea. In the experiment for the 1,095 days between Jan 1, 2013 and Dec 31, 2015, our RACA retrieval model showed high accuracy with a correlation coefficient of 0.948. To achieve the tier 3 daily statistics for AACA, long-term and detailed forest surveying should be combined with the model in the future.

Prediction of patent lifespan and analysis of influencing factors using machine learning (기계학습을 활용한 특허수명 예측 및 영향요인 분석)

  • Kim, Yongwoo;Kim, Min Gu;Kim, Young-Min
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.2
    • /
    • pp.147-170
    • /
    • 2022
  • Although the number of patent which is one of the core outputs of technological innovation continues to increase, the number of low-value patents also hugely increased. Therefore, efficient evaluation of patents has become important. Estimation of patent lifespan which represents private value of a patent, has been studied for a long time, but in most cases it relied on a linear model. Even if machine learning methods were used, interpretation or explanation of the relationship between explanatory variables and patent lifespan was insufficient. In this study, patent lifespan (number of renewals) is predicted based on the idea that patent lifespan represents the value of the patent. For the research, 4,033,414 patents applied between 1996 and 2017 and finally granted were collected from USPTO (US Patent and Trademark Office). To predict the patent lifespan, we use variables that can reflect the characteristics of the patent, the patent owner's characteristics, and the inventor's characteristics. We build four different models (Ridge Regression, Random Forest, Feed Forward Neural Network, Gradient Boosting Models) and perform hyperparameter tuning through 5-fold Cross Validation. Then, the performance of the generated models are evaluated, and the relative importance of predictors is also presented. In addition, based on the Gradient Boosting Model which have excellent performance, Accumulated Local Effects Plot is presented to visualize the relationship between predictors and patent lifespan. Finally, we apply Kernal SHAP (SHapley Additive exPlanations) to present the evaluation reason of individual patents, and discuss applicability to the patent evaluation system. This study has academic significance in that it cumulatively contributes to the existing patent life estimation research and supplements the limitations of existing patent life estimation studies based on linearity. It is academically meaningful that this study contributes cumulatively to the existing studies which estimate patent lifespan, and that it supplements the limitations of linear models. Also, it is practically meaningful to suggest a method for deriving the evaluation basis for individual patent value and examine the applicability to patent evaluation systems.

Preliminary Inspection Prediction Model to select the on-Site Inspected Foreign Food Facility using Multiple Correspondence Analysis (차원축소를 활용한 해외제조업체 대상 사전점검 예측 모형에 관한 연구)

  • Hae Jin Park;Jae Suk Choi;Sang Goo Cho
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.1
    • /
    • pp.121-142
    • /
    • 2023
  • As the number and weight of imported food are steadily increasing, safety management of imported food to prevent food safety accidents is becoming more important. The Ministry of Food and Drug Safety conducts on-site inspections of foreign food facilities before customs clearance as well as import inspection at the customs clearance stage. However, a data-based safety management plan for imported food is needed due to time, cost, and limited resources. In this study, we tried to increase the efficiency of the on-site inspection by preparing a machine learning prediction model that pre-selects the companies that are expected to fail before the on-site inspection. Basic information of 303,272 foreign food facilities and processing businesses collected in the Integrated Food Safety Information Network and 1,689 cases of on-site inspection information data collected from 2019 to April 2022 were collected. After preprocessing the data of foreign food facilities, only the data subject to on-site inspection were extracted using the foreign food facility_code. As a result, it consisted of a total of 1,689 data and 103 variables. For 103 variables, variables that were '0' were removed based on the Theil-U index, and after reducing by applying Multiple Correspondence Analysis, 49 characteristic variables were finally derived. We build eight different models and perform hyperparameter tuning through 5-fold cross validation. Then, the performance of the generated models are evaluated. The research purpose of selecting companies subject to on-site inspection is to maximize the recall, which is the probability of judging nonconforming companies as nonconforming. As a result of applying various algorithms of machine learning, the Random Forest model with the highest Recall_macro, AUROC, Average PR, F1-score, and Balanced Accuracy was evaluated as the best model. Finally, we apply Kernal SHAP (SHapley Additive exPlanations) to present the selection reason for nonconforming facilities of individual instances, and discuss applicability to the on-site inspection facility selection system. Based on the results of this study, it is expected that it will contribute to the efficient operation of limited resources such as manpower and budget by establishing an imported food management system through a data-based scientific risk management model.

Development of 1ST-Model for 1 hour-heavy rain damage scale prediction based on AI models (1시간 호우피해 규모 예측을 위한 AI 기반의 1ST-모형 개발)

  • Lee, Joonhak;Lee, Haneul;Kang, Narae;Hwang, Seokhwan;Kim, Hung Soo;Kim, Soojun
    • Journal of Korea Water Resources Association
    • /
    • v.56 no.5
    • /
    • pp.311-323
    • /
    • 2023
  • In order to reduce disaster damage by localized heavy rains, floods, and urban inundation, it is important to know in advance whether natural disasters occur. Currently, heavy rain watch and heavy rain warning by the criteria of the Korea Meteorological Administration are being issued in Korea. However, since this one criterion is applied to the whole country, we can not clearly recognize heavy rain damage for a specific region in advance. Therefore, in this paper, we tried to reset the current criteria for a special weather report which considers the regional characteristics and to predict the damage caused by rainfall after 1 hour. The study area was selected as Gyeonggi-province, where has more frequent heavy rain damage than other regions. Then, the rainfall inducing disaster or hazard-triggering rainfall was set by utilizing hourly rainfall and heavy rain damage data, considering the local characteristics. The heavy rain damage prediction model was developed by a decision tree model and a random forest model, which are machine learning technique and by rainfall inducing disaster and rainfall data. In addition, long short-term memory and deep neural network models were used for predicting rainfall after 1 hour. The predicted rainfall by a developed prediction model was applied to the trained classification model and we predicted whether the rain damage after 1 hour will be occurred or not and we called this as 1ST-Model. The 1ST-Model can be used for preventing and preparing heavy rain disaster and it is judged to be of great contribution in reducing damage caused by heavy rain.

Predicting the Effects of Rooftop Greening and Evaluating CO2 Sequestration in Urban Heat Island Areas Using Satellite Imagery and Machine Learning (위성영상과 머신러닝 활용 도시열섬 지역 옥상녹화 효과 예측과 이산화탄소 흡수량 평가)

  • Minju Kim;Jeong U Park;Juhyeon Park;Jisoo Park;Chang-Uk Hyun
    • Korean Journal of Remote Sensing
    • /
    • v.39 no.5_1
    • /
    • pp.481-493
    • /
    • 2023
  • In high-density urban areas, the urban heat island effect increases urban temperatures, leading to negative impacts such as worsened air pollution, increased cooling energy consumption, and increased greenhouse gas emissions. In urban environments where it is difficult to secure additional green spaces, rooftop greening is an efficient greenhouse gas reduction strategy. In this study, we not only analyzed the current status of the urban heat island effect but also utilized high-resolution satellite data and spatial information to estimate the available rooftop greening area within the study area. We evaluated the mitigation effect of the urban heat island phenomenon and carbon sequestration capacity through temperature predictions resulting from rooftop greening. To achieve this, we utilized WorldView-2 satellite data to classify land cover in the urban heat island areas of Busan city. We developed a prediction model for temperature changes before and after rooftop greening using machine learning techniques. To assess the degree of urban heat island mitigation due to changes in rooftop greening areas, we constructed a temperature change prediction model with temperature as the dependent variable using the random forest technique. In this process, we built a multiple regression model to derive high-resolution land surface temperatures for training data using Google Earth Engine, combining Landsat-8 and Sentinel-2 satellite data. Additionally, we evaluated carbon sequestration based on rooftop greening areas using a carbon absorption capacity per plant. The results of this study suggest that the developed satellite-based urban heat island assessment and temperature change prediction technology using Random Forest models can be applied to urban heat island-vulnerable areas with potential for expansion.

Development of High-Resolution Fog Detection Algorithm for Daytime by Fusing GK2A/AMI and GK2B/GOCI-II Data (GK2A/AMI와 GK2B/GOCI-II 자료를 융합 활용한 주간 고해상도 안개 탐지 알고리즘 개발)

  • Ha-Yeong Yu;Myoung-Seok Suh
    • Korean Journal of Remote Sensing
    • /
    • v.39 no.6_3
    • /
    • pp.1779-1790
    • /
    • 2023
  • Satellite-based fog detection algorithms are being developed to detect fog in real-time over a wide area, with a focus on the Korean Peninsula (KorPen). The GEO-KOMPSAT-2A/Advanced Meteorological Imager (GK2A/AMI, GK2A) satellite offers an excellent temporal resolution (10 min) and a spatial resolution (500 m), while GEO-KOMPSAT-2B/Geostationary Ocean Color Imager-II (GK2B/GOCI-II, GK2B) provides an excellent spatial resolution (250 m) but poor temporal resolution (1 h) with only visible channels. To enhance the fog detection level (10 min, 250 m), we developed a fused GK2AB fog detection algorithm (FDA) of GK2A and GK2B. The GK2AB FDA comprises three main steps. First, the Korea Meteorological Satellite Center's GK2A daytime fog detection algorithm is utilized to detect fog, considering various optical and physical characteristics. In the second step, GK2B data is extrapolated to 10-min intervals by matching GK2A pixels based on the closest time and location when GK2B observes the KorPen. For reflectance, GK2B normalized visible (NVIS) is corrected using GK2A NVIS of the same time, considering the difference in wavelength range and observation geometry. GK2B NVIS is extrapolated at 10-min intervals using the 10-min changes in GK2A NVIS. In the final step, the extrapolated GK2B NVIS, solar zenith angle, and outputs of GK2A FDA are utilized as input data for machine learning (decision tree) to develop the GK2AB FDA, which detects fog at a resolution of 250 m and a 10-min interval based on geographical locations. Six and four cases were used for the training and validation of GK2AB FDA, respectively. Quantitative verification of GK2AB FDA utilized ground observation data on visibility, wind speed, and relative humidity. Compared to GK2A FDA, GK2AB FDA exhibited a fourfold increase in spatial resolution, resulting in more detailed discrimination between fog and non-fog pixels. In general, irrespective of the validation method, the probability of detection (POD) and the Hanssen-Kuiper Skill score (KSS) are high or similar, indicating that it better detects previously undetected fog pixels. However, GK2AB FDA, compared to GK2A FDA, tends to over-detect fog with a higher false alarm ratio and bias.

Data-centric XAI-driven Data Imputation of Molecular Structure and QSAR Model for Toxicity Prediction of 3D Printing Chemicals (3D 프린팅 소재 화학물질의 독성 예측을 위한 Data-centric XAI 기반 분자 구조 Data Imputation과 QSAR 모델 개발)

  • ChanHyeok Jeong;SangYoun Kim;SungKu Heo;Shahzeb Tariq;MinHyeok Shin;ChangKyoo Yoo
    • Korean Chemical Engineering Research
    • /
    • v.61 no.4
    • /
    • pp.523-541
    • /
    • 2023
  • As accessibility to 3D printers increases, there is a growing frequency of exposure to chemicals associated with 3D printing. However, research on the toxicity and harmfulness of chemicals generated by 3D printing is insufficient, and the performance of toxicity prediction using in silico techniques is limited due to missing molecular structure data. In this study, quantitative structure-activity relationship (QSAR) model based on data-centric AI approach was developed to predict the toxicity of new 3D printing materials by imputing missing values in molecular descriptors. First, MissForest algorithm was utilized to impute missing values in molecular descriptors of hazardous 3D printing materials. Then, based on four different machine learning models (decision tree, random forest, XGBoost, SVM), a machine learning (ML)-based QSAR model was developed to predict the bioconcentration factor (Log BCF), octanol-air partition coefficient (Log Koa), and partition coefficient (Log P). Furthermore, the reliability of the data-centric QSAR model was validated through the Tree-SHAP (SHapley Additive exPlanations) method, which is one of explainable artificial intelligence (XAI) techniques. The proposed imputation method based on the MissForest enlarged approximately 2.5 times more molecular structure data compared to the existing data. Based on the imputed dataset of molecular descriptor, the developed data-centric QSAR model achieved approximately 73%, 76% and 92% of prediction performance for Log BCF, Log Koa, and Log P, respectively. Lastly, Tree-SHAP analysis demonstrated that the data-centric-based QSAR model achieved high prediction performance for toxicity information by identifying key molecular descriptors highly correlated with toxicity indices. Therefore, the proposed QSAR model based on the data-centric XAI approach can be extended to predict the toxicity of potential pollutants in emerging printing chemicals, chemical process, semiconductor or display process.