• Title/Summary/Keyword: Cross-Validation Approach

Search Result 134, Processing Time 0.028 seconds

Satellite-Based Cabbage and Radish Yield Prediction Using Deep Learning in Kangwon-do (딥러닝을 활용한 위성영상 기반의 강원도 지역의 배추와 무 수확량 예측)

  • Hyebin Park;Yejin Lee;Seonyoung Park
    • Korean Journal of Remote Sensing
    • /
    • v.39 no.5_3
    • /
    • pp.1031-1042
    • /
    • 2023
  • In this study, a deep learning model was developed to predict the yield of cabbage and radish, one of the five major supply and demand management vegetables, using satellite images of Landsat 8. To predict the yield of cabbage and radish in Gangwon-do from 2015 to 2020, satellite images from June to September, the growing period of cabbage and radish, were used. Normalized difference vegetation index, enhanced vegetation index, lead area index, and land surface temperature were employed in this study as input data for the yield model. Crop yields can be effectively predicted using satellite images because satellites collect continuous spatiotemporal data on the global environment. Based on the model developed previous study, a model designed for input data was proposed in this study. Using time series satellite images, convolutional neural network, a deep learning model, was used to predict crop yield. Landsat 8 provides images every 16 days, but it is difficult to acquire images especially in summer due to the influence of weather such as clouds. As a result, yield prediction was conducted by splitting June to July into one part and August to September into two. Yield prediction was performed using a machine learning approach and reference models , and modeling performance was compared. The model's performance and early predictability were assessed using year-by-year cross-validation and early prediction. The findings of this study could be applied as basic studies to predict the yield of field crops in Korea.

Mixed dentition analysis using a multivariate approach (다변량 기법을 이용한 혼합치열기 분석법)

  • Seo, Seung-Hyun;An, Hong-Seok;Lee, Shin-Jae;Lim, Won Hee;Kim, Bong-Rae
    • The korean journal of orthodontics
    • /
    • v.39 no.2
    • /
    • pp.112-119
    • /
    • 2009
  • Objective: To develop a mixed dentition analysis method in consideration of the normal variation of tooth sizes. Methods: According to the tooth-size of the maxillary central incisor, maxillary 1st molar, mandibular central incisor, mandibular lateral incisor, and mandibular 1st molar, 307 normal occlusion subjects were clustered into the smaller and larger tooth-size groups. Multiple regression analyses were then performed to predict the sizes of the canine and premolars for the 2 groups and both genders separately. For a cross validation dataset, 504 malocclusion patients were assigned into the 2 groups. Then multiple regression equations were applied. Results: Our results show that the maximum errors of the predicted space for the canine, 1st and 2nd premolars were 0.71 and 0.82 mm residual standard deviation for the normal occlusion and malocclusion groups, respectively. For malocclusion patients, the prediction errors did not imply a statistically significant difference depending on the types of malocclusion nor the types of tooth-size groups. The frequency of prediction error more than 1 mm and 2 mm were 17.3% and 1.8%, respectively. The overall prediction accuracy was dramatically improved in this study compared to that of previous studies. Conclusions: The computer aided calculation method used in this study appeared to be more efficient.

Calibration of Portable Particulate Mattere-Monitoring Device using Web Query and Machine Learning

  • Loh, Byoung Gook;Choi, Gi Heung
    • Safety and Health at Work
    • /
    • v.10 no.4
    • /
    • pp.452-460
    • /
    • 2019
  • Background: Monitoring and control of PM2.5 are being recognized as key to address health issues attributed to PM2.5. Availability of low-cost PM2.5 sensors made it possible to introduce a number of portable PM2.5 monitors based on light scattering to the consumer market at an affordable price. Accuracy of light scatteringe-based PM2.5 monitors significantly depends on the method of calibration. Static calibration curve is used as the most popular calibration method for low-cost PM2.5 sensors particularly because of ease of application. Drawback in this approach is, however, the lack of accuracy. Methods: This study discussed the calibration of a low-cost PM2.5-monitoring device (PMD) to improve the accuracy and reliability for practical use. The proposed method is based on construction of the PM2.5 sensor network using Message Queuing Telemetry Transport (MQTT) protocol and web query of reference measurement data available at government-authorized PM monitoring station (GAMS) in the republic of Korea. Four machine learning (ML) algorithms such as support vector machine, k-nearest neighbors, random forest, and extreme gradient boosting were used as regression models to calibrate the PMD measurements of PM2.5. Performance of each ML algorithm was evaluated using stratified K-fold cross-validation, and a linear regression model was used as a reference. Results: Based on the performance of ML algorithms used, regression of the output of the PMD to PM2.5 concentrations data available from the GAMS through web query was effective. The extreme gradient boosting algorithm showed the best performance with a mean coefficient of determination (R2) of 0.78 and standard error of 5.0 ㎍/㎥, corresponding to 8% increase in R2 and 12% decrease in root mean square error in comparison with the linear regression model. Minimum 100 hours of calibration period was found required to calibrate the PMD to its full capacity. Calibration method proposed poses a limitation on the location of the PMD being in the vicinity of the GAMS. As the number of the PMD participating in the sensor network increases, however, calibrated PMDs can be used as reference devices to nearby PMDs that require calibration, forming a calibration chain through MQTT protocol. Conclusions: Calibration of a low-cost PMD, which is based on construction of PM2.5 sensor network using MQTT protocol and web query of reference measurement data available at a GAMS, significantly improves the accuracy and reliability of a PMD, thereby making practical use of the low-cost PMD possible.

Application of Artificial Neural Network for estimation of daily maximum snow depth in Korea (우리나라에서 일최심신적설의 추정을 위한 인공신경망모형의 활용)

  • Lee, Geon;Lee, Dongryul;Kim, Dongkyun
    • Journal of Korea Water Resources Association
    • /
    • v.50 no.10
    • /
    • pp.681-690
    • /
    • 2017
  • This study estimated the daily maximum snow depth using the Artificial Neural Network (ANN) model in Korean Peninsula. First, the optimal ANN model structure was determined through the trial-and-error approach. As a result, daily precipitation, daily mean temperature, and daily minimum temperature were chosen as the input data of the ANN. The number of hidden layer was set to 1 and the number of nodes in the hidden layer was set to 10. In case of using the observed value as the input data of the ANN model, the cross validation correlation coefficient was 0.87, which is higher than that of the case in which the daily maximum snow depth was spatially interpolated using the Ordinary Kriging method (0.40). In order to investigate the performance of the ANN model for estimating the daily maximum snow depth of the ungauged area, the input data of the ANN model was spatially interpolated using Ordinary Kriging. In this case, the correlation coefficient of 0.49 was obtained. The performance of the ANN model in mountainous areas above 200m above sea level was found to be somewhat lower than that in the rest of the study area. This result of this study implies that the ANN model can be used effectively for the accurate and immediate estimation of the maximum snow depth over the whole country.

Comparison of genome-wide association and genomic prediction methods for milk production traits in Korean Holstein cattle

  • Lee, SeokHyun;Dang, ChangGwon;Choy, YunHo;Do, ChangHee;Cho, Kwanghyun;Kim, Jongjoo;Kim, Yousam;Lee, Jungjae
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.32 no.7
    • /
    • pp.913-921
    • /
    • 2019
  • Objective: The objectives of this study were to compare identified informative regions through two genome-wide association study (GWAS) approaches and determine the accuracy and bias of the direct genomic value (DGV) for milk production traits in Korean Holstein cattle, using two genomic prediction approaches: single-step genomic best linear unbiased prediction (ss-GBLUP) and Bayesian Bayes-B. Methods: Records on production traits such as adjusted 305-day milk (MY305), fat (FY305), and protein (PY305) yields were collected from 265,271 first parity cows. After quality control, 50,765 single-nucleotide polymorphic genotypes were available for analysis. In GWAS for ss-GBLUP (ssGWAS) and Bayes-B (BayesGWAS), the proportion of genetic variance for each 1-Mb genomic window was calculated and used to identify informative genomic regions. Accuracy of the DGV was estimated by a five-fold cross-validation with random clustering. As a measure of accuracy for DGV, we also assessed the correlation between DGV and deregressed-estimated breeding value (DEBV). The bias of DGV for each method was obtained by determining regression coefficients. Results: A total of nine and five significant windows (1 Mb) were identified for MY305 using ssGWAS and BayesGWAS, respectively. Using ssGWAS and BayesGWAS, we also detected multiple significant regions for FY305 (12 and 7) and PY305 (14 and 2), respectively. Both single-step DGV and Bayes DGV also showed somewhat moderate accuracy ranges for MY305 (0.32 to 0.34), FY305 (0.37 to 0.39), and PY305 (0.35 to 0.36) traits, respectively. The mean biases of DGVs determined using the single-step and Bayesian methods were $1.50{\pm}0.21$ and $1.18{\pm}0.26$ for MY305, $1.75{\pm}0.33$ and $1.14{\pm}0.20$ for FY305, and $1.59{\pm}0.20$ and $1.14{\pm}0.15$ for PY305, respectively. Conclusion: From the bias perspective, we believe that genomic selection based on the application of Bayesian approaches would be more suitable than application of ss-GBLUP in Korean Holstein populations.

Classification of Convolvulaceae plants using Vis-NIR spectroscopy and machine learning (근적외선 분광법과 머신러닝을 이용한 메꽃과(Convolvulaceae) 식물의 분류)

  • Yong-Ho Lee;Soo-In Sohn;Sun-Hee Hong;Chang-Seok Kim;Chae-Sun Na;In-Soon Kim;Min-Sang Jang;Young-Ju Oh
    • Korean Journal of Environmental Biology
    • /
    • v.39 no.4
    • /
    • pp.581-589
    • /
    • 2021
  • Using visible-near infrared(Vis-NIR) spectra combined with machine learning methods, the feasibility of quick and non-destructive classification of Convolvulaceae species was studied. The main aim of this study is to classify six Convolvulaceae species in the field in different geographical regions of South Korea using a handheld spectrometer. Spectra were taken at 1.5 nm intervals from the adaxial side of the leaves in the Vis-NIR spectral region between 400 and 1,075 nm. The obtained spectra were preprocessed with three different preprocessing methods to find the best preprocessing approach with the highest classification accuracy. Preprocessed spectra of the six Convolvulaceae sp. were provided as input for the machine learning analysis. After cross-validation, the classification accuracy of various combinations of preprocessing and modeling ranged between 43.4% and 98.6%. The combination of Savitzky-Golay and Support vector machine methods showed the highest classification accuracy of 98.6% for the discrimination of Convolvulaceae sp. The growth stage of the plants, different measuring locations, and the scanning position of leaves on the plant were some of the crucial factors that affected the outcomes in this investigation. We conclude that Vis-NIR spectroscopy, coupled with suitable preprocessing and machine learning approaches, can be used in the field to effectively discriminate Convolvulaceae sp. for effective weed monitoring and management.

A Method for Extracting Equipment Specifications from Plant Documents and Cross-Validation Approach with Similar Equipment Specifications (플랜트 설비 문서로부터 설비사양 추출 및 유사설비 사양 교차 검증 접근법)

  • Jae Hyun Lee;Seungeon Choi;Hyo Won Suh
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.29 no.2
    • /
    • pp.55-68
    • /
    • 2024
  • Plant engineering companies create or refer to requirements documents for each related field, such as plant process/equipment/piping/instrumentation, in different engineering departments. The process-related requirements document includes not only a description of the process but also the requirements of the equipment or related facilities that will operate it. Since the authors and reviewers of the requirements documents are different, there is a possibility that inconsistencies may occur between equipment or parts design specifications described in different requirement documents. Ensuring consistency in these matters can increase the reliability of the overall plant design information. However, the amount of documents and the scattered nature of requirements for a same equipment and parts across different documents make it challenging for engineers to trace and manage requirements. This paper proposes a method to analyze requirement sentences and calculate the similarity of requirement sentences in order to identify semantically identical sentences. To calculate the similarity of requirement sentences, we propose a named entity recognition method to identify compound words for the parts and properties that are semantically central to the requirements. A method to calculate the similarity of the identified compound words for parts and properties is also proposed. The proposed method is explained using sentences in practical documents, and experimental results are described.

Monitoring Ground-level SO2 Concentrations Based on a Stacking Ensemble Approach Using Satellite Data and Numerical Models (위성 자료와 수치모델 자료를 활용한 스태킹 앙상블 기반 SO2 지상농도 추정)

  • Choi, Hyunyoung;Kang, Yoojin;Im, Jungho;Shin, Minso;Park, Seohui;Kim, Sang-Min
    • Korean Journal of Remote Sensing
    • /
    • v.36 no.5_3
    • /
    • pp.1053-1066
    • /
    • 2020
  • Sulfur dioxide (SO2) is primarily released through industrial, residential, and transportation activities, and creates secondary air pollutants through chemical reactions in the atmosphere. Long-term exposure to SO2 can result in a negative effect on the human body causing respiratory or cardiovascular disease, which makes the effective and continuous monitoring of SO2 crucial. In South Korea, SO2 monitoring at ground stations has been performed, but this does not provide spatially continuous information of SO2 concentrations. Thus, this research estimated spatially continuous ground-level SO2 concentrations at 1 km resolution over South Korea through the synergistic use of satellite data and numerical models. A stacking ensemble approach, fusing multiple machine learning algorithms at two levels (i.e., base and meta), was adopted for ground-level SO2 estimation using data from January 2015 to April 2019. Random forest and extreme gradient boosting were used as based models and multiple linear regression was adopted for the meta-model. The cross-validation results showed that the meta-model produced the improved performance by 25% compared to the base models, resulting in the correlation coefficient of 0.48 and root-mean-square-error of 0.0032 ppm. In addition, the temporal transferability of the approach was evaluated for one-year data which were not used in the model development. The spatial distribution of ground-level SO2 concentrations based on the proposed model agreed with the general seasonality of SO2 and the temporal patterns of emission sources.

Wildfire Severity Mapping Using Sentinel Satellite Data Based on Machine Learning Approaches (Sentinel 위성영상과 기계학습을 이용한 국내산불 피해강도 탐지)

  • Sim, Seongmun;Kim, Woohyeok;Lee, Jaese;Kang, Yoojin;Im, Jungho;Kwon, Chunguen;Kim, Sungyong
    • Korean Journal of Remote Sensing
    • /
    • v.36 no.5_3
    • /
    • pp.1109-1123
    • /
    • 2020
  • In South Korea with forest as a major land cover class (over 60% of the country), many wildfires occur every year. Wildfires weaken the shear strength of the soil, forming a layer of soil that is vulnerable to landslides. It is important to identify the severity of a wildfire as well as the burned area to sustainably manage the forest. Although satellite remote sensing has been widely used to map wildfire severity, it is often difficult to determine the severity using only the temporal change of satellite-derived indices such as Normalized Difference Vegetation Index (NDVI) and Normalized Burn Ratio (NBR). In this study, we proposed an approach for determining wildfire severity based on machine learning through the synergistic use of Sentinel-1A Synthetic Aperture Radar-C data and Sentinel-2A Multi Spectral Instrument data. Three wildfire cases-Samcheok in May 2017, Gangreung·Donghae in April 2019, and Gosung·Sokcho in April 2019-were used for developing wildfire severity mapping models with three machine learning algorithms (i.e., Random Forest, Logistic Regression, and Support Vector Machine). The results showed that the random forest model yielded the best performance, resulting in an overall accuracy of 82.3%. The cross-site validation to examine the spatiotemporal transferability of the machine learning models showed that the models were highly sensitive to temporal differences between the training and validation sites, especially in the early growing season. This implies that a more robust model with high spatiotemporal transferability can be developed when more wildfire cases with different seasons and areas are added in the future.

Predicting the Performance of Recommender Systems through Social Network Analysis and Artificial Neural Network (사회연결망분석과 인공신경망을 이용한 추천시스템 성능 예측)

  • Cho, Yoon-Ho;Kim, In-Hwan
    • Journal of Intelligence and Information Systems
    • /
    • v.16 no.4
    • /
    • pp.159-172
    • /
    • 2010
  • The recommender system is one of the possible solutions to assist customers in finding the items they would like to purchase. To date, a variety of recommendation techniques have been developed. One of the most successful recommendation techniques is Collaborative Filtering (CF) that has been used in a number of different applications such as recommending Web pages, movies, music, articles and products. CF identifies customers whose tastes are similar to those of a given customer, and recommends items those customers have liked in the past. Numerous CF algorithms have been developed to increase the performance of recommender systems. Broadly, there are memory-based CF algorithms, model-based CF algorithms, and hybrid CF algorithms which combine CF with content-based techniques or other recommender systems. While many researchers have focused their efforts in improving CF performance, the theoretical justification of CF algorithms is lacking. That is, we do not know many things about how CF is done. Furthermore, the relative performances of CF algorithms are known to be domain and data dependent. It is very time-consuming and expensive to implement and launce a CF recommender system, and also the system unsuited for the given domain provides customers with poor quality recommendations that make them easily annoyed. Therefore, predicting the performances of CF algorithms in advance is practically important and needed. In this study, we propose an efficient approach to predict the performance of CF. Social Network Analysis (SNA) and Artificial Neural Network (ANN) are applied to develop our prediction model. CF can be modeled as a social network in which customers are nodes and purchase relationships between customers are links. SNA facilitates an exploration of the topological properties of the network structure that are implicit in data for CF recommendations. An ANN model is developed through an analysis of network topology, such as network density, inclusiveness, clustering coefficient, network centralization, and Krackhardt's efficiency. While network density, expressed as a proportion of the maximum possible number of links, captures the density of the whole network, the clustering coefficient captures the degree to which the overall network contains localized pockets of dense connectivity. Inclusiveness refers to the number of nodes which are included within the various connected parts of the social network. Centralization reflects the extent to which connections are concentrated in a small number of nodes rather than distributed equally among all nodes. Krackhardt's efficiency characterizes how dense the social network is beyond that barely needed to keep the social group even indirectly connected to one another. We use these social network measures as input variables of the ANN model. As an output variable, we use the recommendation accuracy measured by F1-measure. In order to evaluate the effectiveness of the ANN model, sales transaction data from H department store, one of the well-known department stores in Korea, was used. Total 396 experimental samples were gathered, and we used 40%, 40%, and 20% of them, for training, test, and validation, respectively. The 5-fold cross validation was also conducted to enhance the reliability of our experiments. The input variable measuring process consists of following three steps; analysis of customer similarities, construction of a social network, and analysis of social network patterns. We used Net Miner 3 and UCINET 6.0 for SNA, and Clementine 11.1 for ANN modeling. The experiments reported that the ANN model has 92.61% estimated accuracy and 0.0049 RMSE. Thus, we can know that our prediction model helps decide whether CF is useful for a given application with certain data characteristics.