• Title/Summary/Keyword: cross-validation

Search Result 1,003, Processing Time 0.028 seconds

Variation of Seasonal Groundwater Recharge Analyzed Using Landsat-8 OLI Data and a CART Algorithm (CART알고리즘과 Landsat-8 위성영상 분석을 통한 계절별 지하수함양량 변화)

  • Park, Seunghyuk;Jeong, Gyo-Cheol
    • The Journal of Engineering Geology
    • /
    • v.31 no.3
    • /
    • pp.395-432
    • /
    • 2021
  • Groundwater recharge rates vary widely by location and with time. They are difficult to measure directly and are thus often estimated using simulations. This study employed frequency and regression analysis and a classification and regression tree (CART) algorithm in a machine learning method to estimate groundwater recharge. CART algorithms are considered for the distribution of precipitation by subbasin (PCP), geomorphological data, indices of the relationship between vegetation and landuse, and soil type. The considered geomorphological data were digital elevaion model (DEM), surface slope (SLOP), surface aspect (ASPT), and indices were the perpendicular vegetation index (PVI), normalized difference vegetation index (NDVI), normalized difference tillage index (NDTI), normalized difference residue index (NDRI). The spatio-temperal distribution of groundwater recharge in the SWAT-MOD-FLOW program, was classified as group 4, run in R, sampled for random and a model trained its groundwater recharge was predicted by CART condidering modified PVI, NDVI, NDTI, NDRI, PCP, and geomorphological data. To assess inter-rater reliability for group 4 groundwater recharge, the Kappa coefficient and overall accuracy and confusion matrix using K-fold cross-validation were calculated. The model obtained a Kappa coefficient of 0.3-0.6 and an overall accuracy of 0.5-0.7, indicating that the proposed model for estimating groundwater recharge with respect to soil type and vegetation cover is quite reliable.

Tomato Crop Diseases Classification Models Using Deep CNN-based Architectures (심층 CNN 기반 구조를 이용한 토마토 작물 병해충 분류 모델)

  • Kim, Sam-Keun;Ahn, Jae-Geun
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.22 no.5
    • /
    • pp.7-14
    • /
    • 2021
  • Tomato crops are highly affected by tomato diseases, and if not prevented, a disease can cause severe losses for the agricultural economy. Therefore, there is a need for a system that quickly and accurately diagnoses various tomato diseases. In this paper, we propose a system that classifies nine diseases as well as healthy tomato plants by applying various pretrained deep learning-based CNN models trained on an ImageNet dataset. The tomato leaf image dataset obtained from PlantVillage is provided as input to ResNet, Xception, and DenseNet, which have deep learning-based CNN architectures. The proposed models were constructed by adding a top-level classifier to the basic CNN model, and they were trained by applying a 5-fold cross-validation strategy. All three of the proposed models were trained in two stages: transfer learning (which freezes the layers of the basic CNN model and then trains only the top-level classifiers), and fine-tuned learning (which sets the learning rate to a very small number and trains after unfreezing basic CNN layers). SGD, RMSprop, and Adam were applied as optimization algorithms. The experimental results show that the DenseNet CNN model to which the RMSprop algorithm was applied output the best results, with 98.63% accuracy.

A Proposal of Remaining Useful Life Prediction Model for Turbofan Engine based on k-Nearest Neighbor (k-NN을 활용한 터보팬 엔진의 잔여 유효 수명 예측 모델 제안)

  • Kim, Jung-Tae;Seo, Yang-Woo;Lee, Seung-Sang;Kim, So-Jung;Kim, Yong-Geun
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.22 no.4
    • /
    • pp.611-620
    • /
    • 2021
  • The maintenance industry is mainly progressing based on condition-based maintenance after corrective maintenance and preventive maintenance. In condition-based maintenance, maintenance is performed at the optimum time based on the condition of equipment. In order to find the optimal maintenance point, it is important to accurately understand the condition of the equipment, especially the remaining useful life. Thus, using simulation data (C-MAPSS), a prediction model is proposed to predict the remaining useful life of a turbofan engine. For the modeling process, a C-MAPSS dataset was preprocessed, transformed, and predicted. Data pre-processing was performed through piecewise RUL, moving average filters, and standardization. The remaining useful life was predicted using principal component analysis and the k-NN method. In order to derive the optimal performance, the number of principal components and the number of neighbor data for the k-NN method were determined through 5-fold cross validation. The validity of the prediction results was analyzed through a scoring function while considering the usefulness of prior prediction and the incompatibility of post prediction. In addition, the usefulness of the RUL prediction model was proven through comparison with the prediction performance of other neural network-based algorithms.

Discrimination model for cultivation origin of paper mulberry bast fiber and Hanji based on NIR and MIR spectral data combined with PLS-DA (닥나무 인피섬유와 한지의 원산지 판별모델 개발을 위한 NIR 및 MIR 스펙트럼 데이터의 PLS-DA 적용)

  • Jang, Kyung-Ju;Jung, So-Yoon;Go, In-Hee;Jeong, Seon-Hwa
    • Analytical Science and Technology
    • /
    • v.32 no.1
    • /
    • pp.7-16
    • /
    • 2019
  • The objective of this study was the development of a discrimination model for the cultivational origin of paper mulberry bast fiber and Hanji using near infrared (NIR) and mid infrared (MIR) spectroscopy combined with partial least squares discriminant analysis (PLS-DA). Paper mulberry bast fiber was purchased in 10 different regions of Korea, and used to make Hanji. PLS-DA was performed using pre-treated FT-NIR and FT-MIR spectral data for paper mulberry bast fiber and Hanji. PLS-DA of paper mulberry bast fiber and Hanji samples, using FT-NIR spectral data, showed 100 % performance in cross validation and the confusion matrix (accuracy, sensitivity, and specificity). The discrimination models showed four regional groups which demonstrated clearer separation and much superior score plots in the NIR spectral data-based model than in the MIR spectral data-based model. Furthermore, the discrimination model based on the NIR spectral data of paper mulberry bast fiber had highly similar score morphology to that of the discrimination model based on the NIR spectral data of Hanji.

Panamax Second-hand Vessel Valuation Model (파나막스 중고선가치 추정모델 연구)

  • Lim, Sang-Seop;Lee, Ki-Hwan;Yang, Huck-Jun;Yun, Hee-Sung
    • Journal of Navigation and Port Research
    • /
    • v.43 no.1
    • /
    • pp.72-78
    • /
    • 2019
  • The second-hand ship market provides immediate access to the freight market for shipping investors. When introducing second-hand vessels, the precise estimate of the price is crucial to the decision-making process because it directly affects the burden of capital cost to investors in the future. Previous studies on the second-hand market have mainly focused on the market efficiency. The number of papers on the estimation of second-hand vessel values is very limited. This study proposes an artificial neural network model that has not been attempted in previous studies. Six factors, freight, new-building price, orderbook, scrap price, age and vessel size, that affect the second-hand ship price were identified through literature review. The employed data is 366 real trading records of Panamax second-hand vessels reported to Clarkson between January 2016 and December 2018. Statistical filtering was carried out through correlation analysis and stepwise regression analysis, and three parameters, which are freight, age and size, were selected. Ten-fold cross validation was used to estimate the hyper-parameters of the artificial neural network model. The result of this study confirmed that the performance of the artificial neural network model is better than that of simple stepwise regression analysis. The application of the statistical verification process and artificial neural network model differentiates this paper from others. In addition, it is expected that a scientific model that satisfies both statistical rationality and accuracy of the results will make a contribution to real-life practices.

Comparison of genome-wide association and genomic prediction methods for milk production traits in Korean Holstein cattle

  • Lee, SeokHyun;Dang, ChangGwon;Choy, YunHo;Do, ChangHee;Cho, Kwanghyun;Kim, Jongjoo;Kim, Yousam;Lee, Jungjae
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.32 no.7
    • /
    • pp.913-921
    • /
    • 2019
  • Objective: The objectives of this study were to compare identified informative regions through two genome-wide association study (GWAS) approaches and determine the accuracy and bias of the direct genomic value (DGV) for milk production traits in Korean Holstein cattle, using two genomic prediction approaches: single-step genomic best linear unbiased prediction (ss-GBLUP) and Bayesian Bayes-B. Methods: Records on production traits such as adjusted 305-day milk (MY305), fat (FY305), and protein (PY305) yields were collected from 265,271 first parity cows. After quality control, 50,765 single-nucleotide polymorphic genotypes were available for analysis. In GWAS for ss-GBLUP (ssGWAS) and Bayes-B (BayesGWAS), the proportion of genetic variance for each 1-Mb genomic window was calculated and used to identify informative genomic regions. Accuracy of the DGV was estimated by a five-fold cross-validation with random clustering. As a measure of accuracy for DGV, we also assessed the correlation between DGV and deregressed-estimated breeding value (DEBV). The bias of DGV for each method was obtained by determining regression coefficients. Results: A total of nine and five significant windows (1 Mb) were identified for MY305 using ssGWAS and BayesGWAS, respectively. Using ssGWAS and BayesGWAS, we also detected multiple significant regions for FY305 (12 and 7) and PY305 (14 and 2), respectively. Both single-step DGV and Bayes DGV also showed somewhat moderate accuracy ranges for MY305 (0.32 to 0.34), FY305 (0.37 to 0.39), and PY305 (0.35 to 0.36) traits, respectively. The mean biases of DGVs determined using the single-step and Bayesian methods were $1.50{\pm}0.21$ and $1.18{\pm}0.26$ for MY305, $1.75{\pm}0.33$ and $1.14{\pm}0.20$ for FY305, and $1.59{\pm}0.20$ and $1.14{\pm}0.15$ for PY305, respectively. Conclusion: From the bias perspective, we believe that genomic selection based on the application of Bayesian approaches would be more suitable than application of ss-GBLUP in Korean Holstein populations.

Validity and reliability of the Korean version of the Quality of Recovery-40 questionnaire

  • Lee, Jun Ho;Kim, Deokkyu;Seo, Donghak;Son, Ji-seon;Kim, Dong-Chan
    • Korean Journal of Anesthesiology
    • /
    • v.71 no.6
    • /
    • pp.467-475
    • /
    • 2018
  • Background: The Quality of Recovery-40 (QoR-40) is a widely-used, self-rated, and self-completed questionnaire for postoperative patients. The questionnaire is intended to elicit information from each patient regarding the quality of recovery during the postoperative period. It is noteworthy, however, that an official Korean version of the QoR-40 (QoR-40K) has not been established. The purpose of this study was to develop the QoR-40K by translation and cultural adaptation process and to evaluate the validity and reliability of the QoR-40K. Methods: After pre-authorization from the original author of the QoR-40, the translation procedure was established and carried out based upon Beaton's recommendation to create a QoR-40K model comparable to the original English QoR-40. Two hundred surgical patients were enrolled, and each completed the questionnaire during the preoperative period, on the third day, and 1 month after surgery. The QoR-40K was compared with the visual analogue scale (VAS) and another health-related questionnaire, the Short-form Health Survery-36 (SF-36). The method of validation for QoR-40K included test-retest reliability, internal consistency, and level of responsiveness. Results: Spearman's correlation coefficient for test-retest reliability was 0.895 (P < 0.001), and Cronbach's alpha of the global QoR-40K on the third day after surgery was 0.956. A positive correlation was obtained between the QoR-40K and the mental component summary of SF-36 (${\rho}=0.474$, P < 0.001), and a negative correlation was observed between QoR-40K and VAS (${\rho}=-0.341$, P < 0.001). The standardized responsive mean of the total QoR-40K was 0.71. Conclusions: The QoR-40K was found to be as acceptable and reliable as the original English QoR-40 for Korean patients after surgery, despite the apparent differences in the respective patients' cultural backgrounds.

Validation of Segmental Multi-Frequency Bioelectrical Impedance Analysis based on the Segmental Bioelectrical Impedance analysis in the Elderly Population (분절임피던스를 기준한 분절다주파수 생체임피던스의 일치도 분석)

  • Tang, Sae-Jo;Kim, Jang-Hee;Eom, Jin Jong;Eom, Sunho;Kim, Hakkyun;Kim, Chul-Hyun
    • Journal of Platform Technology
    • /
    • v.9 no.2
    • /
    • pp.38-45
    • /
    • 2021
  • A frequently used bioimpedance analytical method in Korea is the segmental multi-frequency BIA (SMF-BIA) method, but it is not directly determined at a segmented impedance. This study was to compare SMF-BIA determinations with direct segmented determinations for accuracy and appropriateness of segment parameters. This study is to compare the segment parameters, accuracy and appropriateness of the multi-frequency segmental bioimpedance analysis. To this end, 108 elderly individuals were measured. Segmented bioelectrical measurements obtained from a SMF-BIA (Inbody S10) at 50 kHz and measured with a phase sensitive single frequency device (SF-BIA, bia-101, RJL / akern systems) were compared. The significant difference (%) was demonstrated between single - and multiple frequency determinations of the right upper limb (R = 35.5 ± 6.2%, P < 0.001; Xc = 2.7 ± 7.6%, P < 0.01), left upper limb difference (R= 33. 9 ± 6.0%, P < 0.001; Xc = 2.8 ± 8.3%, P < 0.01), right lower limb difference (R = 18.6 ± 4.3%, P < 0.001; Xc = 25.8 ± 10.0%, P < 0.001), left lower limb difference (R = 18.0 ± 4.7%, P < 0.001; Xc = 31.8%). Of the results determined with the two BIA methods, the impedance measurements of the limbs and whole body showed a high correlation (RA: R = 0. 950, LA: R = 0. 949, RL: R = 0.899, LL: R = 0.88), and in the agreement test, the impedance values of the upper limbs and whole body also showed strong agreement (ICC > 0.9), but in the Xc, the correlation was weak. In conclusion, it was found that although bioimpedance devices had significantly different characteristics and inconsistent cross sectionally, there was a high population level agreement in the upper and lower extremities in determining segmental resistance value changes. But a large error was found on the trunk. Further studies were needed for reducing the error.

A Method for Prediction of Quality Defects in Manufacturing Using Natural Language Processing and Machine Learning (자연어 처리 및 기계학습을 활용한 제조업 현장의 품질 불량 예측 방법론)

  • Roh, Jeong-Min;Kim, Yongsung
    • Journal of Platform Technology
    • /
    • v.9 no.3
    • /
    • pp.52-62
    • /
    • 2021
  • Quality control is critical at manufacturing sites and is key to predicting the risk of quality defect before manufacturing. However, the reliability of manual quality control methods is affected by human and physical limitations because manufacturing processes vary across industries. These limitations become particularly obvious in domain areas with numerous manufacturing processes, such as the manufacture of major nuclear equipment. This study proposed a novel method for predicting the risk of quality defects by using natural language processing and machine learning. In this study, production data collected over 6 years at a factory that manufactures main equipment that is installed in nuclear power plants were used. In the preprocessing stage of text data, a mapping method was applied to the word dictionary so that domain knowledge could be appropriately reflected, and a hybrid algorithm, which combined n-gram, Term Frequency-Inverse Document Frequency, and Singular Value Decomposition, was constructed for sentence vectorization. Next, in the experiment to classify the risky processes resulting in poor quality, k-fold cross-validation was applied to categorize cases from Unigram to cumulative Trigram. Furthermore, for achieving objective experimental results, Naive Bayes and Support Vector Machine were used as classification algorithms and the maximum accuracy and F1-score of 0.7685 and 0.8641, respectively, were achieved. Thus, the proposed method is effective. The performance of the proposed method were compared and with votes of field engineers, and the results revealed that the proposed method outperformed field engineers. Thus, the method can be implemented for quality control at manufacturing sites.

A Node2Vec-Based Gene Expression Image Representation Method for Effectively Predicting Cancer Prognosis (암 예후를 효과적으로 예측하기 위한 Node2Vec 기반의 유전자 발현량 이미지 표현기법)

  • Choi, Jonghwan;Park, Sanghyun
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.8 no.10
    • /
    • pp.397-402
    • /
    • 2019
  • Accurately predicting cancer prognosis to provide appropriate treatment strategies for patients is one of the critical challenges in bioinformatics. Many researches have suggested machine learning models to predict patients' outcomes based on their gene expression data. Gene expression data is high-dimensional numerical data containing about 17,000 genes, so traditional researches used feature selection or dimensionality reduction approaches to elevate the performance of prognostic prediction models. These approaches, however, have an issue of making it difficult for the predictive models to grasp any biological interaction between the selected genes because feature selection and model training stages are performed independently. In this paper, we propose a novel two-dimensional image formatting approach for gene expression data to achieve feature selection and prognostic prediction effectively. Node2Vec is exploited to integrate biological interaction network and gene expression data and a convolutional neural network learns the integrated two-dimensional gene expression image data and predicts cancer prognosis. We evaluated our proposed model through double cross-validation and confirmed superior prognostic prediction accuracy to traditional machine learning models based on raw gene expression data. As our proposed approach is able to improve prediction models without loss of information caused by feature selection steps, we expect this will contribute to development of personalized medicine.