• Title/Summary/Keyword: Cross Validation

Search Result 1,001, Processing Time 0.026 seconds

Enhancement of Geomorphology Generation for the Front Land of Levee Using Aerial Photograph (항공영상을 연계한 하천 제외지의 지형분석 개선 기법)

  • Lee, Geun Sang;Lee, Hyun Seok;Hwang, Eui Ho;Koh, Deuk Koo
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.28 no.3D
    • /
    • pp.407-415
    • /
    • 2008
  • This study presents the methodology to link with aerial photos for advancing the accuracy of topographic survey data that is used to calculate water volume in urban stream. First, GIS spatial interpolation technique as Inverse Distance Weight (IDW) and Kriging was applied to construct the terrain morphology to the sand-bar and grass area using cross-sectional survey data, and also validation point data was used to estimate the accuracy of created topographic data. As the result of comparison, IDW ($d^{-2}_{ij}$, 2nd square number) in Sand-bar area and Kriging Spherical model in grass area showed more efficient results in the construction of topographic data of river boundary. But the differences among interpolation methods are very slight. Image classification method, Minimum Distance Method (MDM) was applied to extract sand-bar and grass area that are located to river boundary efficiently and the elevation value of extracted layers was allocated to the water level point value. Water volume with topographic data from aerial photos shows the advanced accuracy of 13% (in sand-bar) and 12% (in grass) compared to the water volume of original terrain data. Therefore, terrain analysis method in river linking with aerial photos is efficient to the monitoring about sand-bar and grass area that are located in the downstream of Dam in flooding season, and also it can be applied to calculate water volume efficiently.

Prediction of Postoperative Lung Function in Lung Cancer Patients Using Machine Learning Models

  • Oh Beom Kwon;Solji Han;Hwa Young Lee;Hye Seon Kang;Sung Kyoung Kim;Ju Sang Kim;Chan Kwon Park;Sang Haak Lee;Seung Joon Kim;Jin Woo Kim;Chang Dong Yeo
    • Tuberculosis and Respiratory Diseases
    • /
    • v.86 no.3
    • /
    • pp.203-215
    • /
    • 2023
  • Background: Surgical resection is the standard treatment for early-stage lung cancer. Since postoperative lung function is related to mortality, predicted postoperative lung function is used to determine the treatment modality. The aim of this study was to evaluate the predictive performance of linear regression and machine learning models. Methods: We extracted data from the Clinical Data Warehouse and developed three sets: set I, the linear regression model; set II, machine learning models omitting the missing data: and set III, machine learning models imputing the missing data. Six machine learning models, the least absolute shrinkage and selection operator (LASSO), Ridge regression, ElasticNet, Random Forest, eXtreme gradient boosting (XGBoost), and the light gradient boosting machine (LightGBM) were implemented. The forced expiratory volume in 1 second measured 6 months after surgery was defined as the outcome. Five-fold cross-validation was performed for hyperparameter tuning of the machine learning models. The dataset was split into training and test datasets at a 70:30 ratio. Implementation was done after dataset splitting in set III. Predictive performance was evaluated by R2 and mean squared error (MSE) in the three sets. Results: A total of 1,487 patients were included in sets I and III and 896 patients were included in set II. In set I, the R2 value was 0.27 and in set II, LightGBM was the best model with the highest R2 value of 0.5 and the lowest MSE of 154.95. In set III, LightGBM was the best model with the highest R2 value of 0.56 and the lowest MSE of 174.07. Conclusion: The LightGBM model showed the best performance in predicting postoperative lung function.

Developing the Self-Reporting Scale of Community Integration for the Person with Psychiatiric Disabilities (정신장애인의 자기보고식 지역사회통합 척도 개발)

  • Choi, Youn Jeong
    • 재활복지
    • /
    • v.16 no.3
    • /
    • pp.165-192
    • /
    • 2012
  • This study aims to develop a valid self-report scale for the community integration of persons with psychiatric disabilities. To this end, conducted were in-depth interviews with individuals with psychiatric disabilities, consultation with experts, and a survey. First, literature review and the in-depth interview with individuals with psychiatric disabilities were collected questionnaires regarding the community integration of persons with psychiatric disabilities. Second, preliminary research 1 focused on the selection and modification of the items collected in the first research. Final 44 items were selected by the verification of the importance and content-validity of items under the advices of professionals. Lastly, preliminaty research 2 applied cross-validation method to the data from 524 cases in order to verify the factor structure and concept-validity of the items. The result of exploratory factor analysis shows that 5 factor structures are the most appropriate, and the confirmatory factor analysis suggests that the Self-reporting Scale of Community Integration for the person with psychiatric disabilities consists of 27 questionnaires which compose 5sub-concepts such as'psychological integration','physical integration', 'social support', 'social integration', 'independence/self-actualization'. Moreover, this scale was significantly related to the 'Life Satisfaction scale for the person with psychiatric disabilities'. This proved concurrent validity of the scale.

Prediction accuracy of incisal points in determining occlusal plane of digital complete dentures

  • Kenta Kashiwazaki;Yuriko Komagamine;Sahaprom Namano;Ji-Man Park;Maiko Iwaki;Shunsuke Minakuchi;Manabu, Kanazawa
    • The Journal of Advanced Prosthodontics
    • /
    • v.15 no.6
    • /
    • pp.281-289
    • /
    • 2023
  • PURPOSE. This study aimed to predict the positional coordinates of incisor points from the scan data of conventional complete dentures and verify their accuracy. MATERIALS AND METHODS. The standard triangulated language (STL) data of the scanned 100 pairs of complete upper and lower dentures were imported into the computer-aided design software from which the position coordinates of the points corresponding to each landmark of the jaw were obtained. The x, y, and z coordinates of the incisor point (XP, YP, and ZP) were obtained from the maxillary and mandibular landmark coordinates using regression or calculation formulas, and the accuracy was verified to determine the deviation between the measured and predicted coordinate values. YP was obtained in two ways using the hamularincisive-papilla plane (HIP) and facial measurements. Multiple regression analysis was used to predict ZP. The root mean squared error (RMSE) values were used to verify the accuracy of the XP and YP. The RMSE value was obtained after crossvalidation using the remaining 30 cases of denture STL data to verify the accuracy of ZP. RESULTS. The RMSE was 2.22 for predicting XP. When predicting YP, the RMSE of the method using the HIP plane and facial measurements was 3.18 and 0.73, respectively. Cross-validation revealed the RMSE to be 1.53. CONCLUSION. YP and ZP could be predicted from anatomical landmarks of the maxillary and mandibular edentulous jaw, suggesting that YP could be predicted with better accuracy with the addition of the position of the lower border of the upper lip.

Application of Handheld Raman Spectroscopy for Pigment Identification of a Hanging Painting at Janggoksa Temple(Maitreya Buddha) (장곡사 미륵불 괘불탱의 채색 재료 분석을 위한 휴대용 라만 분광기의 적용성 연구)

  • LEE Na Ra;YOO Youngmi;KIM Sojin
    • Korean Journal of Heritage: History & Science
    • /
    • v.56 no.4
    • /
    • pp.216-228
    • /
    • 2023
  • The purpose of this study is to apply the handheld Raman spectrometer to identify the coloring materials used in a large Buddhist painting (of Maitreya Buddha) at Janggoksa Temple through cross-validation with HH-XRF. An in situ investigation was performed together with use of a digital microscope and HH-XRF analysis to verify the properties of pigments used in the gwaebul ("large Buddhist painting") via a non-destructive method. However, the identification of coloring materials composed of light elements and mixed or overlaid pigments is difficult using only non-destructive analysis data. Unlike in situ investigation, laboratory analysis often required samples yet the sampling is restricted to a small quantity due to the cultural heritage characteristic. Thus, it is necessary to develop a non-destructive in situ method to supplement the HH-XRF data. The large Buddhist painting at Janggoksa Temple was painted mainly using white, red, yellow, green, and blue colors. The Raman spectroscopy provides molecular information, while XRF spectroscopy provides information about elemental composition of the pigments. Analysis results identified various coloring materials: inorganic pigment, such as lead white, minium, cinnabar, and orpiment, as well as organic pigment such as gamboge and indigo. Therefore, it is possible to obtain more information for the identification of pigments; organic pigment and mixed or overlaid pigments, while at the same time minimizing the collection sample and simplifying the analysis procedure compared to previously used methods. The results of this study will be used as basic data for the analysis of painting cultural heritage through a non-destructive in situ method in the future.

Predicting blast-induced ground vibrations at limestone quarry from artificial neural network optimized by randomized and grid search cross-validation, and comparative analyses with blast vibration predictor models

  • Salman Ihsan;Shahab Saqib;Hafiz Muhammad Awais Rashid;Fawad S. Niazi;Mohsin Usman Qureshi
    • Geomechanics and Engineering
    • /
    • v.35 no.2
    • /
    • pp.121-133
    • /
    • 2023
  • The demand for cement and limestone crushed materials has increased many folds due to the tremendous increase in construction activities in Pakistan during the past few decades. The number of cement production industries has increased correspondingly, and so the rock-blasting operations at the limestone quarry sites. However, the safety procedures warranted at these sites for the blast-induced ground vibrations (BIGV) have not been adequately developed and/or implemented. Proper prediction and monitoring of BIGV are necessary to ensure the safety of structures in the vicinity of these quarry sites. In this paper, an attempt has been made to predict BIGV using artificial neural network (ANN) at three selected limestone quarries of Pakistan. The ANN has been developed in Python using Keras with sequential model and dense layers. The hyper parameters and neurons in each of the activation layers has been optimized using randomized and grid search method. The input parameters for the model include distance, a maximum charge per delay (MCPD), depth of hole, burden, spacing, and number of blast holes, whereas, peak particle velocity (PPV) is taken as the only output parameter. A total of 110 blast vibrations datasets were recorded from three different limestone quarries. The dataset has been divided into 85% for neural network training, and 15% for testing of the network. A five-layer ANN is trained with Rectified Linear Unit (ReLU) activation function, Adam optimization algorithm with a learning rate of 0.001, and batch size of 32 with the topology of 6-32-32-256-1. The blast datasets were utilized to compare the performance of ANN, multivariate regression analysis (MVRA), and empirical predictors. The performance was evaluated using the coefficient of determination (R2), mean absolute error (MAE), mean squared error (MSE), mean absolute percentage error (MAPE), and root mean squared error (RMSE)for predicted and measured PPV. To determine the relative influence of each parameter on the PPV, sensitivity analyses were performed for all input parameters. The analyses reveal that ANN performs superior than MVRA and other empirical predictors, andthat83% PPV is affected by distance and MCPD while hole depth, number of blast holes, burden and spacing contribute for the remaining 17%. This research provides valuable insights into improving safety measures and ensuring the structural integrity of buildings near limestone quarry sites.

A Study on the Prediction of Uniaxial Compressive Strength Classification Using Slurry TBM Data and Random Forest (이수식 TBM 데이터와 랜덤포레스트를 이용한 일축압축강도 분류 예측에 관한 연구)

  • Tae-Ho Kang;Soon-Wook Choi;Chulho Lee;Soo-Ho Chang
    • Tunnel and Underground Space
    • /
    • v.33 no.6
    • /
    • pp.547-560
    • /
    • 2023
  • Recently, research on predicting ground classification using machine learning techniques, TBM excavation data, and ground data is increasing. In this study, a multi-classification prediction study for uniaxial compressive strength (UCS) was conducted by applying random forest model based on a decision tree among machine learning techniques widely used in various fields to machine data and ground data acquired at three slurry shield TBM sites. For the classification prediction, the training and test data were divided into 7:3, and a grid search including 5-fold cross-validation was used to select the optimal parameter. As a result of classification learning for UCS using a random forest, the accuracy of the multi-classification prediction model was found to be high at both 0.983 and 0.982 in the training set and the test set, respectively. However, due to the imbalance in data distribution between classes, the recall was evaluated low in class 4. It is judged that additional research is needed to increase the amount of measured data of UCS acquired in various sites.

The Optimization of Ensembles for Bankruptcy Prediction (기업부도 예측 앙상블 모형의 최적화)

  • Myoung Jong Kim;Woo Seob Yun
    • Information Systems Review
    • /
    • v.24 no.1
    • /
    • pp.39-57
    • /
    • 2022
  • This paper proposes the GMOPTBoost algorithm to improve the performance of the AdaBoost algorithm for bankruptcy prediction in which class imbalance problem is inherent. AdaBoost algorithm has the advantage of providing a robust learning opportunity for misclassified samples. However, there is a limitation in addressing class imbalance problem because the concept of arithmetic mean accuracy is embedded in AdaBoost algorithm. GMOPTBoost can optimize the geometric mean accuracy and effectively solve the category imbalance problem by applying Gaussian gradient descent. The samples are constructed according to the following two phases. First, five class imbalance datasets are constructed to verify the effect of the class imbalance problem on the performance of the prediction model and the performance improvement effect of GMOPTBoost. Second, class balanced data are constituted through data sampling techniques to verify the performance improvement effect of GMOPTBoost. The main results of 30 times of cross-validation analyzes are as follows. First, the class imbalance problem degrades the performance of ensembles. Second, GMOPTBoost contributes to performance improvements of AdaBoost ensembles trained on imbalanced datasets. Third, Data sampling techniques have a positive impact on performance improvement. Finally, GMOPTBoost contributes to significant performance improvement of AdaBoost ensembles trained on balanced datasets.

MRI Predictors of Malignant Transformation in Patients with Inverted Papilloma: A Decision Tree Analysis Using Conventional Imaging Features and Histogram Analysis of Apparent Diffusion Coefficients

  • Chong Hyun Suh;Jeong Hyun Lee;Mi Sun Chung;Xiao Quan Xu;Yu Sub Sung;Sae Rom Chung;Young Jun Choi;Jung Hwan Baek
    • Korean Journal of Radiology
    • /
    • v.22 no.5
    • /
    • pp.751-758
    • /
    • 2021
  • Objective: Preoperative differentiation between inverted papilloma (IP) and its malignant transformation to squamous cell carcinoma (IP-SCC) is critical for patient management. We aimed to determine the diagnostic accuracy of conventional imaging features and histogram parameters obtained from whole tumor apparent diffusion coefficient (ADC) values to predict IP-SCC in patients with IP, using decision tree analysis. Materials and Methods: In this retrospective study, we analyzed data generated from the records of 180 consecutive patients with histopathologically diagnosed IP or IP-SCC who underwent head and neck magnetic resonance imaging, including diffusion-weighted imaging and 62 patients were included in the study. To obtain whole tumor ADC values, the region of interest was placed to cover the entire volume of the tumor. Classification and regression tree analyses were performed to determine the most significant predictors of IP-SCC among multiple covariates. The final tree was selected by cross-validation pruning based on minimal error. Results: Of 62 patients with IP, 21 (34%) had IP-SCC. The decision tree analysis revealed that the loss of convoluted cerebriform pattern and the 20th percentile cutoff of ADC were the most significant predictors of IP-SCC. With these decision trees, the sensitivity, specificity, accuracy, and C-statistics were 86% (18 out of 21; 95% confidence interval [CI], 65-95%), 100% (41 out of 41; 95% CI, 91-100%), 95% (59 out of 61; 95% CI, 87-98%), and 0.966 (95% CI, 0.912-1.000), respectively. Conclusion: Decision tree analysis using conventional imaging features and histogram analysis of whole volume ADC could predict IP-SCC in patients with IP with high diagnostic accuracy.

A Study on the Drug Classification Using Machine Learning Techniques (머신러닝 기법을 이용한 약물 분류 방법 연구)

  • Anmol Kumar Singh;Ayush Kumar;Adya Singh;Akashika Anshum;Pradeep Kumar Mallick
    • Advanced Industrial SCIence
    • /
    • v.3 no.2
    • /
    • pp.8-16
    • /
    • 2024
  • This paper shows the system of drug classification, the goal of this is to foretell the apt drug for the patients based on their demographic and physiological traits. The dataset consists of various attributes like Age, Sex, BP (Blood Pressure), Cholesterol Level, and Na_to_K (Sodium to Potassium ratio), with the objective to determine the kind of drug being given. The models used in this paper are K-Nearest Neighbors (KNN), Logistic Regression and Random Forest. Further to fine-tune hyper parameters using 5-fold cross-validation, GridSearchCV was used and each model was trained and tested on the dataset. To assess the performance of each model both with and without hyper parameter tuning evaluation metrics like accuracy, confusion matrices, and classification reports were used and the accuracy of the models without GridSearchCV was 0.7, 0.875, 0.975 and with GridSearchCV was 0.75, 1.0, 0.975. According to GridSearchCV Logistic Regression is the most suitable model for drug classification among the three-model used followed by the K-Nearest Neighbors. Also, Na_to_K is an essential feature in predicting the outcome.