• Title/Summary/Keyword: one-leave-out cross-validation

Search Result 45, Processing Time 0.017 seconds

LS-SVM for large data sets

  • Park, Hongrak;Hwang, Hyungtae;Kim, Byungju
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.2
    • /
    • pp.549-557
    • /
    • 2016
  • In this paper we propose multiclassification method for large data sets by ensembling least squares support vector machines (LS-SVM) with principal components instead of raw input vector. We use the revised one-vs-all method for multiclassification, which is one of voting scheme based on combining several binary classifications. The revised one-vs-all method is performed by using the hat matrix of LS-SVM ensemble, which is obtained by ensembling LS-SVMs trained using each random sample from the whole large training data. The leave-one-out cross validation (CV) function is used for the optimal values of hyper-parameters which affect the performance of multiclass LS-SVM ensemble. We present the generalized cross validation function to reduce computational burden of leave-one-out CV functions. Experimental results from real data sets are then obtained to illustrate the performance of the proposed multiclass LS-SVM ensemble.

Applicability study on urban flooding risk criteria estimation algorithm using cross-validation and SVM (교차검증과 SVM을 이용한 도시침수 위험기준 추정 알고리즘 적용성 검토)

  • Lee, Hanseung;Cho, Jaewoong;Kang, Hoseon;Hwang, Jeonggeun
    • Journal of Korea Water Resources Association
    • /
    • v.52 no.12
    • /
    • pp.963-973
    • /
    • 2019
  • This study reviews a urban flooding risk criteria estimation model to predict risk criteria in areas where flood risk criteria are not precalculated by using watershed characteristic data and limit rainfall based on damage history. The risk criteria estimation model was designed using Support Vector Machine, one of the machine learning algorithms. The learning data consisted of regional limit rainfall and watershed characteristic. The learning data were applied to the SVM algorithm after normalization. We calculated the mean absolute error and standard deviation using Leave-One-Out and K-fold cross-validation algorithms and evaluated the performance of the model. In Leave-One-Out, models with small standard deviation were selected as the optimal model, and models with less folds were selected in the K-fold. The average accuracy of the selected models by rainfall duration is over 80%, suggesting that SVM can be used to estimate flooding risk criteria.

Prediction of retention of uncharged solutes in nanofiltration by means of molecular descriptors

  • Nowaczyk, Alicja;Nowaczyk, Jacek;Koter, Stanislaw
    • Membrane and Water Treatment
    • /
    • v.1 no.3
    • /
    • pp.181-192
    • /
    • 2010
  • A linear quantitative structure-property relationship (QSPR) model is presented for the prediction of rejection in permeation through membrane. The model was produced by using the multiple linear regression (MLR) technique on the database consisting of retention data of 25 pesticides in 4 different membrane separation experiments. Among the 3224 different physicochemical, topological and structural descriptors that were considered as inputs to the model only 50 were selected using several criteria of elimination. The physical meaning of chosen descriptor is discussed in detail. The accuracy of the proposed MLR models is illustrated using the following evaluation techniques: leave-one-out cross validation procedure, leave-many-out cross validation procedure and Y-randomization.

Docking, CoMFA and CoMSIA Studies of a Series of N-Benzoylated Phenoxazines and Phenothiazines Derivatives as Antiproliferative Agents

  • Ghasemi, Jahan B.;Aghaee, Elham;Jabbari, Ali
    • Bulletin of the Korean Chemical Society
    • /
    • v.34 no.3
    • /
    • pp.899-906
    • /
    • 2013
  • Using generated conformations from docking analysis by Gold algorithm, some 3D-QSAR models; CoMFA and CoMSIA have been created on 39 N-benzoylated phenoxazines and phenothiazines, including their S-oxidized analogues. These molecules inhibit the polymerization of tubulin into microtubules and thus they have been studied for the development of antitumor drugs. Training set for the CoMFA and CoMSIA models using 30 docked conformations gives $q^2$ Leave one out (LOO) values of 0.756 and 0.617, and $r^2$ ncv values of 0.988 and 0.956, respectively. The ability of prediction and robustness of the models were evaluated by test set, cross validation (leave-one-out and leave-ten-out), bootstrapping, and progressive scrambling approaches. The all-orientation search (AOS) was used to achieve the best orientation to minimize the effect of initial orientation of the structures. The docking results confirmed CoMFA and CoMSIA contour maps. The docking and 3D-QSAR studies were thoroughly interpreted and discussed and confirmed the experimental $pIC_{50}$ values.

Feasibility study of deep learning based radiosensitivity prediction model of National Cancer Institute-60 cell lines using gene expression

  • Kim, Euidam;Chung, Yoonsun
    • Nuclear Engineering and Technology
    • /
    • v.54 no.4
    • /
    • pp.1439-1448
    • /
    • 2022
  • Background: We investigated the feasibility of in vitro radiosensitivity prediction with gene expression using deep learning. Methods: A microarray gene expression of the National Cancer Institute-60 (NCI-60) panel was acquired from the Gene Expression Omnibus. The clonogenic surviving fractions at an absorbed dose of 2 Gy (SF2) from previous publications were used to measure in vitro radiosensitivity. The radiosensitivity prediction model was based on the convolutional neural network. The 6-fold cross-validation (CV) was applied to train and validate the model. Then, the leave-one-out cross-validation (LOOCV) was applied by using the large-errored samples as a validation set, to determine whether the error was from the high bias of the folded CV. The criteria for correct prediction were defined as an absolute error<0.01 or a relative error<10%. Results: Of the 174 triplicated samples of NCI-60, 171 samples were correctly predicted with the folded CV. Through an additional LOOCV, one more sample was correctly predicted, representing a prediction accuracy of 98.85% (172 out of 174 samples). The average relative error and absolute errors of 172 correctly predicted samples were 1.351±1.875% and 0.00596±0.00638, respectively. Conclusion: We demonstrated the feasibility of a deep learning-based in vitro radiosensitivity prediction using gene expression.

Spatial Prediction of Wind Speed Data (풍속 자료의 공간예측)

  • Jeong, Seung-Hwan;Park, Man-Sik;Kim, Kee-Whan
    • The Korean Journal of Applied Statistics
    • /
    • v.23 no.2
    • /
    • pp.345-356
    • /
    • 2010
  • In this paper, we introduce the linear regression model taking the parametric spatial association structure into account and employ it to five-year averaged wind speed data measured at 460 meteorological monitoring stations in South Korea. From the prediction map obtained by the model with spatial association parameters, we can see that inland area has smaller wind speed than coastal regions. When comparing the spatial linear regression model with classical one by using one-leave-out cross-validation, the former outperforms the latter in terms of similarity between the observations and the corresponding predictions and coverage rate of 95% prediction intervals.

Estimation of Flood Quantile in Ungauged Watersheds for Flood Damage Analysis Based on Flood Index of Natural Flow (미계측 유역의 홍수피해분석을 위한 자연유량의 홍수지표 기반 확률홍수량 산정)

  • Chae, Byung Seok;Choi, Si Jung;Ahn, Jae Hyun;Kim, Tae-Woong
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.38 no.1
    • /
    • pp.175-182
    • /
    • 2018
  • In this study, flood quantiles were estimated at ungauged watersheds by adjusting the flood quantiles from the design rainfall - runoff analysis (DRRA) method based on regional frequency analysis. Comparing the flood frequency analysis (FFA) and DRRA, it was found that the flood quantiles estimated by the DRRA method were overestimated by 52%. In addition, a practical method was suggested to make an flood index using natural flows to apply the regional frequency analysis (RFA) to ungauged watersheds. Considering the relationships among DRRA, FFA, and RFA, we derived an adjusting formula that can be applied to estimate flood quantiles at ungauged watersheds. We also employed Leave-One-Out Cross-Validation scheme and skill score to verify the method proposed in this study. As a result, the proposed model increased the accuracy by 23.2% compared to the existing DRRA method.

Searching for Optimal Ensemble of Feature-classifier Pairs in Gene Expression Profile using Genetic Algorithm (유전알고리즘을 이용한 유전자발현 데이타상의 특징-분류기쌍 최적 앙상블 탐색)

  • 박찬호;조성배
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.4
    • /
    • pp.525-536
    • /
    • 2004
  • Gene expression profile is numerical data of gene expression level from organism, measured on the microarray. Generally, each specific tissue indicates different expression levels in related genes, so that we can classify disease with gene expression profile. Because all genes are not related to disease, it is needed to select related genes that is called feature selection, and it is needed to classify selected genes properly. This paper Proposes GA based method for searching optimal ensemble of feature-classifier pairs that are composed with seven feature selection methods based on correlation, similarity, and information theory, and six representative classifiers. In experimental results with leave-one-out cross validation on two gene expression Profiles related to cancers, we can find ensembles that produce much superior to all individual feature-classifier fairs for Lymphoma dataset and Colon dataset.

Spatial merging of satellite based soil moisture and in-situ soil moisture using conditional merging technique (조건부 합성방법을 이용한 위성관측 토양수분과 지상관측 토양수분의 합성)

  • Lee, Jaehyeon;Choi, Minha;Kim, Dongkyun
    • Journal of Korea Water Resources Association
    • /
    • v.49 no.3
    • /
    • pp.263-273
    • /
    • 2016
  • This study applied conditional merging (CM) spatial interpolation technique to obtain the satellite and in-situ composite soil moisture data. For the analysis, 24 gages of hourly in-situ data sets from the Rural Development Administration (RDA) of Korea and the satellite soil moisture data retrieved from Advanced Microwave Scanning Radiometer-Earth observing system (AMSR-E) were used. In order to verify the performance of the CM method, leave-one-out cross validation was used. The cross validation result was spatially interpolated to figure out spatial correlation of the CM method. The results derived from this study are as follow: (1) The CM method produced better soil moisture map over Korean Peninsula than AMSR-E did for the over 100 days out of total 113 days considered for the analysis. (2) The method of CM showed high correlation with gage density and better performance on the western side of Korean peninsula due to high spatial gauge density. (3) The performance of CM is not affected by the non-rainy season unlike to AMSR-E data is. Overall, the result of this study indicates that the CM method can be applied for predicting soil moisture at ungaged locations.

Multimodal Parametric Fusion for Emotion Recognition

  • Kim, Jonghwa
    • International journal of advanced smart convergence
    • /
    • v.9 no.1
    • /
    • pp.193-201
    • /
    • 2020
  • The main objective of this study is to investigate the impact of additional modalities on the performance of emotion recognition using speech, facial expression and physiological measurements. In order to compare different approaches, we designed a feature-based recognition system as a benchmark which carries out linear supervised classification followed by the leave-one-out cross-validation. For the classification of four emotions, it turned out that bimodal fusion in our experiment improves recognition accuracy of unimodal approach, while the performance of trimodal fusion varies strongly depending on the individual. Furthermore, we experienced extremely high disparity between single class recognition rates, while we could not observe a best performing single modality in our experiment. Based on these observations, we developed a novel fusion method, called parametric decision fusion (PDF), which lies in building emotion-specific classifiers and exploits advantage of a parametrized decision process. By using the PDF scheme we achieved 16% improvement in accuracy of subject-dependent recognition and 10% for subject-independent recognition compared to the best unimodal results.