• Title/Summary/Keyword: leave-one-out

Search Result 118, Processing Time 0.022 seconds

On Rice Estimator in Simple Regression Models with Outliers (이상치가 존재하는 단순회귀모형에서 Rice 추정량에 관해서)

  • Park, Chun Gun
    • The Korean Journal of Applied Statistics
    • /
    • v.26 no.3
    • /
    • pp.511-520
    • /
    • 2013
  • Detection outliers and robust estimators are crucial in regression models with outliers. In such studies the focus is on detecting outliers and estimating the coefficients using leave-one-out. Our study introduces Rice estimator which is an error variance estimator without estimating the coefficients. In particular, we study a comparison of the statistical properties for Rice estimator with and without outliers in simple regression models.

First Order Difference-Based Error Variance Estimator in Nonparametric Regression with a Single Outlier

  • Park, Chun-Gun
    • Communications for Statistical Applications and Methods
    • /
    • v.19 no.3
    • /
    • pp.333-344
    • /
    • 2012
  • We consider some statistical properties of the first order difference-based error variance estimator in nonparametric regression models with a single outlier. So far under an outlier(s) such difference-based estimators has been rarely discussed. We propose the first order difference-based estimator using the leave-one-out method to detect a single outlier and simulate the outlier detection in a nonparametric regression model with the single outlier. Moreover, the outlier detection works well. The results are promising even in nonparametric regression models with many outliers using some difference based estimators.

Applicability study on urban flooding risk criteria estimation algorithm using cross-validation and SVM (교차검증과 SVM을 이용한 도시침수 위험기준 추정 알고리즘 적용성 검토)

  • Lee, Hanseung;Cho, Jaewoong;Kang, Hoseon;Hwang, Jeonggeun
    • Journal of Korea Water Resources Association
    • /
    • v.52 no.12
    • /
    • pp.963-973
    • /
    • 2019
  • This study reviews a urban flooding risk criteria estimation model to predict risk criteria in areas where flood risk criteria are not precalculated by using watershed characteristic data and limit rainfall based on damage history. The risk criteria estimation model was designed using Support Vector Machine, one of the machine learning algorithms. The learning data consisted of regional limit rainfall and watershed characteristic. The learning data were applied to the SVM algorithm after normalization. We calculated the mean absolute error and standard deviation using Leave-One-Out and K-fold cross-validation algorithms and evaluated the performance of the model. In Leave-One-Out, models with small standard deviation were selected as the optimal model, and models with less folds were selected in the K-fold. The average accuracy of the selected models by rainfall duration is over 80%, suggesting that SVM can be used to estimate flooding risk criteria.

LS-SVM for large data sets

  • Park, Hongrak;Hwang, Hyungtae;Kim, Byungju
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.2
    • /
    • pp.549-557
    • /
    • 2016
  • In this paper we propose multiclassification method for large data sets by ensembling least squares support vector machines (LS-SVM) with principal components instead of raw input vector. We use the revised one-vs-all method for multiclassification, which is one of voting scheme based on combining several binary classifications. The revised one-vs-all method is performed by using the hat matrix of LS-SVM ensemble, which is obtained by ensembling LS-SVMs trained using each random sample from the whole large training data. The leave-one-out cross validation (CV) function is used for the optimal values of hyper-parameters which affect the performance of multiclass LS-SVM ensemble. We present the generalized cross validation function to reduce computational burden of leave-one-out CV functions. Experimental results from real data sets are then obtained to illustrate the performance of the proposed multiclass LS-SVM ensemble.

Docking, CoMFA and CoMSIA Studies of a Series of N-Benzoylated Phenoxazines and Phenothiazines Derivatives as Antiproliferative Agents

  • Ghasemi, Jahan B.;Aghaee, Elham;Jabbari, Ali
    • Bulletin of the Korean Chemical Society
    • /
    • v.34 no.3
    • /
    • pp.899-906
    • /
    • 2013
  • Using generated conformations from docking analysis by Gold algorithm, some 3D-QSAR models; CoMFA and CoMSIA have been created on 39 N-benzoylated phenoxazines and phenothiazines, including their S-oxidized analogues. These molecules inhibit the polymerization of tubulin into microtubules and thus they have been studied for the development of antitumor drugs. Training set for the CoMFA and CoMSIA models using 30 docked conformations gives $q^2$ Leave one out (LOO) values of 0.756 and 0.617, and $r^2$ ncv values of 0.988 and 0.956, respectively. The ability of prediction and robustness of the models were evaluated by test set, cross validation (leave-one-out and leave-ten-out), bootstrapping, and progressive scrambling approaches. The all-orientation search (AOS) was used to achieve the best orientation to minimize the effect of initial orientation of the structures. The docking results confirmed CoMFA and CoMSIA contour maps. The docking and 3D-QSAR studies were thoroughly interpreted and discussed and confirmed the experimental $pIC_{50}$ values.

Classification for intraclass correlation pattern by principal component analysis

  • Chung, Hie-Choon;Han, Chien-Pai
    • Journal of the Korean Data and Information Science Society
    • /
    • v.21 no.3
    • /
    • pp.589-595
    • /
    • 2010
  • In discriminant analysis, we consider an intraclass correlation pattern by principal component analysis. We assume that the two populations are equally likely and the costs of misclassification are equal. In this situation, we consider two procedures, i.e., the test and proportion procedures, for selecting the principal components in classifica-tion. We compare the regular classification method and the proposed two procedures. We consider two methods for estimating error rate, i.e., the leave-one-out method and the bootstrap method.

Prediction of retention of uncharged solutes in nanofiltration by means of molecular descriptors

  • Nowaczyk, Alicja;Nowaczyk, Jacek;Koter, Stanislaw
    • Membrane and Water Treatment
    • /
    • v.1 no.3
    • /
    • pp.181-192
    • /
    • 2010
  • A linear quantitative structure-property relationship (QSPR) model is presented for the prediction of rejection in permeation through membrane. The model was produced by using the multiple linear regression (MLR) technique on the database consisting of retention data of 25 pesticides in 4 different membrane separation experiments. Among the 3224 different physicochemical, topological and structural descriptors that were considered as inputs to the model only 50 were selected using several criteria of elimination. The physical meaning of chosen descriptor is discussed in detail. The accuracy of the proposed MLR models is illustrated using the following evaluation techniques: leave-one-out cross validation procedure, leave-many-out cross validation procedure and Y-randomization.

Data-Adaptive ECOC for Multicategory Classification

  • Seok, Kyung-Ha
    • Journal of the Korean Data and Information Science Society
    • /
    • v.19 no.1
    • /
    • pp.25-36
    • /
    • 2008
  • Error Correcting Output Codes (ECOC) can improve generalization performance when applied to multicategory classification problem. In this study we propose a new criterion to select hyperparameters included in ECOC scheme. Instead of margins of a data we propose to use the probability of misclassification error since it makes the criterion simple. Using this we obtain an upper bound of leave-one-out error of OVA(one vs all) method. Our experiments from real and synthetic data indicate that the bound leads to good estimates of parameters.

  • PDF

Web-based Image Retrieval and Classification System using Sketch Query (스케치 질의를 통한 웹기반 영상 검색과 분류 시스템)

  • 이상봉;고병철;변혜란
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.7_8
    • /
    • pp.703-712
    • /
    • 2003
  • With the explosive growth n the numbers and sizes of imaging technologies, Content-Based Image Retrieval (CBIR) has been attacked the interests of researchers in the fields of digital libraries, image processing, and database systems. In general, in the case of query-by-image, in user has to select an image from database to query, even though it is not his completely desired one. However, since query-by-sketch approach draws a query shape according to the user´s desire it can provide more high-level searching interface to the user compared to the query-b-image. As a result, query-by-sketch has been widely used. In this paper, we propose a Java-based image retrieval system that consists of sketch query and image classification. We use two features such as color histogram and Haar wavelets coefficients to search similar images. Then the Leave-One-Out method is used to classify database images. The categories of classification are photo & painting, city & nature, and sub-classification of nature image. By using the sketch query and image classification, w can offer convenient image retrieval interface to user and we can also reduce the searching time.

Spatial Prediction of Wind Speed Data (풍속 자료의 공간예측)

  • Jeong, Seung-Hwan;Park, Man-Sik;Kim, Kee-Whan
    • The Korean Journal of Applied Statistics
    • /
    • v.23 no.2
    • /
    • pp.345-356
    • /
    • 2010
  • In this paper, we introduce the linear regression model taking the parametric spatial association structure into account and employ it to five-year averaged wind speed data measured at 460 meteorological monitoring stations in South Korea. From the prediction map obtained by the model with spatial association parameters, we can see that inland area has smaller wind speed than coastal regions. When comparing the spatial linear regression model with classical one by using one-leave-out cross-validation, the former outperforms the latter in terms of similarity between the observations and the corresponding predictions and coverage rate of 95% prediction intervals.