• Title/Summary/Keyword: Statistical Model Validation

검색결과 261건 처리시간 0.025초

Prediction of the compressive strength of fly ash geopolymer concrete using gene expression programming

  • Alkroosh, Iyad S.;Sarker, Prabir K.
    • Computers and Concrete
    • /
    • 제24권4호
    • /
    • pp.295-302
    • /
    • 2019
  • Evolutionary algorithms based on conventional statistical methods such as regression and classification have been widely used in data mining applications. This work involves application of gene expression programming (GEP) for predicting compressive strength of fly ash geopolymer concrete, which is gaining increasing interest as an environmentally friendly alternative of Portland cement concrete. Based on 56 test results from the existing literature, a model was obtained relating the compressive strength of fly ash geopolymer concrete with the significantly influencing mix design parameters. The predictions of the model in training and validation were evaluated. The coefficient of determination ($R^2$), mean (${\mu}$) and standard deviation (${\sigma}$) were 0.89, 1.0 and 0.12 respectively, for the training set, and 0.89, 0.99 and 0.13 respectively, for the validation set. The error of prediction by the model was also evaluated and found to be very low. This indicates that the predictions of GEP model are in close agreement with the experimental results suggesting this as a promising method for compressive strength prediction of fly ash geopolymer concrete.

신용평가모형에서 콜모고로프-스미르노프 검정기준의 문제점 (Some Issues on Criterion for Kolmogorov-Smirnov Test in Credit Rating Model Validation)

  • 박용석;홍종선
    • Communications for Statistical Applications and Methods
    • /
    • 제15권6호
    • /
    • pp.1013-1026
    • /
    • 2008
  • 신용평가모형의 판별력에 대한 적합성 검정방법으로 콜모고로프-스미르노프(K-S) 통계량이 널리 사용되고 있다. K-S 통계량을 통한 모형의 판별력 판단기준으로는 표본수에 의존하는 K-S 검정통계량의 임계값보다 매우 큰 기준인 $0.3{\sim}0.4$의 수준이 일반적으로 적용된다. 본 논문에서는 모의실험을 통해 일반적 판단기준의 타당성을 살펴보았다. 모의실험 결과 국내에서 개발된 대부분의 신용평가모형의 결과를 바탕으로 구한 K-S 통계량은 현재 적용하고 있는 판단기준보다 큰 값을 갖는다는 것을 발견하였다. 따라서 어떠한 신용평가모형 이라도 좋은 판별력을 갖는다고 해석할 수 있다. 본 연구에서는 표본크기와 불량률 그리고 제II종 오류율에 따른 대안적인 임계값을 제안한다.

Semiparametric Regression Splines in Matched Case-Control Studies

  • Kim, In-Young;Carroll, Raymond J.;Cohen, Noah
    • 한국통계학회:학술대회논문집
    • /
    • 한국통계학회 2003년도 춘계 학술발표회 논문집
    • /
    • pp.167-170
    • /
    • 2003
  • We develop semiparametric methods for matched case-control studies using regression splines. Three methods are developed: an approximate crossvalidation scheme to estimate the smoothing parameter inherent in regression splines, as well as Monte Carlo Expectation Maximization (MCEM) and Bayesian methods to fit the regression spline model. We compare the approximate cross-validation approach, MCEM and Bayesian approaches using simulation, showing that they appear approximately equally efficient, with the approximate cross-validation method being computationally the most convenient. An example from equine epidemiology that motivated the work is used to demonstrate our approaches.

  • PDF

Risk Prediction Using Genome-Wide Association Studies on Type 2 Diabetes

  • Choi, Sungkyoung;Bae, Sunghwan;Park, Taesung
    • Genomics & Informatics
    • /
    • 제14권4호
    • /
    • pp.138-148
    • /
    • 2016
  • The success of genome-wide association studies (GWASs) has enabled us to improve risk assessment and provide novel genetic variants for diagnosis, prevention, and treatment. However, most variants discovered by GWASs have been reported to have very small effect sizes on complex human diseases, which has been a big hurdle in building risk prediction models. Recently, many statistical approaches based on penalized regression have been developed to solve the "large p and small n" problem. In this report, we evaluated the performance of several statistical methods for predicting a binary trait: stepwise logistic regression (SLR), least absolute shrinkage and selection operator (LASSO), and Elastic-Net (EN). We first built a prediction model by combining variable selection and prediction methods for type 2 diabetes using Affymetrix Genome-Wide Human SNP Array 5.0 from the Korean Association Resource project. We assessed the risk prediction performance using area under the receiver operating characteristic curve (AUC) for the internal and external validation datasets. In the internal validation, SLR-LASSO and SLR-EN tended to yield more accurate predictions than other combinations. During the external validation, the SLR-SLR and SLR-EN combinations achieved the highest AUC of 0.726. We propose these combinations as a potentially powerful risk prediction model for type 2 diabetes.

항공기 날개의 통계적 중량 예측식 도출 연구 (A Study on Deriving the Statistical Weight Estimation Formula for an Aircraft Wing)

  • 김석범;정한규;황호연
    • 한국항공우주학회지
    • /
    • 제46권1호
    • /
    • pp.32-40
    • /
    • 2018
  • 본 논문에서는 개념설계 단계에서 주로 사용되는 통계적 중량 예측식 도출 방법에 관한 연구를 수행하였으며 Microsoft Excel을 이용해 이를 프로그램화하고 제트 여객기에 적용하여 검증하였다. 기존 중량 예측식들의 변수들을 참고하여 데이터베이스를 구축하였고 이를 사용하여 제트 여객기 날개 중량 예측식을 모델링하였다. 모델의 과적합 문제를 해결하기 위해 K-fold cross validation 방법을 사용하여 모델을 평가하였다.

Kernel-Trick Regression and Classification

  • Huh, Myung-Hoe
    • Communications for Statistical Applications and Methods
    • /
    • 제22권2호
    • /
    • pp.201-207
    • /
    • 2015
  • Support vector machine (SVM) is a well known kernel-trick supervised learning tool. This study proposes a working scheme for kernel-trick regression and classification (KtRC) as a SVM alternative. KtRC fits the model on a number of random subsamples and selects the best model. Empirical examples and a simulation study indicate that KtRC's performance is comparable to SVM.

CGCM3 전지구모형에 의한 한반도 미래 일평균 풍속의 평가 (Estimation of Future Daily Wind Speed over South Korea Using the CGCM3 Model)

  • 함희정
    • 산업기술연구
    • /
    • 제33권A호
    • /
    • pp.41-48
    • /
    • 2013
  • A statistical downscaling methodology has been developed to investigate future daily wind speeds over South Korea. This methodology includes calibration of the statistical downscaling model by using large-scale atmospheric variables encompassing NCEP/NCAR reanalysis data, validation of the model for the calibration period, and estimation of the future wind speed based on the general circulation model (GCM) outputs of scenario A1B of the CGCM3. Based on the scenario A1B of the CGCM3 model, the potential impacts of climate change on the daily surface wind speed is relatively small (+/- 1m/s) in South Korea.

  • PDF

공기압 실린더 가속모형의 유효성 평가에 관한 연구 (A Study on Validation of Accelerated Model for Pneumatic Cylinder)

  • 강보식;김형의;장무성;송창섭
    • 대한기계학회논문집A
    • /
    • 제33권10호
    • /
    • pp.1139-1143
    • /
    • 2009
  • Pneumatic cylinder is widely used as key component of various industry fields just like automation production line. Recently, people begin to pay attention to reduce development period and cost of pneumatic cylinder so research requirements of accelerated life test of pneumatic cylinder have been increased more than ever. In this research, we shall evaluate availability of acceleration model by statistical analysis of acceleration model's predicted value and life data which acquired in a real operation condition after finish accelerated life test of pneumatic cylinder. Also to predict the life of pneumatic cylinder in the operation condition we shall develop new acceleration model equations.

Semi-supervised learning using similarity and dissimilarity

  • Seok, Kyung-Ha
    • Journal of the Korean Data and Information Science Society
    • /
    • 제22권1호
    • /
    • pp.99-105
    • /
    • 2011
  • We propose a semi-supervised learning algorithm based on a form of regularization that incorporates similarity and dissimilarity penalty terms. Our approach uses a graph-based encoding of similarity and dissimilarity. We also present a model-selection method which employs cross-validation techniques to choose hyperparameters which affect the performance of the proposed method. Simulations using two types of dat sets demonstrate that the proposed method is promising.

Robust varying coefficient model using L1 regularization

  • Hwang, Changha;Bae, Jongsik;Shim, Jooyong
    • Journal of the Korean Data and Information Science Society
    • /
    • 제27권4호
    • /
    • pp.1059-1066
    • /
    • 2016
  • In this paper we propose a robust version of varying coefficient models, which is based on the regularized regression with L1 regularization. We use the iteratively reweighted least squares procedure to solve L1 regularized objective function of varying coefficient model in locally weighted regression form. It provides the efficient computation of coefficient function estimates and the variable selection for given value of smoothing variable. We present the generalized cross validation function and Akaike information type criterion for the model selection. Applications of the proposed model are illustrated through the artificial examples and the real example of predicting the effect of the input variables and the smoothing variable on the output.