• Title/Summary/Keyword: Statistical Model Validation

Search Result 261, Processing Time 0.026 seconds

Development of a High-Resolution Near-Surface Air Temperature Downscale Model (고해상도 지상 기온 상세화 모델 개발)

  • Lee, Doo-Il;Lee, Sang-Hyun;Jeong, Hyeong-Se;Kim, Yeon-Hee
    • Atmosphere
    • /
    • v.31 no.5
    • /
    • pp.473-488
    • /
    • 2021
  • A new physical/statistical diagnostic downscale model has been developed for use to improve near-surface air temperature forecasts. The model includes a series of physical and statistical correction methods that account for un-resolved topographic and land-use effects as well as statistical bias errors in a low-resolution atmospheric model. Operational temperature forecasts of the Local Data Assimilation and Prediction System (LDAPS) were downscaled at 100 m resolution for three months, which were used to validate the model's physical and statistical correction methods and to compare its performance with the forecasts of the Korea Meteorological Administration Post-processing (KMAP) system. The validation results showed positive impacts of the un-resolved topographic and urban effects (topographic height correction, valley cold air pool effect, mountain internal boundary layer formation effect, urban land-use effect) in complex terrain areas. In addition, the statistical bias correction of the LDAPS model were efficient in reducing forecast errors of the near-surface temperatures. The new high-resolution downscale model showed better agreement against Korean 584 meteorological monitoring stations than the KMAP, supporting the importance of the new physical and statistical correction methods. The new physical/statistical diagnostic downscale model can be a useful tool in improving near-surface temperature forecasts and diagnostics over complex terrain areas.

Fault Prediction Using Statistical and Machine Learning Methods for Improving Software Quality

  • Malhotra, Ruchika;Jain, Ankita
    • Journal of Information Processing Systems
    • /
    • v.8 no.2
    • /
    • pp.241-262
    • /
    • 2012
  • An understanding of quality attributes is relevant for the software organization to deliver high software reliability. An empirical assessment of metrics to predict the quality attributes is essential in order to gain insight about the quality of software in the early phases of software development and to ensure corrective actions. In this paper, we predict a model to estimate fault proneness using Object Oriented CK metrics and QMOOD metrics. We apply one statistical method and six machine learning methods to predict the models. The proposed models are validated using dataset collected from Open Source software. The results are analyzed using Area Under the Curve (AUC) obtained from Receiver Operating Characteristics (ROC) analysis. The results show that the model predicted using the random forest and bagging methods outperformed all the other models. Hence, based on these results it is reasonable to claim that quality models have a significant relevance with Object Oriented metrics and that machine learning methods have a comparable performance with statistical methods.

Estimating the compressive strength of HPFRC containing metallic fibers using statistical methods and ANNs

  • Perumal, Ramadoss;Prabakaran, V.
    • Advances in concrete construction
    • /
    • v.10 no.6
    • /
    • pp.479-488
    • /
    • 2020
  • The experimental and numerical works were carried out on high performance fiber reinforced concrete (HPFRC) with w/cm ratios ranging from 0.25 to 0.40, fiber volume fraction (Vf)=0-1.5% and 10% silica fume replacement. Improvements in compressive and flexural strengths obtained for HPFRC are moderate and significant, respectively, Empirical equations developed for the compressive strength and flexural strength of HPFRC as a function of fiber volume fraction. A relation between flexural strength and compressive strength of HPFRC with R=0.78 was developed. Due to the complex mix proportions and non-linear relationship between the mix proportions and properties, models with reliable predictive capabilities are not developed and also research on HPFRC was empirical. In this paper due to the inadequacy of present method, a back propagation-neural network (BP-NN) was employed to estimate the 28-day compressive strength of HPFRC mixes. BP-NN model was built to implement the highly non-linear relationship between the mix proportions and their properties. This paper describes the data sets collected, training of ANNs and comparison of the experimental results obtained for various mixtures. On statistical analyses of collected data, a multiple linear regression (MLR) model with R2=0.78 was developed for the prediction of compressive strength of HPFRC mixes, and average absolute error (AAE) obtained is 6.5%. On validation of the data sets by NNs, the error range was within 2% of the actual values. ANN model has given the significant degree of accuracy and reliability compared to the MLR model. ANN approach can be effectively used to estimate the 28-day compressive strength of fibrous concrete mixes and is practical.

Doubly penalized kernel method for heteroscedastic autoregressive datay

  • Cho, Dae-Hyeon;Shim, Joo-Yong;Seok, Kyung-Ha
    • Journal of the Korean Data and Information Science Society
    • /
    • v.21 no.1
    • /
    • pp.155-162
    • /
    • 2010
  • In this paper we propose a doubly penalized kernel method which estimates both the mean function and the variance function simultaneously by kernel machines for heteroscedastic autoregressive data. We also present the model selection method which employs the cross validation techniques for choosing the hyper-parameters which aect the performance of proposed method. Simulated examples are provided to indicate the usefulness of proposed method for the estimation of mean and variance functions.

A Comparison Study on Statistical Modeling Methods (통계모델링 방법의 비교 연구)

  • Noh, Yoojeong
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.17 no.5
    • /
    • pp.645-652
    • /
    • 2016
  • The statistical modeling of input random variables is necessary in reliability analysis, reliability-based design optimization, and statistical validation and calibration of analysis models of mechanical systems. In statistical modeling methods, there are the Akaike Information Criterion (AIC), AIC correction (AICc), Bayesian Information Criterion, Maximum Likelihood Estimation (MLE), and Bayesian method. Those methods basically select the best fitted distribution among candidate models by calculating their likelihood function values from a given data set. The number of data or parameters in some methods are considered to identify the distribution types. On the other hand, the engineers in a real field have difficulties in selecting the statistical modeling method to obtain a statistical model of the experimental data because of a lack of knowledge of those methods. In this study, commonly used statistical modeling methods were compared using statistical simulation tests. Their advantages and disadvantages were then analyzed. In the simulation tests, various types of distribution were assumed as populations and the samples were generated randomly from them with different sample sizes. Real engineering data were used to verify each statistical modeling method.

Predictive Modeling of the Growth and Survival of Listeria monocytogenes Using a Response Surface Model

  • Jin, Sung-Sik;Jin, Yong-Guo;Yoon, Ki-Sun;Woo, Gun-Jo;Hwang, In-Gyun;Bahk, Gyung-Jin;Oh, Deog-Hwan
    • Food Science and Biotechnology
    • /
    • v.15 no.5
    • /
    • pp.715-720
    • /
    • 2006
  • This study was performed to develop a predictive model for the growth kinetics of Listeria monocytogenes in tryptic soy broth (TSB) using a response surface model with a combination of potassium lactate (PL), temperature, and pH. The growth parameters, specific growth rate (SGR), and lag time (LT) were obtained by fitting the data into the Gompertz equation and showed high fitness with a correlation coefficient of $R^2{\geq}0.9192$. The polynomial model was identified as an appropriate secondary model for SGR and LT based on the coefficient of determination for the developed model ($R^2\;=\;0.97$ for SGR and $R^2\;=\;0.86$ for LT). The induced values that were calculated using the developed secondary model indicated that the growth kinetics of L. monocytogenes were dependent on storage temperature, pH, and PL. Finally, the predicted model was validated using statistical indicators, such as coefficient of determination, mean square error, bias factor, and accuracy factor. Validation of the model demonstrates that the overall prediction agreed well with the observed data. However, the model developed for SGR showed better predictive ability than the model developed for LT, which can be seen from its statistical validation indices, with the exception of the bias factor ($B_f$ was 0.6 for SGR and 0.97 for LT).

Partially linear support vector orthogonal quantile regression with measurement errors

  • Hwang, Changha
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.1
    • /
    • pp.209-216
    • /
    • 2015
  • Quantile regression models with covariate measurement errors have received a great deal of attention in both the theoretical and the applied statistical literature. A lot of effort has been devoted to develop effective estimation methods for such quantile regression models. In this paper we propose the partially linear support vector orthogonal quantile regression model in the presence of covariate measurement errors. We also provide a generalized approximate cross-validation method for choosing the hyperparameters and the ratios of the error variances which affect the performance of the proposed model. The proposed model is evaluated through simulations.

APPLICATION AND CROSS-VALIDATION OF SPATIAL LOGISTIC MULTIPLE REGRESSION FOR LANDSLIDE SUSCEPTIBILITY ANALYSIS

  • LEE SARO
    • Proceedings of the KSRS Conference
    • /
    • 2004.10a
    • /
    • pp.302-305
    • /
    • 2004
  • The aim of this study is to apply and crossvalidate a spatial logistic multiple-regression model at Boun, Korea, using a Geographic Information System (GIS). Landslide locations in the Boun area were identified by interpretation of aerial photographs and field surveys. Maps of the topography, soil type, forest cover, geology, and land-use were constructed from a spatial database. The factors that influence landslide occurrence, such as slope, aspect, and curvature of topography, were calculated from the topographic database. Texture, material, drainage, and effective soil thickness were extracted from the soil database, and type, diameter, and density of forest were extracted from the forest database. Lithology was extracted from the geological database and land-use was classified from the Landsat TM image satellite image. Landslide susceptibility was analyzed using landslide-occurrence factors by logistic multiple-regression methods. For validation and cross-validation, the result of the analysis was applied both to the study area, Boun, and another area, Youngin, Korea. The validation and cross-validation results showed satisfactory agreement between the susceptibility map and the existing data with respect to landslide locations. The GIS was used to analyze the vast amount of data efficiently, and statistical programs were used to maintain specificity and accuracy.

  • PDF

통계적 모델 검증 및 보정 기술

  • Yun, Byeong-Dong;Yun, Heon-Jun;Park, Jeong-Ho
    • Journal of the KSME
    • /
    • v.54 no.2
    • /
    • pp.52-57
    • /
    • 2014
  • 이 글에서는 컴퓨터 이용 공학(CAE : Computer Aided Engineering) 기술이 발달함에 따라 갈수록 중요해지는 통계적 모델 검증 및 보정(Statistical Model Validation and Calibration)을 수행하는 데 필요한 통계적인 기법들을 각 단계별로 상세하게 소개하고, 실제 제품 개발에 적용하는 데 있어서 예상되는 어려움과 향후 연구방향을 제시하고자 한다.

  • PDF

Penalized logistic regression models for determining the discharge of dyspnea patients (호흡곤란 환자 퇴원 결정을 위한 벌점 로지스틱 회귀모형)

  • Park, Cheolyong;Kye, Myo Jin
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.1
    • /
    • pp.125-133
    • /
    • 2013
  • In this paper, penalized binary logistic regression models are employed as statistical models for determining the discharge of 668 patients with a chief complaint of dyspnea based on 11 blood tests results. Specifically, the ridge model based on $L^2$ penalty and the Lasso model based on $L^1$ penalty are considered in this paper. In the comparison of prediction accuracy, our models are compared with the logistic regression models with all 11 explanatory variables and the selected variables by variable selection method. The results show that the prediction accuracy of the ridge logistic regression model is the best among 4 models based on 10-fold cross-validation.