• Title/Summary/Keyword: Statistical Model Validation

Search Result 261, Processing Time 0.025 seconds

분석용 정밀 워게임모형의 통계적 진단 및 활용

  • 김윤태;고원;박혜련
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2004.11a
    • /
    • pp.117-121
    • /
    • 2004
  • 분석용 정밀 워게임 시뮬레이션 모형에서는 '모형운영 결과와 실제(또는 실험) 결과를 비교' 하는 통상적인 타당성 척도의 적용이 불가능함에 따라 워게임모형 운영환경에 적합한 새로운 개념의 타당성 척도로서 VEA(Validity for Exploratory Analysis), VSA(Validity subject to Assumption) 등의 개념을 도입하고 이를 탐색적으로 점검하는 방안을 제시한다. 분석용 워게임모형 활용에 있어 또 하나의 걸림돌은 1)시나리오 및 상황의 가변성, 2)무기체계 및 장비 성능에 대한 불확실성, 3)묘사범위 제한 및 논리의 부정확성으로 인한 오류 등으로 엄청난 불확실성(uncertainty)을 기본적으로 내포함에 따라 구체적 의사결정을 위한 종합적 결론 도출이 어렵다는 점이다. 본 연구에서는 이를 메타모델(Meta model) 즉 워게임모형 입출력 자료의 관계를 묘사한 통계적 모형을 구축하고 이를 기반으로 다양한 불확실성 하에서 관심변수간의 관계를 종합적으로 도출하고자 하는 '관련공간모의(Relevant Simulation)' 방안을 제시한다. 이와 같은 방안들은 SVAP(Statistical Validation and Aggregation Procedure)라는 하나의 종합된 절차로서 제시된다.

  • PDF

A Study on Exploration of the Recommended Model of Decision Tree to Predict a Hard-to-Measure Mesurement in Anthropometric Survey (인체측정조사에서 측정곤란부위 예측을 위한 의사결정나무 추천 모형 탐지에 관한 연구)

  • Choi, J.H.;Kim, S.K.
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.5
    • /
    • pp.923-935
    • /
    • 2009
  • This study aims to explore a recommended model of decision tree to predict a hard-to-measure measurement in anthropometric survey. We carry out an experiment on cross validation study to obtain a recommened model of decision tree. We use three split rules of decision tree, those are CHAID, Exhaustive CHAID, and CART. CART result is the best one in real world data.

Application of Statistical and Machine Learning Techniques for Habitat Potential Mapping of Siberian Roe Deer in South Korea

  • Lee, Saro;Rezaie, Fatemeh
    • Proceedings of the National Institute of Ecology of the Republic of Korea
    • /
    • v.2 no.1
    • /
    • pp.1-14
    • /
    • 2021
  • The study has been carried out with an objective to prepare Siberian roe deer habitat potential maps in South Korea based on three geographic information system-based models including frequency ratio (FR) as a bivariate statistical approach as well as convolutional neural network (CNN) and long short-term memory (LSTM) as machine learning algorithms. According to field observations, 741 locations were reported as roe deer's habitat preferences. The dataset were divided with a proportion of 70:30 for constructing models and validation purposes. Through FR model, a total of 10 influential factors were opted for the modelling process, namely altitude, valley depth, slope height, topographic position index (TPI), topographic wetness index (TWI), normalized difference water index, drainage density, road density, radar intensity, and morphological feature. The results of variable importance analysis determined that TPI, TWI, altitude and valley depth have higher impact on predicting. Furthermore, the area under the receiver operating characteristic (ROC) curve was applied to assess the prediction accuracies of three models. The results showed that all the models almost have similar performances, but LSTM model had relatively higher prediction ability in comparison to FR and CNN models with the accuracy of 76% and 73% during the training and validation process. The obtained map of LSTM model was categorized into five classes of potentiality including very low, low, moderate, high and very high with proportions of 19.70%, 19.81%, 19.31%, 19.86%, and 21.31%, respectively. The resultant potential maps may be valuable to monitor and preserve the Siberian roe deer habitats.

Kernel Poisson Regression for Longitudinal Data

  • Shim, Joo-Yong;Seok, Kyung-Ha
    • Journal of the Korean Data and Information Science Society
    • /
    • v.19 no.4
    • /
    • pp.1353-1360
    • /
    • 2008
  • An estimating procedure is introduced for the nonlinear mixed-effect Poisson regression, for longitudinal study, where data from different subjects are independent whereas data from same subject are correlated. The proposed procedure provides the estimates of the mean function of the response variables, where the canonical parameter is related to the input vector in a nonlinear form. The generalized cross validation function is introduced to choose optimal hyper-parameters in the procedure. Experimental results are then presented, which indicate the performance of the proposed estimating procedure.

  • PDF

Semiparametric kernel logistic regression with longitudinal data

  • Shim, Joo-Yong;Seok, Kyung-Ha
    • Journal of the Korean Data and Information Science Society
    • /
    • v.23 no.2
    • /
    • pp.385-392
    • /
    • 2012
  • Logistic regression is a well known binary classification method in the field of statistical learning. Mixed-effect regression models are widely used for the analysis of correlated data such as those found in longitudinal studies. We consider kernel extensions with semiparametric fixed effects and parametric random effects for the logistic regression. The estimation is performed through the penalized likelihood method based on kernel trick, and our focus is on the efficient computation and the effective hyperparameter selection. For the selection of optimal hyperparameters, cross-validation techniques are employed. Numerical results are then presented to indicate the performance of the proposed procedure.

A biomechanical model of lower extremity for seated operators (착좌시 하지 동작의 생체역학적 모델)

  • 황규성;이동춘;최재호
    • Journal of the Ergonomics Society of Korea
    • /
    • v.11 no.1
    • /
    • pp.81-92
    • /
    • 1992
  • A two-dimensional static biochemical model of lower extremity in the seated posture was developed to assess muscular activities of lower extremity required for a variety of foot pedal operations. We found that the double linear optimization method that has been used for modelling articulated body segments does no predict the forces generated by biarticular muscles reasonably, so the revised double linear optimization scheme was used to consider the synergistic effects of biarticular muscles in our model, assuming that the muscle forces are distributed proportionally based on their physiological cross sectional area. The model incorporated three rigid body se- gments with six muscles to represnet lower extremity. For the model validation, three male subjects performed the experiments in which EMG activities of six lower extremity muscles were measured. Predicted muscle forces were compare with the corresponding EMG amplitudes and it showed no statistical difference. The model being developed can be used to design and assess pedal and foot-related tool design.

  • PDF

Two-Stage Logistic Regression for Cancer Classi cation and Prediction from Copy-Numbe Changes in cDNA Microarray-Based Comparative Genomic Hybridization

  • Kim, Mi-Jung
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.5
    • /
    • pp.847-859
    • /
    • 2011
  • cDNA microarray-based comparative genomic hybridization(CGH) data includes low-intensity spots and thus a statistical strategy is needed to detect subtle differences between different cancer classes. In this study, genes displaying a high frequency of alteration in one of the different classes were selected among the pre-selected genes that show relatively large variations between genes compared to total variations. Utilizing copy-number changes of the selected genes, this study suggests a statistical approach to predict patients' classes with increased performance by pre-classifying patients with similar genetic alteration scores. Two-stage logistic regression model(TLRM) was suggested to pre-classify homogeneous patients and predict patients' classes for cancer prediction; a decision tree(DT) was combined with logistic regression on the set of informative genes. TLRM was constructed in cDNA microarray-based CGH data from the Cancer Metastasis Research Center(CMRC) at Yonsei University; it predicted the patients' clinical diagnoses with perfect matches (except for one patient among the high-risk and low-risk classified patients where the performance of predictions is critical due to the high sensitivity and specificity requirements for clinical treatments. Accuracy validated by leave-one-out cross-validation(LOOCV) was 83.3% while other classification methods of CART and DT performed as comparisons showed worse performances than TLRM.

Validation of nuclide depletion capabilities in Monte Carlo code MCS

  • Ebiwonjumi, Bamidele;Lee, Hyunsuk;Kim, Wonkyeong;Lee, Deokjung
    • Nuclear Engineering and Technology
    • /
    • v.52 no.9
    • /
    • pp.1907-1916
    • /
    • 2020
  • In this work, the depletion capability implemented in Monte Carlo code MCS is investigated to predict the isotopic compositions of spent nuclear fuel (SNF). By comparison of MCS calculation results to post irradiation examination (PIE) data obtained from one pressurized water reactor (PWR), the validation of this capability is conducted. The depletion analysis is performed with the ENDF/B-VII.1 library and a fuel assembly model. The transmutation equation is solved by the Chebyshev Rational Approximation Method (CRAM) with a depletion chain of 3820 isotopes. 18 actinides and 19 fission products are analyzed in 14 SNF samples. The effect of statistical uncertainties on the calculated number densities is discussed. On average, most of the actinides and fission products analyzed are predicted within ±6% of the experiment. MCS depletion results are also compared to other depletion codes based on publicly reported information in literature. The code-to-code analysis shows comparable accuracy. Overall, it is demonstrated that the depletion capability in MCS can be reliably applied in the prediction of SNF isotopic inventory.

Prediction of movie audience numbers using hybrid model combining GLS and Bass models (GLS와 Bass 모형을 결합한 하이브리드 모형을 이용한 영화 관객 수 예측)

  • Kim, Bokyung;Lim, Changwon
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.4
    • /
    • pp.447-461
    • /
    • 2018
  • Domestic film industry sales are increasing every year. Theaters are the primary sales channels for movies and the number of audiences using the theater affects additional selling rights. Therefore, the number of audiences using the theater is an important factor directly linked to movie industry sales. In this paper we consider a hybrid model that combines a multiple linear regression model and the Bass model to predict the audience numbers for a specific day. By combining the two models, the predictive value of the regression analysis was corrected to that of the Bass model. In the analysis, three films with different release dates were used. All subset regression method is used to generate all possible combinations and 5-fold cross validation to estimate the model 5 times. In this case, the predicted value is obtained from the model with the smallest root mean square error and then combined with the predicted value of the Bass model to obtain the final predicted value. With the existence of past data, it was confirmed that the weight of the Bass model increases and the compensation is added to the predicted value.

Derivation of IDF Curve by the Simulation of Hourly Precipitation using Nonhomogeneous Markov Chain Model (비동질성 Markov 모형에 의한 시간강수량 모의발생을 이용한 IDF 곡선의 유도)

  • Moon, Young-Il;Choi, Byung-Kyu;Oh, Tae-Suk
    • 한국방재학회:학술대회논문집
    • /
    • 2008.02a
    • /
    • pp.501-504
    • /
    • 2008
  • A non-homogeneous markov model which is able to simulate hourly rainfall series is developed for estimating reliable hydrological variables. The proposed approach is applied to simulate hourly rainfall series in Korea. The simulated rainfall is used to estimate the design rainfall and compared to observations in terms of reproducing underlying distributions of the data to assure model's validation. The model shows that the simulated rainfall series reproduce a similar statistical attribute with observations, and expecially maximum value is gradually increased as number of simulation increase.

  • PDF