• 제목/요약/키워드: Cross-Validation Approach

검색결과 129건 처리시간 0.022초

후보점과 대표점 교차검증에 의한 순차적 실험계획 (Candidate Points and Representative Cross-Validation Approach for Sequential Sampling)

  • 김승원;정재준;이태희
    • 대한기계학회논문집A
    • /
    • 제31권1호
    • /
    • pp.55-61
    • /
    • 2007
  • Recently simulation model becomes an essential tool for analysis and design of a system but it is often expensive and time consuming as it becomes complicate to achieve reliable results. Therefore, high-fidelity simulation model needs to be replaced by an approximate model, the so-called metamodel. Metamodeling techniques include 3 components of sampling, metamodel and validation. Cross-validation approach has been proposed to provide sequnatially new sample point based on cross-validation error but it is very expensive because cross-validation must be evaluated at each stage. To enhance the cross-validation of metamodel, sequential sampling method using candidate points and representative cross-validation is proposed in this paper. The candidate and representative cross-validation approach of sequential sampling is illustrated for two-dimensional domain. To verify the performance of the suggested sampling technique, we compare the accuracy of the metamodels for various mathematical functions with that obtained by conventional sequential sampling strategies such as maximum distance, mean squared error, and maximum entropy sequential samplings. Through this research we team that the proposed approach is computationally inexpensive and provides good prediction performance.

다학제적 접근을 고려한 과학기술논문 상호검증 방법 (Cross-Validation method for Science and Technology Research Paper considering Interdisciplinary Approach)

  • 한영신
    • 공학교육연구
    • /
    • 제18권5호
    • /
    • pp.3-10
    • /
    • 2015
  • Researchers in science and technology has broadened the scope of research in order to solve complex problems, academic exchange has also been actively carried out. If the paper which is a mean of interdisciplinary approach has a limited term and the formula, it can act as barriers to access for many researchers in various fields. This paper proposes a cross-validation method for eliminating documentary barriers based on discrete event system formalism. We expect that our proposed method will improve a cross-validation considering researchers in another fields.

A convenient approach for penalty parameter selection in robust lasso regression

  • Kim, Jongyoung;Lee, Seokho
    • Communications for Statistical Applications and Methods
    • /
    • 제24권6호
    • /
    • pp.651-662
    • /
    • 2017
  • We propose an alternative procedure to select penalty parameter in $L_1$ penalized robust regression. This procedure is based on marginalization of prior distribution over the penalty parameter. Thus, resulting objective function does not include the penalty parameter due to marginalizing it out. In addition, its estimating algorithm automatically chooses a penalty parameter using the previous estimate of regression coefficients. The proposed approach bypasses cross validation as well as saves computing time. Variable-wise penalization also performs best in prediction and variable selection perspectives. Numerical studies using simulation data demonstrate the performance of our proposals. The proposed methods are applied to Boston housing data. Through simulation study and real data application we demonstrate that our proposals are competitive to or much better than cross-validation in prediction, variable selection, and computing time perspectives.

Multinomial Kernel Logistic Regression via Bound Optimization Approach

  • Shim, Joo-Yong;Hong, Dug-Hun;Kim, Dal-Ho;Hwang, Chang-Ha
    • Communications for Statistical Applications and Methods
    • /
    • 제14권3호
    • /
    • pp.507-516
    • /
    • 2007
  • Multinomial logistic regression is probably the most popular representative of probabilistic discriminative classifiers for multiclass classification problems. In this paper, a kernel variant of multinomial logistic regression is proposed by combining a Newton's method with a bound optimization approach. This formulation allows us to apply highly efficient approximation methods that effectively overcomes conceptual and numerical problems of standard multiclass kernel classifiers. We also provide the approximate cross validation (ACV) method for choosing the hyperparameters which affect the performance of the proposed approach. Experimental results are then presented to indicate the performance of the proposed procedure.

Semi-supervised learning using similarity and dissimilarity

  • Seok, Kyung-Ha
    • Journal of the Korean Data and Information Science Society
    • /
    • 제22권1호
    • /
    • pp.99-105
    • /
    • 2011
  • We propose a semi-supervised learning algorithm based on a form of regularization that incorporates similarity and dissimilarity penalty terms. Our approach uses a graph-based encoding of similarity and dissimilarity. We also present a model-selection method which employs cross-validation techniques to choose hyperparameters which affect the performance of the proposed method. Simulations using two types of dat sets demonstrate that the proposed method is promising.

Semisupervised support vector quantile regression

  • Seok, Kyungha
    • Journal of the Korean Data and Information Science Society
    • /
    • 제26권2호
    • /
    • pp.517-524
    • /
    • 2015
  • Unlabeled examples are easier and less expensive to be obtained than labeled examples. In this paper semisupervised approach is used to utilize such examples in an effort to enhance the predictive performance of nonlinear quantile regression problems. We propose a semisupervised quantile regression method named semisupervised support vector quantile regression, which is based on support vector machine. A generalized approximate cross validation method is used to choose the hyper-parameters that affect the performance of estimator. The experimental results confirm the successful performance of the proposed S2SVQR.

Semiparametric Regression Splines in Matched Case-Control Studies

  • Kim, In-Young;Carroll, Raymond J.;Cohen, Noah
    • 한국통계학회:학술대회논문집
    • /
    • 한국통계학회 2003년도 춘계 학술발표회 논문집
    • /
    • pp.167-170
    • /
    • 2003
  • We develop semiparametric methods for matched case-control studies using regression splines. Three methods are developed: an approximate crossvalidation scheme to estimate the smoothing parameter inherent in regression splines, as well as Monte Carlo Expectation Maximization (MCEM) and Bayesian methods to fit the regression spline model. We compare the approximate cross-validation approach, MCEM and Bayesian approaches using simulation, showing that they appear approximately equally efficient, with the approximate cross-validation method being computationally the most convenient. An example from equine epidemiology that motivated the work is used to demonstrate our approaches.

  • PDF

자동기계학습 TPOT 기반 저수위 예측 정확도 향상을 위한 시계열 교차검증 기법 연구 (A Study on Time Series Cross-Validation Techniques for Enhancing the Accuracy of Reservoir Water Level Prediction Using Automated Machine Learning TPOT)

  • 배주현;박운지;이서로;박태선;박상빈;김종건;임경재
    • 한국농공학회논문집
    • /
    • 제66권1호
    • /
    • pp.1-13
    • /
    • 2024
  • This study assessed the efficacy of improving the accuracy of reservoir water level prediction models by employing automated machine learning models and efficient cross-validation methods for time-series data. Considering the inherent complexity and non-linearity of time-series data related to reservoir water levels, we proposed an optimized approach for model selection and training. The performance of twelve models was evaluated for the Obong Reservoir in Gangneung, Gangwon Province, using the TPOT (Tree-based Pipeline Optimization Tool) and four cross-validation methods, which led to the determination of the optimal pipeline model. The pipeline model consisting of Extra Tree, Stacking Ridge Regression, and Simple Ridge Regression showed outstanding predictive performance for both training and test data, with an R2 (Coefficient of determination) and NSE (Nash-Sutcliffe Efficiency) exceeding 0.93. On the other hand, for predictions of water levels 12 hours later, the pipeline model selected through time-series split cross-validation accurately captured the change pattern of time-series water level data during the test period, with an NSE exceeding 0.99. The methodology proposed in this study is expected to greatly contribute to the efficient generation of reservoir water level predictions in regions with high rainfall variability.

Machine Learning Based Hybrid Approach to Detect Intrusion in Cyber Communication

  • Neha Pathak;Bobby Sharma
    • International Journal of Computer Science & Network Security
    • /
    • 제23권11호
    • /
    • pp.190-194
    • /
    • 2023
  • By looking the importance of communication, data delivery and access in various sectors including governmental, business and individual for any kind of data, it becomes mandatory to identify faults and flaws during cyber communication. To protect personal, governmental and business data from being misused from numerous advanced attacks, there is the need of cyber security. The information security provides massive protection to both the host machine as well as network. The learning methods are used for analyzing as well as preventing various attacks. Machine learning is one of the branch of Artificial Intelligence that plays a potential learning techniques to detect the cyber-attacks. In the proposed methodology, the Decision Tree (DT) which is also a kind of supervised learning model, is combined with the different cross-validation method to determine the accuracy and the execution time to identify the cyber-attacks from a very recent dataset of different network attack activities of network traffic in the UNSW-NB15 dataset. It is a hybrid method in which different types of attributes including Gini Index and Entropy of DT model has been implemented separately to identify the most accurate procedure to detect intrusion with respect to the execution time. The different DT methodologies including DT using Gini Index, DT using train-split method and DT using information entropy along with their respective subdivision such as using K-Fold validation, using Stratified K-Fold validation are implemented.

Semiparametric support vector machine for accelerated failure time model

  • Hwang, Chang-Ha;Shim, Joo-Yong
    • Journal of the Korean Data and Information Science Society
    • /
    • 제21권4호
    • /
    • pp.765-775
    • /
    • 2010
  • For the accelerated failure time (AFT) model a lot of effort has been devoted to develop effective estimation methods. AFT model assumes a linear relationship between the logarithm of event time and covariates. In this paper we propose a semiparametric support vector machine to consider situations where the functional form of the effect of one or more covariates is unknown. The proposed estimating equation can be computed by a quadratic programming and a linear equation. We study the effect of several covariates on a censored response variable with an unknown probability distribution. We also provide a generalized approximate cross-validation method for choosing the hyper-parameters which affect the performance of the proposed approach. The proposed method is evaluated through simulations using the artificial example.