• Title/Summary/Keyword: 강건모형

Search Result 84, Processing Time 0.026 seconds

Regression diagnostics for response transformations in a partial linear model (부분선형모형에서 반응변수변환을 위한 회귀진단)

  • Seo, Han Son;Yoon, Min
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.1
    • /
    • pp.33-39
    • /
    • 2013
  • In the transformation of response variable in partial linear models outliers can cause a bad effect on estimating the transformation parameter, just as in the linear models. To solve this problem the processes of estimating transformation parameter and detecting outliers are needed, but have difficulties to be performed due to the arbitrariness of the nonparametric function included in the partial linear model. In this study, through the estimation of nonparametric function and outlier detection methods such as a sequential test and a maximum trimmed likelihood estimation, processes for transforming response variable robust to outliers in partial linear models are suggested. The proposed methods are verified and compared their effectiveness by simulation study and examples.

Fast robust variable selection using VIF regression in large datasets (대형 데이터에서 VIF회귀를 이용한 신속 강건 변수선택법)

  • Seo, Han Son
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.4
    • /
    • pp.463-473
    • /
    • 2018
  • Variable selection algorithms for linear regression models of large data are considered. Many algorithms are proposed focusing on the speed and the robustness of algorithms. Among them variance inflation factor (VIF) regression is fast and accurate due to the use of a streamwise regression approach. But a VIF regression is susceptible to outliers because it estimates a model by a least-square method. A robust criterion using a weighted estimator has been proposed for the robustness of algorithm; in addition, a robust VIF regression has also been proposed for the same purpose. In this article a fast and robust variable selection method is suggested via a VIF regression with detecting and removing potential outliers. A simulation study and an analysis of a dataset are conducted to compare the suggested method with other methods.

Robust estimation of sparse vector autoregressive models (희박 벡터 자기 회귀 모형의 로버스트 추정)

  • Kim, Dongyeong;Baek, Changryong
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.5
    • /
    • pp.631-644
    • /
    • 2022
  • This paper considers robust estimation of the sparse vector autoregressive model (sVAR) useful in high-dimensional time series analysis. First, we generalize the result of Xu et al. (2008) that the adaptive lasso indeed has robustness in sVAR as well. However, adaptive lasso method in sVAR performs poorly as the number and sizes of outliers increases. Therefore, we propose new robust estimation methods for sVAR based on least absolute deviation (LAD) and Huber estimation. Our simulation results show that our proposed methods provide more accurate estimation in turn showed better forecasting performance when outliers exist. In addition, we applied our proposed methods to power usage data and confirmed that there are unignorable outliers and robust estimation taking such outliers into account improves forecasting.

An Outlier Detection Method in Penalized Spline Regression Models (벌점 스플라인 회귀모형에서의 이상치 탐지방법)

  • Seo, Han Son;Song, Ji Eun;Yoon, Min
    • The Korean Journal of Applied Statistics
    • /
    • v.26 no.4
    • /
    • pp.687-696
    • /
    • 2013
  • The detection and the examination of outliers are important parts of data analysis because some outliers in the data may have a detrimental effect on statistical analysis. Outlier detection methods have been discussed by many authors. In this article, we propose to apply Hadi and Simonoff's (1993) method to penalized spline a regression model to detect multiple outliers. Simulated data sets and real data sets are used to illustrate and compare the proposed procedure to a penalized spline regression and a robust penalized spline regression.

Robust Extrapolation Design Criteria under the Uncertainty of Model and Error Structure (모형과 오차구조의 불확실성하에서의 강건 외삽 실험설계)

  • Jang, Dae-Heung;Kim, Youngil
    • The Korean Journal of Applied Statistics
    • /
    • v.28 no.3
    • /
    • pp.561-571
    • /
    • 2015
  • When we consider an optimal design to predict the response corresponding to the point outside the design region, we are extremely careful about choosing the design criteria for selecting the support points. The assumed model and its accompanying error structure should be assumed to extend beyond the design region for the selected design criteria to be valid. Thus, we modify the existing design criteria such as extrapolation-optimality to be suited to those situations. We propose some maximin approaches in this paper. Simple and quadratic regression models are tested to find the basic characteristics of such maximin approaches. Some main findings are discussed in the conclusion.

A Discrete Feature Vector for Endpoint Detection of Speech with Hidden Markov Model (숨은마코프모형을 이용하는 음성 끝점 검출을 위한 이산 특징벡터)

  • Lee, Jei-Ky;Oh, Chang-Hyuck
    • The Korean Journal of Applied Statistics
    • /
    • v.21 no.6
    • /
    • pp.959-967
    • /
    • 2008
  • The purpose of this paper is to suggest a discrete feature vector, robust in various levels of noisy environment and inexpensive in computation, for detection of speech segments and is to show such properties of the feature with real speech data. The suggested feature is one dimensional vector which represents slope of short term energies and is discretized into three values to reduce computational burden of computations in HMM. In experiments with speech data, the method with the suggested feature vector showed good performance even in noisy environments.

Domain-agnostic Pre-trained Language Model for Tabular Data (도메인 변화에 강건한 사전학습 표 언어모형)

  • Cho, Sanghyun;Choi, Jae-Hoon;Kwon, Hyuk-Chul
    • Annual Conference on Human and Language Technology
    • /
    • 2021.10a
    • /
    • pp.346-349
    • /
    • 2021
  • 표 기계독해에서는 도메인에 따라 언어모형에 필요한 지식이나 표의 구조적인 형태가 변화하면서 텍스트 데이터에 비해서 더 큰 성능 하락을 보인다. 본 논문에서는 표 기계독해에서 이러한 도메인의 변화에 강건한 사전학습 표 언어모형 구축을 위한 의미있는 표 데이터 선별을 통한 사전학습 데이터 구축 방법과 적대적인 학습 방법을 제안한다. 추출한 표 데이터에서 구조적인 정보가 없이 웹 문서의 장식을 위해 사용되는 표 데이터 검출을 위해 Heuristic을 통한 규칙을 정의하여 HEAD 데이터를 식별하고 표 데이터를 선별하는 방법을 적용했으며, 구조적인 정보를 가지는 일반적인 표 데이터와 엔티티에 대한 지식 정보를 가지는 인포박스 데이터간의 적대적 학습 방법을 적용했다. 기존의 정제되지 않는 데이터로 학습했을 때와 비교하여 데이터를 정제하였을 때, KorQuAD 표 데이터에서 f1 3.45, EM 4.14가 증가하였으며, Spec 표 질의응답 데이터에서 정제하지 않았을 때와 비교하여 f1 19.38, EM 4.22가 증가한 성능을 보였다.

  • PDF

The Maximin Robust Design for the Uncertainty of Parameters of Michaelis-Menten Model (Michaelis-Menten 모형의 모수의 불확실성에 대한 Maximin 타입의 강건 실험)

  • Kim, Youngil;Jang, Dae-Heung;Yi, Seongbaek
    • The Korean Journal of Applied Statistics
    • /
    • v.27 no.7
    • /
    • pp.1269-1278
    • /
    • 2014
  • Despite the D-optimality criterion becomes very popular in designing an experiment for nonlinear models because of theoretical foundations it provides, it is very critical that the criterion depends on the unknown parameters of the nonlinear model. But some nonlinear models turned out to be partially nonlinear in sense that the optimal design depends on the subset of parameters only. It was a strong belief that the maximin approach to find a robust design to protect against the uncertainty of parameters is not guaranteed to be successful in nonlinear models. But the maximin approach could be a success for the partial nonlinear model, because often the optimal design depends on only one unknown value of parameter, easier to handle than the full parameters. We deal with maximin approach for Michaelis-Menten model with respect to D- and $D_s$-optimality.

Robust Designs of the Second Order Response Surface Model in a Mixture (2차 혼합물 반응표면 모형에서의 강건한 실험 설계)

  • Lim, Yong-Bin
    • The Korean Journal of Applied Statistics
    • /
    • v.20 no.2
    • /
    • pp.267-280
    • /
    • 2007
  • Various single-valued design optimality criteria such as D-, G-, and V-optimality are used often in constructing optimal experimental designs for mixture experiments in a constrained region R where lower and upper bound constraints are imposed on the ingredients proportions. Even though they are optimal in the strict sense of particular optimality criterion used, it is known that their performance is unsatisfactory with respect to the prediction capability over a constrained region. (Vining et at., 1993; Khuri et at., 1999) We assume the quadratic polynomial model as the mixture response surface model and are interested in finding efficient designs in the constrained design space for a mixture. In this paper, we make an expanded list of candidate design points by adding interior points to the extreme vertices, edge midpoints, constrained face centroids and the overall centroid. Then, we want to propose a robust design with respect to D-optimality, G-optimality, V-optimality and distance-based U-optimality. Comparing scaled prediction variance quantile plots (SPVQP) of robust designs with that of recommended designs in Khuri et al. (1999) and Vining et al. (1993) in the well-known examples of a four-component fertilizer experiment as well as McLean and Anderson's Railroad Flare Experiment, robust designs turned out to be superior to those recommended designs.

Doubly-robust Q-estimation in observational studies with high-dimensional covariates (고차원 관측자료에서의 Q-학습 모형에 대한 이중강건성 연구)

  • Lee, Hyobeen;Kim, Yeji;Cho, Hyungjun;Choi, Sangbum
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.3
    • /
    • pp.309-327
    • /
    • 2021
  • Dynamic treatment regimes (DTRs) are decision-making rules designed to provide personalized treatment to individuals in multi-stage randomized trials. Unlike classical methods, in which all individuals are prescribed the same type of treatment, DTRs prescribe patient-tailored treatments which take into account individual characteristics that may change over time. The Q-learning method, one of regression-based algorithms to figure out optimal treatment rules, becomes more popular as it can be easily implemented. However, the performance of the Q-learning algorithm heavily relies on the correct specification of the Q-function for response, especially in observational studies. In this article, we examine a number of double-robust weighted least-squares estimating methods for Q-learning in high-dimensional settings, where treatment models for propensity score and penalization for sparse estimation are also investigated. We further consider flexible ensemble machine learning methods for the treatment model to achieve double-robustness, so that optimal decision rule can be correctly estimated as long as at least one of the outcome model or treatment model is correct. Extensive simulation studies show that the proposed methods work well with practical sample sizes. The practical utility of the proposed methods is proven with real data example.