Search | Korea Science

Simultaneous outlier detection and variable selection via difference-based regression model and stochastic search variable selection

Park, Jong Suk;Park, Chun Gun;Lee, Kyeong Eun
- Communications for Statistical Applications and Methods
- /
- v.26 no.2
- /
- pp.149-161
- /
- 2019
In this article, we suggest the following approaches to simultaneous variable selection and outlier detection. First, we determine possible candidates for outliers using properties of an intercept estimator in a difference-based regression model, and the information of outliers is reflected in the multiple regression model adding mean shift parameters. Second, we select the best model from the model including the outlier candidates as predictors using stochastic search variable selection. Finally, we evaluate our method using simulations and real data analysis to yield promising results. In addition, we need to develop our method to make robust estimates. We will also to the nonparametric regression model for simultaneous outlier detection and variable selection.
https://doi.org/10.29220/CSAM.2019.26.2.149 인용 PDF KSCI

On the Residual Empirical Distribution Function of Stochastic Regression with Correlated Errors

Zakeri, Issa-Fakhre;Lee, Sangyeol
- Communications for Statistical Applications and Methods
- /
- v.8 no.1
- /
- pp.291-297
- /
- 2001
For a stochastic regression model in which the errors are assumed to form a stationary linear process, we show that the difference between the empirical distribution functions of the errors and the estimates of those errors converges uniformly in probability to zero at the rate of $o_{p}$ ( $n^{-}$$\frac{1}{2}$) as the sample size n increases.
PDF

A Study on Stochastic Estimation of Monthly Runoff by Multiple Regression Analysis (다중회귀분석에 의한 하천 월 유출량의 추계학적 추정에 관한 연구)

김태철;정하우
- Magazine of the Korean Society of Agricultural Engineers
- /
- v.22 no.3
- /
- pp.75-87
- /
- 1980
Most hydro]ogic phenomena are the complex and organic products of multiple causations like climatic and hydro-geological factors. A certain significant correlation on the run-off in river basin would be expected and foreseen in advance, and the effect of each these causual and associated factors (independant variables; present-month rainfall, previous-month run-off, evapotranspiration and relative humidity etc.) upon present-month run-off(dependent variable) may be determined by multiple regression analysis. Functions between independant and dependant variables should be treated repeatedly until satisfactory and optimal combination of independant variables can be obtained. Reliability of the estimated function should be tested according to the result of statistical criterion such as analysis of variance, coefficient of determination and significance-test of regression coefficients before first estimated multiple regression model in historical sequence is determined. But some error between observed and estimated run-off is still there. The error arises because the model used is an inadequate description of the system and because the data constituting the record represent only a sample from a population of monthly discharge observation, so that estimates of model parameter will be subject to sampling errors. Since this error which is a deviation from multiple regression plane cannot be explained by first estimated multiple regression equation, it can be considered as a random error governed by law of chance in nature. This unexplained variance by multiple regression equation can be solved by stochastic approach, that is, random error can be stochastically simulated by multiplying random normal variate to standard error of estimate. Finally hybrid model on estimation of monthly run-off in nonhistorical sequence can be determined by combining the determistic component of multiple regression equation and the stochastic component of random errors. Monthly run-off in Naju station in Yong-San river basin is estimated by multiple regression model and hybrid model. And some comparisons between observed and estimated run-off and between multiple regression model and already-existing estimation methods such as Gajiyama formula, tank model and Thomas-Fiering model are done. The results are as follows. (1) The optimal function to estimate monthly run-off in historical sequence is multiple linear regression equation in overall-month unit, that is; Qn=0.788Pn+0.130Qn-1-0.273En-0.1 About 85% of total variance of monthly runoff can be explained by multiple linear regression equation and its coefficient of determination (R2) is 0.843. This means we can estimate monthly runoff in historical sequence highly significantly with short data of observation by above mentioned equation. (2) The optimal function to estimate monthly runoff in nonhistorical sequence is hybrid model combined with multiple linear regression equation in overall-month unit and stochastic component, that is; Qn=0. 788Pn+0. l30Qn-1-0. 273En-0. 10+Sy.t The rest 15% of unexplained variance of monthly runoff can be explained by addition of stochastic process and a bit more reliable results of statistical characteristics of monthly runoff in non-historical sequence are derived. This estimated monthly runoff in non-historical sequence shows up the extraordinary value (maximum, minimum value) which is not appeared in the observed runoff as a random component. (3) "Frequency best fit coefficient" (R2f) of multiple linear regression equation is 0.847 which is the same value as Gaijyama's one. This implies that multiple linear regression equation and Gajiyama formula are theoretically rather reasonable functions.
PDF

INFERENCE AFTER STOCHASTIC REGRESSION IMPUTATION UNDER RESPONSE MODEL

Kim, Jae-Kwang;Kim, Yong-Dai
- Journal of the Korean Statistical Society
- /
- v.32 no.2
- /
- pp.103-119
- /
- 2003
Properties of stochastic regression imputation are discussed under the uniform within-cell response model. Variance estimator is proposed and its asymptotic properties are discussed. A limited simulation is also presented.
PDF KSCI

Statistical Methods to Control Response Bias in Nursing Activity Surveys (간호활동시간 조사 시 응답편이 통제를 위한 통계적 접근 방안)

Lim, Ji-Young;Park, Chang-Gi
- Journal of Korean Academy of Nursing
- /
- v.42 no.1
- /
- pp.48-55
- /
- 2012
Purpose: The aim of this study was to compare statistical methods to control response bias in nursing activity surveys. Methods: Data were collected at a medical unit of a general hospital. The number of nursing activities and consumed activity time were measured using self-report questionnaires. Descriptive statistics were used to identify general characteristics of the units. Average, Z-standardization, gamma regression, finite mixture model, and stochastic frontier model were adopted to estimate true activity time controlling for response bias. Results: The nursing activity time data were highly skewed and had non-normal distributions. Among the 4 different methods, only gamma regression and stochastic frontier model controlled response bias effectively and the estimated total nursing activity time did not exceeded total work time. However, in gamma regression, estimated total nursing activity time was too small to use in real clinical settings. Thus stochastic frontier model was the most appropriate method to control response bias when compared with the other methods. Conclusion: According to these results, we recommend the use of a stochastic frontier model to estimate true nursing activity time when using self-report surveys.
https://doi.org/10.4040/jkan.2012.42.1.48 인용 PDF KSCI

A Dynamic-Stochastic Model for Air Pollutant Concentration (大氣汚染濃度에 관한 動的確率모델)

김해경
- Journal of Korean Society for Atmospheric Environment
- /
- v.7 no.3
- /
- pp.156-168
- /
- 1991
The purpose of this paper is to develop a stochastic model for daily sulphur dioxide $(SO_2)$ concentrations prediction in urban area (Seoul). For this, the influence of the meteorological parameters on the $SO_2$ concentrations is investigated by a statistical analysis of the 24-hr averaged $SO_2$ levels of Seoul area during 1989 $\sim$ 1990. The annual fluctuations of the regression trend, periodicity and dependence of the daily concentration are also analyzed. Based on these, a nonlinear regression transfer function model for the prediction of daily $SO_2$ concentrations is derived. A statistical procedure for using the model to predict the concentration level is also proposed.
PDF

A Stochastic Model for Air Pollutant Concentration (大氣汚染濃度에 관한 確率모델)

김해경
- Journal of Korean Society for Atmospheric Environment
- /
- v.7 no.2
- /
- pp.127-136
- /
- 1991
This paper is concerned with the development and application of a stochastic model for daily sulphur dioxide $(SO_2)$ concentrations in urban area (Seoul). For this, the characteristics of the regression trend, periodicity and dependence of the daily $SO_2$ concentration are investigated by a statistisical analysis of the daily average $SO_2$ values measured in Seoul area during 1989 $\sim$ 1990. Based on these, nonlinear regression time series model for the prediction of daily $SO_2$ concentrations is derived. A statistical procedure for using the model to predict the concentration level is also proposed.
PDF

Design of the optimal inputs for parameter estimation in linear dynamic systems (선형계통의 파라미터 추정을 위한 최적 입력의 설계)

양흥석;이석원;정찬수
- 제어로봇시스템학회:학술대회논문집
- /
- 1986.10a
- /
- pp.73-77
- /
- 1986
Optimal input design problem for linear regression model with constrained output variance has been considered. It is shown that the optimal input signal for the linear regression model can also be realized as an ARMA process. Monte-Carlo simulation results show that the optimal stochastic input leads to comparatively better estimation accuracy than white input signal.
PDF

Stochastic Fatigue Life Assesment based on Bayesian-inference (베이지언 추론에 기반한 확률론적 피로수명 평가)

Park, Myong-Jin;Kim, Yooil
- Journal of the Society of Naval Architects of Korea
- /
- v.56 no.2
- /
- pp.161-167
- /
- 2019
In general, fatigue analysis is performed by using deterministic model to estimate the optimal parameters. However, the deterministic model is difficult to clearly describe the physical phenomena of fatigue failure that contains many uncertainty factors. With regard to this, efforts have been made in this research to compare with the deterministic model and the stochastic models. Firstly, One deterministic S-N curve was derived from ordinary least squares technique and two P-S-N curves were estimated through Bayesian-linear regression model and Markov-Chain Monte Carlo simulation. Secondly, the distribution of Long-term fatigue damage and fatigue life were predicted by using the parameters obtained from the three methodologies and the long-term stress distribution.
https://doi.org/10.3744/SNAK.2019.56.2.161 인용 PDF KSCI

Herd behavior and volatility in financial markets

Park, Beum-Jo
- Journal of the Korean Data and Information Science Society
- /
- v.22 no.6
- /
- pp.1199-1215
- /
- 2011
Relaxing an unrealistic assumption of a representative percolation model, this paper demonstrates that herd behavior leads to a high increase in volatility but not trading volume, in contrast with information flows that give rise to increases in both volatility and trading volume. Although detecting herd behavior has posed a great challenge due to its empirical difficulty, this paper proposes a new methodology for detecting trading days with herding. Furthermore, this paper suggests a herd-behavior-stochastic-volatility model, which accounts for herding in financial markets. Strong evidence in favor of the model specification over the standard stochastic volatility model is based on empirical application with high frequency data in the Korean equity market, strongly supporting the intuition that herd behavior causes excess volatility. In addition, this research indicates that strong persistence in volatility, which is a prevalent feature in financial markets, is likely attributed to herd behavior rather than news.
PDF KSCI

Search Result 68, Processing Time 0.026 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)