• Title/Summary/Keyword: Regression estimators

Search Result 226, Processing Time 0.024 seconds

On inference of multivariate means under ranked set sampling

  • Rochani, Haresh;Linder, Daniel F.;Samawi, Hani;Panchal, Viral
    • Communications for Statistical Applications and Methods
    • /
    • v.25 no.1
    • /
    • pp.1-13
    • /
    • 2018
  • In many studies, a researcher attempts to describe a population where units are measured for multiple outcomes, or responses. In this paper, we present an efficient procedure based on ranked set sampling to estimate and perform hypothesis testing on a multivariate mean. The method is based on ranking on an auxiliary covariate, which is assumed to be correlated with the multivariate response, in order to improve the efficiency of the estimation. We showed that the proposed estimators developed under this sampling scheme are unbiased, have smaller variance in the multivariate sense, and are asymptotically Gaussian. We also demonstrated that the efficiency of multivariate regression estimator can be improved by using Ranked set sampling. A bootstrap routine is developed in the statistical software R to perform inference when the sample size is small. We use a simulation study to investigate the performance of the method under known conditions and apply the method to the biomarker data collected in China Health and Nutrition Survey (CHNS 2009) data.

Effect of outliers on the variable selection by the regularized regression

  • Jeong, Junho;Kim, Choongrak
    • Communications for Statistical Applications and Methods
    • /
    • v.25 no.2
    • /
    • pp.235-243
    • /
    • 2018
  • Many studies exist on the influence of one or few observations on estimators in a variety of statistical models under the "large n, small p" setup; however, diagnostic issues in the regression models have been rarely studied in a high dimensional setup. In the high dimensional data, the influence of observations is more serious because the sample size n is significantly less than the number variables p. Here, we investigate the influence of observations on the least absolute shrinkage and selection operator (LASSO) estimates, suggested by Tibshirani (Journal of the Royal Statistical Society, Series B, 73, 273-282, 1996), and the influence of observations on selected variables by the LASSO in the high dimensional setup. We also derived an analytic expression for the influence of the k observation on LASSO estimates in simple linear regression. Numerical studies based on artificial data and real data are done for illustration. Numerical results showed that the influence of observations on the LASSO estimates and the selected variables by the LASSO in the high dimensional setup is more severe than that in the usual "large n, small p" setup.

Exploring modern machine learning methods to improve causal-effect estimation

  • Kim, Yeji;Choi, Taehwa;Choi, Sangbum
    • Communications for Statistical Applications and Methods
    • /
    • v.29 no.2
    • /
    • pp.177-191
    • /
    • 2022
  • This paper addresses the use of machine learning methods for causal estimation of treatment effects from observational data. Even though conducting randomized experimental trials is a gold standard to reveal potential causal relationships, observational study is another rich source for investigation of exposure effects, for example, in the research of comparative effectiveness and safety of treatments, where the causal effect can be identified if covariates contain all confounding variables. In this context, statistical regression models for the expected outcome and the probability of treatment are often imposed, which can be combined in a clever way to yield more efficient and robust causal estimators. Recently, targeted maximum likelihood estimation and causal random forest is proposed and extensively studied for the use of data-adaptive regression in estimation of causal inference parameters. Machine learning methods are a natural choice in these settings to improve the quality of the final estimate of the treatment effect. We explore how we can adapt the design and training of several machine learning algorithms for causal inference and study their finite-sample performance through simulation experiments under various scenarios. Application to the percutaneous coronary intervention (PCI) data shows that these adaptations can improve simple linear regression-based methods.

A comparison study of inverse censoring probability weighting in censored regression (중도절단 회귀모형에서 역절단확률가중 방법 간의 비교연구)

  • Shin, Jungmin;Kim, Hyungwoo;Shin, Seung Jun
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.6
    • /
    • pp.957-968
    • /
    • 2021
  • Inverse censoring probability weighting (ICPW) is a popular technique in survival data analysis. In applications of the ICPW technique such as the censored regression, it is crucial to accurately estimate the censoring probability. A simulation study is undertaken in this article to see how censoring probability estimate influences model performance in censored regression using the ICPW scheme. We compare three censoring probability estimators, including Kaplan-Meier (KM) estimator, Cox proportional hazard model estimator, and local KM estimator. For the local KM estimator, we propose to reduce the predictor dimension to avoid the curse of dimensionality and consider two popular dimension reduction tools: principal component analysis and sliced inverse regression. Finally, we found that the Cox proportional hazard model estimator shows the best performance as a censoring probability estimator in both mean and median censored regressions.

Stability of Construction Cost-variability Factor Rankings from Professionals' Perspective: Evidence from Dar es Salaam -Tanzania

  • Shabani, Neema;Mselle, Justine;Sanga, Samwel Alananga;Kanuti, Arbogasti Isidori
    • Journal of Construction Engineering and Project Management
    • /
    • v.8 no.2
    • /
    • pp.17-33
    • /
    • 2018
  • This study investigates the stability of professionals' cost variability factor-rankings across different levels of cost-variability and response scenarios. Descriptive statistics are used to examine the stability of factor-ranking for 20 cost variability factors and a Multinomial Logistic (MNL) regression model was implemented to examine the stability of cost variability factors across three cost variability levels. The finding on the descriptive statistics indicated that professionals' factors-rankings are stable only for external factors. The MNL regression results on factor-stability suggested that 8 out of the 20 evaluated factors were unstable determinant of lower cost variability levels. These factors are "risk associated with the project", "personal bias and poor professionalism of the estimators", "limited time available to complete the project", "lack of skills and experience by estimator" "geographical location of projects", "incomplete & rush designs for estimate", "unforeseen or unexpected site constraints", "high class bidders for the contractors". Similarly lack of experience and large size projects were observed to be unstable as well. These observations suggest that professionals' view on pre-tender cost variability factor-ranking yields unstable factor rankings hence should not be relied upon as the only mechanisms to mitigate cost related risks in construction projects.

Robust tests for heteroscedasticity using outlier detection methods (이상치 탐지법을 이용한 강건 이분산 검정)

  • Seo, Han Son;Yoon, Min
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.3
    • /
    • pp.399-408
    • /
    • 2016
  • There is a need to detect heteroscedasticity in a regression analysis; however, it invalidates the standard inference procedure. The diagnostics on heteroscedasticity may be distorted when both outliers and heteroscedasticity exist. Available heteroscedasticity detection methods in the presence of outliers usually use robust estimators or separating outliers from the data. Several approaches have been suggested to identify outliers in the heteroscedasticity problem. In this article conventional tests on heteroscedasticity are modified by using a sequential outlier detection methods to separate outliers from contaminated data. The performance of the proposed method is compared with original tests by a Monte Carlo study and examples.

Comparison study on kernel type estimators of discontinuous log-variance (불연속 로그분산함수의 커널추정량들의 비교 연구)

  • Huh, Jib
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.1
    • /
    • pp.87-95
    • /
    • 2014
  • In the regression model, Kang and Huh (2006) studied the estimation of the discontinuous variance function using the Nadaraya-Watson estimator with the squared residuals. The local linear estimator of the log-variance function, which may have the whole real number, was proposed by Huh (2013) based on the kernel weighted local-likelihood of the ${\chi}^2$-distribution. Chen et al. (2009) estimated the continuous variance function using the local linear fit with the log-squared residuals. In this paper, the estimator of the discontinuous log-variance function itself or its derivative using Chen et al. (2009)'s estimator. Numerical works investigate the performances of the estimators with simulated examples.

Effects of Exchange Rate, GDP, ODI on Export to the East Asia: Application the Panel FMOLS Approach (환율, GDP, 해외직접투자가 한국의 대동아시아 수출에 미치는 영향: 패널 FMOLS기법의 적용)

  • Kim, Chang-Beom
    • International Commerce and Information Review
    • /
    • v.14 no.3
    • /
    • pp.307-322
    • /
    • 2012
  • The purpose of this paper is to examine determinants of export to the East Asia region, using panel unit root, panel cointegration framework, panel VECM (vector error correction model), panel FMOLS (fully modified OLS). Different panel unit root tests confirm that the data series are integrated processes with unit roots. When applying cointegration tests to long-run effect for aggregate panel data, a primary concern is to construct the estimators in a way that does not constrain the transitional dynamics to be similar among different countries of the panel. The regression equations are estimated by various panel cointegration estimators. The panel data causality results reveal that exchange rates has unidirectional effects on export and GDP, and there exists bidirectional causality between export and GDP. Also, the results from the panel FMOLS tests overwhelmingly reject the null hypothesis of zero coefficient. The panel cointegrating vectors show that the export has positive relationship with the GDP and ODI (overseas direct investment).

  • PDF

Comparison of Spatial Small Area Estimators Based on Neighborhood Information Systems (이웃정보시스템을 이용한 공간 소지역 추정량 비교)

  • Kim, Jeong-Suk;Hwang, Hee-Jin;Shin, Key-Il
    • The Korean Journal of Applied Statistics
    • /
    • v.21 no.5
    • /
    • pp.855-866
    • /
    • 2008
  • Recently many small area estimation methods using the lattice data analysis have been studied and known that they have good performances. In the case of using the lattice data which is mainly used for small area estimation, the choice of better neighborhood information system is very important for the efficiency of the data analysis. Recently Lee and Shin (2008) compared and analyzed some neighborhood information systems based on GIS methods. In this paper, we evaluate the effect of various neighborhood information systems which were suggested by Lee and Shin (2008). For comparison of the estimators, MSE, Coverage, Calibration, Regression methods are used. The number of unemployment in Economic Active Population Survey(2001) is used for the comparison.

Comparison of GEE Estimators Using Imputation Methods (대체방법별 GEE추정량 비교)

  • 김동욱;노영화
    • The Korean Journal of Applied Statistics
    • /
    • v.16 no.2
    • /
    • pp.407-426
    • /
    • 2003
  • We consider the missing covariates problem in generalized estimating equations(GEE) model. If the covariate is partially missing, GEE can not be calculated. In this paper, we study the performance of 7 imputation methods to handle missing covariates in GEE models, and the properties of GEE estimators are investigated after missing covariates are imputed for ordinal data of repeated measurements. The 7 imputation methods include i) Naive Deletion ii) Sample Average Imputation iii) Row Average Imputation iv) Cross-wave Regression Imputation v) Carry-over Imputation vi) Bayesian Bootstrap vii) Approximate Bayesian Bootstrap. A Monte-Carlo simulation is used to compare the performance of these methods. For the missing mechanism generating the missing data, we assume ignorable nonresponse. Furthermore, we generate missing covariates with or without considering wave nonresp onse patterns.