• 제목/요약/키워드: Sufficient dimension reduction

검색결과 38건 처리시간 0.028초

Integrated Partial Sufficient Dimension Reduction with Heavily Unbalanced Categorical Predictors

  • Yoo, Jae-Keun
    • 응용통계연구
    • /
    • 제23권5호
    • /
    • pp.977-985
    • /
    • 2010
  • In this paper, we propose an approach to conduct partial sufficient dimension reduction with heavily unbalanced categorical predictors. For this, we consider integrated categorical predictors and investigate certain conditions that the integrated categorical predictor is fully informative to partial sufficient dimension reduction. For illustration, the proposed approach is implemented on optimal partial sliced inverse regression in simulation and data analysis.

Tutorial: Dimension reduction in regression with a notion of sufficiency

  • Yoo, Jae Keun
    • Communications for Statistical Applications and Methods
    • /
    • 제23권2호
    • /
    • pp.93-103
    • /
    • 2016
  • In the paper, we discuss dimension reduction of predictors ${\mathbf{X}}{\in}{{\mathbb{R}}^p}$ in a regression of $Y{\mid}{\mathbf{X}}$ with a notion of sufficiency that is called sufficient dimension reduction. In sufficient dimension reduction, the original predictors ${\mathbf{X}}$ are replaced by its lower-dimensional linear projection without loss of information on selected aspects of the conditional distribution. Depending on the aspects, the central subspace, the central mean subspace and the central $k^{th}$-moment subspace are defined and investigated as primary interests. Then the relationships among the three subspaces and the changes in the three subspaces for non-singular transformation of ${\mathbf{X}}$ are studied. We discuss the two conditions to guarantee the existence of the three subspaces that constrain the marginal distribution of ${\mathbf{X}}$ and the conditional distribution of $Y{\mid}{\mathbf{X}}$. A general approach to estimate them is also introduced along with an explanation for conditions commonly assumed in most sufficient dimension reduction methodologies.

Tutorial: Methodologies for sufficient dimension reduction in regression

  • Yoo, Jae Keun
    • Communications for Statistical Applications and Methods
    • /
    • 제23권2호
    • /
    • pp.105-117
    • /
    • 2016
  • In the paper, as a sequence of the first tutorial, we discuss sufficient dimension reduction methodologies used to estimate central subspace (sliced inverse regression, sliced average variance estimation), central mean subspace (ordinary least square, principal Hessian direction, iterative Hessian transformation), and central $k^{th}$-moment subspace (covariance method). Large-sample tests to determine the structural dimensions of the three target subspaces are well derived in most of the methodologies; however, a permutation test (which does not require large-sample distributions) is introduced. The test can be applied to the methodologies discussed in the paper. Theoretical relationships among the sufficient dimension reduction methodologies are also investigated and real data analysis is presented for illustration purposes. A seeded dimension reduction approach is then introduced for the methodologies to apply to large p small n regressions.

On hierarchical clustering in sufficient dimension reduction

  • Yoo, Chaeyeon;Yoo, Younju;Um, Hye Yeon;Yoo, Jae Keun
    • Communications for Statistical Applications and Methods
    • /
    • 제27권4호
    • /
    • pp.431-443
    • /
    • 2020
  • The K-means clustering algorithm has had successful application in sufficient dimension reduction. Unfortunately, the algorithm does have reproducibility and nestness, which will be discussed in this paper. These are clear deficits for the K-means clustering algorithm; however, the hierarchical clustering algorithm has both reproducibility and nestness, but intensive comparison between K-means and hierarchical clustering algorithm has not yet been done in a sufficient dimension reduction context. In this paper, we rigorously study the two clustering algorithms for two popular sufficient dimension reduction methodology of inverse mean and clustering mean methods throughout intensive numerical studies. Simulation studies and two real data examples confirm that the use of hierarchical clustering algorithm has a potential advantage over the K-means algorithm.

Fused inverse regression with multi-dimensional responses

  • Cho, Youyoung;Han, Hyoseon;Yoo, Jae Keun
    • Communications for Statistical Applications and Methods
    • /
    • 제28권3호
    • /
    • pp.267-279
    • /
    • 2021
  • A regression with multi-dimensional responses is quite common nowadays in the so-called big data era. In such regression, to relieve the curse of dimension due to high-dimension of responses, the dimension reduction of predictors is essential in analysis. Sufficient dimension reduction provides effective tools for the reduction, but there are few sufficient dimension reduction methodologies for multivariate regression. To fill this gap, we newly propose two fused slice-based inverse regression methods. The proposed approaches are robust to the numbers of clusters or slices and improve the estimation results over existing methods by fusing many kernel matrices. Numerical studies are presented and are compared with existing methods. Real data analysis confirms practical usefulness of the proposed methods.

Iterative projection of sliced inverse regression with fused approach

  • Han, Hyoseon;Cho, Youyoung;Yoo, Jae Keun
    • Communications for Statistical Applications and Methods
    • /
    • 제28권2호
    • /
    • pp.205-215
    • /
    • 2021
  • Sufficient dimension reduction is useful dimension reduction tool in regression, and sliced inverse regression (Li, 1991) is one of the most popular sufficient dimension reduction methodologies. In spite of its popularity, it is known to be sensitive to the number of slices. To overcome this shortcoming, the so-called fused sliced inverse regression is proposed by Cook and Zhang (2014). Unfortunately, the two existing methods do not have the direction application to large p-small n regression, in which the dimension reduction is desperately needed. In this paper, we newly propose seeded sliced inverse regression and seeded fused sliced inverse regression to overcome this deficit by adopting iterative projection approach (Cook et al., 2007). Numerical studies are presented to study their asymptotic estimation behaviors, and real data analysis confirms their practical usefulness in high-dimensional data analysis.

Method-Free Permutation Predictor Hypothesis Tests in Sufficient Dimension Reduction

  • Lee, Kyungjin;Oh, Suji;Yoo, Jae Keun
    • Communications for Statistical Applications and Methods
    • /
    • 제20권4호
    • /
    • pp.291-300
    • /
    • 2013
  • In this paper, we propose method-free permutation predictor hypothesis tests in the context of sufficient dimension reduction. Different from an existing method-free bootstrap approach, predictor hypotheses are evaluated based on p-values; therefore, usual statistical practitioners should have a potential preference. Numerical studies validate the developed theories, and real data application is provided.

Intensive numerical studies of optimal sufficient dimension reduction with singularity

  • Yoo, Jae Keun;Gwak, Da-Hae;Kim, Min-Sun
    • Communications for Statistical Applications and Methods
    • /
    • 제24권3호
    • /
    • pp.303-315
    • /
    • 2017
  • Yoo (2015, Statistics and Probability Letters, 99, 109-113) derives theoretical results in an optimal sufficient dimension reduction with singular inner-product matrix. The results are promising, but Yoo (2015) only presents one simulation study. So, an evaluation of its practical usefulness is necessary based on numerical studies. This paper studies the asymptotic behaviors of Yoo (2015) through various simulation models and presents a real data example that focuses on ordinary least squares. Intensive numerical studies show that the $x^2$ test by Yoo (2015) outperforms the existing optimal sufficient dimension reduction method. The basis estimation by the former can be theoretically sub-optimal; however, there are no notable differences from that by the latter. This investigation confirms the practical usefulness of Yoo (2015).

A Note on Bootstrapping in Sufficient Dimension Reduction

  • Yoo, Jae Keun;Jeong, Sun
    • Communications for Statistical Applications and Methods
    • /
    • 제22권3호
    • /
    • pp.285-294
    • /
    • 2015
  • A permutation test is the popular and attractive alternative to derive asymptotic distributions of dimension test statistics in sufficient dimension reduction methodologies; however, recent studies show that a bootstrapping technique also can be used. We consider two types of bootstrapping dimension determination, which are partial and whole bootstrapping procedures. Numerical studies compare the permutation test and the two bootstrapping procedures; subsequently, real data application is presented. Considering two additional bootstrapping procedures to the existing permutation test, one has more supporting evidence for the dimension estimation of the central subspace that allow it to be determined more convincingly.

Two variations of cross-distance selection algorithm in hybrid sufficient dimension reduction

  • Jae Keun Yoo
    • Communications for Statistical Applications and Methods
    • /
    • 제30권2호
    • /
    • pp.179-189
    • /
    • 2023
  • Hybrid sufficient dimension reduction (SDR) methods to a weighted mean of kernel matrices of two different SDR methods by Ye and Weiss (2003) require heavy computation and time consumption due to bootstrapping. To avoid this, Park et al. (2022) recently develop the so-called cross-distance selection (CDS) algorithm. In this paper, two variations of the original CDS algorithm are proposed depending on how well and equally the covk-SAVE is treated in the selection procedure. In one variation, which is called the larger CDS algorithm, the covk-SAVE is equally and fairly utilized with the other two candiates of SIR-SAVE and covk-DR. But, for the final selection, a random selection should be necessary. On the other hand, SIR-SAVE and covk-DR are utilized with completely ruling covk-SAVE out, which is called the smaller CDS algorithm. Numerical studies confirm that the original CDS algorithm is better than or compete quite well to the two proposed variations. A real data example is presented to compare and interpret the decisions by the three CDS algorithms in practice.