• Title/Summary/Keyword: Sufficient dimension reduction

Search Result 38, Processing Time 0.022 seconds

Fused sliced inverse regression in survival analysis

  • Yoo, Jae Keun
    • Communications for Statistical Applications and Methods
    • /
    • v.24 no.5
    • /
    • pp.533-541
    • /
    • 2017
  • Sufficient dimension reduction (SDR) replaces original p-dimensional predictors to a lower-dimensional linearly transformed predictor. The sliced inverse regression (SIR) has the longest and most popular history of SDR methodologies. The critical weakness of SIR is its known sensitive to the numbers of slices. Recently, a fused sliced inverse regression is developed to overcome this deficit, which combines SIR kernel matrices constructed from various choices of the number of slices. In this paper, the fused sliced inverse regression and SIR are compared to show that the former has a practical advantage in survival regression over the latter. Numerical studies confirm this and real data example is presented.

Case studies: Statistical analysis of contributions of vitamins and phytochemicals to antioxidant activities in plant-based multivitamins through generalized partially double-index model

  • Yoo, Jae Keun;Kwon, Oran
    • Communications for Statistical Applications and Methods
    • /
    • v.23 no.3
    • /
    • pp.251-258
    • /
    • 2016
  • It is important to verify the identity of plant-based multivitamins prepared with a natural-concept and popular for daily consumption because they are easily purchased in markets with imperfect information. For this study, a generalized partially double-index model (GPDIM) was employed as a main statistical method to identify the contribution of vitamins and phytochemicals to antioxidant potentials using data on antioxidant capacities and chemical fingerprinting. A bootstrapping approach via sufficient dimension reduction is adopted to estimate the two unknown coefficient vectors in the GPDIM. Fifth order polynomial regressions are fitted to measure the contributions of vitamins and phytochemicals after estimating the coefficient vectors with the two double indices.

A concise overview of principal support vector machines and its generalization

  • Jungmin Shin;Seung Jun Shin
    • Communications for Statistical Applications and Methods
    • /
    • v.31 no.2
    • /
    • pp.235-246
    • /
    • 2024
  • In high-dimensional data analysis, sufficient dimension reduction (SDR) has been considered as an attractive tool for reducing the dimensionality of predictors while preserving regression information. The principal support vector machine (PSVM) (Li et al., 2011) offers a unified approach for both linear and nonlinear SDR. This article comprehensively explores a variety of SDR methods based on the PSVM, which we call principal machines (PM) for SDR. The PM achieves SDR by solving a sequence of convex optimizations akin to popular supervised learning methods, such as the support vector machine, logistic regression, and quantile regression, to name a few. This makes the PM straightforward to handle and extend in both theoretical and computational aspects, as we will see throughout this article.

Overview of estimating the average treatment effect using dimension reduction methods (차원축소 방법을 이용한 평균처리효과 추정에 대한 개요)

  • Mijeong Kim
    • The Korean Journal of Applied Statistics
    • /
    • v.36 no.4
    • /
    • pp.323-335
    • /
    • 2023
  • In causal analysis of high dimensional data, it is important to reduce the dimension of covariates and transform them appropriately to control confounders that affect treatment and potential outcomes. The augmented inverse probability weighting (AIPW) method is mainly used for estimation of average treatment effect (ATE). AIPW estimator can be obtained by using estimated propensity score and outcome model. ATE estimator can be inconsistent or have large asymptotic variance when using estimated propensity score and outcome model obtained by parametric methods that includes all covariates, especially for high dimensional data. For this reason, an ATE estimation using an appropriate dimension reduction method and semiparametric model for high dimensional data is attracting attention. Semiparametric method or sparse sufficient dimensionality reduction method can be uesd for dimension reduction for the estimation of propensity score and outcome model. Recently, another method has been proposed that does not use propensity score and outcome regression. After reducing dimension of covariates, ATE estimation can be performed using matching. Among the studies on ATE estimation methods for high dimensional data, four recently proposed studies will be introduced, and how to interpret the estimated ATE will be discussed.

Using noise filtering and sufficient dimension reduction method on unstructured economic data (노이즈 필터링과 충분차원축소를 이용한 비정형 경제 데이터 활용에 대한 연구)

  • Jae Keun Yoo;Yujin Park;Beomseok Seo
    • The Korean Journal of Applied Statistics
    • /
    • v.37 no.2
    • /
    • pp.119-138
    • /
    • 2024
  • Text indicators are increasingly valuable in economic forecasting, but are often hindered by noise and high dimensionality. This study aims to explore post-processing techniques, specifically noise filtering and dimensionality reduction, to normalize text indicators and enhance their utility through empirical analysis. Predictive target variables for the empirical analysis include monthly leading index cyclical variations, BSI (business survey index) All industry sales performance, BSI All industry sales outlook, as well as quarterly real GDP SA (seasonally adjusted) growth rate and real GDP YoY (year-on-year) growth rate. This study explores the Hodrick and Prescott filter, which is widely used in econometrics for noise filtering, and employs sufficient dimension reduction, a nonparametric dimensionality reduction methodology, in conjunction with unstructured text data. The analysis results reveal that noise filtering of text indicators significantly improves predictive accuracy for both monthly and quarterly variables, particularly when the dataset is large. Moreover, this study demonstrated that applying dimensionality reduction further enhances predictive performance. These findings imply that post-processing techniques, such as noise filtering and dimensionality reduction, are crucial for enhancing the utility of text indicators and can contribute to improving the accuracy of economic forecasts.

Dimension reduction for right-censored survival regression: transformation approach

  • Yoo, Jae Keun;Kim, Sung-Jin;Seo, Bi-Seul;Shin, Hyejung;Sim, Su-Ah
    • Communications for Statistical Applications and Methods
    • /
    • v.23 no.3
    • /
    • pp.259-268
    • /
    • 2016
  • High-dimensional survival data with large numbers of predictors has become more common. The analysis of such data can be facilitated if the dimensions of predictors are adequately reduced. Recent studies show that a method called sliced inverse regression (SIR) is an effective dimension reduction tool in high-dimensional survival regression. However, it faces incapability in implementation due to a double categorization procedure. This problem can be overcome in the right-censoring type by transforming the observed survival time and censoring status into a single variable. This provides more flexibility in the categorization, so the applicability of SIR can be enhanced. Numerical studies show that the proposed transforming approach is equally good to (or even better) than the usual SIR application in both balanced and highly-unbalanced censoring status. The real data example also confirms its practical usefulness, so the proposed approach should be an effective and valuable addition to usual statistical practitioners.

Model-based inverse regression for mixture data

  • Choi, Changhwan;Park, Chongsun
    • Communications for Statistical Applications and Methods
    • /
    • v.24 no.1
    • /
    • pp.97-113
    • /
    • 2017
  • This paper proposes a method for sufficient dimension reduction (SDR) of mixture data. We consider mixture data containing more than one component that have distinct central subspaces. We adopt an approach of a model-based sliced inverse regression (MSIR) to the mixture data in a simple and intuitive manner. We employed mixture probabilistic principal component analysis (MPPCA) to estimate each central subspaces and cluster the data points. The results from simulation studies and a real data set show that our method is satisfactory to catch appropriate central spaces and is also robust regardless of the number of slices chosen. Discussions about root selection, estimation accuracy, and classification with initial value issues of MPPCA and its related simulation results are also provided.

Feature Analysis of Multi-Channel Time Series EEG Based on Incremental Model (점진적 모델에 기반한 다채널 시계열 데이터 EEG의 특징 분석)

  • Kim, Sun-Hee;Yang, Hyung-Jeong;Ng, Kam Swee;Jeong, Jong-Mun
    • The KIPS Transactions:PartB
    • /
    • v.16B no.1
    • /
    • pp.63-70
    • /
    • 2009
  • BCI technology is to control communication systems or machines by brain signal among biological signals followed by signal processing. For the implementation of BCI systems, it is required that the characteristics of brain signal are learned and analyzed in real-time and the learned characteristics are applied. In this paper, we detect feature vector of EEG signal on left and right hand movements based on incremental approach and dimension reduction using the detected feature vector. In addition, we show that the reduced dimension can improve the classification performance by removing unnecessary features. The processed data including sufficient features of input data can reduce the time of processing and boost performance of classification by removing unwanted features. Our experiments using K-NN classifier show the proposed approach 5% outperforms the PCA based dimension reduction.

Investigating SIR, DOC and SAVE for the Polychotomous Response

  • Lee, Hak-Bae;Lee, Hee-Min
    • Communications for Statistical Applications and Methods
    • /
    • v.19 no.3
    • /
    • pp.501-506
    • /
    • 2012
  • This paper investigates the central subspace related with SIR, DOC and SAVE when the response has more than two values. The subspaces constructed by SIR, DOC and SAVE are investigated and compared. The SAVE paradigm is the most comprehensive. In addition, the SAVE coincides with the central subspace when the conditional distribution of predictors given the response is normally distributed.

Case study: application of fused sliced average variance estimation to near-infrared spectroscopy of biscuit dough data (Fused sliced average variance estimation의 실증분석: 비스킷 반죽의 근적외분광분석법 분석 자료로의 적용)

  • Um, Hye Yeon;Won, Sungmin;An, Hyoin;Yoo, Jae Keun
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.6
    • /
    • pp.835-842
    • /
    • 2018
  • The so-called sliced average variance estimation (SAVE) is a popular methodology in sufficient dimension reduction literature. SAVE is sensitive to the number of slices in practice. To overcome this, a fused SAVE (FSAVE) is recently proposed by combining the kernel matrices obtained from various numbers of slices. In the paper, we consider practical applications of FSAVE to large p-small n data. For this, near-infrared spectroscopy of biscuit dough data is analyzed. In this case study, the usefulness of FSAVE in high-dimensional data analysis is confirmed by showing that the result by FASVE is superior to existing analysis results.