• Title/Summary/Keyword: dimension-reduction

Search Result 534, Processing Time 0.029 seconds

Applications of response dimension reduction in large p-small n problems

  • Minjee Kim;Jae Keun Yoo
    • Communications for Statistical Applications and Methods
    • /
    • v.31 no.2
    • /
    • pp.191-202
    • /
    • 2024
  • The goal of this paper is to show how multivariate regression analysis with high-dimensional responses is facilitated by the response dimension reduction. Multivariate regression, characterized by multi-dimensional response variables, is increasingly prevalent across diverse fields such as repeated measures, longitudinal studies, and functional data analysis. One of the key challenges in analyzing such data is managing the response dimensions, which can complicate the analysis due to an exponential increase in the number of parameters. Although response dimension reduction methods are developed, there is no practically useful illustration for various types of data such as so-called large p-small n data. This paper aims to fill this gap by showcasing how response dimension reduction can enhance the analysis of high-dimensional response data, thereby providing significant assistance to statistical practitioners and contributing to advancements in multiple scientific domains.

On hierarchical clustering in sufficient dimension reduction

  • Yoo, Chaeyeon;Yoo, Younju;Um, Hye Yeon;Yoo, Jae Keun
    • Communications for Statistical Applications and Methods
    • /
    • v.27 no.4
    • /
    • pp.431-443
    • /
    • 2020
  • The K-means clustering algorithm has had successful application in sufficient dimension reduction. Unfortunately, the algorithm does have reproducibility and nestness, which will be discussed in this paper. These are clear deficits for the K-means clustering algorithm; however, the hierarchical clustering algorithm has both reproducibility and nestness, but intensive comparison between K-means and hierarchical clustering algorithm has not yet been done in a sufficient dimension reduction context. In this paper, we rigorously study the two clustering algorithms for two popular sufficient dimension reduction methodology of inverse mean and clustering mean methods throughout intensive numerical studies. Simulation studies and two real data examples confirm that the use of hierarchical clustering algorithm has a potential advantage over the K-means algorithm.

On robustness in dimension determination in fused sliced inverse regression

  • Yoo, Jae Keun;Cho, Yoo Na
    • Communications for Statistical Applications and Methods
    • /
    • v.25 no.5
    • /
    • pp.513-521
    • /
    • 2018
  • The goal of sufficient dimension reduction (SDR) is to replace original p-dimensional predictors with a lower-dimensional linearly transformed predictor. The sliced inverse regression (SIR) (Li, Journal of the American Statistical Association, 86, 316-342, 1991) is one of the most popular SDR methods because of its applicability and simple implementation in practice. However, SIR may yield different dimension reduction results for different numbers of slices and despite its popularity, is a clear deficit for SIR. To overcome this, a fused sliced inverse regression was recently proposed. The study shows that the dimension-reduced predictors is robust to the numbers of the slices, but it does not investigate how robust its dimension determination is. This paper suggests a permutation dimension determination for the fused sliced inverse regression that is compared with SIR to investigate the robustness to the numbers of slices in the dimension determination. Numerical studies confirm this and a real data example is presented.

Comparative Study of Dimension Reduction Methods for Highly Imbalanced Overlapping Churn Data

  • Lee, Sujee;Koo, Bonhyo;Jung, Kyu-Hwan
    • Industrial Engineering and Management Systems
    • /
    • v.13 no.4
    • /
    • pp.454-462
    • /
    • 2014
  • Retention of possible churning customer is one of the most important issues in customer relationship management, so companies try to predict churn customers using their large-scale high-dimensional data. This study focuses on dealing with large data sets by reducing the dimensionality. By using six different dimension reduction methods-Principal Component Analysis (PCA), factor analysis (FA), locally linear embedding (LLE), local tangent space alignment (LTSA), locally preserving projections (LPP), and deep auto-encoder-our experiments apply each dimension reduction method to the training data, build a classification model using the mapped data and then measure the performance using hit rate to compare the dimension reduction methods. In the result, PCA shows good performance despite its simplicity, and the deep auto-encoder gives the best overall performance. These results can be explained by the characteristics of the churn prediction data that is highly correlated and overlapped over the classes. We also proposed a simple out-of-sample extension method for the nonlinear dimension reduction methods, LLE and LTSA, utilizing the characteristic of the data.

A Classification Method Using Data Reduction

  • Uhm, Daiho;Jun, Sung-Hae;Lee, Seung-Joo
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.12 no.1
    • /
    • pp.1-5
    • /
    • 2012
  • Data reduction has been used widely in data mining for convenient analysis. Principal component analysis (PCA) and factor analysis (FA) methods are popular techniques. The PCA and FA reduce the number of variables to avoid the curse of dimensionality. The curse of dimensionality is to increase the computing time exponentially in proportion to the number of variables. So, many methods have been published for dimension reduction. Also, data augmentation is another approach to analyze data efficiently. Support vector machine (SVM) algorithm is a representative technique for dimension augmentation. The SVM maps original data to a feature space with high dimension to get the optimal decision plane. Both data reduction and augmentation have been used to solve diverse problems in data analysis. In this paper, we compare the strengths and weaknesses of dimension reduction and augmentation for classification and propose a classification method using data reduction for classification. We will carry out experiments for comparative studies to verify the performance of this research.

A Note on Bootstrapping in Sufficient Dimension Reduction

  • Yoo, Jae Keun;Jeong, Sun
    • Communications for Statistical Applications and Methods
    • /
    • v.22 no.3
    • /
    • pp.285-294
    • /
    • 2015
  • A permutation test is the popular and attractive alternative to derive asymptotic distributions of dimension test statistics in sufficient dimension reduction methodologies; however, recent studies show that a bootstrapping technique also can be used. We consider two types of bootstrapping dimension determination, which are partial and whole bootstrapping procedures. Numerical studies compare the permutation test and the two bootstrapping procedures; subsequently, real data application is presented. Considering two additional bootstrapping procedures to the existing permutation test, one has more supporting evidence for the dimension estimation of the central subspace that allow it to be determined more convincingly.

Comprehensive studies of Grassmann manifold optimization and sequential candidate set algorithm in a principal fitted component model

  • Chaeyoung, Lee;Jae Keun, Yoo
    • Communications for Statistical Applications and Methods
    • /
    • v.29 no.6
    • /
    • pp.721-733
    • /
    • 2022
  • In this paper we compare parameter estimation by Grassmann manifold optimization and sequential candidate set algorithm in a structured principal fitted component (PFC) model. The structured PFC model extends the form of the covariance matrix of a random error to relieve the limits that occur due to too simple form of the matrix. However, unlike other PFC models, structured PFC model does not have a closed form for parameter estimation in dimension reduction which signals the need of numerical computation. The numerical computation can be done through Grassmann manifold optimization and sequential candidate set algorithm. We conducted numerical studies to compare the two methods by computing the results of sequential dimension testing and trace correlation values where we can compare the performance in determining dimension and estimating the basis. We could conclude that Grassmann manifold optimization outperforms sequential candidate set algorithm in dimension determination, while sequential candidate set algorithm is better in basis estimation when conducting dimension reduction. We also applied the methods in real data which derived the same result.

Incremental Linear Discriminant Analysis for Streaming Data Using the Minimum Squared Error Solution (스트리밍 데이터에 대한 최소제곱오차해를 통한 점층적 선형 판별 분석 기법)

  • Lee, Gyeong-Hoon;Park, Cheong Hee
    • Journal of KIISE
    • /
    • v.45 no.1
    • /
    • pp.69-75
    • /
    • 2018
  • In the streaming data where data samples arrive sequentially in time, it is difficult to apply the dimension reduction method based on batch learning. Therefore an incremental dimension reduction method for the application to streaming data has been studied. In this paper, we propose an incremental linear discriminant analysis method using the least squared error solution. Instead of computing scatter matrices directly, the proposed method incrementally updates the projective direction for dimension reduction by using the information of a new incoming sample. The experimental results demonstrate that the proposed method is more efficient compared with previously proposed incremental dimension reduction methods.

Defect Severity-based Dimension Reduction Model using PCA (PCA를 적용한 결함 심각도 기반 차원 축소 모델)

  • Kwon, Ki Tae;Lee, Na-Young
    • Journal of Software Assessment and Valuation
    • /
    • v.15 no.1
    • /
    • pp.79-86
    • /
    • 2019
  • Software dimension reduction identifies the commonality of elements and extracts important feature elements. So it reduces complexity by simplify and solves multi-collinearity problems. And it reduces redundancy by performing redundancy and noise detection. In this study, we proposed defect severity-based dimension reduction model. Proposed model is applied defect severity-based NASA dataset. And it is verified the number of dimensions in the column that affect the severity of the defect. Then it is compares and analyzes the dimensions of the data before and after reduction. In this study experiment result, the number of dimensions of PC4's dataset is 2 to 3. It was possible to reduce the dimension.

Classification of Microarray Gene Expression Data by MultiBlock Dimension Reduction

  • Oh, Mi-Ra;Kim, Seo-Young;Kim, Kyung-Sook;Baek, Jang-Sun;Son, Young-Sook
    • Communications for Statistical Applications and Methods
    • /
    • v.13 no.3
    • /
    • pp.567-576
    • /
    • 2006
  • In this paper, we applied the multiblock dimension reduction methods to the classification of tumor based on microarray gene expressions data. This procedure involves clustering selected genes, multiblock dimension reduction and classification using linear discrimination analysis and quadratic discrimination analysis.