• Title/Summary/Keyword: large p-small n data

Search Result 47, Processing Time 0.024 seconds

Applications of response dimension reduction in large p-small n problems

  • Minjee Kim;Jae Keun Yoo
    • Communications for Statistical Applications and Methods
    • /
    • v.31 no.2
    • /
    • pp.191-202
    • /
    • 2024
  • The goal of this paper is to show how multivariate regression analysis with high-dimensional responses is facilitated by the response dimension reduction. Multivariate regression, characterized by multi-dimensional response variables, is increasingly prevalent across diverse fields such as repeated measures, longitudinal studies, and functional data analysis. One of the key challenges in analyzing such data is managing the response dimensions, which can complicate the analysis due to an exponential increase in the number of parameters. Although response dimension reduction methods are developed, there is no practically useful illustration for various types of data such as so-called large p-small n data. This paper aims to fill this gap by showcasing how response dimension reduction can enhance the analysis of high-dimensional response data, thereby providing significant assistance to statistical practitioners and contributing to advancements in multiple scientific domains.

Case study: application of fused sliced average variance estimation to near-infrared spectroscopy of biscuit dough data (Fused sliced average variance estimation의 실증분석: 비스킷 반죽의 근적외분광분석법 분석 자료로의 적용)

  • Um, Hye Yeon;Won, Sungmin;An, Hyoin;Yoo, Jae Keun
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.6
    • /
    • pp.835-842
    • /
    • 2018
  • The so-called sliced average variance estimation (SAVE) is a popular methodology in sufficient dimension reduction literature. SAVE is sensitive to the number of slices in practice. To overcome this, a fused SAVE (FSAVE) is recently proposed by combining the kernel matrices obtained from various numbers of slices. In the paper, we consider practical applications of FSAVE to large p-small n data. For this, near-infrared spectroscopy of biscuit dough data is analyzed. In this case study, the usefulness of FSAVE in high-dimensional data analysis is confirmed by showing that the result by FASVE is superior to existing analysis results.

Estimation of Gini-Simpson index for SNP data

  • Kang, Joonsung
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.6
    • /
    • pp.1557-1564
    • /
    • 2017
  • We take genomic sequences of high-dimensional low sample size (HDLSS) without ordering of response categories into account. When constructing an appropriate test statistics in this model, the classical multivariate analysis of variance (MANOVA) approach might not be useful owing to very large number of parameters and very small sample size. For these reasons, we present a pseudo marginal model based upon the Gini-Simpson index estimated via Bayesian approach. In view of small sample size, we consider the permutation distribution by every possible n! (equally likely) permutation of the joined sample observations across G groups of (sizes $n_1,{\ldots}n_G$). We simulate data and apply false discovery rate (FDR) and positive false discovery rate (pFDR) with associated proposed test statistics to the data. And we also analyze real SARS data and compute FDR and pFDR. FDR and pFDR procedure along with the associated test statistics for each gene control the FDR and pFDR respectively at any level ${\alpha}$ for the set of p-values by using the exact conditional permutation theory.

Effect of outliers on the variable selection by the regularized regression

  • Jeong, Junho;Kim, Choongrak
    • Communications for Statistical Applications and Methods
    • /
    • v.25 no.2
    • /
    • pp.235-243
    • /
    • 2018
  • Many studies exist on the influence of one or few observations on estimators in a variety of statistical models under the "large n, small p" setup; however, diagnostic issues in the regression models have been rarely studied in a high dimensional setup. In the high dimensional data, the influence of observations is more serious because the sample size n is significantly less than the number variables p. Here, we investigate the influence of observations on the least absolute shrinkage and selection operator (LASSO) estimates, suggested by Tibshirani (Journal of the Royal Statistical Society, Series B, 73, 273-282, 1996), and the influence of observations on selected variables by the LASSO in the high dimensional setup. We also derived an analytic expression for the influence of the k observation on LASSO estimates in simple linear regression. Numerical studies based on artificial data and real data are done for illustration. Numerical results showed that the influence of observations on the LASSO estimates and the selected variables by the LASSO in the high dimensional setup is more severe than that in the usual "large n, small p" setup.

Iterative projection of sliced inverse regression with fused approach

  • Han, Hyoseon;Cho, Youyoung;Yoo, Jae Keun
    • Communications for Statistical Applications and Methods
    • /
    • v.28 no.2
    • /
    • pp.205-215
    • /
    • 2021
  • Sufficient dimension reduction is useful dimension reduction tool in regression, and sliced inverse regression (Li, 1991) is one of the most popular sufficient dimension reduction methodologies. In spite of its popularity, it is known to be sensitive to the number of slices. To overcome this shortcoming, the so-called fused sliced inverse regression is proposed by Cook and Zhang (2014). Unfortunately, the two existing methods do not have the direction application to large p-small n regression, in which the dimension reduction is desperately needed. In this paper, we newly propose seeded sliced inverse regression and seeded fused sliced inverse regression to overcome this deficit by adopting iterative projection approach (Cook et al., 2007). Numerical studies are presented to study their asymptotic estimation behaviors, and real data analysis confirms their practical usefulness in high-dimensional data analysis.

A Study on the Power Comparison between Logistic Regression and Offset Poisson Regression for Binary Data

  • Kim, Dae-Youb;Park, Heung-Sun
    • Communications for Statistical Applications and Methods
    • /
    • v.19 no.4
    • /
    • pp.537-546
    • /
    • 2012
  • In this paper, for analyzing binary data, Poisson regression with offset and logistic regression are compared with respect to the power via simulations. Poisson distribution can be used as an approximation of binomial distribution when n is large and p is small; however, we investigate if the same conditions can be held for the power of significant tests between logistic regression and offset poisson regression. The result is that when offset size is large for rare events offset poisson regression has a similar power to logistic regression, but it has an acceptable power even with a moderate prevalence rate. However, with a small offset size (< 10), offset poisson regression should be used with caution for rare events or common events. These results would be good guidelines for users who want to use offset poisson regression models for binary data.

The Design of New Optical Switching Networks for Efficient Data Transmission in BcN (BcN 망에서 효율적인 데이터 전송을 위한 새로운 개념의 광 교환망 설계)

  • Lee SeoungYoung;Park Hong-Shik
    • Journal of the Institute of Electronics Engineers of Korea TC
    • /
    • v.42 no.12
    • /
    • pp.31-36
    • /
    • 2005
  • In this paper, we propose a new optical switching system as a infrastructure of the BcN, in which the high traffic volume will be expected due to the multimedia service, like P2P services. Because the JET protocol, the most popular protocol in OBS (Optical Burst Switching) research area, has high blocking probability for burst, it prevents commercialization in real network for its low throughput in TCP layer. To improve high blocking rate in OBS network, we segment large network into small network and perform burst scheduling to avoid burst loss. By using proposed scheme, Internet provider can reduce network deployment cost in Metro network as well as large mesh core networks

Tutorial: Methodologies for sufficient dimension reduction in regression

  • Yoo, Jae Keun
    • Communications for Statistical Applications and Methods
    • /
    • v.23 no.2
    • /
    • pp.105-117
    • /
    • 2016
  • In the paper, as a sequence of the first tutorial, we discuss sufficient dimension reduction methodologies used to estimate central subspace (sliced inverse regression, sliced average variance estimation), central mean subspace (ordinary least square, principal Hessian direction, iterative Hessian transformation), and central $k^{th}$-moment subspace (covariance method). Large-sample tests to determine the structural dimensions of the three target subspaces are well derived in most of the methodologies; however, a permutation test (which does not require large-sample distributions) is introduced. The test can be applied to the methodologies discussed in the paper. Theoretical relationships among the sufficient dimension reduction methodologies are also investigated and real data analysis is presented for illustration purposes. A seeded dimension reduction approach is then introduced for the methodologies to apply to large p small n regressions.

Effect of different arch widths on the accuracy of three intraoral scanners

  • Kaewbuasa, Narin;Ongthiemsak, Chakree
    • The Journal of Advanced Prosthodontics
    • /
    • v.13 no.4
    • /
    • pp.205-215
    • /
    • 2021
  • PURPOSE. The purpose of this study was to compare the accuracy of three intraoral scanner (IOS) systems with three different dental arch widths. MATERIALS AND METHODS. Three dental models with different intermolar widths (small, medium, and large) were attached to metal bars of different lengths (30, 40, and 50 mm). The bars were measured with a coordinate measuring machine and used as references. Three IOSs were compared: TRIOS 3 (TRI), True Definition (TD), and Dental Wings (DW). The relative length and angular deviation of both ends of the metal bars from the scan data set (n = 15) were calculated and analyzed. RESULTS. Comparing among scanners in terms of trueness, the relative length deviation of DW in the small (1.28%) and medium (1.08%) arches were significantly higher than TRI (0.46% and 0.48%) and TD (0.33% and 0.18%). The angular deviation of DW in the small (1.75°) and medium (1.83°) arches were also significantly greater than TRI (0.63° and 0.40°) and TD (0.55° and 0.89°). Comparing within scanner, the large arch of DW showed better accuracy than other arch sizes (P < .05). On the other hand, the larger arch of TD presented a greater tendency of angular deviation in terms of trueness. No significant differences were found in terms of trueness between the arch widths of TRI group. CONCLUSION. The different widths of the dental arches can affect the accuracy of some intraoral scanners in full arch scan.

Identifying the Effect of Personal, Foodservice and Organizational Characteristics on Foodservice Managers' Job Satisfaction by the Contract Management Company Scale (위탁급식업체 규모에 따른 급식관리자 직무만족에 영향을 미치는 개인, 급식소 및 조직특성 분석)

  • Han, Jeong-Hye;Yi, Na-Young;Hong, Wan-Soo
    • Korean Journal of Community Nutrition
    • /
    • v.14 no.2
    • /
    • pp.216-228
    • /
    • 2009
  • The purpose of the study was to investigate the influences of contract foodservice managers' personal characteristics, foodservice characteristics and organizational characteristics on job satisfaction, including interpersonal relationships, self-actualization and promotion opportunity categories. A survey was administered to four hundred contract foodservice managers of five large companies and five small/medium companies in the Seoul and Kyungin areas. The final response rate was 66%(N=265), and the data were analyzed using SPSS Windows(ver. 12.0). The respondents were 76.1% female, average age 28.8 years, and 73.0% were regular workers. Contract foodservices have profit and loss contracts(69.1%), single menu types(59.6%) and buffet serving styles(37.7%). There are significant differences of job satisfaction by some personal characteristic variables(gender, martial status, age, education, position, work hours, period of working for the present company, and payroll per year) and foodservice characteristic variables(type of contract and charge of food costs). In three job satisfaction categories, foodservice managers reported the highest interpersonal relationship satisfaction, following self-actualization satisfaction and promotion opportunity satisfaction in both large companies and small/medium companies. However, foodservice managers of large companies tended to be more satisfied regarding their promotion opportunities than foodservice managers of small/medium companies(p<0.05). Work hours, number of meals served/day, male, workload, communication with the clients, relationship with co-workers, obvious role and autonomy were significant factors to increase the job satisfaction in contract foodservices of large companies. On the other hand, relationships with co-workers and males were significant factors to increase the job satisfaction in contract foodservices of small/medium companies. This research suggests that contract foodservice companies need to understand the characteristics of their managers, foodservices and organizations to enhance the job satisfaction of foodservice managers and to develop specified human resource management strategies that can be applied to each company scale.