• Title/Summary/Keyword: group Lasso

Search Result 17, Processing Time 0.02 seconds

Study on the Anthropometric and Body Composition Indices for Prediction of Cold and Heat Pattern

  • Mun, Sujeong;Park, Kihyun;Lee, Siwoo
    • The Journal of Korean Medicine
    • /
    • v.42 no.4
    • /
    • pp.185-196
    • /
    • 2021
  • Objectives: Many symptoms of cold and heat patterns are related to the thermoregulation of the body. Thus, we aimed to study the association of cold and heat patterns with anthropometry/body composition. Methods: The cold and heat patterns of 2000 individuals aged 30-55 years were evaluated using a self-administered questionnaire. Results: Among the anthropometric and body composition variables, body mass index (-0.37, 0.39) and fat mass index (-0.35, 0.38) had the highest correlation coefficients with the cold and heat pattern scores after adjustment for age and sex in the cold-heat group, while the correlation coefficients were relatively lower in the non-cold-heat group. In the cold-heat group, the most parsimonious model for the cold pattern with the variables selected by the best subset method and Lasso included sex, body mass index, waist-hip ratio, and extracellular water/total body water (adjusted R2 = 0.324), and the model for heat pattern additionally included age (adjusted R2 = 0.292). Conclusions: The variables related to obesity and water balance were the most useful for predicting cold and heat patterns. Further studies are required to improve the performance of prediction models.

Network-based regularization for analysis of high-dimensional genomic data with group structure (그룹 구조를 갖는 고차원 유전체 자료 분석을 위한 네트워크 기반의 규제화 방법)

  • Kim, Kipoong;Choi, Jiyun;Sun, Hokeun
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.6
    • /
    • pp.1117-1128
    • /
    • 2016
  • In genetic association studies with high-dimensional genomic data, regularization procedures based on penalized likelihood are often applied to identify genes or genetic regions associated with diseases or traits. A network-based regularization procedure can utilize biological network information (such as genetic pathways and signaling pathways in genetic association studies) with an outstanding selection performance over other regularization procedures such as lasso and elastic-net. However, network-based regularization has a limitation because cannot be applied to high-dimension genomic data with a group structure. In this article, we propose to combine data dimension reduction techniques such as principal component analysis and a partial least square into network-based regularization for the analysis of high-dimensional genomic data with a group structure. The selection performance of the proposed method was evaluated by extensive simulation studies. The proposed method was also applied to real DNA methylation data generated from Illumina Innium HumanMethylation27K BeadChip, where methylation beta values of around 20,000 CpG sites over 12,770 genes were compared between 123 ovarian cancer patients and 152 healthy controls. This analysis was also able to indicate a few cancer-related genes.

ADMM for least square problems with pairwise-difference penalties for coefficient grouping

  • Park, Soohee;Shin, Seung Jun
    • Communications for Statistical Applications and Methods
    • /
    • v.29 no.4
    • /
    • pp.441-451
    • /
    • 2022
  • In the era of bigdata, scalability is a crucial issue in learning models. Among many others, the Alternating Direction of Multipliers (ADMM, Boyd et al., 2011) algorithm has gained great popularity in solving large-scale problems efficiently. In this article, we propose applying the ADMM algorithm to solve the least square problem penalized by the pairwise-difference penalty, frequently used to identify group structures among coefficients. ADMM algorithm enables us to solve the high-dimensional problem efficiently in a unified fashion and thus allows us to employ several different types of penalty functions such as LASSO, Elastic Net, SCAD, and MCP for the penalized problem. Additionally, the ADMM algorithm naturally extends the algorithm to distributed computation and real-time updates, both desirable when dealing with large amounts of data.

Development of a Machine-Learning Predictive Model for First-Grade Children at Risk for ADHD (머신러닝 분석을 활용한 초등학교 1학년 ADHD 위험군 아동 종단 예측모형 개발)

  • Lee, Dongmee;Jang, Hye In;Kim, Ho Jung;Bae, Jin;Park, Ju Hee
    • Korean Journal of Childcare and Education
    • /
    • v.17 no.5
    • /
    • pp.83-103
    • /
    • 2021
  • Objective: This study aimed to develop a longitudinal predictive model that identifies first-grade children who are at risk for ADHD and to investigate the factors that predict the probability of belonging to the at-risk group for ADHD by using machine learning. Methods: The data of 1,445 first-grade children from the 1st, 3rd, 6th, 7th, and 8th waves of the Korean Children's Panel were analyzed. The output factors were the at-risk and non-risk group for ADHD divided by the CBCL DSM-ADHD scale. Prenatal as well as developmental factors during infancy and early childhood were used as input factors. Results: The model that best classifies the at-risk and the non-risk group for ADHD was the LASSO model. The input factors which increased the probability of being in the at-risk group for ADHD were temperament of negative emotionality, communication abilities, gross motor skills, social competences, and academic readiness. Conclusion/Implications: The outcomes indicate that children who showed specific risk indicators during infancy and early childhood are likely to be classified as being at risk for ADHD when entering elementary schools. The results may enable parents and clinicians to identify children with ADHD early by observing early signs and thus provide interventions as early as possible.

Comparison of covariance thresholding methods in gene set analysis

  • Park, Sora;Kim, Kipoong;Sun, Hokeun
    • Communications for Statistical Applications and Methods
    • /
    • v.29 no.5
    • /
    • pp.591-601
    • /
    • 2022
  • In gene set analysis with microarray expression data, a group of genes such as a gene regulatory pathway and a signaling pathway is often tested if there exists either differentially expressed (DE) or differentially co-expressed (DC) genes between two biological conditions. Recently, a statistical test based on covariance estimation have been proposed in order to identify DC genes. In particular, covariance regularization by hard thresholding indeed improved the power of the test when the proportion of DC genes within a biological pathway is relatively small. In this article, we compare covariance thresholding methods using four different regularization penalties such as lasso, hard, smoothly clipped absolute deviation (SCAD), and minimax concave plus (MCP) penalties. In our extensive simulation studies, we found that both SCAD and MCP thresholding methods can outperform the hard thresholding method when the proportion of DC genes is extremely small and the number of genes in a biological pathway is much greater than a sample size. We also applied four thresholding methods to 3 different microarray gene expression data sets related with mutant p53 transcriptional activity, and epithelium and stroma breast cancer to compare genetic pathways identified by each method.

The Prediction Ability of Genomic Selection in the Wheat Core Collection

  • Yuna Kang;Changsoo Kim
    • Proceedings of the Korean Society of Crop Science Conference
    • /
    • 2022.10a
    • /
    • pp.235-235
    • /
    • 2022
  • Genome selection is a promising tool for plant and animal breeding, which uses genome-wide molecular marker data to capture large and small effect quantitative trait loci and predict the genetic value of selection candidates. Genomic selection has been shown previously to have higher prediction accuracies than conventional marker-assisted selection (MAS) for quantitative traits. In this study, the prediction accuracy of 10 agricultural traits in the wheat core group with 567 points was compared. We used a cross-validation approach to train and validate prediction accuracy to evaluate the effects of training population size and training model.As for the prediction accuracy according to the model, the prediction accuracy of 0.4 or more was evaluated except for the SVN model among the 6 models (GBLUP, LASSO, BayseA, RKHS, SVN, RF) used in most all traits. For traits such as days to heading and days to maturity, the prediction accuracy was very high, over 0.8. As for the prediction accuracy according to the training group, the prediction accuracy increased as the number of training groups increased in all traits. It was confirmed that the prediction accuracy was different in the training population according to the genetic composition regardless of the number. All training models were verified through 5-fold cross-validation. To verify the prediction ability of the training population of the wheat core collection, we compared the actual phenotype and genomic estimated breeding value using 35 breeding population. In fact, out of 10 individuals with the fastest days to heading, 5 individuals were selected through genomic selection, and 6 individuals were selected through genomic selection out of the 10 individuals with the slowest days to heading. Therefore, we confirmed the possibility of selecting individuals according to traits with only the genotype for a shorter period of time through genomic selection.

  • PDF

Hierarchically penalized sparse principal component analysis (계층적 벌점함수를 이용한 주성분분석)

  • Kang, Jongkyeong;Park, Jaeshin;Bang, Sungwan
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.1
    • /
    • pp.135-145
    • /
    • 2017
  • Principal component analysis (PCA) describes the variation of multivariate data in terms of a set of uncorrelated variables. Since each principal component is a linear combination of all variables and the loadings are typically non-zero, it is difficult to interpret the derived principal components. Sparse principal component analysis (SPCA) is a specialized technique using the elastic net penalty function to produce sparse loadings in principal component analysis. When data are structured by groups of variables, it is desirable to select variables in a grouped manner. In this paper, we propose a new PCA method to improve variable selection performance when variables are grouped, which not only selects important groups but also removes unimportant variables within identified groups. To incorporate group information into model fitting, we consider a hierarchical lasso penalty instead of the elastic net penalty in SPCA. Real data analyses demonstrate the performance and usefulness of the proposed method.