• Title/Summary/Keyword: lasso

Search Result 173, Processing Time 0.024 seconds

Controlling the false discovery rate in sparse VHAR models using knockoffs (KNOCKOFF를 이용한 성근 VHAR 모형의 FDR 제어)

  • Minsu, Park;Jaewon, Lee;Changryong, Baek
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.6
    • /
    • pp.685-701
    • /
    • 2022
  • FDR is widely used in high-dimensional data inference since it provides more liberal criterion contrary to FWER which is known to be very conservative by controlling Type-1 errors. This paper proposes a sparse VHAR model estimation method controlling FDR by adapting the knockoff introduced by Barber and Candès (2015). We also compare knockoff with conventional method using adaptive Lasso (AL) through extensive simulation study. We observe that AL shows sparsistency and decent forecasting performance, however, AL is not satisfactory in controlling FDR. To be more specific, AL tends to estimate zero coefficients as non-zero coefficients. On the other hand, knockoff controls FDR sufficiently well under desired level, but it finds too sparse model when the sample size is small. However, the knockoff is dramatically improved as sample size increases and the model is getting sparser.

A Study of the Application of Machine Learning Methods in the Low-GloSea6 Weather Prediction Solution (Low-GloSea6 기상 예측 소프트웨어의 머신러닝 기법 적용 연구)

  • Hye-Sung Park;Ye-Rin, Cho;Dae-Yeong Shin;Eun-Ok Yun;Sung-Wook Chung
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.16 no.5
    • /
    • pp.307-314
    • /
    • 2023
  • As supercomputing and hardware technology advances, climate prediction models are improving. The Korean Meteorological Administration adopted GloSea5 from the UK Met Office and now operates an updated GloSea6 tailored to Korean weather. Universities and research institutions use Low-GloSea6 on smaller servers, improving accessibility and research efficiency. In this paper, profiling Low-GloSea6 on smaller servers identified the tri_sor_dp_dp subroutine in the tri_sor.F90 atmospheric model as a CPU-intensive hotspot. Applying linear regression, a type of machine learning, to this function showed promise. After removing outliers, the linear regression model achieved an RMSE of 2.7665e-08 and an MAE of 1.4958e-08, outperforming Lasso and ElasticNet regression methods. This suggests the potential for machine learning in optimizing identified hotspots during Low-GloSea6 execution.

Study on Failure Classification of Missile Seekers Using Inspection Data from Production and Manufacturing Phases (생산 및 제조 단계의 검사 데이터를 이용한 유도탄 탐색기의 고장 분류 연구)

  • Ye-Eun Jeong;Kihyun Kim;Seong-Mok Kim;Youn-Ho Lee;Ji-Won Kim;Hwa-Young Yong;Jae-Woo Jung;Jung-Won Park;Yong Soo Kim
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.47 no.2
    • /
    • pp.30-39
    • /
    • 2024
  • This study introduces a novel approach for identifying potential failure risks in missile manufacturing by leveraging Quality Inspection Management (QIM) data to address the challenges presented by a dataset comprising 666 variables and data imbalances. The utilization of the SMOTE for data augmentation and Lasso Regression for dimensionality reduction, followed by the application of a Random Forest model, results in a 99.40% accuracy rate in classifying missiles with a high likelihood of failure. Such measures enable the preemptive identification of missiles at a heightened risk of failure, thereby mitigating the risk of field failures and enhancing missile life. The integration of Lasso Regression and Random Forest is employed to pinpoint critical variables and test items that significantly impact failure, with a particular emphasis on variables related to performance and connection resistance. Moreover, the research highlights the potential for broadening the scope of data-driven decision-making within quality control systems, including the refinement of maintenance strategies and the adjustment of control limits for essential test items.

Comparison of radiomics prediction models for lung metastases according to four semiautomatic segmentation methods in soft-tissue sarcomas of the extremities

  • Heesoon Sheen;Han-Back Shin;Jung Young Kim
    • Journal of the Korean Physical Society
    • /
    • v.80
    • /
    • pp.247-256
    • /
    • 2022
  • Our objective was to investigate radiomics signatures and prediction models defined by four segmentation methods in using 2-[18F]fluoro-2-deoxy-d-glucose positron emission tomography (18F-FDG PET) imaging of lung metastases of soft-tissue sarcomas (STSs). For this purpose, three fixed threshold methods using the standardized uptake value (SUV) and gradient-based edge detection (ED) were used for tumor delineation on the PET images of STSs. The Dice coefficients (DCs) of the segmentation methods were compared. The least absolute shrinkage and selection operator (LASSO) regression and Spearman's rank, and Friedman's ANOVA test were used for selection and validation of radiomics features. The developed radiomics models were assessed using ROC (receiver operating characteristics) curve and confusion matrices. According to the results, the DC values showed the biggest difference between SUV40% and other segmentation methods (DC: 0.55 and 0.59). Grey-level run-length matrix_run-length nonuniformity (GLRLM_RLNU) was a common radiomics signature extracted by all segmentation methods. The multivariable logistic regression of ED showed the highest area under the ROC (receiver operating characteristic) curve (AUC), sensitivity, specificity, and accuracy (AUC: 0.88, sensitivity: 0.85, specificity: 0.74, accuracy: 0.81). In our research, the ED method was able to derive a significant model of radiomics. GLRLM_RLNU which was selected from all segmented methods as a meaningful feature was considered the obvious radiomics feature associated with the heterogeneity and the aggressiveness. Our results have apparently showed that radiomics signatures have the potential to uncover tumor characteristics.

The Doubly Regularized Quantile Regression

  • Choi, Ho-Sik;Kim, Yong-Dai
    • Communications for Statistical Applications and Methods
    • /
    • v.15 no.5
    • /
    • pp.753-764
    • /
    • 2008
  • The $L_1$ regularized estimator in quantile problems conduct parameter estimation and model selection simultaneously and have been shown to enjoy nice performance. However, $L_1$ regularized estimator has a drawback: when there are several highly correlated variables, it tends to pick only a few of them. To make up for it, the proposed method adopts doubly regularized framework with the mixture of $L_1$ and $L_2$ norms. As a result, the proposed method can select significant variables and encourage the highly correlated variables to be selected together. One of the most appealing features of the new algorithm is to construct the entire solution path of doubly regularized quantile estimator. From simulations and real data analysis, we investigate its performance.

Prediction of Quantitative Traits Using Common Genetic Variants: Application to Body Mass Index

  • Bae, Sunghwan;Choi, Sungkyoung;Kim, Sung Min;Park, Taesung
    • Genomics & Informatics
    • /
    • v.14 no.4
    • /
    • pp.149-159
    • /
    • 2016
  • With the success of the genome-wide association studies (GWASs), many candidate loci for complex human diseases have been reported in the GWAS catalog. Recently, many disease prediction models based on penalized regression or statistical learning methods were proposed using candidate causal variants from significant single-nucleotide polymorphisms of GWASs. However, there have been only a few systematic studies comparing existing methods. In this study, we first constructed risk prediction models, such as stepwise linear regression (SLR), least absolute shrinkage and selection operator (LASSO), and Elastic-Net (EN), using a GWAS chip and GWAS catalog. We then compared the prediction accuracy by calculating the mean square error (MSE) value on data from the Korea Association Resource (KARE) with body mass index. Our results show that SLR provides a smaller MSE value than the other methods, while the numbers of selected variables in each model were similar.

Relative Error Prediction via Penalized Regression (벌점회귀를 통한 상대오차 예측방법)

  • Jeong, Seok-Oh;Lee, Seo-Eun;Shin, Key-Il
    • The Korean Journal of Applied Statistics
    • /
    • v.28 no.6
    • /
    • pp.1103-1111
    • /
    • 2015
  • This paper presents a new prediction method based on relative error incorporated with a penalized regression. The proposed method consists of fully data-driven procedures that is fast, simple, and easy to implement. An example of real data analysis and some simulation results were given to prove that the proposed approach works in practice.

Two-Stage Penalized Composite Quantile Regression with Grouped Variables

  • Bang, Sungwan;Jhun, Myoungshic
    • Communications for Statistical Applications and Methods
    • /
    • v.20 no.4
    • /
    • pp.259-270
    • /
    • 2013
  • This paper considers a penalized composite quantile regression (CQR) that performs a variable selection in the linear model with grouped variables. An adaptive sup-norm penalized CQR (ASCQR) is proposed to select variables in a grouped manner; in addition, the consistency and oracle property of the resulting estimator are also derived under some regularity conditions. To improve the efficiency of estimation and variable selection, this paper suggests the two-stage penalized CQR (TSCQR), which uses the ASCQR to select relevant groups in the first stage and the adaptive lasso penalized CQR to select important variables in the second stage. Simulation studies are conducted to illustrate the finite sample performance of the proposed methods.

A note on standardization in penalized regressions

  • Lee, Sangin
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.2
    • /
    • pp.505-516
    • /
    • 2015
  • We consider sparse high-dimensional linear regression models. Penalized regressions have been used as effective methods for variable selection and estimation in high-dimensional models. In penalized regressions, it is common practice to standardize variables before fitting a penalized model and then fit a penalized model with standardized variables. Finally, the estimated coefficients from a penalized model are recovered to the scale on original variables. However, these procedures produce a slightly different solution compared to the corresponding original penalized problem. In this paper, we investigate issues on the standardization of variables in penalized regressions and formulate the definition of the standardized penalized estimator. In addition, we compare the original penalized estimator with the standardized penalized estimator through simulation studies and real data analysis.

A study on the Characteristic of Piezoelectric Transformer for the Fluorescent Lamp ballast (형광등을 점타용 압전트랜스포머의 특성에 관한 연구)

  • 이용우;윤광희;류주현;서성제
    • Proceedings of the Korean Institute of Electrical and Electronic Material Engineers Conference
    • /
    • 1999.05a
    • /
    • pp.621-625
    • /
    • 1999
  • Rosen type piezoelectric transformer for LCD backlight operated at high voltage and low current, may not be sucessfully used for illuminating general fluorescent lamps because low voltage and high current are required. In this study, the piezoelectric transformer with width vibration mode operated at low voyage and high current was designed for the application of fluorescent lasso ballast. The step-up ratio and efficiency as a function of the load resistance in the piezoelectric transformer indicated that the transformer can be effectively used for the electronic ballast for low profile fluorescent lamp.

  • PDF