• Title/Summary/Keyword: Lasso regularization

Search Result 17, Processing Time 0.023 seconds

Time delay estimation algorithm using Elastic Net (Elastic Net를 이용한 시간 지연 추정 알고리즘)

  • Jun-Seok Lim;Keunwa Lee
    • The Journal of the Acoustical Society of Korea
    • /
    • v.42 no.4
    • /
    • pp.364-369
    • /
    • 2023
  • Time-delay estimation between two receivers is a technique that has been applied in a variety of fields, from underwater acoustics to room acoustics and robotics. There are two types of time delay estimation techniques: one that estimates the amount of time delay from the correlation between receivers, and the other that parametrically models the time delay between receivers and estimates the parameters by system recognition. The latter has the characteristic that only a small fraction of the system's parameters are directly related to the delay. This characteristic can be exploited to improve the accuracy of the estimation by methods such as Lasso regularization. However, in the case of Lasso regularization, the necessary information is lost. In this paper, we propose a method using Elastic Net that adds Ridge regularization to Lasso regularization to compensate for this. Comparing the proposed method with the conventional Generalized Cross Correlation (GCC) method and the method using Lasso regularization, we show that the estimation variance is very small even for white Gaussian signal sources and colored signal sources.

An Application of the Clustering Threshold Gradient Descent Regularization Method for Selecting Genes in Predicting the Survival Time of Lung Carcinomas

  • Lee, Seung-Yeoun;Kim, Young-Chul
    • Genomics & Informatics
    • /
    • v.5 no.3
    • /
    • pp.95-101
    • /
    • 2007
  • In this paper, we consider the variable selection methods in the Cox model when a large number of gene expression levels are involved with survival time. Deciding which genes are associated with survival time has been a challenging problem because of the large number of genes and relatively small sample size (n<

Weighted Least Absolute Deviation Lasso Estimator

  • Jung, Kang-Mo
    • Communications for Statistical Applications and Methods
    • /
    • v.18 no.6
    • /
    • pp.733-739
    • /
    • 2011
  • The linear absolute shrinkage and selection operator(Lasso) method improves the low prediction accuracy and poor interpretation of the ordinary least squares(OLS) estimate through the use of $L_1$ regularization on the regression coefficients. However, the Lasso is not robust to outliers, because the Lasso method minimizes the sum of squared residual errors. Even though the least absolute deviation(LAD) estimator is an alternative to the OLS estimate, it is sensitive to leverage points. We propose a robust Lasso estimator that is not sensitive to outliers, heavy-tailed errors or leverage points.

A Study on Regularization Methods to Evaluate the Sediment Trapping Efficiency of Vegetative Filter Strips (식생여과대 유사 저감 효율 산정을 위한 정규화 방안)

  • Bae, JooHyun;Han, Jeongho;Yang, Jae E;Kim, Jonggun;Lim, Kyoung Jae;Jang, Won Seok
    • Journal of The Korean Society of Agricultural Engineers
    • /
    • v.61 no.6
    • /
    • pp.9-19
    • /
    • 2019
  • Vegetative Filter Strip (VFS) is the best management practice which has been widely used to mitigate water pollutants from agricultural fields by alleviating runoff and sediment. This study was conducted to improve an equation for estimating sediment trapping efficiency of VFS using several different regularization methods (i.e., ordinary least squares analysis, LASSO, ridge regression analysis and elastic net). The four different regularization methods were employed to develop the sediment trapping efficiency equation of VFS. Each regularization method indicated high accuracy in estimating the sediment trapping efficiency of VFS. Among the four regularization methods, the ridge method showed the most accurate results according to $R^2$, RMSE and MAPE which were 0.94, 7.31% and 14.63%, respectively. The equation developed in this study can be applied in watershed-scale hydrological models in order to estimate the sediment trapping efficiency of VFS in agricultural fields for an effective watershed management in Korea.

How to improve oil consumption forecast using google trends from online big data?: the structured regularization methods for large vector autoregressive model

  • Choi, Ji-Eun;Shin, Dong Wan
    • Communications for Statistical Applications and Methods
    • /
    • v.29 no.1
    • /
    • pp.41-51
    • /
    • 2022
  • We forecast the US oil consumption level taking advantage of google trends. The google trends are the search volumes of the specific search terms that people search on google. We focus on whether proper selection of google trend terms leads to an improvement in forecast performance for oil consumption. As the forecast models, we consider the least absolute shrinkage and selection operator (LASSO) regression and the structured regularization method for large vector autoregressive (VAR-L) model of Nicholson et al. (2017), which select automatically the google trend terms and the lags of the predictors. An out-of-sample forecast comparison reveals that reducing the high dimensional google trend data set to a low-dimensional data set by the LASSO and the VAR-L models produces better forecast performance for oil consumption compared to the frequently-used forecast models such as the autoregressive model, the autoregressive distributed lag model and the vector error correction model.

Adaptive lasso in sparse vector autoregressive models (Adaptive lasso를 이용한 희박벡터자기회귀모형에서의 변수 선택)

  • Lee, Sl Gi;Baek, Changryong
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.1
    • /
    • pp.27-39
    • /
    • 2016
  • This paper considers variable selection in the sparse vector autoregressive (sVAR) model where sparsity comes from setting small coefficients to exact zeros. In the estimation perspective, Davis et al. (2015) showed that the lasso type of regularization method is successful because it provides a simultaneous variable selection and parameter estimation even for time series data. However, their simulations study reports that the regular lasso overestimates the number of non-zero coefficients, hence its finite sample performance needs improvements. In this article, we show that the adaptive lasso significantly improves the performance where the adaptive lasso finds the sparsity patterns superior to the regular lasso. Some tuning parameter selections in the adaptive lasso are also discussed from the simulations study.

Effect of outliers on the variable selection by the regularized regression

  • Jeong, Junho;Kim, Choongrak
    • Communications for Statistical Applications and Methods
    • /
    • v.25 no.2
    • /
    • pp.235-243
    • /
    • 2018
  • Many studies exist on the influence of one or few observations on estimators in a variety of statistical models under the "large n, small p" setup; however, diagnostic issues in the regression models have been rarely studied in a high dimensional setup. In the high dimensional data, the influence of observations is more serious because the sample size n is significantly less than the number variables p. Here, we investigate the influence of observations on the least absolute shrinkage and selection operator (LASSO) estimates, suggested by Tibshirani (Journal of the Royal Statistical Society, Series B, 73, 273-282, 1996), and the influence of observations on selected variables by the LASSO in the high dimensional setup. We also derived an analytic expression for the influence of the k observation on LASSO estimates in simple linear regression. Numerical studies based on artificial data and real data are done for illustration. Numerical results showed that the influence of observations on the LASSO estimates and the selected variables by the LASSO in the high dimensional setup is more severe than that in the usual "large n, small p" setup.

Multiple Group Testing Procedures for Analysis of High-Dimensional Genomic Data

  • Ko, Hyoseok;Kim, Kipoong;Sun, Hokeun
    • Genomics & Informatics
    • /
    • v.14 no.4
    • /
    • pp.187-195
    • /
    • 2016
  • In genetic association studies with high-dimensional genomic data, multiple group testing procedures are often required in order to identify disease/trait-related genes or genetic regions, where multiple genetic sites or variants are located within the same gene or genetic region. However, statistical testing procedures based on an individual test suffer from multiple testing issues such as the control of family-wise error rate and dependent tests. Moreover, detecting only a few of genes associated with a phenotype outcome among tens of thousands of genes is of main interest in genetic association studies. In this reason regularization procedures, where a phenotype outcome regresses on all genomic markers and then regression coefficients are estimated based on a penalized likelihood, have been considered as a good alternative approach to analysis of high-dimensional genomic data. But, selection performance of regularization procedures has been rarely compared with that of statistical group testing procedures. In this article, we performed extensive simulation studies where commonly used group testing procedures such as principal component analysis, Hotelling's $T^2$ test, and permutation test are compared with group lasso (least absolute selection and shrinkage operator) in terms of true positive selection. Also, we applied all methods considered in simulation studies to identify genes associated with ovarian cancer from over 20,000 genetic sites generated from Illumina Infinium HumanMethylation27K Beadchip. We found a big discrepancy of selected genes between multiple group testing procedures and group lasso.

Improvement of inspection system for common crossings by track side monitoring and prognostics

  • Sysyn, Mykola;Nabochenko, Olga;Kovalchuk, Vitalii;Gruen, Dimitri;Pentsak, Andriy
    • Structural Monitoring and Maintenance
    • /
    • v.6 no.3
    • /
    • pp.219-235
    • /
    • 2019
  • Scheduled inspections of common crossings are one of the main cost drivers of railway maintenance. Prognostics and health management (PHM) approach and modern monitoring means offer many possibilities in the optimization of inspections and maintenance. The present paper deals with data driven prognosis of the common crossing remaining useful life (RUL) that is based on an inertial monitoring system. The problem of scheduled inspections system for common crossings is outlined and analysed. The proposed analysis of inertial signals with the maximal overlap discrete wavelet packet transform (MODWPT) and Shannon entropy (SE) estimates enable to extract the spectral features. The relevant features for the acceleration components are selected with application of Lasso (Least absolute shrinkage and selection operator) regularization. The features are fused with time domain information about the longitudinal position of wheels impact and train velocities by multivariate regression. The fused structural health (SH) indicator has a significant correlation to the lifetime of crossing. The RUL prognosis is performed on the linear degradation stochastic model with recursive Bayesian update. Prognosis testing metrics show the promising results for common crossing inspection scheduling improvement.

Network-based regularization for analysis of high-dimensional genomic data with group structure (그룹 구조를 갖는 고차원 유전체 자료 분석을 위한 네트워크 기반의 규제화 방법)

  • Kim, Kipoong;Choi, Jiyun;Sun, Hokeun
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.6
    • /
    • pp.1117-1128
    • /
    • 2016
  • In genetic association studies with high-dimensional genomic data, regularization procedures based on penalized likelihood are often applied to identify genes or genetic regions associated with diseases or traits. A network-based regularization procedure can utilize biological network information (such as genetic pathways and signaling pathways in genetic association studies) with an outstanding selection performance over other regularization procedures such as lasso and elastic-net. However, network-based regularization has a limitation because cannot be applied to high-dimension genomic data with a group structure. In this article, we propose to combine data dimension reduction techniques such as principal component analysis and a partial least square into network-based regularization for the analysis of high-dimensional genomic data with a group structure. The selection performance of the proposed method was evaluated by extensive simulation studies. The proposed method was also applied to real DNA methylation data generated from Illumina Innium HumanMethylation27K BeadChip, where methylation beta values of around 20,000 CpG sites over 12,770 genes were compared between 123 ovarian cancer patients and 152 healthy controls. This analysis was also able to indicate a few cancer-related genes.