• Title/Summary/Keyword: Lasso

Search Result 169, Processing Time 0.024 seconds

Penalized maximum likelihood estimation with symmetric log-concave errors and LASSO penalty

  • Seo-Young, Park;Sunyul, Kim;Byungtae, Seo
    • Communications for Statistical Applications and Methods
    • /
    • v.29 no.6
    • /
    • pp.641-653
    • /
    • 2022
  • Penalized least squares methods are important tools to simultaneously select variables and estimate parameters in linear regression. The penalized maximum likelihood can also be used for the same purpose assuming that the error distribution falls in a certain parametric family of distributions. However, the use of a certain parametric family can suffer a misspecification problem which undermines the estimation accuracy. To give sufficient flexibility to the error distribution, we propose to use the symmetric log-concave error distribution with LASSO penalty. A feasible algorithm to estimate both nonparametric and parametric components in the proposed model is provided. Some numerical studies are also presented showing that the proposed method produces more efficient estimators than some existing methods with similar variable selection performance.

Machine Learning Prediction of Economic Effects of Busan's Strategic Industry through Ridge Regression and Lasso Regression (릿지 회귀와 라쏘 회귀 모형에 의한 부산 전략산업의 지역경제 효과에 대한 머신러닝 예측)

  • Yi, Chae-Deug
    • Journal of Korea Port Economic Association
    • /
    • v.37 no.1
    • /
    • pp.197-215
    • /
    • 2021
  • This paper analyzes the machine learning predictions of the economic effects of Busan's strategic industries on the employment and income using the Ridge Regression and Lasso Regression models with regulation terms. According to the Ridge estimation and Lasso estimation models of employment, the intelligence information service industry such as the service platform, contents, and smart finance industries and the global tourism industry such as MICE and specialized tourism are predicted to influence on the employment in order. However, the Ridge and Lasso regression model show that the future transportation machine industry does not significantly increase the employment and income since it is the primitive investment industry. The Ridge estimation models of the income show that the intelligence information service industry and global tourism industry are also predicted to influence on the income in order. According to the Lasso estimation models of income, four strategic industries such as the life care, smart maritime, the intelligence machine, and clean tech industry do not influence the income. Furthermore, the future transportation machine industry may influence the income negatively since it is the primitive investment industry. Thus, we have to select the appropriate economic objectives and priorities of industrial policies.

Analysis of multi-center bladder cancer survival data using variable-selection method of multi-level frailty models (다수준 프레일티모형 변수선택법을 이용한 다기관 방광암 생존자료분석)

  • Kim, Bohyeon;Ha, Il Do;Lee, Donghwan
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.2
    • /
    • pp.499-510
    • /
    • 2016
  • It is very important to select relevant variables in regression models for survival analysis. In this paper, we introduce a penalized variable-selection procedure in multi-level frailty models based on the "frailtyHL" R package (Ha et al., 2012). Here, the estimation procedure of models is based on the penalized hierarchical likelihood, and three penalty functions (LASSO, SCAD and HL) are considered. The proposed methods are illustrated with multi-country/multi-center bladder cancer survival data from the EORTC in Belgium. We compare the results of three variable-selection methods and discuss their advantages and disadvantages. In particular, the results of data analysis showed that the SCAD and HL methods select well important variables than in the LASSO method.

Bayesian analysis of latent factor regression model (내재된 인자회귀모형의 베이지안 분석법)

  • Kyung, Minjung
    • The Korean Journal of Applied Statistics
    • /
    • v.33 no.4
    • /
    • pp.365-377
    • /
    • 2020
  • We discuss latent factor regression when constructing a common structure inherent among explanatory variables to solve multicollinearity and use them as regressors to construct a linear model of a response variable. Bayesian estimation with LASSO prior of a large penalty parameter to construct a significant factor loading matrix of intrinsic interests among infinite latent structures. The estimated factor loading matrix with estimated other parameters can be inversely transformed into linear parameters of each explanatory variable and used as prediction models for new observations. We apply the proposed method to Product Service Management data of HBAT and observe that the proposed method constructs the same factors of general common factor analysis for the fixed number of factors. The calculated MSE of predicted values of Bayesian latent factor regression model is also smaller than the common factor regression model.

Backlight Compensation by Using a Novel Region of Interest Extraction Method (새로운 관심영역 추출 방법을 이용한 역광보정)

  • Seong, Joon Mo;Lee, Seong Shin;Lee, Songwook
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.6 no.6
    • /
    • pp.321-328
    • /
    • 2017
  • We have implemented a technique to correct the brightness, saturation, and contrast of an image according to the degree of light, and further compensate the backlight. Backlight compensation can be done automatically or manually. For manual backlight compensation, we have to select the region of interest (ROI). ROI can be selected by connecting the outline of the desired object. We make users select the region delicately with the new magnetic lasso tool. The previous lasso tool has a disadvantage that the start point and the end point must be connected. However, the proposed lasso tool has the advantage of selecting the region of interest without connecting the start point and the end point. We can automatically obtain various results of backlight compensation by adjusting the number of k-means clusters for texture extraction and the threshold value for binarization.

Tracing the breeding farm of domesticated pig using feature selection (Sus scrofa)

  • Kwon, Taehyung;Yoon, Joon;Heo, Jaeyoung;Lee, Wonseok;Kim, Heebal
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.30 no.11
    • /
    • pp.1540-1549
    • /
    • 2017
  • Objective: Increasing food safety demands in the animal product market have created a need for a system to trace the food distribution process, from the manufacturer to the retailer, and genetic traceability is an effective method to trace the origin of animal products. In this study, we successfully achieved the farm tracing of 6,018 multi-breed pigs, using single nucleotide polymorphism (SNP) markers strictly selected through least absolute shrinkage and selection operator (LASSO) feature selection. Methods: We performed farm tracing of domesticated pig (Sus scrofa) from SNP markers and selected the most relevant features for accurate prediction. Considering multi-breed composition of our data, we performed feature selection using LASSO penalization on 4,002 SNPs that are shared between breeds, which also includes 179 SNPs with small between-breed difference. The 100 highest-scored features were extracted from iterative simulations and then evaluated using machine-leaning based classifiers. Results: We selected 1,341 SNPs from over 45,000 SNPs through iterative LASSO feature selection, to minimize between-breed differences. We subsequently selected 100 highest-scored SNPs from iterative scoring, and observed high statistical measures in classification of breeding farms by cross-validation only using these SNPs. Conclusion: The study represents a successful application of LASSO feature selection on multi-breed pig SNP data to trace the farm information, which provides a valuable method and possibility for further researches on genetic traceability.

Efficient Compression Algorithm with Limited Resource for Continuous Surveillance

  • Yin, Ling;Liu, Chuanren;Lu, Xinjiang;Chen, Jiafeng;Liu, Caixing
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.11
    • /
    • pp.5476-5496
    • /
    • 2016
  • Energy efficiency of resource-constrained wireless sensor networks is critical in applications such as real-time monitoring/surveillance. To improve the energy efficiency and reduce the energy consumption, the time series data can be compressed before transmission. However, most of the compression algorithms for time series data were developed only for single variate scenarios, while in practice there are often multiple sensor nodes in one application and the collected data is actually multivariate time series. In this paper, we propose to compress the time series data by the Lasso (least absolute shrinkage and selection operator) approximation. We show that, our approach can be naturally extended for compressing the multivariate time series data. Our extension is novel since it constructs an optimal projection of the original multivariates where the best energy efficiency can be realized. The two algorithms are named by ULasso (Univariate Lasso) and MLasso (Multivariate Lasso), for which we also provide practical guidance for parameter selection. Finally, empirically evaluation is implemented with several publicly available real-world data sets from different application domains. We quantify the algorithm performance by measuring the approximation error, compression ratio, and computation complexity. The results show that ULasso and MLasso are superior to or at least equivalent to compression performance of LTC and PLAMlis. Particularly, MLasso can significantly reduce the smooth multivariate time series data, without breaking the major trends and important changes of the sensor network system.

Time Delay Estimation Using LASSO (Least Absolute Selection and Shrinkage Operator) (LASSO를 사용한 시간 지연 추정 알고리즘)

  • Lim, Jun-Seok;Pyeon, Yong-Guk;Choi, Seok-Im
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.39B no.10
    • /
    • pp.715-721
    • /
    • 2014
  • In decades, many researchers have studied the time delay estimation (TDE) method for the signals in the two different receivers. The channel estimation based TDE is one of the typical TDE methods. The channel estimation based TDE models the time delay between two receiving signals as an impulse response in a channel between two receivers. In general the impulse response becomes sparse. However, most conventional TDE algorithms cannot have utilized the sparsity. In this paper, we propose a TDE method taking the sparsity into consideration. The performance comparison shows that the proposed algorithm improves the estimation accuracy by 10 dB in the white gaussian source. In addition, even in the colored source, the proposed algorithm doesn't show the estimation threshold effect.

A Study on Domestic Drama Rating Prediction (국내 드라마 시청률 예측 및 영향요인 분석)

  • Kang, Suyeon;Jeon, Heejeong;Kim, Jihye;Song, Jongwoo
    • The Korean Journal of Applied Statistics
    • /
    • v.28 no.5
    • /
    • pp.933-949
    • /
    • 2015
  • Audience rating competition in the domestic drama market has increased recently due to the introduction of commercial broadcasting and diversification of channels. There is now a need for thorough studies and analysis on audience rating. Especially, a drama rating is an important measure to estimate advertisement costs for producers and advertisers. In this paper, we study the drama rating prediction models using various data mining techniques such as linear regression, LASSO regression, random forest, and gradient boosting. The analysis results show that initial drama ratings are affected by structural elements such as broadcasting station and broadcasting time. Average drama ratings are also influenced by earlier public opinion such as the number of internet searches about the drama.

Detection of multiple change points using penalized least square methods: a comparative study between ℓ0 and ℓ1 penalty (벌점-최소제곱법을 이용한 다중 변화점 탐색)

  • Son, Won;Lim, Johan;Yu, Donghyeon
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.6
    • /
    • pp.1147-1154
    • /
    • 2016
  • In this paper, we numerically compare two penalized least square methods, the ${\ell}_0$-penalized method and the fused lasso regression (FLR, ${\ell}_1$ penalization), in finding multiple change points of a signal. We find that the ${\ell}_0$-penalized method performs better than the FLR, which produces many false detections in some cases as the theory tells. In addition, the computation of ${\ell}_0$-penalized method relies on dynamic programming and is as efficient as the FLR.