• Title/Summary/Keyword: sparse regression

Search Result 55, Processing Time 0.03 seconds

Analysis of internet addiction in Korean adolescents using sparse partial least-squares regression (희소 부분 최소 제곱법을 이용한 우리나라 청소년 인터넷 중독 자료 분석)

  • Han, Jeongseop;Park, Soobin;Lee, onghwan
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.2
    • /
    • pp.253-263
    • /
    • 2018
  • Internet addiction in adolescents is an important social issue. In this study, sparse partial least-squares regression (SPLS) was applied to internet addiction data in Korean adolescent samples. The internet addiction score and various clinical and psychopathological features were collected and analyzed from self-reported questionnaires. We considered three PLS methods and compared the performance in terms of prediction and sparsity. We found that the SPLS method with the hierarchical likelihood penalty was the best; in addition, two aggression features, AQ and BSAS, are important to discriminate and explain latent features of the SPLS model.

A note on standardization in penalized regressions

  • Lee, Sangin
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.2
    • /
    • pp.505-516
    • /
    • 2015
  • We consider sparse high-dimensional linear regression models. Penalized regressions have been used as effective methods for variable selection and estimation in high-dimensional models. In penalized regressions, it is common practice to standardize variables before fitting a penalized model and then fit a penalized model with standardized variables. Finally, the estimated coefficients from a penalized model are recovered to the scale on original variables. However, these procedures produce a slightly different solution compared to the corresponding original penalized problem. In this paper, we investigate issues on the standardization of variables in penalized regressions and formulate the definition of the standardized penalized estimator. In addition, we compare the original penalized estimator with the standardized penalized estimator through simulation studies and real data analysis.

Why Gabor Frames? Two Fundamental Measures of Coherence and Their Role in Model Selection

  • Bajwa, Waheed U.;Calderbank, Robert;Jafarpour, Sina
    • Journal of Communications and Networks
    • /
    • v.12 no.4
    • /
    • pp.289-307
    • /
    • 2010
  • The problem of model selection arises in a number of contexts, such as subset selection in linear regression, estimation of structures in graphical models, and signal denoising. This paper studies non-asymptotic model selection for the general case of arbitrary (random or deterministic) design matrices and arbitrary nonzero entries of the signal. In this regard, it generalizes the notion of incoherence in the existing literature on model selection and introduces two fundamental measures of coherence-termed as the worst-case coherence and the average coherence-among the columns of a design matrix. It utilizes these two measures of coherence to provide an in-depth analysis of a simple, model-order agnostic one-step thresholding (OST) algorithm for model selection and proves that OST is feasible for exact as well as partial model selection as long as the design matrix obeys an easily verifiable property, which is termed as the coherence property. One of the key insights offered by the ensuing analysis in this regard is that OST can successfully carry out model selection even when methods based on convex optimization such as the lasso fail due to the rank deficiency of the submatrices of the design matrix. In addition, the paper establishes that if the design matrix has reasonably small worst-case and average coherence then OST performs near-optimally when either (i) the energy of any nonzero entry of the signal is close to the average signal energy per nonzero entry or (ii) the signal-to-noise ratio in the measurement system is not too high. Finally, two other key contributions of the paper are that (i) it provides bounds on the average coherence of Gaussian matrices and Gabor frames, and (ii) it extends the results on model selection using OST to low-complexity, model-order agnostic recovery of sparse signals with arbitrary nonzero entries. In particular, this part of the analysis in the paper implies that an Alltop Gabor frame together with OST can successfully carry out model selection and recovery of sparse signals irrespective of the phases of the nonzero entries even if the number of nonzero entries scales almost linearly with the number of rows of the Alltop Gabor frame.

A Robust Method for Partially Occluded Face Recognition

  • Xu, Wenkai;Lee, Suk-Hwan;Lee, Eung-Joo
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.9 no.7
    • /
    • pp.2667-2682
    • /
    • 2015
  • Due to the wide application of face recognition (FR) in information security, surveillance, access control and others, it has received significantly increased attention from both the academic and industrial communities during the past several decades. However, partial face occlusion is one of the most challenging problems in face recognition issue. In this paper, a novel method based on linear regression-based classification (LRC) algorithm is proposed to address this problem. After all images are downsampled and divided into several blocks, we exploit the evaluator of each block to determine the clear blocks of the test face image by using linear regression technique. Then, the remained uncontaminated blocks are utilized to partial occluded face recognition issue. Furthermore, an improved Distance-based Evidence Fusion approach is proposed to decide in favor of the class with average value of corresponding minimum distance. Since this occlusion removing process uses a simple linear regression approach, the completely computational cost approximately equals to LRC and much lower than sparse representation-based classification (SRC) and extended-SRC (eSRC). Based on the experimental results on both AR face database and extended Yale B face database, it demonstrates the effectiveness of the proposed method on issue of partial occluded face recognition and the performance is satisfactory. Through the comparison with the conventional methods (eigenface+NN, fisherfaces+NN) and the state-of-the-art methods (LRC, SRC and eSRC), the proposed method shows better performance and robustness.

Spatio-temporal Load Forecasting Considering Aggregation Features of Electricity Cells and Uncertainties in Input Variables

  • Zhao, Teng;Zhang, Yan;Chen, Haibo
    • Journal of Electrical Engineering and Technology
    • /
    • v.13 no.1
    • /
    • pp.38-50
    • /
    • 2018
  • Spatio-temporal load forecasting (STLF) is a foundation for building the prediction-based power map, which could be a useful tool for the visualization and tendency assessment of urban energy application. Constructing one point-forecasting model for each electricity cell in the geographic space is possible; however, it is unadvisable and insufficient, considering the aggregation features of electricity cells and uncertainties in input variables. This paper presents a new STLF method, with a data-driven framework consisting of 3 subroutines: multi-level clustering of cells considering their aggregation features, load regression for each category of cells based on SLS-SVRNs (sparse least squares support vector regression networks), and interval forecasting of spatio-temporal load with sampled blind number. Take some area in Pudong, Shanghai as the region of study. Results of multi-level clustering show that electricity cells in the same category are clustered in geographic space to some extent, which reveals the spatial aggregation feature of cells. For cellular load regression, a comparison has been made with 3 other forecasting methods, indicating the higher accuracy of the proposed method in point-forecasting of spatio-temporal load. Furthermore, results of interval load forecasting demonstrate that the proposed prediction-interval construction method can effectively convey the uncertainties in input variables.

Prediction of compressive strength of GGBS based concrete using RVM

  • Prasanna, P.K.;Ramachandra Murthy, A.;Srinivasu, K.
    • Structural Engineering and Mechanics
    • /
    • v.68 no.6
    • /
    • pp.691-700
    • /
    • 2018
  • Ground granulated blast furnace slag (GGBS) is a by product obtained from iron and steel industries, useful in the design and development of high quality cement paste/mortar and concrete. This paper investigates the applicability of relevance vector machine (RVM) based regression model to predict the compressive strength of various GGBS based concrete mixes. Compressive strength data for various GGBS based concrete mixes has been obtained by considering the effect of water binder ratio and steel fibres. RVM is a machine learning technique which employs Bayesian inference to obtain parsimonious solutions for regression and classification. The RVM is an extension of support vector machine which couples probabilistic classification and regression. RVM is established based on a Bayesian formulation of a linear model with an appropriate prior that results in a sparse representation. Compressive strength model has been developed by using MATLAB software for training and prediction. About 70% of the data has been used for development of RVM model and 30% of the data is used for validation. The predicted compressive strength for GGBS based concrete mixes is found to be in very good agreement with those of the corresponding experimental observations.

Group Contribution Method and Support Vector Regression based Model for Predicting Physical Properties of Aromatic Compounds (Group Contribution Method 및 Support Vector Regression 기반 모델을 이용한 방향족 화합물 물성치 예측에 관한 연구)

  • Kang, Ha Yeong;Oh, Chang Bo;Won, Yong Sun;Liu, J. Jay;Lee, Chang Jun
    • Journal of the Korean Society of Safety
    • /
    • v.36 no.1
    • /
    • pp.1-8
    • /
    • 2021
  • To simulate a process model in the field of chemical engineering, it is very important to identify the physical properties of novel materials as well as existing materials. However, it is difficult to measure the physical properties throughout a set of experiments due to the potential risk and cost. To address this, this study aims to develop a property prediction model based on the group contribution method for aromatic chemical compounds including benzene rings. The benzene rings of aromatic materials have a significant impact on their physical properties. To establish the prediction model, 42 important functional groups that determine the physical properties are considered, and the total numbers of functional groups on 147 aromatic chemical compounds are counted to prepare a dataset. Support vector regression is employed to prepare a prediction model to handle sparse and high-dimensional data. To verify the efficacy of this study, the results of this study are compared with those of previous studies. Despite the different datasets in the previous studies, the comparison indicated the enhanced performance in this study. Moreover, there are few reports on predicting the physical properties of aromatic compounds. This study can provide an effective method to estimate the physical properties of unknown chemical compounds and contribute toward reducing the experimental efforts for measuring physical properties.

A small review and further studies on the LASSO

  • Kwon, Sunghoon;Han, Sangmi;Lee, Sangin
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.5
    • /
    • pp.1077-1088
    • /
    • 2013
  • High-dimensional data analysis arises from almost all scientific areas, evolving with development of computing skills, and has encouraged penalized estimations that play important roles in statistical learning. For the past years, various penalized estimations have been developed, and the least absolute shrinkage and selection operator (LASSO) proposed by Tibshirani (1996) has shown outstanding ability, earning the first place on the development of penalized estimation. In this paper, we first introduce a number of recent advances in high-dimensional data analysis using the LASSO. The topics include various statistical problems such as variable selection and grouped or structured variable selection under sparse high-dimensional linear regression models. Several unsupervised learning methods including inverse covariance matrix estimation are presented. In addition, we address further studies on new applications which may establish a guideline on how to use the LASSO for statistical challenges of high-dimensional data analysis.

Two Dimensional Slow Feature Discriminant Analysis via L2,1 Norm Minimization for Feature Extraction

  • Gu, Xingjian;Shu, Xiangbo;Ren, Shougang;Xu, Huanliang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.7
    • /
    • pp.3194-3216
    • /
    • 2018
  • Slow Feature Discriminant Analysis (SFDA) is a supervised feature extraction method inspired by biological mechanism. In this paper, a novel method called Two Dimensional Slow Feature Discriminant Analysis via $L_{2,1}$ norm minimization ($2DSFDA-L_{2,1}$) is proposed. $2DSFDA-L_{2,1}$ integrates $L_{2,1}$ norm regularization and 2D statically uncorrelated constraint to extract discriminant feature. First, $L_{2,1}$ norm regularization can promote the projection matrix row-sparsity, which makes the feature selection and subspace learning simultaneously. Second, uncorrelated features of minimum redundancy are effective for classification. We define 2D statistically uncorrelated model that each row (or column) are independent. Third, we provide a feasible solution by transforming the proposed $L_{2,1}$ nonlinear model into a linear regression type. Additionally, $2DSFDA-L_{2,1}$ is extended to a bilateral projection version called $BSFDA-L_{2,1}$. The advantage of $BSFDA-L_{2,1}$ is that an image can be represented with much less coefficients. Experimental results on three face databases demonstrate that the proposed $2DSFDA-L_{2,1}/BSFDA-L_{2,1}$ can obtain competitive performance.

Relevance vector based approach for the prediction of stress intensity factor for the pipe with circumferential crack under cyclic loading

  • Ramachandra Murthy, A.;Vishnuvardhan, S.;Saravanan, M.;Gandhic, P.
    • Structural Engineering and Mechanics
    • /
    • v.72 no.1
    • /
    • pp.31-41
    • /
    • 2019
  • Structural integrity assessment of piping components is of paramount important for remaining life prediction, residual strength evaluation and for in-service inspection planning. For accurate prediction of these, a reliable fracture parameter is essential. One of the fracture parameters is stress intensity factor (SIF), which is generally preferred for high strength materials, can be evaluated by using linear elastic fracture mechanics principles. To employ available analytical and numerical procedures for fracture analysis of piping components, it takes considerable amount of time and effort. In view of this, an alternative approach to analytical and finite element analysis, a model based on relevance vector machine (RVM) is developed to predict SIF of part through crack of a piping component under fatigue loading. RVM is based on probabilistic approach and regression and it is established based on Bayesian formulation of a linear model with an appropriate prior that results in a sparse representation. Model for SIF prediction is developed by using MATLAB software wherein 70% of the data has been used for the development of RVM model and rest of the data is used for validation. The predicted SIF is found to be in good agreement with the corresponding analytical solution, and can be used for damage tolerant analysis of structural components.