• Title/Summary/Keyword: least absolute shrinkage and selection operator(LASSO)

Search Result 36, Processing Time 0.041 seconds

Risk Prediction Using Genome-Wide Association Studies on Type 2 Diabetes

  • Choi, Sungkyoung;Bae, Sunghwan;Park, Taesung
    • Genomics & Informatics
    • /
    • v.14 no.4
    • /
    • pp.138-148
    • /
    • 2016
  • The success of genome-wide association studies (GWASs) has enabled us to improve risk assessment and provide novel genetic variants for diagnosis, prevention, and treatment. However, most variants discovered by GWASs have been reported to have very small effect sizes on complex human diseases, which has been a big hurdle in building risk prediction models. Recently, many statistical approaches based on penalized regression have been developed to solve the "large p and small n" problem. In this report, we evaluated the performance of several statistical methods for predicting a binary trait: stepwise logistic regression (SLR), least absolute shrinkage and selection operator (LASSO), and Elastic-Net (EN). We first built a prediction model by combining variable selection and prediction methods for type 2 diabetes using Affymetrix Genome-Wide Human SNP Array 5.0 from the Korean Association Resource project. We assessed the risk prediction performance using area under the receiver operating characteristic curve (AUC) for the internal and external validation datasets. In the internal validation, SLR-LASSO and SLR-EN tended to yield more accurate predictions than other combinations. During the external validation, the SLR-SLR and SLR-EN combinations achieved the highest AUC of 0.726. We propose these combinations as a potentially powerful risk prediction model for type 2 diabetes.

Intelligent System for the Prediction of Heart Diseases Using Machine Learning Algorithms with Anew Mixed Feature Creation (MFC) technique

  • Rawia Elarabi;Abdelrahman Elsharif Karrar;Murtada El-mukashfi El-taher
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.5
    • /
    • pp.148-162
    • /
    • 2023
  • Classification systems can significantly assist the medical sector by allowing for the precise and quick diagnosis of diseases. As a result, both doctors and patients will save time. A possible way for identifying risk variables is to use machine learning algorithms. Non-surgical technologies, such as machine learning, are trustworthy and effective in categorizing healthy and heart-disease patients, and they save time and effort. The goal of this study is to create a medical intelligent decision support system based on machine learning for the diagnosis of heart disease. We have used a mixed feature creation (MFC) technique to generate new features from the UCI Cleveland Cardiology dataset. We select the most suitable features by using Least Absolute Shrinkage and Selection Operator (LASSO), Recursive Feature Elimination with Random Forest feature selection (RFE-RF) and the best features of both LASSO RFE-RF (BLR) techniques. Cross-validated and grid-search methods are used to optimize the parameters of the estimator used in applying these algorithms. and classifier performance assessment metrics including classification accuracy, specificity, sensitivity, precision, and F1-Score, of each classification model, along with execution time and RMSE the results are presented independently for comparison. Our proposed work finds the best potential outcome across all available prediction models and improves the system's performance, allowing physicians to diagnose heart patients more accurately.

Applied linear and nonlinear statistical models for evaluating strength of Geopolymer concrete

  • Prem, Prabhat Ranjan;Thirumalaiselvi, A.;Verma, Mohit
    • Computers and Concrete
    • /
    • v.24 no.1
    • /
    • pp.7-17
    • /
    • 2019
  • The complex phenomenon of the bond formation in geopolymer is not well understood and therefore, difficult to model. This paper present applied statistical models for evaluating the compressive strength of geopolymer. The applied statistical models studied are divided into three different categories - linear regression [least absolute shrinkage and selection operator (LASSO) and elastic net], tree regression [decision and bagging tree] and kernel methods (support vector regression (SVR), kernel ridge regression (KRR), Gaussian process regression (GPR), relevance vector machine (RVM)]. The performance of the methods is compared in terms of error indices, computational effort, convergence and residuals. Based on the present study, kernel based methods (GPR and KRR) are recommended for evaluating compressive strength of Geopolymer concrete.

Efficient Neural Network for Downscaling climate scenarios

  • Moradi, Masha;Lee, Taesam
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2018.05a
    • /
    • pp.157-157
    • /
    • 2018
  • A reliable and accurate downscaling model which can provide climate change information, obtained from global climate models (GCMs), at finer resolution has been always of great interest to researchers. In order to achieve this model, linear methods widely have been studied in the past decades. However, nonlinear methods also can be potentially beneficial to solve downscaling problem. Therefore, this study explored the applicability of some nonlinear machine learning techniques such as neural network (NN), extreme learning machine (ELM), and ELM autoencoder (ELM-AE) as well as a linear method, least absolute shrinkage and selection operator (LASSO), to build a reliable temperature downscaling model. ELM is an efficient learning algorithm for generalized single layer feed-forward neural networks (SLFNs). Its excellent training speed and good generalization capability make ELM an efficient solution for SLFNs compared to traditional time-consuming learning methods like back propagation (BP). However, due to its shallow architecture, ELM may not capture all of nonlinear relationships between input features. To address this issue, ELM-AE was tested in the current study for temperature downscaling.

  • PDF

Improvement of inspection system for common crossings by track side monitoring and prognostics

  • Sysyn, Mykola;Nabochenko, Olga;Kovalchuk, Vitalii;Gruen, Dimitri;Pentsak, Andriy
    • Structural Monitoring and Maintenance
    • /
    • v.6 no.3
    • /
    • pp.219-235
    • /
    • 2019
  • Scheduled inspections of common crossings are one of the main cost drivers of railway maintenance. Prognostics and health management (PHM) approach and modern monitoring means offer many possibilities in the optimization of inspections and maintenance. The present paper deals with data driven prognosis of the common crossing remaining useful life (RUL) that is based on an inertial monitoring system. The problem of scheduled inspections system for common crossings is outlined and analysed. The proposed analysis of inertial signals with the maximal overlap discrete wavelet packet transform (MODWPT) and Shannon entropy (SE) estimates enable to extract the spectral features. The relevant features for the acceleration components are selected with application of Lasso (Least absolute shrinkage and selection operator) regularization. The features are fused with time domain information about the longitudinal position of wheels impact and train velocities by multivariate regression. The fused structural health (SH) indicator has a significant correlation to the lifetime of crossing. The RUL prognosis is performed on the linear degradation stochastic model with recursive Bayesian update. Prognosis testing metrics show the promising results for common crossing inspection scheduling improvement.

Predictive Model of Optimal Continuous Positive Airway Pressure for Obstructive Sleep Apnea Patients with Obesity by Using Machine Learning (비만 폐쇄수면무호흡 환자에서 기계학습을 통한 적정양압 예측모형)

  • Kim, Seung Soo;Yang, Kwang Ik
    • Journal of Sleep Medicine
    • /
    • v.15 no.2
    • /
    • pp.48-54
    • /
    • 2018
  • Objectives: The aim of this study was to develop a predicting model for the optimal continuous positive airway pressure (CPAP) for obstructive sleep apnea (OSA) patient with obesity by using a machine learning. Methods: We retrospectively investigated the medical records of 162 OSA patients who had obesity [body mass index (BMI) ≥ 25] and undertaken successful CPAP titration study. We divided the data to a training set (90%) and a test set (10%), randomly. We made a random forest model and a least absolute shrinkage and selection operator (lasso) regression model to predict the optimal pressure by using the training set, and then applied our models and previous reported equations to the test set. To compare the fitness of each models, we used a correlation coefficient (CC) and a mean absolute error (MAE). Results: The random forest model showed the best performance {CC 0.78 [95% confidence interval (CI) 0.43-0.93], MAE 1.20}. The lasso regression model also showed the improved result [CC 0.78 (95% CI 0.42-0.93), MAE 1.26] compared to the Hoffstein equation [CC 0.68 (95% CI 0.23-0.89), MAE 1.34] and the Choi's equation [CC 0.72 (95% CI 0.30-0.90), MAE 1.40]. Conclusions: Our random forest model and lasso model ($26.213+0.084{\times}BMI+0.004{\times}$apnea-hypopnea index+$0.004{\times}oxygen$ desaturation index-$0.215{\times}mean$ oxygen saturation) showed the improved performance compared to the previous reported equations. The further study for other subgroup or phenotype of OSA is required.

A study of Predicting International Gasoline Prices based on Multiple Linear Regression with Economic Indicators (경제지표를 활용한 다중선형회귀 모델 기반 국제 휘발유 가격 예측)

  • Myeongeun Han;Jiyeon Kim;Hyunhee Lee;Sein Kim;Minseo Park
    • The Journal of the Convergence on Culture Technology
    • /
    • v.10 no.1
    • /
    • pp.159-164
    • /
    • 2024
  • The domestic petroleum market is highly sensitive to changes in international oil prices. So, it is important to identify and respond to those changes. In particular, it is necessary to clearly understand the factors causing the price fluctuations of gasoline, which exhibits high consumption. International gasoline prices are influenced by global factors such as gasoline supplies, geopolitical events, and fluctuations in the U.S. dollar. However, previous studies have only focused on gasoline supplies. In this study, we explore the causal relationship between economic indicators and international gasoline prices using various machine learning-based regression models. First, we collect data on various global economic indicators. Second, we perform data preprocessing. Third, we model using Multiple linear regression, Ridge regression, and Lasso(Least Absolute Shrinkage and Selection Operator) regression. The multiple linear regression model showed the highest accuracy at 96.73% in test sets. As a result, Our Multiple linear regression model showed the highest accuracy at 96.73% in test sets. We will expect that our proposed model will be helpful for domestic economic stability and energy policy decisions.

Time Delay Estimation Using LASSO (Least Absolute Selection and Shrinkage Operator) (LASSO를 사용한 시간 지연 추정 알고리즘)

  • Lim, Jun-Seok;Pyeon, Yong-Guk;Choi, Seok-Im
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.39B no.10
    • /
    • pp.715-721
    • /
    • 2014
  • In decades, many researchers have studied the time delay estimation (TDE) method for the signals in the two different receivers. The channel estimation based TDE is one of the typical TDE methods. The channel estimation based TDE models the time delay between two receiving signals as an impulse response in a channel between two receivers. In general the impulse response becomes sparse. However, most conventional TDE algorithms cannot have utilized the sparsity. In this paper, we propose a TDE method taking the sparsity into consideration. The performance comparison shows that the proposed algorithm improves the estimation accuracy by 10 dB in the white gaussian source. In addition, even in the colored source, the proposed algorithm doesn't show the estimation threshold effect.

Pure additive contribution of genetic variants to a risk prediction model using propensity score matching: application to type 2 diabetes

  • Park, Chanwoo;Jiang, Nan;Park, Taesung
    • Genomics & Informatics
    • /
    • v.17 no.4
    • /
    • pp.47.1-47.12
    • /
    • 2019
  • The achievements of genome-wide association studies have suggested ways to predict diseases, such as type 2 diabetes (T2D), using single-nucleotide polymorphisms (SNPs). Most T2D risk prediction models have used SNPs in combination with demographic variables. However, it is difficult to evaluate the pure additive contribution of genetic variants to classically used demographic models. Since prediction models include some heritable traits, such as body mass index, the contribution of SNPs using unmatched case-control samples may be underestimated. In this article, we propose a method that uses propensity score matching to avoid underestimation by matching case and control samples, thereby determining the pure additive contribution of SNPs. To illustrate the proposed propensity score matching method, we used SNP data from the Korea Association Resources project and reported SNPs from the genome-wide association study catalog. We selected various SNP sets via stepwise logistic regression (SLR), least absolute shrinkage and selection operator (LASSO), and the elastic-net (EN) algorithm. Using these SNP sets, we made predictions using SLR, LASSO, and EN as logistic regression modeling techniques. The accuracy of the predictions was compared in terms of area under the receiver operating characteristic curve (AUC). The contribution of SNPs to T2D was evaluated by the difference in the AUC between models using only demographic variables and models that included the SNPs. The largest difference among our models showed that the AUC of the model using genetic variants with demographic variables could be 0.107 higher than that of the corresponding model using only demographic variables.

Spatial Hedonic Modeling using Geographically Weighted LASSO Model (GWL을 적용한 공간 헤도닉 모델링)

  • Jin, Chanwoo;Lee, Gunhak
    • Journal of the Korean Geographical Society
    • /
    • v.49 no.6
    • /
    • pp.917-934
    • /
    • 2014
  • Geographically weighted regression(GWR) model has been widely used to estimate spatially heterogeneous real estate prices. The GWR model, however, has some limitations of the selection of different price determinants over space and the restricted number of observations for local estimation. Alternatively, the geographically weighted LASSO(GWL) model has been recently introduced and received a growing interest. In this paper, we attempt to explore various local price determinants for the real estate by utilizing the GWL and its applicability to forecasting the real estate price. To do this, we developed the three hedonic models of OLS, GWR, and GWL focusing on the sales price of apartments in Seoul and compared those models in terms of model fit, prediction, and multicollinearity. As a result, local models appeared to be better than the global OLS on the whole, and in particular, the GWL appeared to be more explanatory and predictable than other models. Moreover, the GWL enabled to provide spatially different sets of price determinants which no multicollinearity exists. The GWL helps select the significant sets of independent variables from a high dimensional dataset, and hence will be a useful technique for large and complex spatial big data.

  • PDF