• 제목/요약/키워드: least absolute shrinkage and selection operator(LASSO)

검색결과 34건 처리시간 0.027초

Risk Prediction Using Genome-Wide Association Studies on Type 2 Diabetes

  • Choi, Sungkyoung;Bae, Sunghwan;Park, Taesung
    • Genomics & Informatics
    • /
    • 제14권4호
    • /
    • pp.138-148
    • /
    • 2016
  • The success of genome-wide association studies (GWASs) has enabled us to improve risk assessment and provide novel genetic variants for diagnosis, prevention, and treatment. However, most variants discovered by GWASs have been reported to have very small effect sizes on complex human diseases, which has been a big hurdle in building risk prediction models. Recently, many statistical approaches based on penalized regression have been developed to solve the "large p and small n" problem. In this report, we evaluated the performance of several statistical methods for predicting a binary trait: stepwise logistic regression (SLR), least absolute shrinkage and selection operator (LASSO), and Elastic-Net (EN). We first built a prediction model by combining variable selection and prediction methods for type 2 diabetes using Affymetrix Genome-Wide Human SNP Array 5.0 from the Korean Association Resource project. We assessed the risk prediction performance using area under the receiver operating characteristic curve (AUC) for the internal and external validation datasets. In the internal validation, SLR-LASSO and SLR-EN tended to yield more accurate predictions than other combinations. During the external validation, the SLR-SLR and SLR-EN combinations achieved the highest AUC of 0.726. We propose these combinations as a potentially powerful risk prediction model for type 2 diabetes.

Intelligent System for the Prediction of Heart Diseases Using Machine Learning Algorithms with Anew Mixed Feature Creation (MFC) technique

  • Rawia Elarabi;Abdelrahman Elsharif Karrar;Murtada El-mukashfi El-taher
    • International Journal of Computer Science & Network Security
    • /
    • 제23권5호
    • /
    • pp.148-162
    • /
    • 2023
  • Classification systems can significantly assist the medical sector by allowing for the precise and quick diagnosis of diseases. As a result, both doctors and patients will save time. A possible way for identifying risk variables is to use machine learning algorithms. Non-surgical technologies, such as machine learning, are trustworthy and effective in categorizing healthy and heart-disease patients, and they save time and effort. The goal of this study is to create a medical intelligent decision support system based on machine learning for the diagnosis of heart disease. We have used a mixed feature creation (MFC) technique to generate new features from the UCI Cleveland Cardiology dataset. We select the most suitable features by using Least Absolute Shrinkage and Selection Operator (LASSO), Recursive Feature Elimination with Random Forest feature selection (RFE-RF) and the best features of both LASSO RFE-RF (BLR) techniques. Cross-validated and grid-search methods are used to optimize the parameters of the estimator used in applying these algorithms. and classifier performance assessment metrics including classification accuracy, specificity, sensitivity, precision, and F1-Score, of each classification model, along with execution time and RMSE the results are presented independently for comparison. Our proposed work finds the best potential outcome across all available prediction models and improves the system's performance, allowing physicians to diagnose heart patients more accurately.

Applied linear and nonlinear statistical models for evaluating strength of Geopolymer concrete

  • Prem, Prabhat Ranjan;Thirumalaiselvi, A.;Verma, Mohit
    • Computers and Concrete
    • /
    • 제24권1호
    • /
    • pp.7-17
    • /
    • 2019
  • The complex phenomenon of the bond formation in geopolymer is not well understood and therefore, difficult to model. This paper present applied statistical models for evaluating the compressive strength of geopolymer. The applied statistical models studied are divided into three different categories - linear regression [least absolute shrinkage and selection operator (LASSO) and elastic net], tree regression [decision and bagging tree] and kernel methods (support vector regression (SVR), kernel ridge regression (KRR), Gaussian process regression (GPR), relevance vector machine (RVM)]. The performance of the methods is compared in terms of error indices, computational effort, convergence and residuals. Based on the present study, kernel based methods (GPR and KRR) are recommended for evaluating compressive strength of Geopolymer concrete.

Efficient Neural Network for Downscaling climate scenarios

  • Moradi, Masha;Lee, Taesam
    • 한국수자원학회:학술대회논문집
    • /
    • 한국수자원학회 2018년도 학술발표회
    • /
    • pp.157-157
    • /
    • 2018
  • A reliable and accurate downscaling model which can provide climate change information, obtained from global climate models (GCMs), at finer resolution has been always of great interest to researchers. In order to achieve this model, linear methods widely have been studied in the past decades. However, nonlinear methods also can be potentially beneficial to solve downscaling problem. Therefore, this study explored the applicability of some nonlinear machine learning techniques such as neural network (NN), extreme learning machine (ELM), and ELM autoencoder (ELM-AE) as well as a linear method, least absolute shrinkage and selection operator (LASSO), to build a reliable temperature downscaling model. ELM is an efficient learning algorithm for generalized single layer feed-forward neural networks (SLFNs). Its excellent training speed and good generalization capability make ELM an efficient solution for SLFNs compared to traditional time-consuming learning methods like back propagation (BP). However, due to its shallow architecture, ELM may not capture all of nonlinear relationships between input features. To address this issue, ELM-AE was tested in the current study for temperature downscaling.

  • PDF

Improvement of inspection system for common crossings by track side monitoring and prognostics

  • Sysyn, Mykola;Nabochenko, Olga;Kovalchuk, Vitalii;Gruen, Dimitri;Pentsak, Andriy
    • Structural Monitoring and Maintenance
    • /
    • 제6권3호
    • /
    • pp.219-235
    • /
    • 2019
  • Scheduled inspections of common crossings are one of the main cost drivers of railway maintenance. Prognostics and health management (PHM) approach and modern monitoring means offer many possibilities in the optimization of inspections and maintenance. The present paper deals with data driven prognosis of the common crossing remaining useful life (RUL) that is based on an inertial monitoring system. The problem of scheduled inspections system for common crossings is outlined and analysed. The proposed analysis of inertial signals with the maximal overlap discrete wavelet packet transform (MODWPT) and Shannon entropy (SE) estimates enable to extract the spectral features. The relevant features for the acceleration components are selected with application of Lasso (Least absolute shrinkage and selection operator) regularization. The features are fused with time domain information about the longitudinal position of wheels impact and train velocities by multivariate regression. The fused structural health (SH) indicator has a significant correlation to the lifetime of crossing. The RUL prognosis is performed on the linear degradation stochastic model with recursive Bayesian update. Prognosis testing metrics show the promising results for common crossing inspection scheduling improvement.

비만 폐쇄수면무호흡 환자에서 기계학습을 통한 적정양압 예측모형 (Predictive Model of Optimal Continuous Positive Airway Pressure for Obstructive Sleep Apnea Patients with Obesity by Using Machine Learning)

  • 김승수;양광익
    • Journal of Sleep Medicine
    • /
    • 제15권2호
    • /
    • pp.48-54
    • /
    • 2018
  • Objectives: The aim of this study was to develop a predicting model for the optimal continuous positive airway pressure (CPAP) for obstructive sleep apnea (OSA) patient with obesity by using a machine learning. Methods: We retrospectively investigated the medical records of 162 OSA patients who had obesity [body mass index (BMI) ≥ 25] and undertaken successful CPAP titration study. We divided the data to a training set (90%) and a test set (10%), randomly. We made a random forest model and a least absolute shrinkage and selection operator (lasso) regression model to predict the optimal pressure by using the training set, and then applied our models and previous reported equations to the test set. To compare the fitness of each models, we used a correlation coefficient (CC) and a mean absolute error (MAE). Results: The random forest model showed the best performance {CC 0.78 [95% confidence interval (CI) 0.43-0.93], MAE 1.20}. The lasso regression model also showed the improved result [CC 0.78 (95% CI 0.42-0.93), MAE 1.26] compared to the Hoffstein equation [CC 0.68 (95% CI 0.23-0.89), MAE 1.34] and the Choi's equation [CC 0.72 (95% CI 0.30-0.90), MAE 1.40]. Conclusions: Our random forest model and lasso model ($26.213+0.084{\times}BMI+0.004{\times}$apnea-hypopnea index+$0.004{\times}oxygen$ desaturation index-$0.215{\times}mean$ oxygen saturation) showed the improved performance compared to the previous reported equations. The further study for other subgroup or phenotype of OSA is required.

경제지표를 활용한 다중선형회귀 모델 기반 국제 휘발유 가격 예측 (A study of Predicting International Gasoline Prices based on Multiple Linear Regression with Economic Indicators)

  • 한명은;김지연;이현희;김세인;박민서
    • 문화기술의 융합
    • /
    • 제10권1호
    • /
    • pp.159-164
    • /
    • 2024
  • 국내 석유 시장은 국제 석유 가격의 변동에 매우 민감하기 때문에 그 변동성에 대한 파악과 대처가 중요하다. 특히, 높은 소비량을 보이는 휘발유의 가격이 어떠한 요인에 인해 변화하는지 명확하게 파악하는 것이 필요하다. 국제 휘발유 가격은 휘발유 수급, 지정학적 사건, 미국 달러화 가치 변동 등 글로벌 요인에 영향을 받는다. 그러나 기존의 연구들은 휘발유의 수급에만 초점에 맞추어 진행하였다는 한계가 존재한다. 본 연구에서는 다양한 머신러닝 기반의 회귀 모델을 활용하여 거시적 경제지표와 국제 휘발유 가격 간의 인과관계를 탐색한다. 첫째, 다양한 세계 경제지표 데이터를 수집한다. 둘째, 데이터 전처리를 진행한다. 셋째, 다중선형회귀, Ridge 회귀, Lasso(Least Absolute Shrinkage and Selection Operator) 회귀 모델을 활용하여 모델링한다. 실험 결과, 테스트 데이터 셋에서 다중선형회귀 모델이 가장 높은 정확도(97.3%)를 보였다. 우리는 국제 휘발유 가격의 예측은 국내 경제 안정성과 에너지 정책 결정에 도움이 될 수 있을 것으로 기대한다.

LASSO를 사용한 시간 지연 추정 알고리즘 (Time Delay Estimation Using LASSO (Least Absolute Selection and Shrinkage Operator))

  • 임준석;편용국;최석임
    • 한국통신학회논문지
    • /
    • 제39B권10호
    • /
    • pp.715-721
    • /
    • 2014
  • 두 개 센서에 도래하는 신호 간의 시간 지연을 추정 방법에는 여러 가지가 존재한다. 그 중에서 채널 추정 기법을 기반으로 한 방법의 경우는 두 센서에 입력되는 서로 다른 신호간의 상대적인 지연을 채널의 임펄스 응답처럼 추정하도록 되어 있다. 이 경우에는 해당 채널의 특성이 희박 채널의 특성을 가지고 있다. 기존의 방법들은 채널의 희박성을 이용하지 못하고 있는 방법이 대부분이다. 본 논문에서는 채널의 희박성을 이용하기 위하여 희박신호 최적화 방법의 하나인 LASSO 최적화를 사용한 시간 지연 추정 방법을 제안한다. 제안한 방법을 기존의 방법과 비교하여, 백색 가우시안 신호원에서는 약 10dB 이상의 성능 개선 결과를 보이고, 유색 신호원에서도 갑자기 추정성능이 열하되는 현상이 없음을 보인다.

Pure additive contribution of genetic variants to a risk prediction model using propensity score matching: application to type 2 diabetes

  • Park, Chanwoo;Jiang, Nan;Park, Taesung
    • Genomics & Informatics
    • /
    • 제17권4호
    • /
    • pp.47.1-47.12
    • /
    • 2019
  • The achievements of genome-wide association studies have suggested ways to predict diseases, such as type 2 diabetes (T2D), using single-nucleotide polymorphisms (SNPs). Most T2D risk prediction models have used SNPs in combination with demographic variables. However, it is difficult to evaluate the pure additive contribution of genetic variants to classically used demographic models. Since prediction models include some heritable traits, such as body mass index, the contribution of SNPs using unmatched case-control samples may be underestimated. In this article, we propose a method that uses propensity score matching to avoid underestimation by matching case and control samples, thereby determining the pure additive contribution of SNPs. To illustrate the proposed propensity score matching method, we used SNP data from the Korea Association Resources project and reported SNPs from the genome-wide association study catalog. We selected various SNP sets via stepwise logistic regression (SLR), least absolute shrinkage and selection operator (LASSO), and the elastic-net (EN) algorithm. Using these SNP sets, we made predictions using SLR, LASSO, and EN as logistic regression modeling techniques. The accuracy of the predictions was compared in terms of area under the receiver operating characteristic curve (AUC). The contribution of SNPs to T2D was evaluated by the difference in the AUC between models using only demographic variables and models that included the SNPs. The largest difference among our models showed that the AUC of the model using genetic variants with demographic variables could be 0.107 higher than that of the corresponding model using only demographic variables.

GWL을 적용한 공간 헤도닉 모델링 (Spatial Hedonic Modeling using Geographically Weighted LASSO Model)

  • 진찬우;이건학
    • 대한지리학회지
    • /
    • 제49권6호
    • /
    • pp.917-934
    • /
    • 2014
  • 지리가중회귀 모델(GWR)은 국지적으로 이질적인 부동산 가격을 추정할 수 있는 도구로 폭넓게 활용되어 왔다. 그럼에도 불구하고 GWR은 공간적으로 이질적인 가격결정요인의 선택이나 국지적 추정에서의 관측치 수의 제한 등과 같은 한계를 가지고 있다. 본 연구는 이러한 한계를 극복하기 위한 대안으로 최근 주목받고 있는 지리가중라소 모델(GWL)을 이용하여 국지적으로 다양한 부동산 가격결정요인들을 탐색하고, 부동산 가격 추정에 있어서 GWL 모델의 적용가능성을 살펴보고자 한다. 이를 위해 서울시 아파트 가격을 대상으로 OLS, GWR, GWL의 헤도닉 모델을 구축하였으며, 모델의 설명력, 예측력, 다중공선성 측면에서 이들을 비교 분석하였다. 그 결과, 전역적 모델에 비해 국지적 모델이 전체적인 설명력, 예측력이 우수한 것으로 나타났으며, 특히 국지적 모델 중 GWL 모델은 다중공선성 문제를 자동적으로 해결하면서 공간적으로 이질적인 가격 결정요인 집합들을 도출하였고, 다른 모델들에 비해 상당히 높은 설명력과 예측력을 보여주고 있다. 본 연구에서 적용한 GWL 모델은 고차원의 데이터셋에서 유의미한 독립 변수들을 효율적으로 선정하는데 직접적인 도움을 줌으로써 부동산과 같이 대용량의 복잡한 구조를 가진 공간 빅데이터를 위한 유용한 분석 기법으로 활용될 수 있을 것이다.

  • PDF