• Title/Summary/Keyword: LASSO regression

Search Result 106, Processing Time 0.025 seconds

Development of Evaluation Model of Pumping and Drainage Station Using Performance Degradation Factors (농업기반시설물 양·배수장의 성능저하 요인분석 및 성능평가 모델 개발)

  • Lee, Jonghyuk;Lee, Sangik;Jeong, Youngjoon;Lee, Jemyung;Yoon, Seongsoo;Park, Jinseon;Lee, Byeongjoon;Lee, Joongu;Choi, Won
    • Journal of The Korean Society of Agricultural Engineers
    • /
    • v.61 no.4
    • /
    • pp.75-86
    • /
    • 2019
  • Recently, natural disasters due to abnormal climates are frequently outbreaking, and there is rapid increase of damage to aged agricultural infrastructure. As agricultural infrastructure facilities are in contact with water throughout the year and the number of them is significant, it is important to build a maintenance management system. Especially, the current maintenance management system of pumping and drainage stations among the agricultural facilities has the limit of lack of objectivity and management personnel. The purpose of this study is to develop a performance evaluation model using the factors related to performance degradation of pumping and drainage facilities and to predict the performance of the facilities in response to climate change. In this study, we focused on the pumping and drainage stations belonging to each climatic zone separated by the Korea geographical climatic classification system. The performance evaluation model was developed using three different statistical models of POLS, RE, and LASSO. As the result of analysis of statistical models, LASSO was selected for the performance evaluation model as it solved the multicollinearity problem between variables, and showed the smallest MSE. To predict the performance degradation due to climate change, the climate change response variables were classified into three categories: climate exposure, sensitivity, and adaptive capacity. The performance degradation prediction was performed at each facility using the developed performance evaluation model and the climate change response variables.

Prediction of the Probability of Job Loss due to Digitalization and Comparison by Industry: Using Machine Learning Methods

  • Park, Heedae;Lee, Kiyoul
    • Journal of Korea Trade
    • /
    • v.25 no.5
    • /
    • pp.110-128
    • /
    • 2021
  • Purpose - The essential purpose of this study is to analyze the possibility of substitution of an individual job resulting from technological development represented by the 4th Industrial Resolution, considering the different effects of digital transformation on the labor market. Design/methodology - In order to estimate the substitution probability, this study used two data sets which the job characteristics data for individual occupations provided by KEIS and the information on occupational status of substitution provided by Frey and Osborne(2013). In total, 665 occupations were considered in this study. Of these, 80 occupations had data with labels of substitution status. The primary goal of estimation was to predict the degree of substitution for 607 of 665 occupations (excluding 58 with markers). It utilized three methods a principal component analysis, an unsupervised learning methodology of machine learning, and Ridge and Lasso from supervised learning methodology. After extracting significant variables based on the three methods, this study carried out logistics regression to estimate the probability of substitution for each occupation. Findings - The probability of substitution for other occupational groups did not significantly vary across individual models, and the rank order of the probabilities across occupational groups were similar across models. The mean of three methods of substitution probability was analyzed to be 45.3%. The highest value was obtained using the PCA method, and the lowest value was derived from the LASSO method. The average substitution probability of the trading industry was 45.1%, very similar to the overall average. Originality/value - This study has a significance in that it estimates the job substitution probability using various machine learning methods. The results of substitution probability estimation were compared by industry sector. In addition, This study attempts to compare between trade business and industry sector.

Analysis of cycle racing ranking using statistical prediction models (통계적 예측모형을 활용한 경륜 경기 순위 분석)

  • Park, Gahee;Park, Rira;Song, Jongwoo
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.1
    • /
    • pp.25-39
    • /
    • 2017
  • Over 5 million people participate in cycle racing betting and its revenue is more than 2 trillion won. This study predicts the ranking of cycle racing using various statistical analyses and identifies important variables which have influence on ranking. We propose competitive ranking prediction models using various classification and regression methods. Our model can predict rankings with low misclassification rates most of the time. We found that the ranking increases as the grade of a racer decreases and as overall scores increase. Inversely, we can observe that the ranking decreases when the grade of a racer increases, race number four is given, and the ranking of the last race of a racer decreases. We also found that prediction accuracy can be improved when we use centered data per race instead of raw data. However, the real profit from the future data was not high when we applied our prediction model because our model can predict only low-return events well.

A study of Predicting International Gasoline Prices based on Multiple Linear Regression with Economic Indicators (경제지표를 활용한 다중선형회귀 모델 기반 국제 휘발유 가격 예측)

  • Myeongeun Han;Jiyeon Kim;Hyunhee Lee;Sein Kim;Minseo Park
    • The Journal of the Convergence on Culture Technology
    • /
    • v.10 no.1
    • /
    • pp.159-164
    • /
    • 2024
  • The domestic petroleum market is highly sensitive to changes in international oil prices. So, it is important to identify and respond to those changes. In particular, it is necessary to clearly understand the factors causing the price fluctuations of gasoline, which exhibits high consumption. International gasoline prices are influenced by global factors such as gasoline supplies, geopolitical events, and fluctuations in the U.S. dollar. However, previous studies have only focused on gasoline supplies. In this study, we explore the causal relationship between economic indicators and international gasoline prices using various machine learning-based regression models. First, we collect data on various global economic indicators. Second, we perform data preprocessing. Third, we model using Multiple linear regression, Ridge regression, and Lasso(Least Absolute Shrinkage and Selection Operator) regression. The multiple linear regression model showed the highest accuracy at 96.73% in test sets. As a result, Our Multiple linear regression model showed the highest accuracy at 96.73% in test sets. We will expect that our proposed model will be helpful for domestic economic stability and energy policy decisions.

Prediction of Quantitative Traits Using Common Genetic Variants: Application to Body Mass Index

  • Bae, Sunghwan;Choi, Sungkyoung;Kim, Sung Min;Park, Taesung
    • Genomics & Informatics
    • /
    • v.14 no.4
    • /
    • pp.149-159
    • /
    • 2016
  • With the success of the genome-wide association studies (GWASs), many candidate loci for complex human diseases have been reported in the GWAS catalog. Recently, many disease prediction models based on penalized regression or statistical learning methods were proposed using candidate causal variants from significant single-nucleotide polymorphisms of GWASs. However, there have been only a few systematic studies comparing existing methods. In this study, we first constructed risk prediction models, such as stepwise linear regression (SLR), least absolute shrinkage and selection operator (LASSO), and Elastic-Net (EN), using a GWAS chip and GWAS catalog. We then compared the prediction accuracy by calculating the mean square error (MSE) value on data from the Korea Association Resource (KARE) with body mass index. Our results show that SLR provides a smaller MSE value than the other methods, while the numbers of selected variables in each model were similar.

Relative Error Prediction via Penalized Regression (벌점회귀를 통한 상대오차 예측방법)

  • Jeong, Seok-Oh;Lee, Seo-Eun;Shin, Key-Il
    • The Korean Journal of Applied Statistics
    • /
    • v.28 no.6
    • /
    • pp.1103-1111
    • /
    • 2015
  • This paper presents a new prediction method based on relative error incorporated with a penalized regression. The proposed method consists of fully data-driven procedures that is fast, simple, and easy to implement. An example of real data analysis and some simulation results were given to prove that the proposed approach works in practice.

Two-Stage Penalized Composite Quantile Regression with Grouped Variables

  • Bang, Sungwan;Jhun, Myoungshic
    • Communications for Statistical Applications and Methods
    • /
    • v.20 no.4
    • /
    • pp.259-270
    • /
    • 2013
  • This paper considers a penalized composite quantile regression (CQR) that performs a variable selection in the linear model with grouped variables. An adaptive sup-norm penalized CQR (ASCQR) is proposed to select variables in a grouped manner; in addition, the consistency and oracle property of the resulting estimator are also derived under some regularity conditions. To improve the efficiency of estimation and variable selection, this paper suggests the two-stage penalized CQR (TSCQR), which uses the ASCQR to select relevant groups in the first stage and the adaptive lasso penalized CQR to select important variables in the second stage. Simulation studies are conducted to illustrate the finite sample performance of the proposed methods.

Factors contributing to the Increase of ADHD in Korea (한국 사회의 ADHD 증가 요인 분석)

  • Soo-Kyeong Kim;Hyon Hee Kim
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.11a
    • /
    • pp.456-457
    • /
    • 2023
  • ADHD(과활동성 주의력 결핍 장애) 환자 수가 증가하며 주의력 집중이 사회적 문제로 대두되고 있다. 그러나 ADHD에 대한 이해나 요인에 대한 연구는 미흡하다. 본 연구에서는 아동기 전신마취가 ADHD 발생에 영향이 있다는 연구를 기반으로, 상관관계 분석과 선형회귀분석, Lasso Regression, Support Vector Regression, Deep Neural Network, Ensemble, Random Forest Regression을 활용하여 ADHD 증가 요인에 대해 탐구했다. 분석 결과는 전신 마취에 노출될 가능성이 높은 아동의 경우 ADHD에 노출될 가능성 역시 높을 수 있음을 시사한다.

Multiple Group Testing Procedures for Analysis of High-Dimensional Genomic Data

  • Ko, Hyoseok;Kim, Kipoong;Sun, Hokeun
    • Genomics & Informatics
    • /
    • v.14 no.4
    • /
    • pp.187-195
    • /
    • 2016
  • In genetic association studies with high-dimensional genomic data, multiple group testing procedures are often required in order to identify disease/trait-related genes or genetic regions, where multiple genetic sites or variants are located within the same gene or genetic region. However, statistical testing procedures based on an individual test suffer from multiple testing issues such as the control of family-wise error rate and dependent tests. Moreover, detecting only a few of genes associated with a phenotype outcome among tens of thousands of genes is of main interest in genetic association studies. In this reason regularization procedures, where a phenotype outcome regresses on all genomic markers and then regression coefficients are estimated based on a penalized likelihood, have been considered as a good alternative approach to analysis of high-dimensional genomic data. But, selection performance of regularization procedures has been rarely compared with that of statistical group testing procedures. In this article, we performed extensive simulation studies where commonly used group testing procedures such as principal component analysis, Hotelling's $T^2$ test, and permutation test are compared with group lasso (least absolute selection and shrinkage operator) in terms of true positive selection. Also, we applied all methods considered in simulation studies to identify genes associated with ovarian cancer from over 20,000 genetic sites generated from Illumina Infinium HumanMethylation27K Beadchip. We found a big discrepancy of selected genes between multiple group testing procedures and group lasso.

Improvement of inspection system for common crossings by track side monitoring and prognostics

  • Sysyn, Mykola;Nabochenko, Olga;Kovalchuk, Vitalii;Gruen, Dimitri;Pentsak, Andriy
    • Structural Monitoring and Maintenance
    • /
    • v.6 no.3
    • /
    • pp.219-235
    • /
    • 2019
  • Scheduled inspections of common crossings are one of the main cost drivers of railway maintenance. Prognostics and health management (PHM) approach and modern monitoring means offer many possibilities in the optimization of inspections and maintenance. The present paper deals with data driven prognosis of the common crossing remaining useful life (RUL) that is based on an inertial monitoring system. The problem of scheduled inspections system for common crossings is outlined and analysed. The proposed analysis of inertial signals with the maximal overlap discrete wavelet packet transform (MODWPT) and Shannon entropy (SE) estimates enable to extract the spectral features. The relevant features for the acceleration components are selected with application of Lasso (Least absolute shrinkage and selection operator) regularization. The features are fused with time domain information about the longitudinal position of wheels impact and train velocities by multivariate regression. The fused structural health (SH) indicator has a significant correlation to the lifetime of crossing. The RUL prognosis is performed on the linear degradation stochastic model with recursive Bayesian update. Prognosis testing metrics show the promising results for common crossing inspection scheduling improvement.