• Title/Summary/Keyword: LASSO regression

Search Result 104, Processing Time 0.027 seconds

Big Data Study about the Effects of Weather Factors on Food Poisoning Incidence (기상요인과 식중독 발병의 연관성에 대한 빅 데이터 분석)

  • Park, Ji-Ae;Kim, Jang-Mook;Lee, Ho-Sung;Lee, He-Jin
    • Journal of Digital Convergence
    • /
    • v.14 no.3
    • /
    • pp.319-327
    • /
    • 2016
  • This research attempts an analysis that fuses the big data concerning weather variation and health care from January 1, 2011 to December 31, 2014; it gives the weather factor as to what kind of influence there is for the incidence of food poisoning, and also endeavors to be helpful regarding national health prevention. By using R, the Logistic and Lasso Logistic Regression were analyzed. The main factor germ generating the food poisoning was classified and the incidence was confirmed for the germ of bacteria and virus. According to the result of the analysis of Logistic Regression, we found that the incidence of bacterial food poisoning was affected by the following influences: the average temperature, amount of sunshine deviation, and deviation of temperature. Furthermore, the weather factors, having an effect on the incidence of viral food poisoning, were: the minimum vapor pressure, amount of sunshine deviation and deviation of temperature. This study confirmed the correlation of meteorological factors and incidence of food poisoning. It was also found out that even if the incidence from two causes were influenced by the same weather factor, the incidence might be oppositely affected by the characteristic of the germs.

Youtube Mukbang and Online Delivery Orders: Analysis of Impacts and Predictive Model (유튜브 먹방과 온라인 배달 주문: 영향력 분석과 예측 모형)

  • Choi, Sarah;Lee, Sang-Yong Tom
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.4
    • /
    • pp.119-133
    • /
    • 2022
  • One of the most important current features of food related industry is the growth of food delivery service. Another notable food related culture is, with the advent of Youtube, the popularity of Mukbang, which refers to content that records eating. Based on these background, this study intended to focus on two things. First, we tried to see the impact of Youtube Mukbang and the sentiments of Mukbang comments on the number of related food deliveries. Next, we tried to set up the predictive modeling of chicken delivery order with machine learning method. We used Youtube Mukbang comments data as well as weather related data as main independent variables. The dependent variable used in this study is the number of delivery order of fried chicken. The period of data used in this study is from June 3, 2015 to September 30, 2019, and a total of 1,580 data were used. For the predictive modeling, we used machine learning methods such as linear regression, ridge, lasso, random forest, and gradient boost. We found that the sentiment of Youtube Mukbang and comments have impacts on the number of delivery orders. The prediction model with Mukban data we set up in this study had better performances than the existing models without Mukbang data. We also tried to suggest managerial implications to the food delivery service industry.

Non-Contrast Cine Cardiac Magnetic Resonance Derived-Radiomics for the Prediction of Left Ventricular Adverse Remodeling in Patients With ST-Segment Elevation Myocardial Infarction

  • Xin A;Mingliang Liu;Tong Chen;Feng Chen;Geng Qian;Ying Zhang;Yundai Chen
    • Korean Journal of Radiology
    • /
    • v.24 no.9
    • /
    • pp.827-837
    • /
    • 2023
  • Objective: To investigate the predictive value of radiomics features based on cardiac magnetic resonance (CMR) cine images for left ventricular adverse remodeling (LVAR) after acute ST-segment elevation myocardial infarction (STEMI). Materials and Methods: We conducted a retrospective, single-center, cohort study involving 244 patients (random-split into 170 and 74 for training and testing, respectively) having an acute STEMI (88.5% males, 57.0 ± 10.3 years of age) who underwent CMR examination at one week and six months after percutaneous coronary intervention. LVAR was defined as a 20% increase in left ventricular end-diastolic volume 6 months after acute STEMI. Radiomics features were extracted from the oneweek CMR cine images using the least absolute shrinkage and selection operator regression (LASSO) analysis. The predictive performance of the selected features was evaluated using receiver operating characteristic curve analysis and the area under the curve (AUC). Results: Nine radiomics features with non-zero coefficients were included in the LASSO regression of the radiomics score (RAD score). Infarct size (odds ratio [OR]: 1.04 (1.00-1.07); P = 0.031) and RAD score (OR: 3.43 (2.34-5.28); P < 0.001) were independent predictors of LVAR. The RAD score predicted LVAR, with an AUC (95% confidence interval [CI]) of 0.82 (0.75-0.89) in the training set and 0.75 (0.62-0.89) in the testing set. Combining the RAD score with infarct size yielded favorable performance in predicting LVAR, with an AUC of 0.84 (0.72-0.95). Moreover, the addition of the RAD score to the left ventricular ejection fraction (LVEF) significantly increased the AUC from 0.68 (0.52-0.84) to 0.82 (0.70-0.93) (P = 0.018), which was also comparable to the prediction provided by the combined microvascular obstruction, infarct size, and LVEF with an AUC of 0.79 (0.65-0.94) (P = 0.727). Conclusion: Radiomics analysis using non-contrast cine CMR can predict LVAR after STEMI independently and incrementally to LVEF and may provide an alternative to traditional CMR parameters.

Modelling the deflection of reinforced concrete beams using the improved artificial neural network by imperialist competitive optimization

  • Li, Ning;Asteris, Panagiotis G.;Tran, Trung-Tin;Pradhan, Biswajeet;Nguyen, Hoang
    • Steel and Composite Structures
    • /
    • v.42 no.6
    • /
    • pp.733-745
    • /
    • 2022
  • This study proposed a robust artificial intelligence (AI) model based on the social behaviour of the imperialist competitive algorithm (ICA) and artificial neural network (ANN) for modelling the deflection of reinforced concrete beams, abbreviated as ICA-ANN model. Accordingly, the ICA was used to adjust and optimize the parameters of an ANN model (i.e., weights and biases) aiming to improve the accuracy of the ANN model in modelling the deflection reinforced concrete beams. A total of 120 experimental datasets of reinforced concrete beams were employed for this aim. Therein, applied load, tensile reinforcement strength and the reinforcement percentage were used to simulate the deflection of reinforced concrete beams. Besides, five other AI models, such as ANN, SVM (support vector machine), GLMNET (lasso and elastic-net regularized generalized linear models), CART (classification and regression tree) and KNN (k-nearest neighbours), were also used for the comprehensive assessment of the proposed model (i.e., ICA-ANN). The comparison of the derived results with the experimental findings demonstrates that among the developed models the ICA-ANN model is that can approximate the reinforced concrete beams deflection in a more reliable and robust manner.

A Pre-processing Process Using TadGAN-based Time-series Anomaly Detection (TadGAN 기반 시계열 이상 탐지를 활용한 전처리 프로세스 연구)

  • Lee, Seung Hoon;Kim, Yong Soo
    • Journal of Korean Society for Quality Management
    • /
    • v.50 no.3
    • /
    • pp.459-471
    • /
    • 2022
  • Purpose: The purpose of this study was to increase prediction accuracy for an anomaly interval identified using an artificial intelligence-based time series anomaly detection technique by establishing a pre-processing process. Methods: Significant variables were extracted by applying feature selection techniques, and anomalies were derived using the TadGAN time series anomaly detection algorithm. After applying machine learning and deep learning methodologies using normal section data (excluding anomaly sections), the explanatory power of the anomaly sections was demonstrated through performance comparison. Results: The results of the machine learning methodology, the performance was the best when SHAP and TadGAN were applied, and the results in the deep learning, the performance was excellent when Chi-square Test and TadGAN were applied. Comparing each performance with the papers applied with a Conventional methodology using the same data, it can be seen that the performance of the MLR was significantly improved to 15%, Random Forest to 24%, XGBoost to 30%, Lasso Regression to 73%, LSTM to 17% and GRU to 19%. Conclusion: Based on the proposed process, when detecting unsupervised learning anomalies of data that are not actually labeled in various fields such as cyber security, financial sector, behavior pattern field, SNS. It is expected to prove the accuracy and explanation of the anomaly detection section and improve the performance of the model.

Quality Prediction Model for Manufacturing Process of Free-Machining 303-series Stainless Steel Small Rolling Wire Rods (쾌삭 303계 스테인리스강 소형 압연 선재 제조 공정의 생산품질 예측 모형)

  • Seo, Seokjun;Kim, Heungseob
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.44 no.4
    • /
    • pp.12-22
    • /
    • 2021
  • This article suggests the machine learning model, i.e., classifier, for predicting the production quality of free-machining 303-series stainless steel(STS303) small rolling wire rods according to the operating condition of the manufacturing process. For the development of the classifier, manufacturing data for 37 operating variables were collected from the manufacturing execution system(MES) of Company S, and the 12 types of derived variables were generated based on literature review and interviews with field experts. This research was performed with data preprocessing, exploratory data analysis, feature selection, machine learning modeling, and the evaluation of alternative models. In the preprocessing stage, missing values and outliers are removed, and oversampling using SMOTE(Synthetic oversampling technique) to resolve data imbalance. Features are selected by variable importance of LASSO(Least absolute shrinkage and selection operator) regression, extreme gradient boosting(XGBoost), and random forest models. Finally, logistic regression, support vector machine(SVM), random forest, and XGBoost are developed as a classifier to predict the adequate or defective products with new operating conditions. The optimal hyper-parameters for each model are investigated by the grid search and random search methods based on k-fold cross-validation. As a result of the experiment, XGBoost showed relatively high predictive performance compared to other models with an accuracy of 0.9929, specificity of 0.9372, F1-score of 0.9963, and logarithmic loss of 0.0209. The classifier developed in this study is expected to improve productivity by enabling effective management of the manufacturing process for the STS303 small rolling wire rods.

A Study on the Prediction Models of Used Car Prices Using Ensemble Model And SHAP Value: Focus on Feature of the Vehicle Type (앙상블 모델과 SHAP Value를 활용한 국내 중고차 가격 예측 모델에 관한 연구: 차종 특성을 중심으로)

  • Seungjun Yim;Joungho Lee;Choonho Ryu
    • Journal of Service Research and Studies
    • /
    • v.14 no.1
    • /
    • pp.27-43
    • /
    • 2024
  • The market share of online platform services in the used car market continues to expand. And The used car online platform service provides service users with specifications of vehicles, accident history, inspection details, detailed options, and prices of used cars. SUV vehicle type's share in the domestic automobile market will be more than 50% in 2023, Sales of Hybrid vehicle type are doubled compared to last year. And these vehicle types are also gaining popularity in the used car market. Prior research has proposed a used car price prediction model by executing a Machine Learning model for all vehicles or vehicles by brand. On the other hand, the popularity of SUV and Hybrid vehicles in the domestic market continues to rise, but It was difficult to find a study that proposed a used car price prediction model for these vehicle type. This study selects a used car price prediction model by vehicle type using vehicle specifications and options for Sedans, SUV, and Hybrid vehicles produced by domestic brands. Accordingly, after selecting feature through the Lasso regression model, which is a feature selection, the ensemble model was sequentially executed with the same sampling, and the best model by vehicle type was selected. As a result, the best model for all models was selected as the CBR model, and the contribution and direction of the features were confirmed by visualizing Tree SHAP Value for the best model for each model. The implications of this study are expected to propose a used car price prediction model by vehicle type to sales officials using online platform services, confirm the attribution and direction of features, and help solve problems caused by asymmetry fo information between them.

Investigation on the Key Parameters for the Strengthening Behavior of Biopolymer-based Soil Treatment (BPST) Technology (바이오폴리머-흙 처리(BPST) 기술의 강도 발현 거동에 대한 주요 영향인자 분석에 관한 연구)

  • Lee, Hae-Jin;Cho, Gye-Chum;Chang, Ilhan
    • Land and Housing Review
    • /
    • v.12 no.3
    • /
    • pp.109-119
    • /
    • 2021
  • Global warming caused by greenhouse gas emissions has rapidly increased abnormal climate events and geotechnical engineering hazards in terms of their size and frequency accordingly. Biopolymer-based soil treatment (BPST) in geotechnical engineering has been implemented in recent years as an alternative to reducing carbon footprint. Furthermore, thermo-gelating biopolymers, including agar gum, gellan gum, and xanthan gum, are known to strengthen soils noticeably. However, an explicitly detailed evaluation of the correlation between the factors, that have a significant influence on the strengthening behavior of BPST, has not been explored yet. In this study, machine learning regression analysis was performed using the UCS (unconfined compressive strength) data for BPST tested in the laboratory to evaluate the factors influencing the strengthening behavior of gellan gum-treated soil mixtures. General linear regression, Ridge, and Lasso were used as linear regression methods; the key factors influencing the behavior of BPST were determined by RMSE (root mean squared error) and regression coefficient values. The results of the analysis showed that the concentration of biopolymer and the content of clay have the most significant influence on the strength of BPST.

A Study on the Prediction Models of Used Car Prices for Domestic Brands Using Machine Learning (머신러닝을 활용한 브랜드별 국내 중고차 가격 예측 모델에 관한 연구)

  • Seungjun Yim;Joungho Lee;Choonho Ryu
    • Journal of Service Research and Studies
    • /
    • v.13 no.3
    • /
    • pp.105-126
    • /
    • 2023
  • The domestic used car market continues to grow along with the used car online platform service. The used car online platform service discloses vehicle specifications, accident history, inspection history, and detailed options to service consumers. Most of the preceding studies were predictions of used car prices using vehicle specifications and some options for vehicles. As a result of the study, it was confirmed that there was a nonlinear relationship between used car prices and some specification variables. Accordingly, the researchers tried to solve the nonlinear problem by executing a Machine Learning model. In common, the Regression based Machine Learning model had the advantage of knowing the actual influence and direction of variables, but there was a disadvantage of low Cost Function figures compared to the Decision Tree based Machine Learning model. This study attempted to predict used car prices of six domestic brands by utilizing both vehicle specifications and vehicle options. Through this, we tried to collect the advantages of the two types of Machine Learning models. To this end, we sequentially conducted a regression based Machine Learning model and a decision tree based Machine Learning model. As a result of the analysis, the practical influence and direction of each brand variable, and the best tree based Machine Learning model were selected. The implications of this study are as follows. It will help buyers and sellers who use used car online platform services to predict approximate used car prices. And it is hoped that it will help solve the problem caused by information inequality among users of the used car online platform service.

Prediction of Postoperative Lung Function in Lung Cancer Patients Using Machine Learning Models

  • Oh Beom Kwon;Solji Han;Hwa Young Lee;Hye Seon Kang;Sung Kyoung Kim;Ju Sang Kim;Chan Kwon Park;Sang Haak Lee;Seung Joon Kim;Jin Woo Kim;Chang Dong Yeo
    • Tuberculosis and Respiratory Diseases
    • /
    • v.86 no.3
    • /
    • pp.203-215
    • /
    • 2023
  • Background: Surgical resection is the standard treatment for early-stage lung cancer. Since postoperative lung function is related to mortality, predicted postoperative lung function is used to determine the treatment modality. The aim of this study was to evaluate the predictive performance of linear regression and machine learning models. Methods: We extracted data from the Clinical Data Warehouse and developed three sets: set I, the linear regression model; set II, machine learning models omitting the missing data: and set III, machine learning models imputing the missing data. Six machine learning models, the least absolute shrinkage and selection operator (LASSO), Ridge regression, ElasticNet, Random Forest, eXtreme gradient boosting (XGBoost), and the light gradient boosting machine (LightGBM) were implemented. The forced expiratory volume in 1 second measured 6 months after surgery was defined as the outcome. Five-fold cross-validation was performed for hyperparameter tuning of the machine learning models. The dataset was split into training and test datasets at a 70:30 ratio. Implementation was done after dataset splitting in set III. Predictive performance was evaluated by R2 and mean squared error (MSE) in the three sets. Results: A total of 1,487 patients were included in sets I and III and 896 patients were included in set II. In set I, the R2 value was 0.27 and in set II, LightGBM was the best model with the highest R2 value of 0.5 and the lowest MSE of 154.95. In set III, LightGBM was the best model with the highest R2 value of 0.56 and the lowest MSE of 174.07. Conclusion: The LightGBM model showed the best performance in predicting postoperative lung function.