• Title/Summary/Keyword: multiple linear regression models

Search Result 321, Processing Time 0.023 seconds

Development of Medical Cost Prediction Model Based on the Machine Learning Algorithm (머신러닝 알고리즘 기반의 의료비 예측 모델 개발)

  • Han Bi KIM;Dong Hoon HAN
    • Journal of Korea Artificial Intelligence Association
    • /
    • v.1 no.1
    • /
    • pp.11-16
    • /
    • 2023
  • Accurate hospital case modeling and prediction are crucial for efficient healthcare. In this study, we demonstrate the implementation of regression analysis methods in machine learning systems utilizing mathematical statics and machine learning techniques. The developed machine learning model includes Bayesian linear, artificial neural network, decision tree, decision forest, and linear regression analysis models. Through the application of these algorithms, corresponding regression models were constructed and analyzed. The results suggest the potential of leveraging machine learning systems for medical research. The experiment aimed to create an Azure Machine Learning Studio tool for the speedy evaluation of multiple regression models. The tool faciliates the comparision of 5 types of regression models in a unified experiment and presents assessment results with performance metrics. Evaluation of regression machine learning models highlighted the advantages of boosted decision tree regression, and decision forest regression in hospital case prediction. These findings could lay the groundwork for the deliberate development of new directions in medical data processing and decision making. Furthermore, potential avenues for future research may include exploring methods such as clustering, classification, and anomaly detection in healthcare systems.

Safety Performance Models of Improvement Projects of Frequent Traffic Accident Locations (사고잦은곳 개선사업의 안전성과 모형)

  • Park, Byung-Ho;Park, Gil-Su;Kim, Tae-Young
    • Journal of the Korean Society of Safety
    • /
    • v.25 no.2
    • /
    • pp.89-94
    • /
    • 2010
  • This study deals with the traffic accident according to the improvement projects of frequent accident locations. The objective is to analyze the impact of improvements on the accident reduction. In pursuing the above, the study gives the particular attentions to developing the models based on the data of 70 intersections improved. The main results analyzed are as follows. First, 4 multiple linear regression accident models(total, side right-angle, rear end and side stripe accident) which were statistically significant were developed. Second, total accidents reduction by sight-distance and turning traffic flow improvements, side right-angle by sight-distance, over-speed and lane operation, rear end by turning traffic flow, signal and lane operation, and side stripe by traffic impedance improvements were analyzed. Finally, the above 4 models were evaluated to be statically significant through the correlation analysis and pair-sample t-test.

Traffic Accident Density Models Reflecting the Characteristics of the Traffic Analysis Zone in Cheongju (존별 특성을 반영한 교통사고밀도 모형 - 청주시 사례를 중심으로 -)

  • Kim, Kyeong Yong;Beck, Tea Hun;Lim, Jin Kang;Park, Byung Ho
    • International Journal of Highway Engineering
    • /
    • v.17 no.6
    • /
    • pp.75-83
    • /
    • 2015
  • PURPOSES : This study deals with the traffic accidents classified by the traffic analysis zone. The purpose is to develop the accident density models by using zonal traffic and socioeconomic data. METHODS : The traffic accident density models are developed through multiple linear regression analysis. In this study, three multiple linear models were developed. The dependent variable was traffic accident density, which is a measure of the relative distribution of traffic accidents. The independent variables were various traffic and socioeconomic variables. CONCLUSIONS : Three traffic accident density models were developed, and all models were statistically significant. Road length, trip production volume, intersections, van ratio, and number of vehicles per person in the transportation-based model were analyzed to be positive to the accident. Residential and commercial area ratio and transportation vulnerability ratio obtained using the socioeconomic-based model were found to affect the accident. The major arterial road ratio, trip production volume, intersection, van ratio, commercial ratio, and number of companies in the integrated model were also found to be related to the accident.

Particle size distributions and concentrations above radiators in indoor environments: Exploratory results from Xi'an, China

  • Chen, Xi;Li, Angui
    • Environmental Engineering Research
    • /
    • v.20 no.3
    • /
    • pp.237-245
    • /
    • 2015
  • Particulate matter in indoor environments has caused public concerns in recent years. The objective of this research is to explore the influence of radiators on particle size distributions and concentrations. The particle size distributions as well as concentrations above radiators and in the adjacent indoor air are monitored in forty-two indoor environments in Xi'an, China. The temperatures, relative humidity and air velocities are also measured. The particle size distributions above radiators at ten locations are analyzed. The results show that the functional difference of indoor environments has little impact on the particle size distributions above radiators. Then the effects of the environmental parameters (particle concentrations in the adjacent indoor air, temperatures, relative humidities and air velocities) on particle concentrations above radiators are assessed by applying multiple linear regression analysis. Three multiple linear regression models are established to predict the concentrations of $PM_{10}$, $PM_{2.5}$ and $PM_1$ above radiators.

Development of the Algorithm for Optimizing Wavelength Selection in Multiple Linear Regression

  • Hoeil Chung
    • Near Infrared Analysis
    • /
    • v.1 no.1
    • /
    • pp.1-7
    • /
    • 2000
  • A convenient algorithm for optimizing wavelength selection in multiple linear regression (MLR) has been developed. MOP (MLP Optimization Program) has been developed to test all possible MLR calibration models in a given spectral range and finally find an optimal MLR model with external validation capability. MOP generates all calibration models from all possible combinations of wavelength, and simultaneously calculates SEC (Standard Error of Calibration) and SEV (Standard Error of Validation) by predicting samples in a validation data set. Finally, with determined SEC and SEV, it calculates another parameter called SAD (Sum of SEC, SEV, and Absolute Difference between SEC and SEV: sum(SEC+SEV+Abs(SEC-SEV)). SAD is an useful parameter to find an optimal calibration model without over-fitting by simultaneously evaluating SEC, SEV, and difference of error between calibration and validation. The calibration model corresponding to the smallest SAD value is chosen as an optimum because the errors in both calibration and validation are minimal as well as similar in scale. To evaluate the capability of MOP, the determination of benzene content in unleaded gasoline has been examined. MOP successfully found the optimal calibration model and showed the better calibration and independent prediction performance compared to conventional MLR calibration.

Development of Statistical Model and Neural Network Model for Tensile Strength Estimation in Laser Material Processing of Aluminum Alloy (알루미늄 합금의 레이저 가공에서 인장 강도 예측을 위한 회귀 모델 및 신경망 모델의 개발)

  • Park, Young-Whan;Rhee, Se-Hun
    • Journal of the Korean Society for Precision Engineering
    • /
    • v.24 no.4 s.193
    • /
    • pp.93-101
    • /
    • 2007
  • Aluminum alloy which is one of the light materials has been tried to apply to light weight vehicle body. In order to do that, welding technology is very important. In case of the aluminum laser welding, the strength of welded part is reduced due to porosity, underfill, and magnesium loss. To overcome these problems, laser welding of aluminum with filler wire was suggested. In this study, experiment about laser welding of AA5182 aluminum alloy with AA5356 filler wire was performed according to process parameters such as laser power, welding speed and wire feed rate. The tensile strength was measured to find the weldability of laser welding with filler wire. The models to estimate tensile strength were suggested using three regression models and one neural network model. For regression models, one was the multiple linear regression model, another was the second order polynomial regression model, and the other was the multiple nonlinear regression model. Neural network model with 2 hidden layers which had 5 and 3 nodes respectively was investigated to find the most suitable model for the system. Estimation performance was evaluated for each model using the average error rate. Among the three regression models, the second order polynomial regression model had the best estimation performance. For all models, neural network model has the best estimation performance.

Accident Analysis of 3-legged and 4-legged Roundabouts (3지와 4지 회전교차로의 사고분석)

  • Park, Min-Kyu;Park, Byung-Ho
    • Journal of the Korean Society of Safety
    • /
    • v.27 no.3
    • /
    • pp.161-166
    • /
    • 2012
  • This study deals with the accident of roundabout. The objective is to analyze the traffic accidents occurred in 3-legged and 4-legged roundabouts through the developed models. In developing the multiple linear regression models, this study uses the number of traffic accidents as a dependent variable and such the variables as geometric structures, traffic characters and others as the independent variables. The correlation and multicollinearity of variables were analyzed using SPSS17.0. The main results are as follows. First, R-square value of developed models were analyzed to be 0.851(3-leg) and 0.689(4-leg), respectively. Second, the independent variables in the 3-legged roundabout accident model were analyzed to be the traffic volume and number of crosswalk, and the variables in the 4-legged roundabouts were evaluated to be the traffic volume and signal. Finally, the paired t-test shows that the predicted values and observed values are not statistically different.

Is it Possible to Predict the ADI of Pesticides using the QSAR Approach?

  • Kim, Jae Hyoun
    • Journal of Environmental Health Sciences
    • /
    • v.38 no.6
    • /
    • pp.550-560
    • /
    • 2012
  • Objectives: QSAR methodology was applied to explain two different sets of acceptable daily intake (ADI) data of 74 pesticides proposed by both the USEPA and WHO in terms of setting guidelines for food and drinking water. Methods: A subset of calculated descriptors was selected from Dragon$^{(R)}$ software. QSARs were then developed utilizing a statistical technique, genetic algorithm-multiple linear regression (GA-MLR). The differences in each specific model in the prediction of the ADI of the pesticides were discussed. Results: The stepwise multiple linear regression analysis resulted in a statistically significant QSAR model with five descriptors. Resultant QSAR models were robust, showing good utility across multiple classes of pesticide compounds. The applicability domain was also defined. The proposed models were robust and satisfactory. Conclusions: The QSAR model could be a feasible and effective tool for predicting ADI and for the comparison of logADIEPA to logADIWHO. The statistical results agree with the fact that USEPA focuses on more subtle endpoints than does WHO.

Developing Rear-End Collision Models of Roundabouts in Korea (국내 회전교차로의 추돌사고 모형 개발)

  • Park, Byung Ho;Beak, Tae Hun
    • Journal of the Korean Society of Safety
    • /
    • v.29 no.6
    • /
    • pp.151-157
    • /
    • 2014
  • This study deals with the rear-end collision at roundabouts. The purpose of this study is to develop the accident models of rear-end collision in Korea. In pursuing the above, this study gives particular attention to developing the appropriate models using Poisson, negative binomial model, ZAM, multiple linear and nonlinear regression models, and statistical analysis tools. The main results are as follows. First, the Vuong statistics and overdispersion parameters indicate that ZIP is the most appropriate model among count data models. Second, RMSE, MPB, MAD and correlation coefficient tests show that the multiple nonlinear model is the most suitable to the rear-end collision data. Finally, such the independent variables as traffic volume, ratio of heavy vehicle, number of circulatory roadway lane, number of crosswalk and stop line are adopted in the optimal model.

Clustering Observations for Detecting Multiple Outliers in Regression Models

  • Seo, Han-Son;Yoon, Min
    • The Korean Journal of Applied Statistics
    • /
    • v.25 no.3
    • /
    • pp.503-512
    • /
    • 2012
  • Detecting outliers in a linear regression model eventually fails when similar observations are classified differently in a sequential process. In such circumstances, identifying clusters and applying certain methods to the clustered data can prevent a failure to detect outliers and is computationally efficient due to the reduction of data. In this paper, we suggest to implement a clustering procedure for this purpose and provide examples that illustrate the suggested procedure applied to the Hadi-Simonoff (1993) method, reverse Hadi-Simonoff method, and Gentleman-Wilk (1975) method.