• Title/Summary/Keyword: logistic model

Search Result 1,915, Processing Time 0.03 seconds

A comparison of Multilayer Perceptron with Logistic Regression for the Risk Factor Analysis of Type 2 Diabetes Mellitus (제2형 당뇨병의 위험인자 분석을 위한 다층 퍼셉트론과 로지스틱 회귀 모델의 비교)

  • 서혜숙;최진욱;이홍규
    • Journal of Biomedical Engineering Research
    • /
    • v.22 no.4
    • /
    • pp.369-375
    • /
    • 2001
  • The statistical regression model is one of the most frequently used clinical analysis methods. It has basic assumption of linearity, additivity and normal distribution of data. However, most of biological data in medical field are nonlinear and unevenly distributed. To overcome the discrepancy between the basic assumption of statistical model and actual biological data, we propose a new analytical method based on artificial neural network. The newly developed multilayer perceptron(MLP) is trained with 120 data set (60 normal, 60 patient). On applying test data, it shows the discrimination power of 0.76. The diabetic risk factors were also identified from the MLP neural network model and the logistic regression model. The signigicant risk factors identified by MLP model were post prandial glucose level(PP2), sex(male), fasting blood sugar(FBS) level, age, SBP, AC and WHR. Those from the regression model are sex(male), PP2, age and FBS. The combined risk factors can be identified using the MLP model. Those are total cholesterol and body weight, which is consistent with the result of other clinical studies. From this experiment we have learned that MLP can be applied to the combined risk factor analysis of biological data which can not be provided by the conventional statistical method.

  • PDF

Developing a Pedestrian Satisfaction Prediction Model Based on Machine Learning Algorithms (기계학습 알고리즘을 이용한 보행만족도 예측모형 개발)

  • Lee, Jae Seung;Lee, Hyunhee
    • Journal of Korea Planning Association
    • /
    • v.54 no.3
    • /
    • pp.106-118
    • /
    • 2019
  • In order to develop pedestrian navigation service that provides optimal pedestrian routes based on pedestrian satisfaction levels, it is required to develop a prediction model that can estimate a pedestrian's satisfaction level given a certain condition. Thus, the aim of the present study is to develop a pedestrian satisfaction prediction model based on three machine learning algorithms: Logistic Regression, Random Forest, and Artificial Neural Network models. The 2009, 2012, 2013, 2014, and 2015 Pedestrian Satisfaction Survey Data in Seoul, Korea are used to train and test the machine learning models. As a result, the Random Forest model shows the best prediction performance among the three (Accuracy: 0.798, Recall: 0.906, Precision: 0.842, F1 Score: 0.873, AUC: 0.795). The performance of Artificial Neural Network is the second (Accuracy: 0.773, Recall: 0.917, Precision: 0.811, F1 Score: 0.868, AUC: 0.738) and Logistic Regression model's performance follows the second (Accuracy: 0.764, Recall: 1.000, Precision: 0.764, F1 Score: 0.868, AUC: 0.575). The precision score of the Random Forest model implies that approximately 84.2% of pedestrians may be satisfied if they walk the areas, suggested by the Random Forest model.

Modeling Growth of Canopy Heights and Stem Diameters in Soybeans at Different Groundwater Level (지하 수위가 다른 조건에서 콩의 초장과 경태 모델링)

  • Choi, Jin-Young;Kim, Dong-Hyun;Kwon, Soon-Hong;Choi, Won-Sik;Kim, Jong-Soon
    • Journal of the Korean Society of Industry Convergence
    • /
    • v.20 no.5
    • /
    • pp.395-404
    • /
    • 2017
  • Cultivating soybeans in rice paddy field reduces labor costs and increases the yield. Soybeans, however, are highly susceptible to excessive soil water in paddy field. Controlled drainage system can adjust groundwater level (GWL) and control soil moisture content, resulting in improvement soil environments for optimum crop growth. The objective of this study was to fit the soybean growth data (canopy height and stem diameter) using Gompertz model and Logistic model at different GWL and validate those models. The soybean, Daewon cultivar, was grown on the lysimeters controlled GWL (20cm and 40cm). The soil textures were silt loam and sandy loam. The canopy height and stem diameter were measured from the 20th days after seeding until harvest. The Gompertz and Logistic models were fitted with the growth data and each growth rate and maximum growth value was estimated. At the canopy height, the $R_2$ and RMSE were 0.99 and 1.58 in Gompertz model and 0.99 and 1.33 in Logistic model, respectively. The large discrepancy was shown in full maturity stage (R8), where plants have shed substantial amount of leaves. Regardless of soil texture, the maximum growth values at 40cm GWL were greater than the value at 20cm GWL. The growth rates were larger at silt loam. At the stem diameter, the $R_2$ and RMSE were 0.96 and 0.27 in Gompertz model and 0.96 and 0.26 in Logistic model, respectively. Unlike the canopy height, the stem diameter in R8 stage didn't decrease significantly. At both GWLs, the maximum growth values and the growth rates at silt loam were all larger than the values at sandy loam. In conclusion, Gompertz model and Logistic model both well fit the canopy heights and stem diameters of soybeans. These growth models can provide invaluable information for the development of precision water management system.

Estimation of growth curve in Hanwoo steers using progeny test records

  • Yun, Jae-Woong;Park, Se-Yeong;Park, Hu-Rak;Eum, Seung-Hoon;Roh, Seung-Hee;Seo, Jakyeom;Cho, Seong-Keun;Kim, Byeong-Woo
    • Korean Journal of Agricultural Science
    • /
    • v.43 no.4
    • /
    • pp.623-633
    • /
    • 2016
  • A total of 6,973 steer growth records of Hanwoo breeding bull's progeny test data collected from 1989 to 2015 were analyzed to identify the most appropriate growth curve among three growth curve models (Gompertz, Logistic and von Bertalanffy). The Gompertz growth curve model equation was $W_t=990.5e^{{-2.7479e}^{-0.00241t}}$, the Logistic growth curve model equation was $W_t=772(1+8.3314e^{-0.00475t})^{-1}$, and the von Bertalanffy growth curve model equation was $W_t=1,196.4(1-0.646e^{-0.00162t})^3$. The Gompertz model parameters A, b, and k were estimated to be $990.5{\pm}10.27$, $2.7479{\pm}0.0068$, and $0.00241{\pm}0.000028$, respectively. The inflection point age was estimated to be 421 days and the weight of inflection point was 365.3 kg. The Logistic model parameters A, b, and k were estimated to be $772.0{\pm}4.12$, $8.3314{\pm}0.0453$, and $0.00475{\pm}0.000033$, respectively. The inflection point age was estimated to be 445 days and the weight of inflection point was 385.0 kg. The von Bertalanffy model parameters A, b, and k were estimated to be $1196.4{\pm}18.39$, $0.646{\pm}0.0010$, and $0.00162{\pm}0.000027$, respectively. The inflection point age was estimated to be 405 days and the weight of inflection point was 352.0 kg. Mature body weight of the von Bertalanffy model was 1196.4 kg, the Gompertz model was 990.5 kg, and the Logistic model was 772.0 kg. The difference between actual and estimated weights was similar in the Logistic model and the von Bertalanffy model. The difference between market weight and estimated market weight was the lowest in the Gompertz model. The growth curve using the von Bertalanffy model showed the lowest mean square error.

Estimating small area proportions with kernel logistic regressions models

  • Shim, Jooyong;Hwang, Changha
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.4
    • /
    • pp.941-949
    • /
    • 2014
  • Unit level logistic regression model with mixed effects has been used for estimating small area proportions, which treats the spatial effects as random effects and assumes linearity between the logistic link and the covariates. However, when the functional form of the relationship between the logistic link and the covariates is not linear, it may lead to biased estimators of the small area proportions. In this paper, we relax the linearity assumption and propose two types of kernel-based logistic regression models for estimating small area proportions. We also demonstrate the efficiency of our propose models using simulated data and real data.

Generalized Partially Linear Additive Models for Credit Scoring

  • Shim, Ju-Hyun;Lee, Young-K.
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.4
    • /
    • pp.587-595
    • /
    • 2011
  • Credit scoring is an objective and automatic system to assess the credit risk of each customer. The logistic regression model is one of the popular methods of credit scoring to predict the default probability; however, it may not detect possible nonlinear features of predictors despite the advantages of interpretability and low computation cost. In this paper, we propose to use a generalized partially linear model as an alternative to logistic regression. We also introduce modern ensemble technologies such as bagging, boosting and random forests. We compare these methods via a simulation study and illustrate them through a German credit dataset.

Generalization of Road Network using Logistic Regression

  • Park, Woojin;Huh, Yong
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.37 no.2
    • /
    • pp.91-97
    • /
    • 2019
  • In automatic map generalization, the formalization of cartographic principles is important. This study proposes and evaluates the selection method for road network generalization that analyzes existing maps using reverse engineering and formalizes the selection rules for the road network. Existing maps with a 1:5,000 scale and a 1:25,000 scale are compared, and the criteria for selection of the road network data and the relative importance of each network object are determined and analyzed using $T{\ddot{o}}pfer^{\prime}s$ Radical Law as well as the logistic regression model. The selection model derived from the analysis result is applied to the test data, and road network data for the 1:25,000 scale map are generated from the digital topographic map on a 1:5,000 scale. The selected road network is compared with the existing road network data on the 1:25,000 scale for a qualitative and quantitative evaluation. The result indicates that more than 80% of road objects are matched to existing data.

Statistical micro matching using a multinomial logistic regression model for categorical data

  • Kim, Kangmin;Park, Mingue
    • Communications for Statistical Applications and Methods
    • /
    • v.26 no.5
    • /
    • pp.507-517
    • /
    • 2019
  • Statistical matching is a method of combining multiple sources of data that are extracted or surveyed from the same population. It can be used in situation when variables of interest are not jointly observed. It is a low-cost way to expect high-effects in terms of being able to create synthetic data using existing sources. In this paper, we propose the several statistical micro matching methods using a multinomial logistic regression model when all variables of interest are categorical or categorized ones, which is common in sample survey. Under conditional independence assumption (CIA), a mixed statistical matching method, which is useful when auxiliary information is not available, is proposed. We also propose a statistical matching method with auxiliary information that reduces the bias of the conventional matching methods suggested under CIA. Through a simulation study, proposed micro matching methods and conventional ones are compared. Simulation study shows that suggested matching methods outperform the existing ones especially when CIA does not hold.

Exploring interaction using 3-D residual plots in logistic regression model (3차원 잔차산점도를 이용한 로지스틱회귀모형에서 교호작용의 탐색)

  • Kahng, Myung-Wook
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.1
    • /
    • pp.177-185
    • /
    • 2014
  • Under bivariate normal distribution assumptions, the interaction and quadratic terms are needed in the logistic regression model with two predictors. However, depending on the correlation coefficient and the variances of two conditional distributions, the interaction and quadratic terms may not be necessary. Although the need for these terms can be determined by comparing the two scatter plots, it is not as useful for interaction terms. We explore the structure and usefulness of the 3-D residual plot as a tool for dealing with interaction in logistic regression models. If predictors have an interaction effect, a 3-D residual plot can show the effect. This is illustrated by simulated and real data.

Suppression for Logistic Regression Model (로지스틱 회귀모형에서의 SUPPRESSION)

  • Hong C. S.;Kim H. I.;Ham J. H.
    • The Korean Journal of Applied Statistics
    • /
    • v.18 no.3
    • /
    • pp.701-712
    • /
    • 2005
  • The suppression for logistic regression models has been debated no longer than that for linear regression models since, among many other reasons, sum of squares for regression (SSR) or coefficient of determination ($R^2$) could be defined into various ways. Based on four kinds of $R^2$'s: two kinds are most preferred, and the other two are proposed by Liao & McGee (2003), four kinds of SSR's are derived so that the suppression for logistic models is explained. Many data fitted to logistic models are generated by Monte Carlo method. We explore when suppression happens, and compare with that for linear regression models.