• 제목/요약/키워드: logistic model

검색결과 1,927건 처리시간 0.026초

제2형 당뇨병의 위험인자 분석을 위한 다층 퍼셉트론과 로지스틱 회귀 모델의 비교 (A comparison of Multilayer Perceptron with Logistic Regression for the Risk Factor Analysis of Type 2 Diabetes Mellitus)

  • 서혜숙;최진욱;이홍규
    • 대한의용생체공학회:의공학회지
    • /
    • 제22권4호
    • /
    • pp.369-375
    • /
    • 2001
  • The statistical regression model is one of the most frequently used clinical analysis methods. It has basic assumption of linearity, additivity and normal distribution of data. However, most of biological data in medical field are nonlinear and unevenly distributed. To overcome the discrepancy between the basic assumption of statistical model and actual biological data, we propose a new analytical method based on artificial neural network. The newly developed multilayer perceptron(MLP) is trained with 120 data set (60 normal, 60 patient). On applying test data, it shows the discrimination power of 0.76. The diabetic risk factors were also identified from the MLP neural network model and the logistic regression model. The signigicant risk factors identified by MLP model were post prandial glucose level(PP2), sex(male), fasting blood sugar(FBS) level, age, SBP, AC and WHR. Those from the regression model are sex(male), PP2, age and FBS. The combined risk factors can be identified using the MLP model. Those are total cholesterol and body weight, which is consistent with the result of other clinical studies. From this experiment we have learned that MLP can be applied to the combined risk factor analysis of biological data which can not be provided by the conventional statistical method.

  • PDF

기계학습 알고리즘을 이용한 보행만족도 예측모형 개발 (Developing a Pedestrian Satisfaction Prediction Model Based on Machine Learning Algorithms)

  • 이제승;이현희
    • 국토계획
    • /
    • 제54권3호
    • /
    • pp.106-118
    • /
    • 2019
  • In order to develop pedestrian navigation service that provides optimal pedestrian routes based on pedestrian satisfaction levels, it is required to develop a prediction model that can estimate a pedestrian's satisfaction level given a certain condition. Thus, the aim of the present study is to develop a pedestrian satisfaction prediction model based on three machine learning algorithms: Logistic Regression, Random Forest, and Artificial Neural Network models. The 2009, 2012, 2013, 2014, and 2015 Pedestrian Satisfaction Survey Data in Seoul, Korea are used to train and test the machine learning models. As a result, the Random Forest model shows the best prediction performance among the three (Accuracy: 0.798, Recall: 0.906, Precision: 0.842, F1 Score: 0.873, AUC: 0.795). The performance of Artificial Neural Network is the second (Accuracy: 0.773, Recall: 0.917, Precision: 0.811, F1 Score: 0.868, AUC: 0.738) and Logistic Regression model's performance follows the second (Accuracy: 0.764, Recall: 1.000, Precision: 0.764, F1 Score: 0.868, AUC: 0.575). The precision score of the Random Forest model implies that approximately 84.2% of pedestrians may be satisfied if they walk the areas, suggested by the Random Forest model.

지하 수위가 다른 조건에서 콩의 초장과 경태 모델링 (Modeling Growth of Canopy Heights and Stem Diameters in Soybeans at Different Groundwater Level)

  • 최진영;김동현;권순홍;최원식;김종순
    • 한국산업융합학회 논문집
    • /
    • 제20권5호
    • /
    • pp.395-404
    • /
    • 2017
  • Cultivating soybeans in rice paddy field reduces labor costs and increases the yield. Soybeans, however, are highly susceptible to excessive soil water in paddy field. Controlled drainage system can adjust groundwater level (GWL) and control soil moisture content, resulting in improvement soil environments for optimum crop growth. The objective of this study was to fit the soybean growth data (canopy height and stem diameter) using Gompertz model and Logistic model at different GWL and validate those models. The soybean, Daewon cultivar, was grown on the lysimeters controlled GWL (20cm and 40cm). The soil textures were silt loam and sandy loam. The canopy height and stem diameter were measured from the 20th days after seeding until harvest. The Gompertz and Logistic models were fitted with the growth data and each growth rate and maximum growth value was estimated. At the canopy height, the $R_2$ and RMSE were 0.99 and 1.58 in Gompertz model and 0.99 and 1.33 in Logistic model, respectively. The large discrepancy was shown in full maturity stage (R8), where plants have shed substantial amount of leaves. Regardless of soil texture, the maximum growth values at 40cm GWL were greater than the value at 20cm GWL. The growth rates were larger at silt loam. At the stem diameter, the $R_2$ and RMSE were 0.96 and 0.27 in Gompertz model and 0.96 and 0.26 in Logistic model, respectively. Unlike the canopy height, the stem diameter in R8 stage didn't decrease significantly. At both GWLs, the maximum growth values and the growth rates at silt loam were all larger than the values at sandy loam. In conclusion, Gompertz model and Logistic model both well fit the canopy heights and stem diameters of soybeans. These growth models can provide invaluable information for the development of precision water management system.

Estimation of growth curve in Hanwoo steers using progeny test records

  • Yun, Jae-Woong;Park, Se-Yeong;Park, Hu-Rak;Eum, Seung-Hoon;Roh, Seung-Hee;Seo, Jakyeom;Cho, Seong-Keun;Kim, Byeong-Woo
    • 농업과학연구
    • /
    • 제43권4호
    • /
    • pp.623-633
    • /
    • 2016
  • A total of 6,973 steer growth records of Hanwoo breeding bull's progeny test data collected from 1989 to 2015 were analyzed to identify the most appropriate growth curve among three growth curve models (Gompertz, Logistic and von Bertalanffy). The Gompertz growth curve model equation was $W_t=990.5e^{{-2.7479e}^{-0.00241t}}$, the Logistic growth curve model equation was $W_t=772(1+8.3314e^{-0.00475t})^{-1}$, and the von Bertalanffy growth curve model equation was $W_t=1,196.4(1-0.646e^{-0.00162t})^3$. The Gompertz model parameters A, b, and k were estimated to be $990.5{\pm}10.27$, $2.7479{\pm}0.0068$, and $0.00241{\pm}0.000028$, respectively. The inflection point age was estimated to be 421 days and the weight of inflection point was 365.3 kg. The Logistic model parameters A, b, and k were estimated to be $772.0{\pm}4.12$, $8.3314{\pm}0.0453$, and $0.00475{\pm}0.000033$, respectively. The inflection point age was estimated to be 445 days and the weight of inflection point was 385.0 kg. The von Bertalanffy model parameters A, b, and k were estimated to be $1196.4{\pm}18.39$, $0.646{\pm}0.0010$, and $0.00162{\pm}0.000027$, respectively. The inflection point age was estimated to be 405 days and the weight of inflection point was 352.0 kg. Mature body weight of the von Bertalanffy model was 1196.4 kg, the Gompertz model was 990.5 kg, and the Logistic model was 772.0 kg. The difference between actual and estimated weights was similar in the Logistic model and the von Bertalanffy model. The difference between market weight and estimated market weight was the lowest in the Gompertz model. The growth curve using the von Bertalanffy model showed the lowest mean square error.

Estimating small area proportions with kernel logistic regressions models

  • Shim, Jooyong;Hwang, Changha
    • Journal of the Korean Data and Information Science Society
    • /
    • 제25권4호
    • /
    • pp.941-949
    • /
    • 2014
  • Unit level logistic regression model with mixed effects has been used for estimating small area proportions, which treats the spatial effects as random effects and assumes linearity between the logistic link and the covariates. However, when the functional form of the relationship between the logistic link and the covariates is not linear, it may lead to biased estimators of the small area proportions. In this paper, we relax the linearity assumption and propose two types of kernel-based logistic regression models for estimating small area proportions. We also demonstrate the efficiency of our propose models using simulated data and real data.

Generalized Partially Linear Additive Models for Credit Scoring

  • Shim, Ju-Hyun;Lee, Young-K.
    • 응용통계연구
    • /
    • 제24권4호
    • /
    • pp.587-595
    • /
    • 2011
  • Credit scoring is an objective and automatic system to assess the credit risk of each customer. The logistic regression model is one of the popular methods of credit scoring to predict the default probability; however, it may not detect possible nonlinear features of predictors despite the advantages of interpretability and low computation cost. In this paper, we propose to use a generalized partially linear model as an alternative to logistic regression. We also introduce modern ensemble technologies such as bagging, boosting and random forests. We compare these methods via a simulation study and illustrate them through a German credit dataset.

Generalization of Road Network using Logistic Regression

  • Park, Woojin;Huh, Yong
    • 한국측량학회지
    • /
    • 제37권2호
    • /
    • pp.91-97
    • /
    • 2019
  • In automatic map generalization, the formalization of cartographic principles is important. This study proposes and evaluates the selection method for road network generalization that analyzes existing maps using reverse engineering and formalizes the selection rules for the road network. Existing maps with a 1:5,000 scale and a 1:25,000 scale are compared, and the criteria for selection of the road network data and the relative importance of each network object are determined and analyzed using $T{\ddot{o}}pfer^{\prime}s$ Radical Law as well as the logistic regression model. The selection model derived from the analysis result is applied to the test data, and road network data for the 1:25,000 scale map are generated from the digital topographic map on a 1:5,000 scale. The selected road network is compared with the existing road network data on the 1:25,000 scale for a qualitative and quantitative evaluation. The result indicates that more than 80% of road objects are matched to existing data.

Statistical micro matching using a multinomial logistic regression model for categorical data

  • Kim, Kangmin;Park, Mingue
    • Communications for Statistical Applications and Methods
    • /
    • 제26권5호
    • /
    • pp.507-517
    • /
    • 2019
  • Statistical matching is a method of combining multiple sources of data that are extracted or surveyed from the same population. It can be used in situation when variables of interest are not jointly observed. It is a low-cost way to expect high-effects in terms of being able to create synthetic data using existing sources. In this paper, we propose the several statistical micro matching methods using a multinomial logistic regression model when all variables of interest are categorical or categorized ones, which is common in sample survey. Under conditional independence assumption (CIA), a mixed statistical matching method, which is useful when auxiliary information is not available, is proposed. We also propose a statistical matching method with auxiliary information that reduces the bias of the conventional matching methods suggested under CIA. Through a simulation study, proposed micro matching methods and conventional ones are compared. Simulation study shows that suggested matching methods outperform the existing ones especially when CIA does not hold.

3차원 잔차산점도를 이용한 로지스틱회귀모형에서 교호작용의 탐색 (Exploring interaction using 3-D residual plots in logistic regression model)

  • 강명욱
    • Journal of the Korean Data and Information Science Society
    • /
    • 제25권1호
    • /
    • pp.177-185
    • /
    • 2014
  • 로지스틱회귀모형에서 설명변수만으로는 충분히 설명이 되지 못하고 설명변수의 변환된 형태인 이차항 또는 교호작용항이 필요한 경우가 있다. 설명변수가 두 개이고 조건부 분포가 이변량 정규분포를 따르는 경우 로지스틱회귀모형에서는 기본적으로 이차항과 교호작용항이 모형에 포함되어야 한다. 하지만 조건부 분포의 분산과 상관계수에 따라 이차항과 교호작용항이 필요하지 않게 되는 경우도 있다. 분산이나 상관계수에 대한 정보는 산점도를 보고 대체적인 판단이 가능하지만 교호작용항의 필요성을 판단하기가 쉽지 않다. 본 논문에서는 3차원 잔차산점도를 이용한 교호작용의 탐색방법을 제시하고 이 방법을 실제 자료에 적용시켜본다.

로지스틱 회귀모형에서의 SUPPRESSION (Suppression for Logistic Regression Model)

  • 홍종선;김호일;함주형
    • 응용통계연구
    • /
    • 제18권3호
    • /
    • pp.701-712
    • /
    • 2005
  • 로지스틱 회귀모형에서 suppression의 논의는 선형회귀의 논의보다 많지 않은데 그 이유 중의 하나는 회귀제곱합 또는 결정계수의 정의가 유일하지 않고 다양하기 때문이다. 여러 종류의 결정계수들 중에서 선호되는 두 종류의 결정계수와 Liao와 McGee(2003)가 제안한 두 종류의 수정 결정계수의 정의로부터 회귀제곱합을 유도하여 로지스틱 회귀모형에서의 suppression을 설명하고자 한다. 모의실험을 통하여 자료를 생성하여 어떤 경우에 suppression이 발생하는지를 살펴보고 그 결과를 선형회귀모형에서의 suppression 결과와 비교한다.