• 제목/요약/키워드: Logistic models

검색결과 804건 처리시간 0.026초

A Study on the Power Comparison between Logistic Regression and Offset Poisson Regression for Binary Data

  • Kim, Dae-Youb;Park, Heung-Sun
    • Communications for Statistical Applications and Methods
    • /
    • 제19권4호
    • /
    • pp.537-546
    • /
    • 2012
  • In this paper, for analyzing binary data, Poisson regression with offset and logistic regression are compared with respect to the power via simulations. Poisson distribution can be used as an approximation of binomial distribution when n is large and p is small; however, we investigate if the same conditions can be held for the power of significant tests between logistic regression and offset poisson regression. The result is that when offset size is large for rare events offset poisson regression has a similar power to logistic regression, but it has an acceptable power even with a moderate prevalence rate. However, with a small offset size (< 10), offset poisson regression should be used with caution for rare events or common events. These results would be good guidelines for users who want to use offset poisson regression models for binary data.

Geographically weighted kernel logistic regression for small area proportion estimation

  • Shim, Jooyong;Hwang, Changha
    • Journal of the Korean Data and Information Science Society
    • /
    • 제27권2호
    • /
    • pp.531-538
    • /
    • 2016
  • In this paper we deal with the small area estimation for the case that the response variables take binary values. The mixed effects models have been extensively studied for the small area estimation, which treats the spatial effects as random effects. However, when the spatial information of each area is given specifically as coordinates it is popular to use the geographically weighted logistic regression to incorporate the spatial information by assuming that the regression parameters vary spatially across areas. In this paper, relaxing the linearity assumption and propose a geographically weighted kernel logistic regression for estimating small area proportions by using basic principle of kernel machine. Numerical studies have been carried out to compare the performance of proposed method with other methods in estimating small area proportion.

Semiparametric kernel logistic regression with longitudinal data

  • Shim, Joo-Yong;Seok, Kyung-Ha
    • Journal of the Korean Data and Information Science Society
    • /
    • 제23권2호
    • /
    • pp.385-392
    • /
    • 2012
  • Logistic regression is a well known binary classification method in the field of statistical learning. Mixed-effect regression models are widely used for the analysis of correlated data such as those found in longitudinal studies. We consider kernel extensions with semiparametric fixed effects and parametric random effects for the logistic regression. The estimation is performed through the penalized likelihood method based on kernel trick, and our focus is on the efficient computation and the effective hyperparameter selection. For the selection of optimal hyperparameters, cross-validation techniques are employed. Numerical results are then presented to indicate the performance of the proposed procedure.

지하 수위가 다른 조건에서 콩의 초장과 경태 모델링 (Modeling Growth of Canopy Heights and Stem Diameters in Soybeans at Different Groundwater Level)

  • 최진영;김동현;권순홍;최원식;김종순
    • 한국산업융합학회 논문집
    • /
    • 제20권5호
    • /
    • pp.395-404
    • /
    • 2017
  • Cultivating soybeans in rice paddy field reduces labor costs and increases the yield. Soybeans, however, are highly susceptible to excessive soil water in paddy field. Controlled drainage system can adjust groundwater level (GWL) and control soil moisture content, resulting in improvement soil environments for optimum crop growth. The objective of this study was to fit the soybean growth data (canopy height and stem diameter) using Gompertz model and Logistic model at different GWL and validate those models. The soybean, Daewon cultivar, was grown on the lysimeters controlled GWL (20cm and 40cm). The soil textures were silt loam and sandy loam. The canopy height and stem diameter were measured from the 20th days after seeding until harvest. The Gompertz and Logistic models were fitted with the growth data and each growth rate and maximum growth value was estimated. At the canopy height, the $R_2$ and RMSE were 0.99 and 1.58 in Gompertz model and 0.99 and 1.33 in Logistic model, respectively. The large discrepancy was shown in full maturity stage (R8), where plants have shed substantial amount of leaves. Regardless of soil texture, the maximum growth values at 40cm GWL were greater than the value at 20cm GWL. The growth rates were larger at silt loam. At the stem diameter, the $R_2$ and RMSE were 0.96 and 0.27 in Gompertz model and 0.96 and 0.26 in Logistic model, respectively. Unlike the canopy height, the stem diameter in R8 stage didn't decrease significantly. At both GWLs, the maximum growth values and the growth rates at silt loam were all larger than the values at sandy loam. In conclusion, Gompertz model and Logistic model both well fit the canopy heights and stem diameters of soybeans. These growth models can provide invaluable information for the development of precision water management system.

Evaluating seismic liquefaction potential using multivariate adaptive regression splines and logistic regression

  • Zhang, Wengang;Goh, Anthony T.C.
    • Geomechanics and Engineering
    • /
    • 제10권3호
    • /
    • pp.269-284
    • /
    • 2016
  • Simplified techniques based on in situ testing methods are commonly used to assess seismic liquefaction potential. Many of these simplified methods were developed by analyzing liquefaction case histories from which the liquefaction boundary (limit state) separating two categories (the occurrence or non-occurrence of liquefaction) is determined. As the liquefaction classification problem is highly nonlinear in nature, it is difficult to develop a comprehensive model using conventional modeling techniques that take into consideration all the independent variables, such as the seismic and soil properties. In this study, a modification of the Multivariate Adaptive Regression Splines (MARS) approach based on Logistic Regression (LR) LR_MARS is used to evaluate seismic liquefaction potential based on actual field records. Three different LR_MARS models were used to analyze three different field liquefaction databases and the results are compared with the neural network approaches. The developed spline functions and the limit state functions obtained reveal that the LR_MARS models can capture and describe the intrinsic, complex relationship between seismic parameters, soil parameters, and the liquefaction potential without having to make any assumptions about the underlying relationship between the various variables. Considering its computational efficiency, simplicity of interpretation, predictive accuracy, its data-driven and adaptive nature and its ability to map the interaction between variables, the use of LR_MARS model in assessing seismic liquefaction potential is promising.

소프트웨어 신뢰성 예측을 위한 객체지향 척도 분석 (Analysis of Object-Oriented Metrics to Predict Software Reliability)

  • 이양규
    • 한국신뢰성학회지:신뢰성응용연구
    • /
    • 제16권1호
    • /
    • pp.48-55
    • /
    • 2016
  • Purpose: The purpose of this study is to identify the object-oriented metrics which have strong impact on the reliability and fault-proneness of software products. The reliability and fault-proneness of software product is closely related to the design properties of class diagrams such as coupling between objects and depth of inheritance tree. Methods: This study has empirically validated the object-oriented metrics to determine which metrics are the best to predict fault-proneness. We have tested the metrics using logistic regressions and artificial neural networks. The results are then compared and validated by ROC curves. Results: The artificial neural network models show better results in sensitivity, specificity and correctness than logistic regression models. Among object-oriented metrics, several metrics can estimate the fault-proneness better. The metrics are CBO (coupling between objects), DIT (depth of inheritance), LCOM (lack of cohesive methods), RFC (response for class). In addition to the object-oriented metrics, LOC (lines of code) metric has also proven to be a good factor for determining fault-proneness of software products. Conclusion: In order to develop fault-free and reliable software products on time and within budget, assuring quality of initial phases of software development processes is crucial. Since object-oriented metrics can be measured in the early phases, it is important to make sure the key metrics of software design as good as possible.

양돈폐수의 영양염류 제거를 위한 녹조류 Chlorella vulgaris 성장 모형의 비교 (Comparison of Models to Describe Growth of Green Algae Chlorella vulgaris for Nutrient Removal from Piggery Wastewater)

  • 임병란;주티담롱판;박기영
    • 한국농공학회논문집
    • /
    • 제52권6호
    • /
    • pp.19-26
    • /
    • 2010
  • Batch experiments were conducted to investigate growth and nutrient removal performance of microalgae Chlorella vulgaris by using piggery wastewater in different concentration of pollutants and the common growth models (logistic, Gompertz and Richards) were applied to compare microalgal growth parameters. Removal of nitrogen (N) and phosphorus (P) by Chlorella vulgaris showed correlation with biomass increase, implying nutrient uptake coupled with microalgae growth. The higher the levels of suspended solids (SS), COD and ammonia nitrogen were in the wastewater, the worse growth of Chlorella vulgaris was observed, showing the occurrence of growth inhibition in higher concentration of those pollutants. The growth parameters were estimated by non-linear regression of three growth curves for comparative analyses. Determination of growth parameters were more accurate with population as a variable than the logarithm of population in terms of R square. Richards model represented better fit comparing with logistic and Gompertz model. However, Richards model showed some complexity and sensitivity in calculation. In the cases tested, both logistic and Gompertz equation were proper to describe the growth of microalgae on piggery wastewater as well as easy to application.

부산시 교통사고예측모형의 개발 (Development of Traffic Accident Forecasting Model in Pusan)

  • 이일병;임현정
    • 대한교통학회지
    • /
    • 제10권3호
    • /
    • pp.103-122
    • /
    • 1992
  • The objective of this research is to develop a traffic accident forecasting model using traffic accident data in pusan from 1963 to 1991 and then to make short-term forecasts('93~'94) of traffic accidents in pusan. In this research, several forecasting models are developed. They include a multiple regression model, a time-series ARIMA model, a Logistic curve model, and a Gompertz curve model. Among them, the model which shows the most significance in forecasting accuracy is selected as the traffic accident forecasting model. The results of this research are as followings. 1. The existing model such as Smeed model which was developed for foreign countries shows only 47.8% explanation for traffic accident deaths in Korea. 2. A nonliner regression model ($R^2$=0.9432) and a Logistic curve model are appeared to be th gest forecasting models for the number of traffic accidents, and a Logistic curve model shows th most significance in predicting the accident deaths and injuries. 3. The forecasting figures of the traffic accidents in pusan are as followings: . In 1993, 31, 180 accidents are predicted to happen, and 430 persons are predicted to be deaths and 29, 680 persons are predicated to be injuries. . In 1994, 33, 710 accidents are predicted to happen, and 431.persons are predicted to be deat! and 30, 510 persons are predicted to be injuried. Therefore, preventive measures against traffic accidents are certainly required.

  • PDF

객체지향 메트릭을 이용한 결함 예측 모형의 실험적 비교 (A Comparative Experiment of Software Defect Prediction Models using Object Oriented Metrics)

  • 김윤규;김태연;채흥석
    • 한국정보과학회논문지:컴퓨팅의 실제 및 레터
    • /
    • 제15권8호
    • /
    • pp.596-600
    • /
    • 2009
  • 검증과 확인을 통한 소프트웨어의 효율적인 관리를 지원하기 위하여 객체지향 메트릭 기반의 결함 예측 모형이 많이 제안되고 있다. 제안된 모형은 주로 로지스틱 회귀분석으로 개발하였다. 그리고 개발된 모형의 결함 예측 정확도는 60${\sim}$70%이었다. 본 논문에서는 기존 결함 예측 모형의 효과를 확인하기 위하여 이클립스 3.3을 대상으로 개발된 모형과 유사한 방법으로 실험을 하였다. 실험 결과 모형의 정확성은 약 40%이었다. 이는 주장된 예측력보다 많이 낮은 수치이었다. 또한 단순 로지스틱 회귀분석이 다중 로지스틱 회귀분석보다 높은 예측력을 보였다.

기계학습 알고리즘을 이용한 보행만족도 예측모형 개발 (Developing a Pedestrian Satisfaction Prediction Model Based on Machine Learning Algorithms)

  • 이제승;이현희
    • 국토계획
    • /
    • 제54권3호
    • /
    • pp.106-118
    • /
    • 2019
  • In order to develop pedestrian navigation service that provides optimal pedestrian routes based on pedestrian satisfaction levels, it is required to develop a prediction model that can estimate a pedestrian's satisfaction level given a certain condition. Thus, the aim of the present study is to develop a pedestrian satisfaction prediction model based on three machine learning algorithms: Logistic Regression, Random Forest, and Artificial Neural Network models. The 2009, 2012, 2013, 2014, and 2015 Pedestrian Satisfaction Survey Data in Seoul, Korea are used to train and test the machine learning models. As a result, the Random Forest model shows the best prediction performance among the three (Accuracy: 0.798, Recall: 0.906, Precision: 0.842, F1 Score: 0.873, AUC: 0.795). The performance of Artificial Neural Network is the second (Accuracy: 0.773, Recall: 0.917, Precision: 0.811, F1 Score: 0.868, AUC: 0.738) and Logistic Regression model's performance follows the second (Accuracy: 0.764, Recall: 1.000, Precision: 0.764, F1 Score: 0.868, AUC: 0.575). The precision score of the Random Forest model implies that approximately 84.2% of pedestrians may be satisfied if they walk the areas, suggested by the Random Forest model.