• Title/Summary/Keyword: logistic model

Search Result 1,941, Processing Time 0.026 seconds

Suppression for Logistic Regression Model (로지스틱 회귀모형에서의 SUPPRESSION)

  • Hong C. S.;Kim H. I.;Ham J. H.
    • The Korean Journal of Applied Statistics
    • /
    • v.18 no.3
    • /
    • pp.701-712
    • /
    • 2005
  • The suppression for logistic regression models has been debated no longer than that for linear regression models since, among many other reasons, sum of squares for regression (SSR) or coefficient of determination ($R^2$) could be defined into various ways. Based on four kinds of $R^2$'s: two kinds are most preferred, and the other two are proposed by Liao & McGee (2003), four kinds of SSR's are derived so that the suppression for logistic models is explained. Many data fitted to logistic models are generated by Monte Carlo method. We explore when suppression happens, and compare with that for linear regression models.

Log-density Ratio with Two Predictors in a Logistic Regression Model (로지스틱 회귀모형에서 이변량 정규분포에 근거한 로그-밀도비)

  • Kahng, Myung Wook;Yoon, Jae Eun
    • The Korean Journal of Applied Statistics
    • /
    • v.26 no.1
    • /
    • pp.141-149
    • /
    • 2013
  • We present methods for studying the log-density ratio that enables the selection of the predictors and the form to be included in the logistic regression model. Under bivariate normal distributional assumptions, we investigate the form of the log-density ratio as a function of two predictors. If two covariance matrices are equal, then the crossproduct and quadratic terms are not needed. If the variables are uncorrelated, we do not need the crossproduct terms, but we still need the linear and quadratic terms. We also explore other conditions in which the crossproduct and quadratic terms are not needed in the logistic regression model.

A Study on the Optimal Release Time Decision of a Developed Software by using Logistic Testing Effort Function (로지스틱 테스트 노력함수를 이용한 소프트웨어의 최적인도시기 결정에 관한 연구)

  • Che, Gyu-Shik;Kim, Yong-Kyung
    • Journal of Information Technology Applications and Management
    • /
    • v.12 no.2
    • /
    • pp.1-13
    • /
    • 2005
  • This paper proposes a software-reliability growth model incoporating the amount of testing effort expended during the software testing phase after developing it. The time-dependent behavior of testing effort expenditures is described by a Logistic curve. Assuming that the error detection rate to the amount of testing effort spent during the testing phase is proportional to the current error content, a software-reliability growth model is formulated by a nonhomogeneous Poisson process. Using this model the method of data analysis for software reliability measurement is developed. After defining a software reliability, This paper discusses the relations between testing time and reliability and between duration following failure fixing and reliability are studied. SRGM in several literatures has used the exponential curve, Railleigh curve or Weibull curve as an amount of testing effort during software testing phase. However, it might not be appropriate to represent the consumption curve for testing effort by one of already proposed curves in some software development environments. Therefore, this paper shows that a logistic testing-effort function can be adequately expressed as a software development/testing effort curve and that it gives a good predictive capability based on real failure data.

  • PDF

Categorized the Contribution evasion through Health Insurance contribution evasion expected model (건강보험 체납예측모형을 통한 체납세대의 유형화 및 특성)

  • 이애경;최인덕
    • Health Policy and Management
    • /
    • v.14 no.2
    • /
    • pp.78-98
    • /
    • 2004
  • The purpose of this study was to categorize the contribution evasion and develop the expected models for contribution arrears in National Health Care System. The modified logistic regression model in non-payments was used as logistic regression model based on the statistical method. By using this model, we arranged non-payment types and typical branches those are appeared by statistical technique. First fact, sex and age branches those are able to take a part in economy had effect mostly. Also they had difference in non-payment probability by existence of their incomes and property. Especially people who didn't have their own house and car were appeared in high non-payment probability, disease and reduction characteristic(rare diseases, reduction of seniors, handicaps, numbers of medical treatments) didn't effect much in probability. The reason for some characteristic of non-payment which is higher than the correct threshold value of Logistic Regression Model (a suggested model for predicting non-payment)'s distribution of probability was mostly moral hazard. Living difficulty was the bigger reason for non-payment, but moral slackening was the bigger reason for non-payment. But it is careless to decide that moral hazard is just the reason, there is a necessity to examine on the side of sociology based in family. By the reason, the member's non-payment reason can be classified by economy, population, and psychology, but there was a comprehension that losing of work desire could be one reason. So we analyzed informations for composition of family of members. In conclusion, we grasped that family conflict makes non-payment and conversion of member in the National Basic Livelihood Protection System difficult.

Analysis on the Survivor's Pension Payment with Logistic Regression Model (로지스틱 회귀모형을 이용한 유족연금 수급 분석)

  • Kim, Mi-Jung;Kim, Jin-Hyung
    • The Korean Journal of Applied Statistics
    • /
    • v.21 no.2
    • /
    • pp.183-200
    • /
    • 2008
  • Research for efficient management of the National Pension has been emphasized as the current society trends toward aging and low birth rate. In this article, we suggest a statistical model for effective classification and prediction of the reserve for the survivor's pension in Korea. Logistic regression model is incorporated; correct classification rate, and distribution of the posterior probability for the reserve of survivor's pension are investigated and compared with the results from the general logistic models. Assessment of predictive model is also done with lift graph, ROC curve and K-S statistic. We suggest strategies for reducing financial risks in managing and planning the pension as an application of the suggested model.

Value Weighted Regularized Logistic Regression Model (속성값 기반의 정규화된 로지스틱 회귀분석 모델)

  • Lee, Chang-Hwan;Jung, Mina
    • Journal of KIISE
    • /
    • v.43 no.11
    • /
    • pp.1270-1274
    • /
    • 2016
  • Logistic regression is widely used for predicting and estimating the relationship among variables. We propose a new logistic regression model, the value weighted logistic regression, which comprises of a fine-grained weighting method, and assigns adapted weights to each feature value. This gradient approach obtains the optimal weights of feature values. Experiments were conducted on several data sets from the UCI machine learning repository, and the results revealed that the proposed method achieves meaningful improvement in the prediction accuracy.

Development of Forecasting Model for the Initial Sale of Apartment Using Data Mining: The Case of Unsold Apartment Complex in Wirye New Town (데이터 마이닝을 이용한 아파트 초기계약 예측모형 개발: 위례 신도시 미분양 아파트 단지를 사례로)

  • Kim, Ji Young;Lee, Sang-Kyeong
    • Journal of Digital Convergence
    • /
    • v.16 no.12
    • /
    • pp.217-229
    • /
    • 2018
  • This paper aims at applying the data mining such as decision tree, neural network, and logistic regression to an unsold apartment complex in Wirye new town and developing the model forecasting the result of initial sale contract by house unit. Raw data are divided into training data and test data. The order of predictability in training data is neural network, decision tree, and logistic regression. On the contrary, the results of test data show that logistic regression is the best model. This means that logistic regression has more data adaptability than neural network which is developed as the model optimized for training data. Determinants of initial sale are the location of floor, direction, the location of unit, the proximity of electricity and generator room, subscriber's residential region and the type of subscription. This suggests that using two models together is more effective in exploring determinants of initial sales. This paper contributes to the development of convergence field by expanding the scope of data mining.

A Study on Quality Control Using Data Mining in Steel Continuous Casting Process (철강 연주공정에서 데이터마이닝을 이용한 품질제어 방법에 관한 연구)

  • Kim, Jae-Kyeong;Kwon, Taeck-Sung;Choi, Il-Young;Kim, Hyea-Kyeong;Kim, Min-Yong
    • Journal of Information Technology Services
    • /
    • v.10 no.3
    • /
    • pp.113-126
    • /
    • 2011
  • The smelting and the continuous casting of steel are important processes that determine the quality of steel products. Especially most of quality defects occur during solidification of the steel continuous casting process. Although quality control techniques such as six sigma, SQC, and TQM can be applied to the continuous casting process for improving quality of steel products, these techniques don't provide real-time analysis to identify the causes of defect occurrence. To solve problems, we have developed a detection model using decision tree which identified abnormal transactions to have a coarse grain structure. And we have compared the proposed model with models using neural network and logistic regression. Experiments on steel data showed that the performance of the proposed model was higher than those of neural network model and logistic regression model. Thus, we expect that the suggested model will be helpful to control the quality of steel products in real-time in the continuous casting process.

Development of a Probability Prediction Model for Tropical Cyclone Genesis in the Northwestern Pacific using the Logistic Regression Method

  • Choi, Ki-Seon;Kang, Ki-Ryong;Kim, Do-Woo;Kim, Tae-Ryong
    • Journal of the Korean earth science society
    • /
    • v.31 no.5
    • /
    • pp.454-464
    • /
    • 2010
  • A probability prediction model for tropical cyclone (TC) genesis in the Northwestern Pacific area was developed using the logistic regression method. Total five predictors were used in this model: the lower-level relative vorticity, vertical wind shear, mid-level relative humidity, upper-level equivalent potential temperature, and sea surface temperature (SST). The values for four predictors except for SST were obtained from difference of spatial-averaged value between May and January, and the time average of Ni$\tilde{n}$o-3.4 index from February to April was used to see the SST effect. As a result of prediction for the TC genesis frequency from June to December during 1951 to 2007, the model was capable of predicting that 21 (22) years had higher (lower) frequency than the normal year. The analysis of real data indicated that the number of year with the higher (lower) frequency of TC genesis was 28 (29). The overall predictability was about 75%, and the model reliability was also verified statistically through the cross validation analysis method.

Comparison of nomogram construction methods using chronic obstructive pulmonary disease (만성 폐쇄성 폐질환을 이용한 노모그램 구축과 비교)

  • Seo, Ju-Hyun;Lee, Jea-Young
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.3
    • /
    • pp.329-342
    • /
    • 2018
  • Nomogram is a statistical tool that visualizes the risk factors of the disease and then helps to understand the untrained people. This study used risk factors of chronic obstructive pulmonary disease (COPD) and compared with logistic regression model and naïve Bayesian classifier model. Data were analyzed using the Korean National Health and Nutrition Examination Survey 6th (2013-2015). First, we used 6 risk factors about COPD. We constructed nomogram using logistic regression model and naïve Bayesian classifier model. We also compared the nomograms constructed using the two methods to find out which method is more appropriate. The receiver operating characteristic curve and the calibration plot were used to verify each nomograms.