• Title/Summary/Keyword: 로지스틱 회귀 모형

Search Result 432, Processing Time 0.026 seconds

Analysis of Horse Races: Prediction of Winning Horses in Horse Races Using Statistical Models (서울 경마 경기 우승마 예측 모형 연구)

  • Choe, Hyemin;Hwang, Nayoung;Hwang, Chankyoung;Song, Jongwoo
    • The Korean Journal of Applied Statistics
    • /
    • v.28 no.6
    • /
    • pp.1133-1146
    • /
    • 2015
  • The Horse race industry has the largest proportion of the domestic legal gambling industry. However, there is limited statistical analysis on horse races versus other sports. We propose prediction models for winning horses in horse races using data mining techniques such as logistic regression, linear regression, and random forest. Horse races data are from the Korea Racing Authority and we use horse racing reports, information of racehorses, jockeys, and horse trainers. We consider two models based on ranks and time records. The analysis results show that prediction of ranks is affected by information on racehorses, number of wins of racehorses and jockeys. We place wagers for the last month of races based on our prediction models that produce serious profits.

An Application of Support Vector Machines to Personal Credit Scoring: Focusing on Financial Institutions in China (Support Vector Machines을 이용한 개인신용평가 : 중국 금융기관을 중심으로)

  • Ding, Xuan-Ze;Lee, Young-Chan
    • Journal of Industrial Convergence
    • /
    • v.16 no.4
    • /
    • pp.33-46
    • /
    • 2018
  • Personal credit scoring is an effective tool for banks to properly guide decision profitably on granting loans. Recently, many classification algorithms and models are used in personal credit scoring. Personal credit scoring technology is usually divided into statistical method and non-statistical method. Statistical method includes linear regression, discriminate analysis, logistic regression, and decision tree, etc. Non-statistical method includes linear programming, neural network, genetic algorithm and support vector machine, etc. But for the development of the credit scoring model, there is no consistent conclusion to be drawn regarding which method is the best. In this paper, we will compare the performance of the most common scoring techniques such as logistic regression, neural network, and support vector machines using personal credit data of the financial institution in China. Specifically, we build three models respectively, classify the customers and compare analysis results. According to the results, support vector machine has better performance than logistic regression and neural networks.

Comparison of Behavior Patterns between First and Repeated Offenders in Driving While Intoxicated(DWI) (음주운전 초.재범자 특성 비교)

  • Jeong, Cheol-U;Jang, Myeong-Sun
    • Journal of Korean Society of Transportation
    • /
    • v.27 no.3
    • /
    • pp.149-160
    • /
    • 2009
  • The purpose of this study is to comparatively analyse the behavior patterns of the first and the repeated offenders in DWI, and to develope the models of BAC(Blood Alcohol Concentration) by using multiple regression analysis method and a model of repeated DWI conviction by using logistic regression analysis method. The main results are as follows. First, the repeated offenders are more in criminal and traffic accidents records than that of the first offenders. The unlicenced drivers are in higher BAC than licenced drivers. Second, multiple regression model of BAC was developed, and the model revealed that criminal records and driving distance were important factors. Third, a model of repeated DWI conviction was developed, and the model revealed that traffic accidents records, whether or not having licence, and criminal records were most important factors.

Binary regression model using skewed generalized t distributions (기운 일반화 t 분포를 이용한 이진 데이터 회귀 분석)

  • Kim, Mijeong
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.5
    • /
    • pp.775-791
    • /
    • 2017
  • We frequently encounter binary data in real life. Logistic, Probit, Cauchit, Complementary log-log models are often used for binary data analysis. In order to analyze binary data, Liu (2004) proposed a Robit model, in which the inverse of cdf of the Student's t distribution is used as a link function. Kim et al. (2008) also proposed a generalized t-link model to make the binary regression model more flexible. The more flexible skewed distributions allow more flexible link functions in generalized linear models. In the sense, we propose a binary data regression model using skewed generalized t distributions introduced in Theodossiou (1998). We implement R code of the proposed models using the glm function included in R base and R sgt package. We also analyze Pima Indian data using the proposed model in R.

Development of model for prediction of land sliding at steep slopes (급경사지 붕괴 예측을 위한 모형 개발)

  • Park, Ki-Byung;Joo, Yong-Sung;Park, Dug-Keun
    • Journal of the Korean Data and Information Science Society
    • /
    • v.22 no.4
    • /
    • pp.691-699
    • /
    • 2011
  • Land sliding is one of well-known nature disaster. As a part of effort to reduce damage from land sliding, many researchers worked on increasing prediction ability. However, because previous studies are conducted mostly by non-statisticians, previously proposed models were hardly statistically justifiable. In this paper, we predicted the probability of land sliding using the logistic regression model. Since most explanatory variables under consideration were correlated, we proposed the final model after backward elimination process.

Undecided inference using logistic regression for credit evaluation (신용평가에서 로지스틱 회귀를 이용한 미결정자 추론)

  • Hong, Chong-Sun;Jung, Min-Sub
    • Journal of the Korean Data and Information Science Society
    • /
    • v.22 no.2
    • /
    • pp.149-157
    • /
    • 2011
  • Undecided inference could be regarded as a missing data problem such as MARand MNAR. Under the assumption of MAR, undecided inference make use of logistic regression model. The probability of default for the undecided group is obtained with regression coefficient vectors for the decided group and compare with the probability of default for the decided group. And under the assumption of MNAR, undecide dinference make use of logistic regression model with additional feature random vector. Simulation results based on two kinds of real data are obtained and compared. It is found that the misclassification rates are not much different from the rate of rawdata under the assumption of MAR. However the misclassification rates under the assumption of MNAR are less than those under the assumption of MAR, and as the ratio of the undecided group is increasing, the misclassification rates is decreasing.

Analysis of Stress level of Korean Household Members due to Household Debt (한국국민의 가계 금융부채에 대한 체감도 분석)

  • Oh, Man-Suk;Hyun, Seung-Me
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.2
    • /
    • pp.297-307
    • /
    • 2009
  • Korean household debt is one of the main sources of the current financial crisis. This paper studies the impact of household members' attributes such as a type of housing(self-own or rent), education, age, average monthly income of the head of household, and the area of residence, on the stress level of the household members due to household debt. We analyze a real data set collected by KB Kookmin Bank in 2004. We consider low and high stress level as a binary response variable and use a logistic regression model with the attributes of household members as explanatory variables. A simple but well-fitting model is selected by backward elimination method based on the likelihood statistic for goodness-of-fit test, and the impact of the attributes on the stress level is studied from parameter estimates of the selected model. We also perform the similar analysis on a binary response variable which distinguishes households with no debt from the rest. From the analysis, the stress level tends to be low for households with self-own houses, high average monthly income, low education level, and young members.

Analysis on the Survivor's Pension Payment with Logistic Regression Model (로지스틱 회귀모형을 이용한 유족연금 수급 분석)

  • Kim, Mi-Jung;Kim, Jin-Hyung
    • The Korean Journal of Applied Statistics
    • /
    • v.21 no.2
    • /
    • pp.183-200
    • /
    • 2008
  • Research for efficient management of the National Pension has been emphasized as the current society trends toward aging and low birth rate. In this article, we suggest a statistical model for effective classification and prediction of the reserve for the survivor's pension in Korea. Logistic regression model is incorporated; correct classification rate, and distribution of the posterior probability for the reserve of survivor's pension are investigated and compared with the results from the general logistic models. Assessment of predictive model is also done with lift graph, ROC curve and K-S statistic. We suggest strategies for reducing financial risks in managing and planning the pension as an application of the suggested model.