• Title/Summary/Keyword: 로지스틱판별분석

Search Result 74, Processing Time 0.024 seconds

Implementation of Mahalanobis-Taguchi System for the Election of Major League Baseball Hitters to the Hall of Fame (메이저리그 타자들의 명예의 전당 입성과 탈락에 대한 Mahalanobis-Taguchi System의 적용과 비교)

  • Kim, Su Whan;Park, Changsoon
    • The Korean Journal of Applied Statistics
    • /
    • v.26 no.2
    • /
    • pp.223-236
    • /
    • 2013
  • Various statistical classification methods to predict election to the Major League Baseball hall of fame of are implemented and their accuracies are compared. Seventeen independent variables are selected from the data of candidates eligible for the hall of fame and well-known classification methods such as discriminant analysis and logistic regression as well as the recently proposed Mahalanobis-Taguchi system(MTS). The MTS showed a better performance than the others in classification accuracy because it is especially efficient in cases where multivariate data does not constitute directionally geographical groups according to attributes.

Dropout Prediction Modeling and Investigating the Feasibility of Early Detection in e-Learning Courses (일반대학에서 교양 e-러닝 강좌의 중도탈락 예측모형 개발과 조기 판별 가능성 탐색)

  • You, Ji Won
    • The Journal of Korean Association of Computer Education
    • /
    • v.17 no.1
    • /
    • pp.1-12
    • /
    • 2014
  • Since students' behaviors during e-learning are automatically stored in LMS(Learning Management System), the LMS log data convey the valuable information of students' engagement. The purpose of this study is to develop a prediction model of e-learning course dropout by utilizing LMS log data. Log data of 578 college students who registered e-learning courses in a traditional university were used for the logistic regression analysis. The results showed that attendance and study time were significant to predict dropout, and the model classified between dropouts and completers of e-learning courses with 96% accuracy. Furthermore, the feasibility of early detection of dropouts by utilizing the model were discussed.

  • PDF

A Study on the Fraud Detection of Industrial Accident Compensation Insurance (산재보험 부정수급 식별모형에 관한 연구)

  • Ham, Seung-O;Hong, Jeong-Sik
    • Proceedings of the Korean Operations and Management Science Society Conference
    • /
    • 2008.10a
    • /
    • pp.342-345
    • /
    • 2008
  • 산재 발생 시 산재근로자는 근로복지공단을 통해서 각종 급여를 받게 된다. 본 논문은 심사 과정과 급여지급 후에 부정수급으로 판명된 산재 청구 건을 데이터 마이닝을 통해서 분석하여 부정수급의 유형을 발견하고자 한다. 이 연구에서는 서울관내 4개 지사에서 8년 동안(2000년$\sim$2007년)의 총 61,536명의 최초요양 신청을 한 산재근로자 자료를 대상으로 하였고, 종속변수에 영향을 미치는 8개의 독립변수를 선택해서 사용한다. 데이터 마이닝을 적용함에 있어서 가장 효율적인 허위 부정 탐지 모델을 만들기 위해 의사결정나무분석(Decision Tree)과 로지스틱 회귀분석(Logistic Regresion)등의 다양한 기법을 적용하여 결과를 비교분석 하고, 오분류 비용을 적용하여, 최적의 분류결정 값을 가지는 모델을 도출한다. 분석결과, 로지스틱 회귀분석이 산재보험 부정수급 유형 발견에 보다 효과적인 모델로 판명되었다. 또한 판별점(Cut-Off) 0.01로 했을 때 4개변수(요양기간, 업종형태, 의료기관, 재해발생형태)가 부정수급에 탐지하는데 영향력이 큰 변수로 선정되었다.

  • PDF

Customer Churning Analysis by Using Data Mining in Credit Card Market (신용카드 시장에서 데이터마이닝을 이용한 이탈고객 분석)

  • 이건창;정남호;신경식
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2001.06a
    • /
    • pp.421-444
    • /
    • 2001
  • 최근 데이터 마이닝 기법이 주목받고 있는 이유 중의 가장 큰 이유는 자사가 보유하고 있는 고객의 특성을 파악함으로써 기존의 고객을 효과적으로 유지·관리할 수 있도록 지원하기 때문이다. 특히 고객 보유율 5% 신장이 수익률 120% 증대를 가져오는 것으로 보고되고 있는 신용카드 업계에서는 신규고객을 확보하는 것 만큼 기존 고객을 유지·관리하는 것이 중요하다. 특히, 신용카드를 발급 받고 거의 사용하지 않은 고객이나 쉽게 이탈하는 고객을 판별하는 것은 신용카드사의 입장에서는 비용절감 차원에서 매우 중요하다. 그러나 아직까지 어떠한 속성을 보유하고 있는 고객이 쉽게 이탈하는지를 판별할 수 있는 연구는 거의 진행되지 않았다. 이에 본 인구에서는 데이터마이닝 기법 중 널리 알려진 인공신경망, 로지스틱 회귀분석, C5.0 방법을 이용하여 신용카드 시장에서의 고객현황에 대하여 분석하고자 한다. 이를 위하여 본 연구에서는 모 신용카드사의 최근 4년간 (97넌 3월 이후) 가입고객 및 이탈고객을 대상으로 실증분석을 실시하였다. 분석결과 신용카드 시장에서 카드를 지속적으로 보유하고 있는 고객과 이탈하는 고객을 구분하는 속성이 존재함을 발견하였고, 이를 바탕으로 신용카드사가 수립해야 할 마케팅 전략을 제시하였다.

  • PDF

A Study for Improving the Performance of Data Mining Using Ensemble Techniques (앙상블기법을 이용한 다양한 데이터마이닝 성능향상 연구)

  • Jung, Yon-Hae;Eo, Soo-Heang;Moon, Ho-Seok;Cho, Hyung-Jun
    • Communications for Statistical Applications and Methods
    • /
    • v.17 no.4
    • /
    • pp.561-574
    • /
    • 2010
  • We studied the performance of 8 data mining algorithms including decision trees, logistic regression, LDA, QDA, Neral network, and SVM and their combinations of 2 ensemble techniques, bagging and boosting. In this study, we utilized 13 data sets with binary responses. Sensitivity, Specificity and missclassificate error were used as criteria for comparison.

기업부도예측을 위한 통합알고리즘

  • Bae Jae-Gwon;Kim Jin-Hwa
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2006.06a
    • /
    • pp.195-202
    • /
    • 2006
  • 본 연구에서는 보다 효과적인 기업부도예측을 위하여, 동계적 방법과 인공지능 방법을 결합한 통합모형을 제시하였다. 이를 위하여 통계적인 모형 중에서 가장 널리 활용되고 있는 다변량 판별분석, 로지스틱 회귀분석과 인공 지능적인 방법으로서 최근 널리 사용되고 있는 인공신경망, 규칙유도기법, 베이지안 망의 5가지 방법론을 통합한 Voting with Performance & Weights from ANN(WP-ANN) 통합모형을 제시하였다. 실험결과, 본 연구에서 제안한 WP-ANN 통합모형은 다변량 판별분석, 로지스탁 회귀분석, 인공신경망, 규칙유도기법, 베이지안 망 등의 단일모형과 비교한 결과 가장 예측정확성이 유수한 것으로 나타났다. 따라서 본 연구를 통해 기업부도예측에 있어서 WP-ANN 통합모형이 기존의 모형들에 비해 우수한 예측정확성을 나타냄을 알 수 있었다.

  • PDF

Classification Analysis for the Prediction of Underground Cultural Assets (매장문화재 예측을 위한 통계적 분류 분석)

  • Yu, Hye-Kyung;Lee, Jin-Young;Na, Jong-Hwa
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.14 no.3
    • /
    • pp.106-113
    • /
    • 2009
  • Various statistical classification methods have been used to establish prediction model of underground cultural assets in our country. Among them, linear discriminant analysis, logistic regression, decision tree, neural network, and support vector machines are used in this paper. We introduced the basic concepts of above-mentioned classification methods and applied these to the analyses of real data of I city. As a results, five different prediction models are suggested. And also model comparisons are executed by suggesting correct classification rates of the fitted models. To see the applicability of the suggested models for a new data set, simulations are carried out. R packages and programs are used in real data analyses and simulations. Especially, the detailed executing processes by R are provided for the other analyser of related area.

Development for City Bus Dirver's Accident Occurrence Prediction Model Based on Digital Tachometer Records (디지털 운행기록에 근거한 시내버스 운전자의 사고발생 예측모형 개발)

  • Kim, Jung-yeul;Kum, Ki-jung
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.15 no.1
    • /
    • pp.1-15
    • /
    • 2016
  • This study aims to develop a model by which city bus drivers who are likely to cause an accident can be figured out based on the information about their actual driving records. For this purpose, from the information about the actual driving records of the drivers who have caused an accident and those who have not caused any, significance variables related to traffic accidents are drawn, and the accuracy between models is compared for the classification models developed, applying a discriminant analysis and logistic regression analysis. In addition, the developed models are applied to the data on other drivers' driving records to verify the accuracy of the models. As a result of developing a model for the classification of drivers who are likely to cause an accident, when deceleration ($X_{deceleration}$) and acceleration to the right ($Y_{right}$) are simultaneously in action, this variable was drawn as the optimal factor variable of the classification of drivers who had caused an accident, and the prediction model by discriminant analysis classified drivers who had caused an accident at a rate up to 62.8%, and the prediction model by logistic regression analysis could classify those who had caused an accident at a rate up to 76.7%. In addition, as a result of the verification of model predictive power of the models showed an accuracy rate of 84.1%.

Classification of a binary group variable with dependece structure (종속구조를 가진 집단변수의 판별-분류에 관한 연구)

  • 황선영;나은정
    • The Korean Journal of Applied Statistics
    • /
    • v.11 no.1
    • /
    • pp.177-184
    • /
    • 1998
  • Most of the research on discrimination and classification analysis has been directed to the situation where the data consist of independent observations. However, it is often the case in practice that a dependence structure between objects does exist, in particular, for the time series data. This article is handling such a case and is concerned with the problem of classifying new object when the dependence can be modelled by a discrete time series via conditional autologistic transition probability.

  • PDF

An Application of Support Vector Machines to Personal Credit Scoring: Focusing on Financial Institutions in China (Support Vector Machines을 이용한 개인신용평가 : 중국 금융기관을 중심으로)

  • Ding, Xuan-Ze;Lee, Young-Chan
    • Journal of Industrial Convergence
    • /
    • v.16 no.4
    • /
    • pp.33-46
    • /
    • 2018
  • Personal credit scoring is an effective tool for banks to properly guide decision profitably on granting loans. Recently, many classification algorithms and models are used in personal credit scoring. Personal credit scoring technology is usually divided into statistical method and non-statistical method. Statistical method includes linear regression, discriminate analysis, logistic regression, and decision tree, etc. Non-statistical method includes linear programming, neural network, genetic algorithm and support vector machine, etc. But for the development of the credit scoring model, there is no consistent conclusion to be drawn regarding which method is the best. In this paper, we will compare the performance of the most common scoring techniques such as logistic regression, neural network, and support vector machines using personal credit data of the financial institution in China. Specifically, we build three models respectively, classify the customers and compare analysis results. According to the results, support vector machine has better performance than logistic regression and neural networks.