• Title/Summary/Keyword: misclassification cost

Search Result 27, Processing Time 0.034 seconds

Credit Score Modelling in A Two-Phase Mathematical Programming (두 단계 수리계획 접근법에 의한 신용평점 모델)

  • Sung Chang Sup;Lee Sung Wook
    • Proceedings of the Korean Operations and Management Science Society Conference
    • /
    • 2002.05a
    • /
    • pp.1044-1051
    • /
    • 2002
  • This paper proposes a two-phase mathematical programming approach by considering classification gap to solve the proposed credit scoring problem so as to complement any theoretical shortcomings. Specifically, by using the linear programming (LP) approach, phase 1 is to make the associated decisions such as issuing grant of credit or denial of credit to applicants. or to seek any additional information before making the final decision. Phase 2 is to find a cut-off value, which minimizes any misclassification penalty (cost) to be incurred due to granting credit to 'bad' loan applicant or denying credit to 'good' loan applicant by using the mixed-integer programming (MIP) approach. This approach is expected to and appropriate classification scores and a cut-off value with respect to deviation and misclassification cost, respectively. Statistical discriminant analysis methods have been commonly considered to deal with classification problems for credit scoring. In recent years, much theoretical research has focused on the application of mathematical programming techniques to the discriminant problems. It has been reported that mathematical programming techniques could outperform statistical discriminant techniques in some applications, while mathematical programming techniques may suffer from some theoretical shortcomings. The performance of the proposed two-phase approach is evaluated in this paper with line data and loan applicants data, by comparing with three other approaches including Fisher's linear discriminant function, logistic regression and some other existing mathematical programming approaches, which are considered as the performance benchmarks. The evaluation results show that the proposed two-phase mathematical programming approach outperforms the aforementioned statistical approaches. In some cases, two-phase mathematical programming approach marginally outperforms both the statistical approaches and the other existing mathematical programming approaches.

  • PDF

A Recidivism Prediction Model Based on XGBoost Considering Asymmetric Error Costs (비대칭 오류 비용을 고려한 XGBoost 기반 재범 예측 모델)

  • Won, Ha-Ram;Shim, Jae-Seung;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.127-137
    • /
    • 2019
  • Recidivism prediction has been a subject of constant research by experts since the early 1970s. But it has become more important as committed crimes by recidivist steadily increase. Especially, in the 1990s, after the US and Canada adopted the 'Recidivism Risk Assessment Report' as a decisive criterion during trial and parole screening, research on recidivism prediction became more active. And in the same period, empirical studies on 'Recidivism Factors' were started even at Korea. Even though most recidivism prediction studies have so far focused on factors of recidivism or the accuracy of recidivism prediction, it is important to minimize the prediction misclassification cost, because recidivism prediction has an asymmetric error cost structure. In general, the cost of misrecognizing people who do not cause recidivism to cause recidivism is lower than the cost of incorrectly classifying people who would cause recidivism. Because the former increases only the additional monitoring costs, while the latter increases the amount of social, and economic costs. Therefore, in this paper, we propose an XGBoost(eXtream Gradient Boosting; XGB) based recidivism prediction model considering asymmetric error cost. In the first step of the model, XGB, being recognized as high performance ensemble method in the field of data mining, was applied. And the results of XGB were compared with various prediction models such as LOGIT(logistic regression analysis), DT(decision trees), ANN(artificial neural networks), and SVM(support vector machines). In the next step, the threshold is optimized to minimize the total misclassification cost, which is the weighted average of FNE(False Negative Error) and FPE(False Positive Error). To verify the usefulness of the model, the model was applied to a real recidivism prediction dataset. As a result, it was confirmed that the XGB model not only showed better prediction accuracy than other prediction models but also reduced the cost of misclassification most effectively.

사례기반추론을 이용한 다이렉트 마케팅의 고객반응예측모형의 통합

  • Hong, Taeho;Park, Jiyoung
    • The Journal of Information Systems
    • /
    • v.18 no.3
    • /
    • pp.375-399
    • /
    • 2009
  • In this study, we propose a integrated model of logistic regression, artificial neural networks, support vector machines(SVM), with case-based reasoning(CBR). To predict respondents in the direct marketing is the binary classification problem as like bankruptcy prediction, IDS, churn management and so on. To solve the binary problems, we employed logistic regression, artificial neural networks, SVM. and CBR. CBR is a problem-solving technique and shows significant promise for improving the effectiveness of complex and unstructured decision making, and we can obtain excellent results through CBR in this study. Experimental results show that the classification accuracy of integration model using CBR is superior to logistic regression, artificial neural networks and SVM. When we apply the customer response model to predict respondents in the direct marketing, we have to consider from the view point of profit/cost about the misclassification.

  • PDF

Economic Design of a Two-Sided Two-Stage Screening Procedure with a Prescribed Outgoing Quality

  • Kwon, Hyuck-Moo;Bai, Do-Sun
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.22 no.1
    • /
    • pp.17-36
    • /
    • 1996
  • An economic two-stage screening procedure is presented when both lower and upper specification limits are given on the performance variable. A screening variable which is highly correlated with the performance variable is used first to decide whether an item should be accepted, rejected, or undecided. The performance variable is then used to classify the undecided items. The two variables are assumed to be jointly normally distributed. A cost model is constructed on the basis of six cost components; inspection costs of screening and performance variables and costs caused by type I and type II misclassification errors related with lower and upper specification limits. Optimal cutoff values on the screening variable are determined so that the average outgoing quality exceeds a prespecified level. Solution methods are provided for both known-parameter and unknown-parameter cases.

  • PDF

An Economic Design of a Screening and Process Monitoring Procedure for a Normal Model (정규모형하에서의 선별검사 및 공정감시 절차의 경제적 설계)

  • Kwon, Hyuck-Moo;Hong, Sung-Hoon;Lee, Min-Koo;Kim, Sang-Boo
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.26 no.3
    • /
    • pp.200-205
    • /
    • 2000
  • An economic process monitoring procedure is presented using a surrogate variable for the case where performance variable is dichotomous. Every item is inspected with a surrogate variable and determined whether it should be accepted or rejected. When an item is rejected, the previous number of consecutively accepted items is compared with a predetermined number r to decide whether there is a shift in fraction nonconforming or not. The conditional distribution of the surrogate variable given the performance variable is assumed to be normal. A cost model is constructed which includes costs of inspection, misclassification, illegal signal, undetected out-of-control state, and correction. Methods of finding the optimum number r and screening limit are provided. Numerical studies on the effects of cost coefficients are also performed.

  • PDF

Cost-sensitive Learning for Credit Card Fraud Detection (신용카드 사기 검출을 위한 비용 기반 학습에 관한 연구)

  • Park Lae-Jeong
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.15 no.5
    • /
    • pp.545-551
    • /
    • 2005
  • The main objective of fraud detection is to minimize costs or losses that are incurred due to fraudulent transactions. Because of the problem's nature such as highly skewed, overlapping class distribution and non-uniform misclassification costs, it is, however, practically difficult to generate a classifier that is near-optimal in terms of classification costs at a desired operating range of rejection rates. This paper defines a performance measure that reflects classifier's costs at a specific operating range and offers a cost-sensitive learning approach that enables us to train classifiers suitable for real-world credit card fraud detection by directly optimizing the performance measure with evolutionary programming. The experimental results demonstrate that the proposed approach provides an effective way of training cost-sensitive classifiers for successful fraud detection, compared to other training methods.

An Intelligent Intrusion Detection Model Based on Support Vector Machines and the Classification Threshold Optimization for Considering the Asymmetric Error Cost (비대칭 오류비용을 고려한 분류기준값 최적화와 SVM에 기반한 지능형 침입탐지모형)

  • Lee, Hyeon-Uk;Ahn, Hyun-Chul
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.4
    • /
    • pp.157-173
    • /
    • 2011
  • As the Internet use explodes recently, the malicious attacks and hacking for a system connected to network occur frequently. This means the fatal damage can be caused by these intrusions in the government agency, public office, and company operating various systems. For such reasons, there are growing interests and demand about the intrusion detection systems (IDS)-the security systems for detecting, identifying and responding to unauthorized or abnormal activities appropriately. The intrusion detection models that have been applied in conventional IDS are generally designed by modeling the experts' implicit knowledge on the network intrusions or the hackers' abnormal behaviors. These kinds of intrusion detection models perform well under the normal situations. However, they show poor performance when they meet a new or unknown pattern of the network attacks. For this reason, several recent studies try to adopt various artificial intelligence techniques, which can proactively respond to the unknown threats. Especially, artificial neural networks (ANNs) have popularly been applied in the prior studies because of its superior prediction accuracy. However, ANNs have some intrinsic limitations such as the risk of overfitting, the requirement of the large sample size, and the lack of understanding the prediction process (i.e. black box theory). As a result, the most recent studies on IDS have started to adopt support vector machine (SVM), the classification technique that is more stable and powerful compared to ANNs. SVM is known as a relatively high predictive power and generalization capability. Under this background, this study proposes a novel intelligent intrusion detection model that uses SVM as the classification model in order to improve the predictive ability of IDS. Also, our model is designed to consider the asymmetric error cost by optimizing the classification threshold. Generally, there are two common forms of errors in intrusion detection. The first error type is the False-Positive Error (FPE). In the case of FPE, the wrong judgment on it may result in the unnecessary fixation. The second error type is the False-Negative Error (FNE) that mainly misjudges the malware of the program as normal. Compared to FPE, FNE is more fatal. Thus, when considering total cost of misclassification in IDS, it is more reasonable to assign heavier weights on FNE rather than FPE. Therefore, we designed our proposed intrusion detection model to optimize the classification threshold in order to minimize the total misclassification cost. In this case, conventional SVM cannot be applied because it is designed to generate discrete output (i.e. a class). To resolve this problem, we used the revised SVM technique proposed by Platt(2000), which is able to generate the probability estimate. To validate the practical applicability of our model, we applied it to the real-world dataset for network intrusion detection. The experimental dataset was collected from the IDS sensor of an official institution in Korea from January to June 2010. We collected 15,000 log data in total, and selected 1,000 samples from them by using random sampling method. In addition, the SVM model was compared with the logistic regression (LOGIT), decision trees (DT), and ANN to confirm the superiority of the proposed model. LOGIT and DT was experimented using PASW Statistics v18.0, and ANN was experimented using Neuroshell 4.0. For SVM, LIBSVM v2.90-a freeware for training SVM classifier-was used. Empirical results showed that our proposed model based on SVM outperformed all the other comparative models in detecting network intrusions from the accuracy perspective. They also showed that our model reduced the total misclassification cost compared to the ANN-based intrusion detection model. As a result, it is expected that the intrusion detection model proposed in this paper would not only enhance the performance of IDS, but also lead to better management of FNE.

Aggregating Prediction Outputs of Multiple Classification Techniques Using Mixed Integer Programming (다수의 분류 기법의 예측 결과를 결합하기 위한 혼합 정수 계획법의 사용)

  • Jo, Hongkyu;Han, Ingoo
    • Journal of Intelligence and Information Systems
    • /
    • v.9 no.1
    • /
    • pp.71-89
    • /
    • 2003
  • Although many studies demonstrate that one technique outperforms the others for a given data set, there is often no way to tell a priori which of these techniques will be most effective in the classification problems. Alternatively, it has been suggested that a better approach to classification problem might be to integrate several different forecasting techniques. This study proposes the linearly combining methodology of different classification techniques. The methodology is developed to find the optimal combining weight and compute the weighted-average of different techniques' outputs. The proposed methodology is represented as the form of mixed integer programming. The objective function of proposed combining methodology is to minimize total misclassification cost which is the weighted-sum of two types of misclassification. To simplify the problem solving process, cutoff value is fixed and threshold function is removed. The form of mixed integer programming is solved with the branch and bound methods. The result showed that proposed methodology classified more accurately than any of techniques individually did. It is confirmed that Proposed methodology Predicts significantly better than individual techniques and the other combining methods.

  • PDF

Forecasting evaluation via parametric bootstrap for threshold-INARCH models

  • Kim, Deok Ryun;Hwang, Sun Young
    • Communications for Statistical Applications and Methods
    • /
    • v.27 no.2
    • /
    • pp.177-187
    • /
    • 2020
  • This article is concerned with the issue of forecasting and evaluation of threshold-asymmetric volatility models for time series of count data. In particular, threshold integer-valued models with conditional Poisson and conditional negative binomial distributions are highlighted. Based on the parametric bootstrap method, some evaluation measures are discussed in terms of one-step ahead forecasting. A parametric bootstrap procedure is explained from which directional measure, magnitude measure and expected cost of misclassification are discussed to evaluate competing models. The cholera data in Bangladesh from 1988 to 2016 is analyzed as a real application.

Decision Tree Induction with Imbalanced Data Set: A Case of Health Insurance Bill Audit in a General Hospital (불균형 데이터 집합에서의 의사결정나무 추론: 종합 병원의 건강 보험료 청구 심사 사례)

  • Hur, Joon;Kim, Jong-Woo
    • Information Systems Review
    • /
    • v.9 no.1
    • /
    • pp.45-65
    • /
    • 2007
  • In medical industry, health insurance bill audit is unique and essential process in general hospitals. The health insurance bill audit process is very important because not only for hospital's profit but also hospital's reputation. Particularly, at the large general hospitals many related workers including analysts, nurses, and etc. have engaged in the health insurance bill audit process. This paper introduces a case of health insurance bill audit for finding reducible health insurance bill cases using decision tree induction techniques at a large general hospital in Korea. When supervised learning methods had been tried to be applied, one of major problems was data imbalance problem in the health insurance bill audit data. In other words, there were many normal(passing) cases and relatively small number of reduction cases in a bill audit dataset. To resolve the problem, in this study, well-known methods for imbalanced data sets including over sampling of rare cases, under sampling of major cases, and adjusting the misclassification cost are combined in several ways to find appropriate decision trees that satisfy required conditions in health insurance bill audit situation.