• Title/Summary/Keyword: Misclassification probability

Search Result 33, Processing Time 0.022 seconds

A Model for Effective Customer Classification Using LTV and Churn Probability : Application of Holistic Profit Method (고객의 이탈 가능성과 LTV를 이용한 고객등급화 모형개발에 관한 연구)

  • Lee, HoonYoung;Yang, JooHwan;Ryu, Chi Hun
    • Journal of Intelligence and Information Systems
    • /
    • v.12 no.4
    • /
    • pp.109-126
    • /
    • 2006
  • An effective customer classification has been essential for the successful customer relationship management. The typical customer rating is carried out by the proportionally allocating the customers into classes in terms of their life time values. However, since this method does not accurately reflect the homogeneity within a class along with the heterogeneity between classes, there would be many problems incurred due to the misclassification. This paper suggests a new method of rating customer using Holistic profit technique, and validates the new method using the customer data provided by an insurance company. Holistic profit is one of the methods used for deciding the cutoff score in screening the loan application. By rating customers using the proposed techniques, insurance companies could effectively perform customer relationship management and diverse marketing activities.

  • PDF

A Study on the GK2A/AMI Image Based Cold Water Detection Using Convolutional Neural Network (합성곱신경망을 활용한 천리안위성 2A호 영상 기반의 동해안 냉수대 감지 연구)

  • Park, Sung-Hwan;Kim, Dae-Sun;Kwon, Jae-Il
    • Korean Journal of Remote Sensing
    • /
    • v.38 no.6_2
    • /
    • pp.1653-1661
    • /
    • 2022
  • In this study, the classification of cold water and normal water based on Geo-Kompsat 2A images was performed. Daily mean surface temperature products provided by the National Meteorological Satellite Center (NMSC) were used, and convolution neural network (CNN) deep learning technique was applied as a classification algorithm. From 2019 to 2022, the cold water occurrence data provided by the National Institute of Fisheries Science (NIFS) were used as the cold water class. As a result of learning, the probability of detection was 82.5% and the false alarm ratio was 54.4%. Through misclassification analysis, it was confirmed that cloud area should be considered and accurate learning data should be considered in the future.

An Analysis of Noise Robustness for Multilayer Perceptrons and Its Improvements (다층퍼셉트론의 잡음 강건성 분석 및 향상 방법)

  • Oh, Sang-Hoon
    • The Journal of the Korea Contents Association
    • /
    • v.9 no.1
    • /
    • pp.159-166
    • /
    • 2009
  • In this paper, we analyse the noise robustness of MLPs(Multilayer perceptrons) through deriving the probability density function(p.d.f.) of output nodes with additive input noises and the misclassification ratio with the integral form of the p.d.f. functions. Also, we propose linear preprocessing methods to improve the noise robustness. As a preprocessing stage of MLPs, we consider ICA(independent component analysis) and PCA(principle component analysis). After analyzing the noise reduction effect using PCA or ICA in the viewpoints of SNR(Singal-to-Noise Ratio), we verify the preprocessing effects through the simulations of handwritten-digit recognition problems.

The Unified Framework for AUC Maximizer

  • Jun, Jong-Jun;Kim, Yong-Dai;Han, Sang-Tae;Kang, Hyun-Cheol;Choi, Ho-Sik
    • Communications for Statistical Applications and Methods
    • /
    • v.16 no.6
    • /
    • pp.1005-1012
    • /
    • 2009
  • The area under the curve(AUC) is commonly used as a measure of the receiver operating characteristic(ROC) curve which displays the performance of a set of binary classifiers for all feasible ratios of the costs associated with true positive rate(TPR) and false positive rate(FPR). In the bipartite ranking problem where one has to compare two different observations and decide which one is "better", the AUC measures the quantity that ranking score of a randomly chosen sample in one class is larger than that of a randomly chosen sample in the other class and hence, the function which maximizes an AUC of bipartite ranking problem is different to the function which maximizes (minimizes) accuracy (misclassification error rate) of binary classification problem. In this paper, we develop a way to construct the unified framework for AUC maximizer including support vector machines based on maximizing large margin and logistic regression based on estimating posterior probability. Moreover, we develop an efficient algorithm for the proposed unified framework. Numerical results show that the propose unified framework can treat various methodologies successfully.

An Intelligent Intrusion Detection Model Based on Support Vector Machines and the Classification Threshold Optimization for Considering the Asymmetric Error Cost (비대칭 오류비용을 고려한 분류기준값 최적화와 SVM에 기반한 지능형 침입탐지모형)

  • Lee, Hyeon-Uk;Ahn, Hyun-Chul
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.4
    • /
    • pp.157-173
    • /
    • 2011
  • As the Internet use explodes recently, the malicious attacks and hacking for a system connected to network occur frequently. This means the fatal damage can be caused by these intrusions in the government agency, public office, and company operating various systems. For such reasons, there are growing interests and demand about the intrusion detection systems (IDS)-the security systems for detecting, identifying and responding to unauthorized or abnormal activities appropriately. The intrusion detection models that have been applied in conventional IDS are generally designed by modeling the experts' implicit knowledge on the network intrusions or the hackers' abnormal behaviors. These kinds of intrusion detection models perform well under the normal situations. However, they show poor performance when they meet a new or unknown pattern of the network attacks. For this reason, several recent studies try to adopt various artificial intelligence techniques, which can proactively respond to the unknown threats. Especially, artificial neural networks (ANNs) have popularly been applied in the prior studies because of its superior prediction accuracy. However, ANNs have some intrinsic limitations such as the risk of overfitting, the requirement of the large sample size, and the lack of understanding the prediction process (i.e. black box theory). As a result, the most recent studies on IDS have started to adopt support vector machine (SVM), the classification technique that is more stable and powerful compared to ANNs. SVM is known as a relatively high predictive power and generalization capability. Under this background, this study proposes a novel intelligent intrusion detection model that uses SVM as the classification model in order to improve the predictive ability of IDS. Also, our model is designed to consider the asymmetric error cost by optimizing the classification threshold. Generally, there are two common forms of errors in intrusion detection. The first error type is the False-Positive Error (FPE). In the case of FPE, the wrong judgment on it may result in the unnecessary fixation. The second error type is the False-Negative Error (FNE) that mainly misjudges the malware of the program as normal. Compared to FPE, FNE is more fatal. Thus, when considering total cost of misclassification in IDS, it is more reasonable to assign heavier weights on FNE rather than FPE. Therefore, we designed our proposed intrusion detection model to optimize the classification threshold in order to minimize the total misclassification cost. In this case, conventional SVM cannot be applied because it is designed to generate discrete output (i.e. a class). To resolve this problem, we used the revised SVM technique proposed by Platt(2000), which is able to generate the probability estimate. To validate the practical applicability of our model, we applied it to the real-world dataset for network intrusion detection. The experimental dataset was collected from the IDS sensor of an official institution in Korea from January to June 2010. We collected 15,000 log data in total, and selected 1,000 samples from them by using random sampling method. In addition, the SVM model was compared with the logistic regression (LOGIT), decision trees (DT), and ANN to confirm the superiority of the proposed model. LOGIT and DT was experimented using PASW Statistics v18.0, and ANN was experimented using Neuroshell 4.0. For SVM, LIBSVM v2.90-a freeware for training SVM classifier-was used. Empirical results showed that our proposed model based on SVM outperformed all the other comparative models in detecting network intrusions from the accuracy perspective. They also showed that our model reduced the total misclassification cost compared to the ANN-based intrusion detection model. As a result, it is expected that the intrusion detection model proposed in this paper would not only enhance the performance of IDS, but also lead to better management of FNE.

Discrimination of Natural Earthquakes and Explosions in Spectral Domain (주파수 영역에서의 인공지진과 자연지진의 식별)

  • 김성균;김명수
    • Economic and Environmental Geology
    • /
    • v.36 no.3
    • /
    • pp.201-212
    • /
    • 2003
  • Recently, the ability of earthquake detection in the Kyungsang Basin of southeastern Korean Peninsula is greatly improved since seismic stations including seismic network of KIGAM(Korea Institute of Geoscience and Mineral Resources) have been significantly increased. However, a large number of signals from explosions are recorded because of frequent medium to large chemical explosions. The discrimination between natural earthquakes and explosions in the Basin has become an important issue. High frequency local records from 43 earthquakes and 43 explosions with comparable magnitude are selected to establish a reliable discrimination technique in the Basin. Several discrimination techniques in spectral domain using spectral amplitude ratios among Pg, Sg, and Lg waves are widely examined with tile selected data. Among them the Pg/Lg spectral ratio method is appeared to be a good discrimination technique to improve the discrimination power. Multivariate discriminant analysis is also applied to the Pg/Lg spectral ratios. The discrimination power of the Pg/Lg ratios for distance corrected three component record compared to uncorrected vertical component one shows distinct improvement. In the frequency band 4 to 14 Hz, Pg/Lg spectral ratio for distance corrected three component record provides discrimination power with a total misclassification probability of only 0.89%.

Validation of the International Classification of Diseases 10th Edition Based Injury Severity Score(ICISS) (ICD-10을 이용한 ICISS의 타당도 평가)

  • Jung, Ku-Young;Kim, Chang-Yup;Kim, Yong-Ik;Shin, Young-Soo;Kim, Yoon
    • Journal of Preventive Medicine and Public Health
    • /
    • v.32 no.4
    • /
    • pp.538-545
    • /
    • 1999
  • Objective : To compare the predictive power of International Classification of Diseases 10th Edition(ICD-10) based International Classification of Diseases based Injury Severity Score(ICISS) with Trauma and Injury Severity Score(TRISS) and International Classification of Diseases 9th Edition Clinical Modification(ICD-9CM) based ICISS in the injury severity measure. Methods : ICD-10 version of Survival Risk Ratios(SRRs) was derived from 47,750 trauma patients from 35 Emergency Centers for 1 year. The predictive power of TRISS, the ICD-9CM based ICISS and ICD-10 based ICISS were compared in a group of 367 severely injured patients admitted to two university hospitals. The predictive power was compared by using the measures of discrimination(disparity, sensitivity, specificity, misclassification rates, and ROC curve analysis) and calibration(Hosmer-Lemeshow goodness-of-fit statistics), all calculated by logistic regression procedure. Results : ICD-10 based ICISS showed a lower performance than TRISS and ICD-9CM based ICISS. When age and Revised Trauma Score(RTS) were incorporated into the survival probability model, however, ICD-10 based ICISS full model showed a similar predictive power compared with TRISS and ICD-9CM based ICISS full model. ICD-10 based ICISS had some disadvantages in predicting outcomes among patients with intracranial injuries. However, such weakness was largely compensated by incorporating age and RTS in the model. Conclusions : The ICISS methodology can be extended to ICD-10 horizon as a standard injury severity measure in the place of TRISS, especially when age and RTS were incorporated in the model. In patients with intracranial injuries, the predictive power of ICD-10 based ICISS was relatively low because of differences in the classifying system between ICD-10 and ICD-9CM.

  • PDF

Critical Literature Review on Exposure Assessment Methods for Metalworking Fluids in Epidemiological Cancer Study (금속가공유 노출과 암 발생위험역학조사에서 금속가공유 노출 평가 방법에 대한 고찰)

  • Park, Donguk
    • Journal of Korean Society of Occupational and Environmental Hygiene
    • /
    • v.17 no.4
    • /
    • pp.282-288
    • /
    • 2007
  • 그 동안 많은 역학연구를 통해서 금속가공유(metalworking fluid, MWFs) 노출과 여러 조직에서 암 발생 위험과의 관계를 밝혔지만, 금속가공유 종류(비수용성, 수용성, 합성, 준합성)별로 구분된 위험은 아직 완전하게 규명되지 않았다. 역학조사에서 금속가공유 노출을 대체할 수 있는 인자(surrogate)로서 정성적(qualitative), 명목적(ordinal) 혹은 준정량적인(semi-quantitative) 변수들(금속가공유에 대한 노출 유무, 노출 정도: 높음, 낮음 등, 직업 유무, 근무기간 등)을 이용하여 금속가공유 노출을 평가하였다. 이러한 노출평가방법은 기본적으로 금속가공유 노출 강도(intensity)가 고려되지 않을 뿐만 아니라 노출 분류 오류(misclassification)도 항상 존재할 수 있어 금속가공유 노출은 물론이고 종류별 위험을 밝히기 어렵다. 일부 역학연구에서 금속가공유 종류별 누적 노출양(cumulative exposure level)과 암위험과의 관계를 밝혔다. 이러한 연구결과들은 모두 금속가공유 종류별로 과거노출을 추정할 수 있는 자료(정량적인 노출평가자료, 과거직업력, 취급했던 금속가공유 종류 등)가 잘 기록되어 있는 1개의 대규모 자동차공장에서 나온 것들이다. 따라서 금속가공유에 대한 노출자료가 부족하고 사용특성에 대한 기록이 없거나 부족한 일반 인구나 산업을 대상으로 한 역학연구에서는 금속가공유의 종류별 위험을 밝히는것은 불가능하다. 금속가공유 종류별로 과거 노출에 대한 확률(probability)을 추정하는데 일반적으로 활용할 수 있는 노출확률 메트릭스를 개발하는 것이 필요하다.

Feasibility Evaluation of High-Tech New Product Development Projects Using Support Vector Machines

  • Shin, Teak-Soo;Noh, Jeon-Pyo
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2005.11a
    • /
    • pp.241-250
    • /
    • 2005
  • New product development (NPD) is defined as the transformation of a market opportunity and a set of assumptions about product technology into a product available for sale. Managers charged with project selection decisions in the NPD process, such as go/no-go choices and specific resource allocation decisions, are faced with a complicated problem. Therefore, the ability to develop new successful products has identifies as a major determinant in sustaining a firm's competitive advantage. The purpose of this study is to develop a new evaluation model for NPD project selection in the high -tech industry using support vector machines (SYM). The evaluation model is developed through two phases. In the first phase, binary (go/no-go) classification prediction model, i.e. SVM for high-tech NPD project selection is developed. In the second phase. using the predicted output value of SVM, feasibility grade is calculated for the final NPD project decision making. In this study, the feasibility grades are also divided as three level grades. We assume that the frequency of NPD project cases is symmetrically determined according to the feasibility grades and misclassification errors are partially minimized by the multiple grades. However, the horizon of grade level can be changed by firms' NPD strategy. Our proposed feasibility grade method is more reasonable in NPD decision problems by considering particularly risk factor of NPD in viewpoints of future NPD success probability. In our empirical study using Korean NPD cases, the SVM significantly outperformed ANN and logistic regression as benchmark models in hit ratio. And the feasibility grades generated from the predicted output value of SVM showed that they can offer a useful guideline for NPD project selection.

  • PDF

Improving Classification Accuracy in Hierarchical Trees via Greedy Node Expansion

  • Byungjin Lim;Jong Wook Kim
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.6
    • /
    • pp.113-120
    • /
    • 2024
  • With the advancement of information and communication technology, we can easily generate various forms of data in our daily lives. To efficiently manage such a large amount of data, systematic classification into categories is essential. For effective search and navigation, data is organized into a tree-like hierarchical structure known as a category tree, which is commonly seen in news websites and Wikipedia. As a result, various techniques have been proposed to classify large volumes of documents into the terminal nodes of category trees. However, document classification methods using category trees face a problem: as the height of the tree increases, the number of terminal nodes multiplies exponentially, which increases the probability of misclassification and ultimately leads to a reduction in classification accuracy. Therefore, in this paper, we propose a new node expansion-based classification algorithm that satisfies the classification accuracy required by the application, while enabling detailed categorization. The proposed method uses a greedy approach to prioritize the expansion of nodes with high classification accuracy, thereby maximizing the overall classification accuracy of the category tree. Experimental results on real data show that the proposed technique provides improved performance over naive methods.