• Title/Summary/Keyword: Classification Problem

Search Result 1,735, Processing Time 0.031 seconds

Feature Selection Effect of Classification Tree Using Feature Importance : Case of Credit Card Customer Churn Prediction (특성중요도를 활용한 분류나무의 입력특성 선택효과 : 신용카드 고객이탈 사례)

  • Yoon Hanseong
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.20 no.2
    • /
    • pp.1-10
    • /
    • 2024
  • For the purpose of predicting credit card customer churn accurately through data analysis, a model can be constructed with various machine learning algorithms, including decision tree. And feature importance has been utilized in selecting better input features that can improve performance of data analysis models for several application areas. In this paper, a method of utilizing feature importance calculated from the MDI method and its effects are investigated in the credit card customer churn prediction problem with classification trees. Compared with several random feature selections from case data, a set of input features selected from higher value of feature importance shows higher predictive power. It can be an efficient method for classifying and choosing input features necessary for improving prediction performance. The method organized in this paper can be an alternative to the selection of input features using feature importance in composing and using classification trees, including credit card customer churn prediction.

An Active Learning-based Method for Composing Training Document Set in Bayesian Text Classification Systems (베이지언 문서분류시스템을 위한 능동적 학습 기반의 학습문서집합 구성방법)

  • 김제욱;김한준;이상구
    • Journal of KIISE:Software and Applications
    • /
    • v.29 no.12
    • /
    • pp.966-978
    • /
    • 2002
  • There are two important problems in improving text classification systems based on machine learning approach. The first one, called "selection problem", is how to select a minimum number of informative documents from a given document collection. The second one, called "composition problem", is how to reorganize selected training documents so that they can fit an adopted learning method. The former problem is addressed in "active learning" algorithms, and the latter is discussed in "boosting" algorithms. This paper proposes a new learning method, called AdaBUS, which proactively solves the above problems in the context of Naive Bayes classification systems. The proposed method constructs more accurate classification hypothesis by increasing the valiance in "weak" hypotheses that determine the final classification hypothesis. Consequently, the proposed algorithm yields perturbation effect makes the boosting algorithm work properly. Through the empirical experiment using the Routers-21578 document collection, we show that the AdaBUS algorithm more significantly improves the Naive Bayes-based classification system than other conventional learning methodson system than other conventional learning methods

A case of corporate failure prediction

  • Shin, Kyung-Shik;Jo, Hongkyu;Han, Ingoo
    • Proceedings of the Korean Operations and Management Science Society Conference
    • /
    • 1996.10a
    • /
    • pp.199-202
    • /
    • 1996
  • Although numerous studies demonstrate that one technique outperforms the others for a given data set, there is often no way to tell a priori which of these techniques will be most effective to solve a specific problem. Alternatively, it has been suggested that a better approach to classification problem might be to integrate several different forecasting techniques by combining their results. The issues of interest are how to integrate different modeling techniques to increase the prediction performance. This paper proposes the post-model integration method, which means integration is performed after individual techniques produce their own outputs, by finding the best combination of the results of each method. To get the optimal or near optimal combination of different prediction techniques. Genetic Algorithms (GAs) are applied, which are particularly suitable for multi-parameter optimization problems with an objective function subject to numerous hard and soft constraints. This study applied three individual classification techniques (Discriminant analysis, Logit and Neural Networks) as base models to the corporate failure prediction context. Results of composite prediction were compared to the individual models. Preliminary results suggests that the use of integrated methods will offer improved performance in business classification problems.

  • PDF

The Hybrid Systems for Credit Rating

  • Goo, Han-In;Jo, Hong-Kyuo;Shin, Kyung-Shik
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.22 no.3
    • /
    • pp.163-173
    • /
    • 1997
  • Although numerous studies demonstrate that one technique outperforms the others for a given data set, it is hard to tell a priori which of these techniques will be the most effective to solve a specific problem. It has been suggested that the better approach to classification problem might be to integrate several different forecasting techniques by combining their results. The issues of interest are how to integrate different modeling techniques to increase the predictive performance. This paper proposes the post-model integration method, which tries to find the best combination of the results provided by individual techniques. To get the optimal or near optimal combination of different prediction techniques, Genetic Algorithms (GAs) are applied, which are particularly suitable for multi-parameter optimization problems with an object function subject to numerous hard and soft constraints. This study applies three individual classification techniques (Discriminant analysis, Logit model and Neural Networks) as base models for the corporate failure prediction. The results of composite predictions are compared with the individual models. Preliminary results suggests that the use of integrated methods improve the performance of business classification.

  • PDF

Development of a Conceptual Framework of Nursing from Selected Concepts of Nursing Diagnoses (간호진단 분류체계에 근거한 간호개념틀 개발)

  • 김조자
    • Journal of Korean Academy of Nursing
    • /
    • v.26 no.1
    • /
    • pp.177-193
    • /
    • 1996
  • For the purpose of integrating nursing diagnosis into the nursing curriculum, a descriptive survey research was done using the inductive method with questionnaires and a literature review. Research subjects included nurse educators, textbooks of adult nursing published in Korea, and the course outline for adult nursing used in one college of nursing. The Results show that there was common agreement on 39 nursing diagnosis which should be in cluded in the adult nursing curriculum, textbooks of adult nursing, and patient care on the medical-surgical units. The two existing nursing diagnosis classification systems(NANDA and Gordon's Human Response Patterns) show different basic frameworks and difficulties were discovered in integration of nursing diagnosis into the curriculum. To develop a conceptual framework for a nursing diagnosis classification system, diagnosis were classified into three categories ; health promotion, high risk problem, and actual problem on the basis of the framework used in adult nursing textbooks and Gordon's 11 Functional Health Patterns. Subconcepts for actual problems were classified as ; activity and rest, nutrition and elimination, perception and coordination, stress and coping. Progress in this study supports further development of a conceptual framework of nursing based on a nursing diagnosis classification system, from which improvement in nursing education and clinical practice can be expected.

  • PDF

Truncated Kernel Projection Machine for Link Prediction

  • Huang, Liang;Li, Ruixuan;Chen, Hong
    • Journal of Computing Science and Engineering
    • /
    • v.10 no.2
    • /
    • pp.58-67
    • /
    • 2016
  • With the large amount of complex network data that is increasingly available on the Web, link prediction has become a popular data-mining research field. The focus of this paper is on a link-prediction task that can be formulated as a binary classification problem in complex networks. To solve this link-prediction problem, a sparse-classification algorithm called "Truncated Kernel Projection Machine" that is based on empirical-feature selection is proposed. The proposed algorithm is a novel way to achieve a realization of sparse empirical-feature-based learning that is different from those of the regularized kernel-projection machines. The algorithm is more appealing than those of the previous outstanding learning machines since it can be computed efficiently, and it is also implemented easily and stably during the link-prediction task. The algorithm is applied here for link-prediction tasks in different complex networks, and an investigation of several classification algorithms was performed for comparison. The experimental results show that the proposed algorithm outperformed the compared algorithms in several key indices with a smaller number of test errors and greater stability.

A STUDY ON THE PREVALENCE OF MALOCCLUSION IN 2,378 YONSEI UNIVERSITY STUDENTS (연세대학생 2,378명을 대상으로 한 부정교합빈도에 관한 연구)

  • Yoo, Young Kyu;Kim, Nam ill;Lee, Hyo Kyoung
    • The korean journal of orthodontics
    • /
    • v.2 no.1
    • /
    • pp.35-40
    • /
    • 1971
  • Since malocclusion affects a large segment of the population, it is by definition a public health problem. The etiology ana treatment of malocclusions have been studied by clinicians; however epidemioloic aspect of tile problem have been neglected. This study was undertaken using Angle's classification to obtain and to evaluate epidemiologic data on the prevalence of malocclusion in a group of 2,378 Yonsei University students, 17 to 23 years of age. All freshmen were selected, except for those students receiving orthodontic treatment and those few with too many missing teeth which prohibits classification by Angle's method. The following results were obtained: 1) Almost $91\%$ of students had malocclusion of the teeth severe enough to require correction. 2) There was a statistically significant difference in malocclusion between males and females($93.66\%$ malocclusion in males, $79.13\%$ malocclusioa in females). 3) Crowding was most pravalent in class I malocclusion. 4) There appeared to be a specific association between the number of lost first molars and Angle's classification. 5) In this study, more class II, Div.2 malocclusion appeared than in Massier's and Frankel's study of Caucasians, which used similar criteria. Class III malocclusion was more prevalent than normal occlusion in the Korean students studied, but in Caucasians' normal occlusion was more prevalent.

  • PDF

On the Use of Adaptive Weights for the F-Norm Support Vector Machine

  • Bang, Sung-Wan;Jhun, Myoung-Shic
    • The Korean Journal of Applied Statistics
    • /
    • v.25 no.5
    • /
    • pp.829-835
    • /
    • 2012
  • When the input features are generated by factors in a classification problem, it is more meaningful to identify important factors, rather than individual features. The $F_{\infty}$-norm support vector machine(SVM) has been developed to perform automatic factor selection in classification. However, the $F_{\infty}$-norm SVM may suffer from estimation inefficiency and model selection inconsistency because it applies the same amount of shrinkage to each factor without assessing its relative importance. To overcome such a limitation, we propose the adaptive $F_{\infty}$-norm ($AF_{\infty}$-norm) SVM, which penalizes the empirical hinge loss by the sum of the adaptively weighted factor-wise $L_{\infty}$-norm penalty. The $AF_{\infty}$-norm SVM computes the weights by the 2-norm SVM estimator and can be formulated as a linear programming(LP) problem which is similar to the one of the $F_{\infty}$-norm SVM. The simulation studies show that the proposed $AF_{\infty}$-norm SVM improves upon the $F_{\infty}$-norm SVM in terms of classification accuracy and factor selection performance.

Automatic Retrieval of SNS Opinion Document Using Machine Learning Technique (기계학습을 이용한 SNS 오피니언 문서의 자동추출기법)

  • Chang, Jae-Young
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.13 no.5
    • /
    • pp.27-35
    • /
    • 2013
  • Recently, as Social Network Services(SNS) are becoming more popular, much research has been doing on analyzing public opinions from SNS. One of the most important tasks for solving such a problem is to separate opinion(subjective) documents from others(e.g. objective documents) in SNS. In this paper, we propose a new method of retrieving the opinion documents from Twitter. The reason why it is not easy to search or classify the opinion documents in Twitter is due to a lack of publicly available Twitter documents for training. To tackle the problem, at first, we build a machine-learned model for sentiment classification using the external documents similar to Twitter, and then modify the model to separate the opinion documents from Twitter. Experimental results show that proposed method can be applied successfully in opinion classification.

A Study on the Estimation Model of Liquid Evaporation Rate for Classification of Flammable Liquid Explosion Hazardous Area (인화성액체의 폭발위험장소 설정을 위한 증발율 추정 모델 연구)

  • Jung, Yong Jae;Lee, Chang Jun
    • Journal of the Korean Society of Safety
    • /
    • v.33 no.4
    • /
    • pp.21-29
    • /
    • 2018
  • In many companies handling flammable liquids, explosion-proof electrical equipment have been installed according to the Korean Industrial Standards (KS C IEC 60079-10-1). In these standards, hazardous area for explosive gas atmospheres has to be classified by the evaluation of the evaporation rate of flammable liquid leakage. The evaporation rate is an important factor to determine the zones classification and hazardous area distance. However, there is no systematic method or rule for the estimation of evaporation rate in these standards and the first principle equations of a evaporation rate are very difficult. Thus, it is really hard for industrial workplaces to employ these equations. Thus, this problem can trigger inaccurate results for evaluating evaporation range. In this study, empirical models for estimating an evaporation rate of flammable liquid have been developed to tackle this problem. Throughout the sensitivity analysis of the first principle equations, it can be found that main factors for the evaporation rate are wind speed and temperature and empirical models have to be nonlinear. Polynomial regression is employed to build empirical models. Methanol, benzene, para-xylene and toluene are selected as case studies to verify the accuracy of empirical models.