• Title/Summary/Keyword: decision tree induction

Search Result 38, Processing Time 0.025 seconds

Development of Predictive Models for Rights Issues Using Financial Analysis Indices and Decision Tree Technique (경영분석지표와 의사결정나무기법을 이용한 유상증자 예측모형 개발)

  • Kim, Myeong-Kyun;Cho, Yoonho
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.4
    • /
    • pp.59-77
    • /
    • 2012
  • This study focuses on predicting which firms will increase capital by issuing new stocks in the near future. Many stakeholders, including banks, credit rating agencies and investors, performs a variety of analyses for firms' growth, profitability, stability, activity, productivity, etc., and regularly report the firms' financial analysis indices. In the paper, we develop predictive models for rights issues using these financial analysis indices and data mining techniques. This study approaches to building the predictive models from the perspective of two different analyses. The first is the analysis period. We divide the analysis period into before and after the IMF financial crisis, and examine whether there is the difference between the two periods. The second is the prediction time. In order to predict when firms increase capital by issuing new stocks, the prediction time is categorized as one year, two years and three years later. Therefore Total six prediction models are developed and analyzed. In this paper, we employ the decision tree technique to build the prediction models for rights issues. The decision tree is the most widely used prediction method which builds decision trees to label or categorize cases into a set of known classes. In contrast to neural networks, logistic regression and SVM, decision tree techniques are well suited for high-dimensional applications and have strong explanation capabilities. There are well-known decision tree induction algorithms such as CHAID, CART, QUEST, C5.0, etc. Among them, we use C5.0 algorithm which is the most recently developed algorithm and yields performance better than other algorithms. We obtained data for the rights issue and financial analysis from TS2000 of Korea Listed Companies Association. A record of financial analysis data is consisted of 89 variables which include 9 growth indices, 30 profitability indices, 23 stability indices, 6 activity indices and 8 productivity indices. For the model building and test, we used 10,925 financial analysis data of total 658 listed firms. PASW Modeler 13 was used to build C5.0 decision trees for the six prediction models. Total 84 variables among financial analysis data are selected as the input variables of each model, and the rights issue status (issued or not issued) is defined as the output variable. To develop prediction models using C5.0 node (Node Options: Output type = Rule set, Use boosting = false, Cross-validate = false, Mode = Simple, Favor = Generality), we used 60% of data for model building and 40% of data for model test. The results of experimental analysis show that the prediction accuracies of data after the IMF financial crisis (59.04% to 60.43%) are about 10 percent higher than ones before IMF financial crisis (68.78% to 71.41%). These results indicate that since the IMF financial crisis, the reliability of financial analysis indices has increased and the firm intention of rights issue has been more obvious. The experiment results also show that the stability-related indices have a major impact on conducting rights issue in the case of short-term prediction. On the other hand, the long-term prediction of conducting rights issue is affected by financial analysis indices on profitability, stability, activity and productivity. All the prediction models include the industry code as one of significant variables. This means that companies in different types of industries show their different types of patterns for rights issue. We conclude that it is desirable for stakeholders to take into account stability-related indices and more various financial analysis indices for short-term prediction and long-term prediction, respectively. The current study has several limitations. First, we need to compare the differences in accuracy by using different data mining techniques such as neural networks, logistic regression and SVM. Second, we are required to develop and to evaluate new prediction models including variables which research in the theory of capital structure has mentioned about the relevance to rights issue.

Neural network rule extraction for credit scoring

  • Bart Baesens;Rudy Setiono;Lille, Valerina-De;Stijn Viaene
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2001.01a
    • /
    • pp.128-132
    • /
    • 2001
  • In this paper, we evaluate and contrast four neural network rule extraction approaches for credit scoring. Experiments are carried our on three real life credit scoring data sets. Both the continuous and the discretised versions of all data sets are analysed The rule extraction algorithms, Neurolonear, Neurorule. Trepan and Nefclass, have different characteristics, with respect to their perception of the neural network and their way of representing the generated rules or knowledge. It is shown that Neurolinear, Neurorule and Trepan are able to extract very concise rule sets or trees with a high predictive accuracy when compared to classical decision tree(rule) induction algorithms like C4.5(rules). Especially Neurorule extracted easy to understand and powerful propositional if -then rules for all discretised data sets. Hence, the Neurorule algorithm may offer a viable alternative for rule generation and knowledge discovery in the domain of credit scoring.

  • PDF

A Study on a Prototype Learning Model (프로토타입 학습 모델에 관한 연구)

  • 송두헌
    • Journal of the Korea Computer Industry Society
    • /
    • v.2 no.2
    • /
    • pp.151-156
    • /
    • 2001
  • We describe a new representation for learning concepts that differs from the traditional decision tree and rule induction algorithms. Our algorithm PROLEARN learns one or more prototype per class and follows instance based classification with them. Prototype here differs from psychological term in that we can have more than one prototype per concept and also differs from other instance based algorithms since the prototype is a "ficticious ideal example". We show that PROLEARN is as good as the traditional machine learning algorithms but much move stable than them in an environment that has noise or changing training set, what we call 'stability’.tability’.

  • PDF

A Study on the Database Marketing using Data Mining in the Traditional Medicine (데이터마이닝을 활용한 한방분야에서의 데이터베이스 마케팅에 대한 연구)

  • Lee Sang-Young;Lee Yun-Seok
    • Journal of the Korea Society of Computer and Information
    • /
    • v.10 no.5 s.37
    • /
    • pp.271-280
    • /
    • 2005
  • This study is to elicit the factors affected on the medical examination in the tra야tional medicine using the technical method of the decision tree and characterize the Patient subject by clustering analysis technique. And to draw results from the association analysis between the form of diseases in the re-hospitalized Patient group. The obtained results were analyzed for their effect on the hospital Profits. Thus. through application of the database marketing to the data mining technique in the tradition리 medicine, the characteristics of patient clients for the objective induction of factors affected on the hospital Fronts can be identified. Practical application of the database marketing as presented in this study will bring about a fundamental efficiency of hospital management and vitalization.

  • PDF

Combined Application of Data Imbalance Reduction Techniques Using Genetic Algorithm (유전자 알고리즘을 활용한 데이터 불균형 해소 기법의 조합적 활용)

  • Jang, Young-Sik;Kim, Jong-Woo;Hur, Joon
    • Journal of Intelligence and Information Systems
    • /
    • v.14 no.3
    • /
    • pp.133-154
    • /
    • 2008
  • The data imbalance problem which can be uncounted in data mining classification problems typically means that there are more or less instances in a class than those in other classes. In order to solve the data imbalance problem, there has been proposed a number of techniques based on re-sampling with replacement, adjusting decision thresholds, and adjusting the cost of the different classes. In this paper, we study the feasibility of the combination usage of the techniques previously proposed to deal with the data imbalance problem, and suggest a combination method using genetic algorithm to find the optimal combination ratio of the techniques. To improve the prediction accuracy of a minority class, we determine the combination ratio based on the F-value of the minority class as the fitness function of genetic algorithm. To compare the performance with those of single techniques and the matrix-style combination of random percentage, we performed experiments using four public datasets which has been generally used to compare the performance of methods for the data imbalance problem. From the results of experiments, we can find the usefulness of the proposed method.

  • PDF

Improving the Effectiveness of Customer Classification Models: A Pre-segmentation Approach (사전 세분화를 통한 고객 분류모형의 효과성 제고에 관한 연구)

  • Chang, Nam-Sik
    • Information Systems Review
    • /
    • v.7 no.2
    • /
    • pp.23-40
    • /
    • 2005
  • Discovering customers' behavioral patterns from large data set and providing them with corresponding services or products are critical components in managing a current business. However, the diversity of customer needs coupled with the limited resources suggests that companies should make more efforts on understanding and managing specific groups of customers, not the whole customers. The key issue of this paper is based on the fact that the behavioral patterns extracted from the specific groups of customers shall be different from those from the whole customers. This paper proposes the idea of pre-segmentation before developing customer classification models. We collected three customers' demographic and transactional data sets from a credit card, a tele-communication, and an insurance company in Korea, and then segmented customers by major variables. Different churn prediction models were developed from each segments and the whole data set, respectively, using the decision tree induction approach, and compared in terms of the hit ratio and the simplicity of generated rules.

Candidate Marker Identification from Gene Expression Data with Attribute Value Discretization and Negation (속성값 이산화 및 부정값 허용을 하는 의사결정트리 기반의 유전자 발현 데이터의 마커 후보 식별)

  • Lee, Kyung-Mi;Lee, Keon-Myung
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.21 no.5
    • /
    • pp.575-580
    • /
    • 2011
  • With the increasing expectation on personalized medicine, it is getting importance to analyze medical information in molecular biology perspective. Gene expression data are one of representative ones to show the microscopic phenomena of biological activities. In gene expression data analysis, one of major concerns is to identify markers which can be used to predict disease occurrence, progression or recurrence in the molecular level. Existing markers candidate identification methods mainly depend on statistical hypothesis test methods. This paper proposes a search method based decision tree induction to identify candidate markers which consist of multiple genes. The propose method discretizes numeric expression level into three categorical values and allows candidate markers' genes to be expressed by their negation as well as categorical values. It is desirable to have some number of genes to be included in markers. Hence the method is devised to try to find candidate markers with restricted number of genes.

A Study on Improving Speech Recognition Rate (H/W, S/W) of Speech Impairment by Neurological Injury (신경학적 손상에 의한 언어장애인 음성 인식률 개선(H/W, S/W)에 관한 연구)

  • Lee, Hyung-keun;Kim, Soon-hub;Yang, Ki-Woong
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.23 no.11
    • /
    • pp.1397-1406
    • /
    • 2019
  • In everyday mobile phone calls between the disabled and non-disabled people due to neurological impairment, the communication accuracy is often hindered by combining the accuracy of pronunciation due to the neurological impairment and the pronunciation features of the disabled. In order to improve this problem, the limiting method is MEMS (micro electro mechanical systems), which includes an induction line that artificially corrects difficult vocalization according to the oral characteristics of the language impaired by improving the word of out of vocabulary. mechanical System) Microphone device improvement. S/W improvement is decision tree with invert function, and improved matrix-vector rnn method is proposed considering continuous word characteristics. Considering the characteristics of H/W and S/W, a similar dictionary was created, contributing to the improvement of speech intelligibility for smooth communication.