• Title/Summary/Keyword: C4.5 의사 결정 트리

Search Result 18, Processing Time 0.021 seconds

Search Tree Generation for Efficient Management of Business Process Repository in e-commerce Delivery Exception Handling (전자상거래 배송업무의 예외처리용 프로세스 저장소의 효과적 관리를 위한 검색트리 생성)

  • Choi, Doug-Won;Shin, Jin-Gyu
    • Journal of Intelligence and Information Systems
    • /
    • v.14 no.4
    • /
    • pp.147-160
    • /
    • 2008
  • BPMS(business process management system) facilitates defining new processes or updating existing processes. However, processing of exceptional or nonroutine task requires the intervention of domain experts or introduction of the situation specific resolution process. This paper assumes sufficient amount of business process exception handling cases are stored in the process repository. Since the retrieval of the best exception handling process requires a good understanding about the exceptional situation, context awareness is an important issue. To facilitate the understanding of exceptional situation and to enable the efficient selection of the best exception handling process, we adopted the 'situation variable' and 'decision variable' construct. A case example for exception handling in the e-commerce delivery process is provided to illustrate how the proposed construct works. Application of the C5.0 algorithm guarantees the construction of an optimum search tree. It also implies that an efficient search path has been identified for the context aware selection of the best exception handling process.

  • PDF

Buying Customer Classification in Automotive Corporation with Decision Tree (의사결정트리를 통한 자동차산업의 구매패턴분류)

  • Lee, Byoung-Yup;Park, Yong-Hoon;Yoo, Jae-Soo
    • The Journal of the Korea Contents Association
    • /
    • v.10 no.2
    • /
    • pp.372-380
    • /
    • 2010
  • Generally, data mining is the process of analyzing data from different perspectives and summarizing it into useful information that can be used to increase revenue, cuts costs, or both. It allows users to analyze data from many different dimensions or angles, categorize it, and summarize the relationships identified. Technically, data mining is the process of finding correlations or patterns among dozens of fields in large relational databases. Data mining is one of the fastest growing field in the computer industry. Because of According to computer technology has been improving, Massive customer data has stored in database. Using this massive data, decision maker can extract the useful information to make a valuable plan with data mining. Data mining offers service providers great opportunities to get closer to customer. Data mining doesn't always require the latest technology, but it does require a magic eye that looks beyond the obvious to find and use the hidden knowledge to drive marketing strategies. Automotive market face an explosion of data arising from customer but a rate of increasing customer is getting lower. therefore, we need to determine which customer are profitable clients whom you wish to hold. This paper builds model of customer loyalty detection and analyzes customer buying patterns in automotive market with data mining using decision tree as a quinlan C4.5 and basic statics methods.

Classification Accuracy Improvement for Decision Tree (의사결정트리의 분류 정확도 향상)

  • Rezene, Mehari Marta;Park, Sanghyun
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2017.04a
    • /
    • pp.787-790
    • /
    • 2017
  • Data quality is the main issue in the classification problems; generally, the presence of noisy instances in the training dataset will not lead to robust classification performance. Such instances may cause the generated decision tree to suffer from over-fitting and its accuracy may decrease. Decision trees are useful, efficient, and commonly used for solving various real world classification problems in data mining. In this paper, we introduce a preprocessing technique to improve the classification accuracy rates of the C4.5 decision tree algorithm. In the proposed preprocessing method, we applied the naive Bayes classifier to remove the noisy instances from the training dataset. We applied our proposed method to a real e-commerce sales dataset to test the performance of the proposed algorithm against the existing C4.5 decision tree classifier. As the experimental results, the proposed method improved the classification accuracy by 8.5% and 14.32% using training dataset and 10-fold crossvalidation, respectively.

A Comparative Study on the Performance of Intrusion Detection using Decision Tree and Artificial Neural Network Models (의사결정트리와 인공 신경망 기법을 이용한 침입탐지 효율성 비교 연구)

  • Jo, Seongrae;Sung, Haengnam;Ahn, Byunghyuk
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.11 no.4
    • /
    • pp.33-45
    • /
    • 2015
  • Currently, Internet is used an essential tool in the business area. Despite this importance, there is a risk of network attacks attempting collection of fraudulence, private information, and cyber terrorism. Firewalls and IDS(Intrusion Detection System) are tools against those attacks. IDS is used to determine whether a network data is a network attack. IDS analyzes the network data using various techniques including expert system, data mining, and state transition analysis. This paper tries to compare the performance of two data mining models in detecting network attacks. They are decision tree (C4.5), and neural network (FANN model). I trained and tested these models with data and measured the effectiveness in terms of detection accuracy, detection rate, and false alarm rate. This paper tries to find out which model is effective in intrusion detection. In the analysis, I used KDD Cup 99 data which is a benchmark data in intrusion detection research. I used an open source Weka software for C4.5 model, and C++ code available for FANN model.

Missing Pattern Matching of Rough Set Based on Attribute Variations Minimization in Rough Set (속성 변동 최소화에 의한 러프집합 누락 패턴 부합)

  • Lee, Young-Cheon
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.10 no.6
    • /
    • pp.683-690
    • /
    • 2015
  • In Rough set, attribute missing values have several problems such as reduct and core estimation. Further, they do not give some discernable pattern for decision tree construction. Now, there are several methods such as substitutions of typical attribute values, assignment of every possible value, event covering, C4.5 and special LEMS algorithm. However, they are mainly substitutions into frequently appearing values or common attribute ones. Thus, decision rules with high information loss are derived in case that important attribute values are missing in pattern matching. In particular, there is difficult to implement cross validation of the decision rules. In this paper we suggest new method for substituting the missing attribute values into high information gain by using entropy variation among given attributes, and thereby completing the information table. The suggested method is validated by conducting the same rough set analysis on the incomplete information system using the software ROSE.

Efficient Feature Selection Based Near Real-Time Hybrid Intrusion Detection System (근 실시간 조건을 달성하기 위한 효과적 속성 선택 기법 기반의 고성능 하이브리드 침입 탐지 시스템)

  • Lee, Woosol;Oh, Sangyoon
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.5 no.12
    • /
    • pp.471-480
    • /
    • 2016
  • Recently, the damage of cyber attack toward infra-system, national defence and security system is gradually increasing. In this situation, military recognizes the importance of cyber warfare, and they establish a cyber system in preparation, regardless of the existence of threaten. Thus, the study of Intrusion Detection System(IDS) that plays an important role in network defence system is required. IDS is divided into misuse and anomaly detection methods. Recent studies attempt to combine those two methods to maximize advantagesand to minimize disadvantages both of misuse and anomaly. The combination is called Hybrid IDS. Previous studies would not be inappropriate for near real-time network environments because they have computational complexity problems. It leads to the need of the study considering the structure of IDS that have high detection rate and low computational cost. In this paper, we proposed a Hybrid IDS which combines C4.5 decision tree(misuse detection method) and Weighted K-means algorithm (anomaly detection method) hierarchically. It can detect malicious network packets effectively with low complexity by applying mutual information and genetic algorithm based efficient feature selection technique. Also we construct upgraded the the hierarchical structure of IDS reusing feature weights in anomaly detection section. It is validated that proposed Hybrid IDS ensures high detection accuracy (98.68%) and performance at experiment section.

Implementation of Fatigue Identification System using C4.5 Algorithm (C4.5 알고리즘을 이용한 피로도 식별 시스템 구현)

  • Jin, You Zhen;Lee, Deok-Jin
    • Journal of the Korea Convergence Society
    • /
    • v.10 no.8
    • /
    • pp.21-26
    • /
    • 2019
  • This paper proposes a fatigue recognition method using the C4.5 algorithm. Based on domestic and international studies on fatigue evaluation, we have completed the fatigue self - assessment scale in combination with lifestyle and cultural characteristics of Chinese people. The scales used in the text were applied to 58 sub items and were used to assess the type and extent of fatigue. These items fall into four categories that measure physical fatigue, mental fatigue, personal habits, and fatigue outcomes. The purpose of this study is to analyze the leading causes of fatigue formation and to recognize the degree of fatigue, thereby increasing the personal interest in fatigue and reducing the risk of cerebrovascular disease due to excessive fatigue. The recognition rate of the fatigue recognition system using the C4.5 algorithm was 85% on average, confirming the usefulness of this proposal.

Medical Diagnosis Problem Solving Based on the Combination of Genetic Algorithms and Local Adaptive Operations (유전자 알고리즘 및 국소 적응 오퍼레이션 기반의 의료 진단 문제 자동화 기법 연구)

  • Lee, Ki-Kwang;Han, Chang-Hee
    • Journal of Intelligence and Information Systems
    • /
    • v.14 no.2
    • /
    • pp.193-206
    • /
    • 2008
  • Medical diagnosis can be considered a classification task which classifies disease types from patient's condition data represented by a set of pre-defined attributes. This study proposes a hybrid genetic algorithm based classification method to develop classifiers for multidimensional pattern classification problems related with medical decision making. The classification problem can be solved by identifying separation boundaries which distinguish the various classes in the data pattern. The proposed method fits a finite number of regional agents to the data pattern by combining genetic algorithms and local adaptive operations. The local adaptive operations of an agent include expansion, avoidance and relocation, one of which is performed according to the agent's fitness value. The classifier system has been tested with well-known medical data sets from the UCI machine learning database, showing superior performance to other methods such as the nearest neighbor, decision tree, and neural networks.

  • PDF