• Title/Summary/Keyword: C4.5 Decision Tree

Search Result 83, Processing Time 0.027 seconds

Length of stay in PACU among surgical patients using data mining technique (데이터 마이닝을 활용한 외과수술환자의 회복실 체류시간 분석)

  • Yoo, Je-Bog;Jang, Hee Jung
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.14 no.7
    • /
    • pp.3400-3411
    • /
    • 2013
  • The data mining is a new approach to extract useful information through effective analysis of huge data in numerous fields. This study was analyzed by decision making tree model using Clementine C&RT(Classification & Regression Tree, CART) as data mining technique. We utilized this data mining technique to analyze medical record of 1,500 people. Whole data were assorted by length of stay in PACU and divided into 3 groups. The result extracted by C5.0 decision tree method showed that important related factors for lengh of stay in PACU are type of operation, preoperative EKG abnormality, anesthetics, operative duration, age.

Implementation of Fatigue Identification System using C4.5 Algorithm (C4.5 알고리즘을 이용한 피로도 식별 시스템 구현)

  • Jin, You Zhen;Lee, Deok-Jin
    • Journal of the Korea Convergence Society
    • /
    • v.10 no.8
    • /
    • pp.21-26
    • /
    • 2019
  • This paper proposes a fatigue recognition method using the C4.5 algorithm. Based on domestic and international studies on fatigue evaluation, we have completed the fatigue self - assessment scale in combination with lifestyle and cultural characteristics of Chinese people. The scales used in the text were applied to 58 sub items and were used to assess the type and extent of fatigue. These items fall into four categories that measure physical fatigue, mental fatigue, personal habits, and fatigue outcomes. The purpose of this study is to analyze the leading causes of fatigue formation and to recognize the degree of fatigue, thereby increasing the personal interest in fatigue and reducing the risk of cerebrovascular disease due to excessive fatigue. The recognition rate of the fatigue recognition system using the C4.5 algorithm was 85% on average, confirming the usefulness of this proposal.

A Study on the Documents's Automatic Classification Using Machine Learning (기계학습을 이용한 문서 자동분류에 관한 연구)

  • Kim, Seong-Hee;Eom, Jae-Eun
    • Journal of Information Management
    • /
    • v.39 no.4
    • /
    • pp.47-66
    • /
    • 2008
  • This study introduced the machine learning algorithms to overcome the many different limitations involved with manual classification and to provide the users with faster and more accurate classification service. The experiments objects of the study were consisted of 100 literature titles for each of the eight subject categories in MeSH. The algorithms used to the experiments included Neural network, C5.0, CHAID and KNN. As results, the combination of the neural network and C5.0 technique recorded classification accuracy of 83.75%, which was 2.5% and 3.75% higher than that of the neural network alone and C5.0 alone, respectively. The number represented the highest accuracy rates among the four classification experiments. Thus the use of the neural network and C5.0 technique together will result in higher accuracy rates than the techniques individually.

Terminology Recognition System based on Machine Learning for Scientific Document Analysis (과학 기술 문헌 분석을 위한 기계학습 기반 범용 전문용어 인식 시스템)

  • Choi, Yun-Soo;Song, Sa-Kwang;Chun, Hong-Woo;Jeong, Chang-Hoo;Choi, Sung-Pil
    • The KIPS Transactions:PartD
    • /
    • v.18D no.5
    • /
    • pp.329-338
    • /
    • 2011
  • Terminology recognition system which is a preceding research for text mining, information extraction, information retrieval, semantic web, and question-answering has been intensively studied in limited range of domains, especially in bio-medical domain. We propose a domain independent terminology recognition system based on machine learning method using dictionary, syntactic features, and Web search results, since the previous works revealed limitation on applying their approaches to general domain because their resources were domain specific. We achieved F-score 80.8 and 6.5% improvement after comparing the proposed approach with the related approach, C-value, which has been widely used and is based on local domain frequencies. In the second experiment with various combinations of unithood features, the method combined with NGD(Normalized Google Distance) showed the best performance of 81.8 on F-score. We applied three machine learning methods such as Logistic regression, C4.5, and SVMs, and got the best score from the decision tree method, C4.5.

Missing Pattern Matching of Rough Set Based on Attribute Variations Minimization in Rough Set (속성 변동 최소화에 의한 러프집합 누락 패턴 부합)

  • Lee, Young-Cheon
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.10 no.6
    • /
    • pp.683-690
    • /
    • 2015
  • In Rough set, attribute missing values have several problems such as reduct and core estimation. Further, they do not give some discernable pattern for decision tree construction. Now, there are several methods such as substitutions of typical attribute values, assignment of every possible value, event covering, C4.5 and special LEMS algorithm. However, they are mainly substitutions into frequently appearing values or common attribute ones. Thus, decision rules with high information loss are derived in case that important attribute values are missing in pattern matching. In particular, there is difficult to implement cross validation of the decision rules. In this paper we suggest new method for substituting the missing attribute values into high information gain by using entropy variation among given attributes, and thereby completing the information table. The suggested method is validated by conducting the same rough set analysis on the incomplete information system using the software ROSE.

Spatial Information Data Construction and Data Mining Analysis for Topography Investigation of Land Characteristics (토지특성 고저조사를 위한 공간정보 데이터 구축과 데이터 마이닝 분석)

  • Choi, Jin Ho;Kim, Jun Hyun
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.37 no.6
    • /
    • pp.507-516
    • /
    • 2019
  • The investigation of land characteristics is an important task for the calculation of officially land prices and standard comparison table of land price. Therefore, it should be done objectively and consistently. However, the current investigation system is mainly done by researcher's subjective judgment. Therefore, the objectivity and consistency of this investigation is not guaranteed and questionable. In this study, we first defined the problem by analyzing the current land topography investigation method. In addition, in order to investigate the land topography, the geometry of the parcel is quantified by spatial information and applied to the decision tree based method(C4.5) to produce the final result. This study intended to extract the parcel characteristics data of the topographic by the use of spatial information and to apply the information to the C4.5, there by suggesting a method for addressing the problems. The findings showed approximately 93.5% between the results of topography classification estimated with rules learned by C4.5.

Efficient Feature Selection Based Near Real-Time Hybrid Intrusion Detection System (근 실시간 조건을 달성하기 위한 효과적 속성 선택 기법 기반의 고성능 하이브리드 침입 탐지 시스템)

  • Lee, Woosol;Oh, Sangyoon
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.5 no.12
    • /
    • pp.471-480
    • /
    • 2016
  • Recently, the damage of cyber attack toward infra-system, national defence and security system is gradually increasing. In this situation, military recognizes the importance of cyber warfare, and they establish a cyber system in preparation, regardless of the existence of threaten. Thus, the study of Intrusion Detection System(IDS) that plays an important role in network defence system is required. IDS is divided into misuse and anomaly detection methods. Recent studies attempt to combine those two methods to maximize advantagesand to minimize disadvantages both of misuse and anomaly. The combination is called Hybrid IDS. Previous studies would not be inappropriate for near real-time network environments because they have computational complexity problems. It leads to the need of the study considering the structure of IDS that have high detection rate and low computational cost. In this paper, we proposed a Hybrid IDS which combines C4.5 decision tree(misuse detection method) and Weighted K-means algorithm (anomaly detection method) hierarchically. It can detect malicious network packets effectively with low complexity by applying mutual information and genetic algorithm based efficient feature selection technique. Also we construct upgraded the the hierarchical structure of IDS reusing feature weights in anomaly detection section. It is validated that proposed Hybrid IDS ensures high detection accuracy (98.68%) and performance at experiment section.

A Rule Generation Technique Utilizing a Parallel Expansion Method (병렬확장을 활용한 규칙생성 기법)

  • Lee, Kee-Cheol;Kim, Jin-Bong
    • The Transactions of the Korea Information Processing Society
    • /
    • v.5 no.4
    • /
    • pp.942-950
    • /
    • 1998
  • Extraction of knowledge, especially in the form of rules, from raw data is very important in data mining, the aim of which is to help users who feel the lack of knowledge in spite of the abundance of data. Logic minimization tools are ones which derive optimized knowledge given ON set and DC set. First, the parallel expansion scheme of logic minimization is extracted and used to obtain intial knowledge to get final rules, which are successfully applicable to real world data. The prototype system based on this new approach has been experimented with real world data to show that it is as practical as conventional long studied decision tree methods like C4.5 system.

  • PDF

Medical Diagnosis Problem Solving Based on the Combination of Genetic Algorithms and Local Adaptive Operations (유전자 알고리즘 및 국소 적응 오퍼레이션 기반의 의료 진단 문제 자동화 기법 연구)

  • Lee, Ki-Kwang;Han, Chang-Hee
    • Journal of Intelligence and Information Systems
    • /
    • v.14 no.2
    • /
    • pp.193-206
    • /
    • 2008
  • Medical diagnosis can be considered a classification task which classifies disease types from patient's condition data represented by a set of pre-defined attributes. This study proposes a hybrid genetic algorithm based classification method to develop classifiers for multidimensional pattern classification problems related with medical decision making. The classification problem can be solved by identifying separation boundaries which distinguish the various classes in the data pattern. The proposed method fits a finite number of regional agents to the data pattern by combining genetic algorithms and local adaptive operations. The local adaptive operations of an agent include expansion, avoidance and relocation, one of which is performed according to the agent's fitness value. The classifier system has been tested with well-known medical data sets from the UCI machine learning database, showing superior performance to other methods such as the nearest neighbor, decision tree, and neural networks.

  • PDF

Stream-based Biomedical Classification Algorithms for Analyzing Biosignals

  • Fong, Simon;Hang, Yang;Mohammed, Sabah;Fiaidhi, Jinan
    • Journal of Information Processing Systems
    • /
    • v.7 no.4
    • /
    • pp.717-732
    • /
    • 2011
  • Classification in biomedical applications is an important task that predicts or classifies an outcome based on a given set of input variables such as diagnostic tests or the symptoms of a patient. Traditionally the classification algorithms would have to digest a stationary set of historical data in order to train up a decision-tree model and the learned model could then be used for testing new samples. However, a new breed of classification called stream-based classification can handle continuous data streams, which are ever evolving, unbound, and unstructured, for instance--biosignal live feeds. These emerging algorithms can potentially be used for real-time classification over biosignal data streams like EEG and ECG, etc. This paper presents a pioneer effort that studies the feasibility of classification algorithms for analyzing biosignals in the forms of infinite data streams. First, a performance comparison is made between traditional and stream-based classification. The results show that accuracy declines intermittently for traditional classification due to the requirement of model re-learning as new data arrives. Second, we show by a simulation that biosignal data streams can be processed with a satisfactory level of performance in terms of accuracy, memory requirement, and speed, by using a collection of stream-mining algorithms called Optimized Very Fast Decision Trees. The algorithms can effectively serve as a corner-stone technology for real-time classification in future biomedical applications.