• Title/Summary/Keyword: C4.5 algorithm

Search Result 295, Processing Time 0.032 seconds

A Feature Analysis of Industrial Accidents Using C4.5 Algorithm (C4.5 알고리즘을 이용한 산업 재해의 특성 분석)

  • Leem, Young-Moon;Kwag, Jun-Koo;Hwang, Young-Seob
    • Journal of the Korean Society of Safety
    • /
    • v.20 no.4 s.72
    • /
    • pp.130-137
    • /
    • 2005
  • Decision tree algorithm is one of the data mining techniques, which conducts grouping or prediction into several sub-groups from interested groups. This technique can analyze a feature of type on groups and can be used to detect differences in the type of industrial accidents. This paper uses C4.5 algorithm for the feature analysis. The data set consists of 24,887 features through data selection from total data of 25,159 taken from 2 year observation of industrial accidents in Korea For the purpose of this paper, one target value and eight independent variables are detailed by type of industrial accidents. There are 222 total tree nodes and 151 leaf nodes after grouping. This paper Provides an acceptable level of accuracy(%) and error rate(%) in order to measure tree accuracy about created trees. The objective of this paper is to analyze the efficiency of the C4.5 algorithm to classify types of industrial accidents data and thereby identify potential weak points in disaster risk grouping.

Implementation of Fatigue Identification System using C4.5 Algorithm (C4.5 알고리즘을 이용한 피로도 식별 시스템 구현)

  • Jin, You Zhen;Lee, Deok-Jin
    • Journal of the Korea Convergence Society
    • /
    • v.10 no.8
    • /
    • pp.21-26
    • /
    • 2019
  • This paper proposes a fatigue recognition method using the C4.5 algorithm. Based on domestic and international studies on fatigue evaluation, we have completed the fatigue self - assessment scale in combination with lifestyle and cultural characteristics of Chinese people. The scales used in the text were applied to 58 sub items and were used to assess the type and extent of fatigue. These items fall into four categories that measure physical fatigue, mental fatigue, personal habits, and fatigue outcomes. The purpose of this study is to analyze the leading causes of fatigue formation and to recognize the degree of fatigue, thereby increasing the personal interest in fatigue and reducing the risk of cerebrovascular disease due to excessive fatigue. The recognition rate of the fatigue recognition system using the C4.5 algorithm was 85% on average, confirming the usefulness of this proposal.

Correlation Analysis of the Frequency and Death Rates in Arterial Intervention using C4.5

  • Jung, Yong Gyu;Jung, Sung-Jun;Cha, Byeong Heon
    • International journal of advanced smart convergence
    • /
    • v.6 no.3
    • /
    • pp.22-28
    • /
    • 2017
  • With the recent development of technologies to manage vast amounts of data, data mining technology has had a major impact on all industries.. Data mining is the process of discovering useful correlations hidden in data, extracting executable information for the future, and using it for decision making. In other words, it is a core process of Knowledge Discovery in data base(KDD) that transforms input data and derives useful information. It extracts information that we did not know until now from a large data base. In the decision tree, c4.5 algorithm was used. In addition, the C4.5 algorithm was used in the decision tree to analyze the difference between frequency and mortality in the region. In this paper, the frequency and mortality of percutaneous coronary intervention for patients with heart disease were divided into regions.

The Four Color Algorithm (4-색 알고리즘)

  • Lee, Sang-Un
    • Journal of the Korea Society of Computer and Information
    • /
    • v.18 no.5
    • /
    • pp.113-120
    • /
    • 2013
  • This paper proposes an algorithm that proves an NP-complete 4-color theorem by employing a linear time complexity where $O(n)$. The proposed algorithm accurately halves the vertex set V of the graph $G=(V_1,E_1)$ into the Maximum Independent Set (MIS) $\bar{C_1}$ and the Minimum Vertex Cover Set $C_1$. It then assigns the first color to $\bar{C_1}$ and the second to $\bar{C_2}$, which, along with $C_2$, is halved from the connected graph $G=(V_2,E_2)$, a reduced set of the remaining vertices. Subsequently, the third color is assigned to $\bar{C_3}$, which, along with $C_3$, is halved from the connected graph $G=(V_3,E_3)$, a further reduced set of the remaining vertices. Lastly, denoting $C_3$ as $\bar{C_4}$, the algorithm assigns the forth color to $\bar{C_4}$. The algorithm has successfully obtained the chromatic number ${\chi}(G)=4$ with 100% probability, when applied to two actual map and two planar graphs. The proposed "four color algorithm", therefore, could be employed as a general algorithm to determine four-color for planar graphs.

The Adopting C4.5 classification and it's Application for Deinterlacing (디인터레이싱을 위한 C4.5 분류화 기법의 적용 및 구현)

  • Kim, Donghyung
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.18 no.1
    • /
    • pp.8-14
    • /
    • 2017
  • Deinterlacing is a method to convert interlaced video, including two fields (even and odd), to progressive video. It can be divided into spatial and temporal methods. The deinterlacing method in the spatial domain can easily be hardware-implemented, but yields image degradation if information about the deinterlaced pixel does not exist in the same field. On the other hand, the method in the temporal domain yields a deinterlaced image with higher quality but uses more memory, and hardware implementation is more difficult. Furthermore, the deinterlacing method in the temporal domain degrades image quality when motion is not estimated properly. The proposed method is for deinterlacing in the spatial domain. It uses several deinterlacing methods according to statistical characteristics in neighboring pixel locations. In this procedure, the proposed method uses the C4.5 algorithm, a typical classification algorithm based on entropy for choosing optimal methods from among the candidates. The simulation results show that the proposed algorithm outperforms previous deinterlacing methods in terms of objective and subjective image quality.

Real-time Implementation of Variable Transmission Bit Rate Vocoder Integrating G.729A Vocoder and Reduction of the Computational Amount SOLA-B Algorithm Using the TMS320C5416 (TMS320C5416을 이용한 G.729A 보코더와 계산량 감소된 SOLA-B 알고리즘을 통합한 가변 전송율 보코더의 실시간 구현)

  • 함명규;배명진
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.40 no.6
    • /
    • pp.84-89
    • /
    • 2003
  • In this paper, we real-time implemented to the TMS320C5416 the vocoder of variable bit rate applied the SOLA-B algorithm by Henja to the ITU-T G.729A vocoder of 8kbps transmission rate. This proposed method using the SOLA-B algorithm is that it is reduced the duration of the speech in encoding and is played at the speed of normal by extending the duration of the speech in decoding. At this time, we bandied that the interval of cross correlation function if skipped every 3 sample for decreasing the computational amount of SOLA-B algorithm. The real-time implemented vocoder of C.729A and SOLA-B algorithm is represented the complexity of maximum that is 10.2MIPS in encoder and 2.8MIPS in decoder of 8kbps transmission rate. Also, it is represented the complexity of maximum that is 18.5MIPS in encoder and 13.1MIPS in decoder of 6kbps, it is 18.5MIPS in encoder and 13.1MIPS in decoder of 4kbps. The used memory is about program ROM 9.7kwords, table ROM 4.5kwords, RAM 5.1 kwords. The waveform of output is showed by the result of C simulator and Bit Exact. Also, for evaluation of speech quality of the vocoder of real-time implemented variable bit rate, it is estimated the MOS score of 3.69 in 4kbps.

Optimization of Luffing-Tower Crane Location in Tall Building Construction

  • Lee, Dongmin;Lim, Hyunsu;Cho, Hunhee;Kang, Kyung-In
    • Journal of Construction Engineering and Project Management
    • /
    • v.5 no.4
    • /
    • pp.7-11
    • /
    • 2015
  • The luffing-tower crane (T/C) is a key facility used in the vertical and horizontal transportation of materials in a tall building construction. Locating the crane in an optimal position is an essential task in the initial stages of construction planning. This paper proposes a new optimization model to locate the luffing T/C in the optimal position to minimize the transportation time. A newly developed mathematical formula is suggested to calculate the transportation time of luffing T/C correctly. An optimization algorithm, the Harmony Search (HS) algorithm, was used and the results show that HS has high performance characteristics to solve the optimization problem in a short period of time. In a case study, the proposed model offered a better position for T/C than the previous heuristic approach.

A Study on Selection of Split Variable in Constructing Classification Tree (의사결정나무에서 분리 변수 선택에 관한 연구)

  • 정성석;김순영;임한필
    • The Korean Journal of Applied Statistics
    • /
    • v.17 no.2
    • /
    • pp.347-357
    • /
    • 2004
  • It is very important to select a split variable in constructing the classification tree. The efficiency of a classification tree algorithm can be evaluated by the variable selection bias and the variable selection power. The C4.5 has largely biased variable selection due to the influence of many distinct values in variable selection and the QUEST has low variable selection power when a continuous predictor variable doesn't deviate from normal distribution. In this thesis, we propose the SRT algorithm which overcomes the drawback of the C4.5 and the QUEST. Simulations were performed to compare the SRT with the C4.5 and the QUEST. As a result, the SRT is characterized with low biased variable selection and robust variable selection power.

Inductive Learning using Theory-Refinement Knowledge-Based Artificial Neural Network (이론정련 지식기반인공신경망을 이용한 귀납적 학습)

  • 심동희
    • Journal of Korea Multimedia Society
    • /
    • v.4 no.3
    • /
    • pp.280-285
    • /
    • 2001
  • Since KBANN (knowledge-based artificial neural network) combing the inductive learning algorithm and the analytical learning algorithm was proposed, several methods such as TopGen, TR-KBANN, THRE-KBANN which modify KBANN have been proposed. But these methods can be applied when there is a domain theory. The algorithm representing the problem into KBANN based on only the instances without domain theory is proposed in this paper. Domain theory represented into KBANN can be refined by THRE-KBANN. The performance of this algorithm is more efficient than the C4.5 in the experiment for some problem domains of inductive learning.

  • PDF

Data Mining Algorithm Based on Fuzzy Decision Tree for Pattern Classification (퍼지 결정트리를 이용한 패턴분류를 위한 데이터 마이닝 알고리즘)

  • Lee, Jung-Geun;Kim, Myeong-Won
    • Journal of KIISE:Software and Applications
    • /
    • v.26 no.11
    • /
    • pp.1314-1323
    • /
    • 1999
  • 컴퓨터의 사용이 일반화됨에 따라 데이타를 생성하고 수집하는 것이 용이해졌다. 이에 따라 데이타로부터 자동적으로 유용한 지식을 얻는 기술이 필요하게 되었다. 데이타 마이닝에서 얻어진 지식은 정확성과 이해성을 충족해야 한다. 본 논문에서는 데이타 마이닝을 위하여 퍼지 결정트리에 기반한 효율적인 퍼지 규칙을 생성하는 알고리즘을 제안한다. 퍼지 결정트리는 ID3와 C4.5의 이해성과 퍼지이론의 추론과 표현력을 결합한 방법이다. 특히, 퍼지 규칙은 속성 축에 평행하게 판단 경계선을 결정하는 방법으로는 어려운 속성 축에 평행하지 않는 경계선을 갖는 패턴을 효율적으로 분류한다. 제안된 알고리즘은 첫째, 각 속성 데이타의 히스토그램 분석을 통해 적절한 소속함수를 생성한다. 둘째, 주어진 소속함수를 바탕으로 ID3와 C4.5와 유사한 방법으로 퍼지 결정트리를 생성한다. 또한, 유전자 알고리즘을 이용하여 소속함수를 조율한다. IRIS 데이타, Wisconsin breast cancer 데이타, credit screening 데이타 등 벤치마크 데이타들에 대한 실험 결과 제안된 방법이 C4.5 방법을 포함한 다른 방법보다 성능과 규칙의 이해성에서 보다 효율적임을 보인다.Abstract With an extended use of computers, we can easily generate and collect data. There is a need to acquire useful knowledge from data automatically. In data mining the acquired knowledge needs to be both accurate and comprehensible. In this paper, we propose an efficient fuzzy rule generation algorithm based on fuzzy decision tree for data mining. We combine the comprehensibility of rules generated based on decision tree such as ID3 and C4.5 and the expressive power of fuzzy sets. Particularly, fuzzy rules allow us to effectively classify patterns of non-axis-parallel decision boundaries, which are difficult to do using attribute-based classification methods.In our algorithm we first determine an appropriate set of membership functions for each attribute of data using histogram analysis. Given a set of membership functions then we construct a fuzzy decision tree in a similar way to that of ID3 and C4.5. We also apply genetic algorithm to tune the initial set of membership functions. We have experimented our algorithm with several benchmark data sets including the IRIS data, the Wisconsin breast cancer data, and the credit screening data. The experiment results show that our method is more efficient in performance and comprehensibility of rules compared with other methods including C4.5.