• Title/Summary/Keyword: Tree data

Search Result 3,349, Processing Time 0.027 seconds

A Feature Analysis of Industrial Accidents Using C4.5 Algorithm (C4.5 알고리즘을 이용한 산업 재해의 특성 분석)

  • Leem, Young-Moon;Kwag, Jun-Koo;Hwang, Young-Seob
    • Journal of the Korean Society of Safety
    • /
    • v.20 no.4 s.72
    • /
    • pp.130-137
    • /
    • 2005
  • Decision tree algorithm is one of the data mining techniques, which conducts grouping or prediction into several sub-groups from interested groups. This technique can analyze a feature of type on groups and can be used to detect differences in the type of industrial accidents. This paper uses C4.5 algorithm for the feature analysis. The data set consists of 24,887 features through data selection from total data of 25,159 taken from 2 year observation of industrial accidents in Korea For the purpose of this paper, one target value and eight independent variables are detailed by type of industrial accidents. There are 222 total tree nodes and 151 leaf nodes after grouping. This paper Provides an acceptable level of accuracy(%) and error rate(%) in order to measure tree accuracy about created trees. The objective of this paper is to analyze the efficiency of the C4.5 algorithm to classify types of industrial accidents data and thereby identify potential weak points in disaster risk grouping.

A Study on University Big Data-based Student Employment Roadmap Recommendation (대학 빅데이터 기반 학생 취업 로드맵 추천에 관한 연구)

  • Park, Sangsung
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.17 no.3
    • /
    • pp.1-7
    • /
    • 2021
  • The number of new students at many domestic universities is declining. In particular, private universities, which are highly dependent on tuition, are experiencing a crisis of existence. Amid the declining school-age population, universities are striving to fill new students by improving the quality of education and increasing the student employment rate. Recently, there is an increasing number of cases of using the accumulated big data of universities to prepare measures to fill new students. A representative example of this is the analysis of factors that affect student employment. Existing employment-influencing factor analysis studies have applied quantitative models such as regression analysis to university big data. However, since the factors affecting employment differ by major, it is necessary to reflect this. In this paper, the factors affecting employment by major are analyzed using the data of University C and the decision tree model. In addition, based on the analysis results, a roadmap for student employment by major is recommended. As a result of the experiment, four decision tree models were constructed for each major, and factors affecting employment by major and roadmap were derived.

Extraction of the Tree Regions in Forest Areas Using LIDAR Data and Ortho-image (라이다 자료와 정사영상을 이용한 산림지역의 수목영역추출)

  • Kim, Eui Myoung
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.21 no.2
    • /
    • pp.27-34
    • /
    • 2013
  • Due to the increased interest in global warming, interest in forest resources aimed towards reducing greenhouse gases have subsequently increased. Thus far, data related to forest resources have been obtained, through the employment of aerial photographs or satellite images, by means of plotting. However, the use of imaging data is disadvantageous; merely, due to the fact that recorded measurements such as the height of trees, in dense forest areas, lack accuracy. Within such context, the authors of this study have presented a method of data processing in which an individual tree is isolated within forested areas through the use of LIDAR data and ortho-images. Such isolation resulted in the provision of more efficient and accurate data in regards to the height of trees. As for the data processing of LIDAR, the authors have generated a normalized digital surface model to extract tree points via local maxima filtering, and have additionally, with motives to extract forest areas, applied object oriented image classifications to the processing of data using ortho-images. The final tree point was then given a figure derived from the combination of LIDAR and ortho-images results. Based from an experiment conducted in the Yongin area, the authors have analyzed the merits and demerits of methods that either employ LIDAR data or ortho-images and have thereby obtained information of individual trees within forested areas by combining the two data; thus verifying the efficiency of the above presented method.

Podiatric Clinical Diagnosis using Decision Tree Data Mining (결정트리 데이터마이닝을 이용한 족부 임상 진단)

  • Kim, Jin-Ho;Park, In-Sik;Kim, Bong-Ok;Yang, Yoon-Seok;Won, Yong-Gwan;Kim, Jung-Ja
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.48 no.2
    • /
    • pp.28-37
    • /
    • 2011
  • With growing concerns about healthy life recently, although the podiatry which deals with the whole area for diagnosis, treatment of foot and leg, and prevention has been widely interested, research in our country is not active. Also, because most of the previous researches in data analysis performed the quantitative approaches, the reasonable level of reliability for clinical application could not be guaranteed. Clinical data mining utilizes various data mining analysis methods for clinical data, which provides decision support for expert's diagnosis and treatment for the patients. Because the decision tree can provide good explanation and description for the analysis procedure and is easy to interpret the results, it is simple to apply for clinical problems. This study investigate rules of item of diagnosis in disease types for adapting decision tree after collecting diagnosed data patients who are 2620 feet of 1310(males:633, females:677) in shoes clinic (department of rehabilitation medicine, Chungnam National University Hospital). and we classified 15 foot diseases followed factor of 22 foot diseases, which investigated diagnosis of 64 rules. Also, we analyzed and compared correlation relationship of characteristic of disease and factor in types through made decision tree from 5 class types(infants, child, adolescent, adult, total). Investigated results can be used qualitative and useful knowledge for clinical expert`s, also can be used tool for taking effective and accurate diagnosis.

RSP-DS: Real Time Sequential Patterns Analysis in Data Streams (RSP-DS: 데이터 스트림에서의 실시간 순차 패턴 분석)

  • Shin Jae-Jyn;Kim Ho-Seok;Kim Kyoung-Bae;Bae Hae-Young
    • Journal of Korea Multimedia Society
    • /
    • v.9 no.9
    • /
    • pp.1118-1130
    • /
    • 2006
  • Existed pattern analysis algorithms in data streams environment have researched performance improvement and effective memory usage. But when new data streams come, existed pattern analysis algorithms have to analyze patterns again and have to generate pattern tree again. This approach needs many calculations in real situation that needs real time pattern analysis. This paper proposes a method that continuously analyzes patterns of incoming data streams in real time. This method analyzes patterns fast, and thereafter obtains real time patterns by updating previously analyzed patterns. The incoming data streams are divided into several sequences based on time based window. Informations of the sequences are inputted into a hash table. When the number of the sequences are over predefined bound, patterns are analyzed from the hash table. The patterns form a pattern tree, and later created new patterns update the pattern tree. In this way, real time patterns are always maintained in the pattern tree. During pattern analysis, suffixes of both new pattern and existed pattern in the tree can be same. Then a pointer is created from the new pattern to the existed pattern. This method reduce calculation time during duplicated pattern analysis. And old patterns in the tree are deleted easily by FIFO method. The advantage of our algorithm is proved by performance comparison with existed method, MILE, in a condition that pattern is changed continuously. And we look around performance variation by changing several variable in the algorithm.

  • PDF

ANALYSIS OF NEIGHBOR-JOINING BASED ON BOX MODEL

  • Cho, Jin-Hwan;Joe, Do-Sang;Kim, Young-Rock
    • Journal of applied mathematics & informatics
    • /
    • v.25 no.1_2
    • /
    • pp.455-470
    • /
    • 2007
  • In phylogenetic tree construction the neighbor-joining algorithm is the most well known method which constructs a trivalent tree from a pairwise distance data measured by DNA sequences. The core part of the algorithm is its cherry picking criterion based on the tree structure of each quartet. We give a generalized version of the criterion based on the exact box model of quartets, known as the tight span of a metric. We also show by experiment why neighbor-joining and the quartet consistency count method give similar performance.

An Improvement Video Search Method for VP-Tree by using a Trigonometric Inequality

  • Lee, Samuel Sangkon;Shishibori, Masami;Han, Chia Y.
    • Journal of Information Processing Systems
    • /
    • v.9 no.2
    • /
    • pp.315-332
    • /
    • 2013
  • This paper presents an approach for improving the use of VP-tree in video indexing and searching. A vantage-point tree or VP-tree is one of the metric space-based indexing methods used in multimedia database searches and data retrieval. Instead of relying on the Euclidean distance as a measure of search space, the proposed approach focuses on the trigonometric inequality for compressing the search range, which thus, improves the search performance. A test result of using 10,000 video files shows that this method reduced the search time by 5-12%, as compared to the existing method that uses the AESA algorithm.

Enhanced Routing Algorithm for ZigBee using a Family Set of a Destination Node (목적지의 가족집합을 이용한 향상된 ZigBee 라우팅 알고리즘)

  • Shin, Hyun-Jae;Ahn, Sae-Young;Jo, Young-Jun;An, Sun-Shin
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.59 no.12
    • /
    • pp.2329-2336
    • /
    • 2010
  • Hierarchical tree routing is a inefficient routing method of transmitting data in a wireless sensor network. Zigbee routing which is made to improve inefficiency of the hierarchical tree routing only fulfills the tree routing when a destination node don't exists in neighbor nodes of a router. We suggest a TFSR algorithm that is improved more than the zigbee routing. The TFSR algorithm generates a family set included a parent node and child nodes and over of a destination node, and uses this information. According to simulation results, the TFSR algorithm reduce routing costs over 30 percent in comparison with the hierarchical tree routing and the zigbee routing.

Multicast Tree to Minimize Maximum Delay in Dynamic Overlay Network

  • Lee Chae-Y.;Baek Jin-Woo
    • Proceedings of the Korean Operations and Management Science Society Conference
    • /
    • 2006.05a
    • /
    • pp.1609-1615
    • /
    • 2006
  • Overlay multicast technique is an effective way as an alternative to IP multicast. Traditional IP multicast is not widely deployed because of the complexity of IP multicast technology and lack of application. But overlay multicast can be easily deployed by effectively reducing complexity of network routers. Because overlay multicast resides on top of densely connected IP network, In case of multimedia streaming service over overlay multicast tree, real-time data is sensitive to end-to-end delay. Therefore, moderate algorithm's development to this network environment is very important. In this paper, we are interested in minimizing maximum end-to-end delay in overlay multicast tree. The problem is formulated as a degree-bounded minimum delay spanning tree, which is a problem well-known as NP-hard. We develop tabu search heuristic with intensification and diversification strategies. Robust experimental results show that is comparable to the optimal solution and applicable in real time

  • PDF

Multi-Interval Discretization of Continuous-Valued Attributes for Constructing Incremental Decision Tree (증분 의사결정 트리 구축을 위한 연속형 속성의 다구간 이산화)

  • Baek, Jun-Geol;Kim, Chang-Ouk;Kim, Sung-Shick
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.27 no.4
    • /
    • pp.394-405
    • /
    • 2001
  • Since most real-world application data involve continuous-valued attributes, properly addressing the discretization process for constructing a decision tree is an important problem. A continuous-valued attribute is typically discretized during decision tree generation by partitioning its range into two intervals recursively. In this paper, by removing the restriction to the binary discretization, we present a hybrid multi-interval discretization algorithm for discretizing the range of continuous-valued attribute into multiple intervals. On the basis of experiment using semiconductor etching machine, it has been verified that our discretization algorithm constructs a more efficient incremental decision tree compared to previously proposed discretization algorithms.

  • PDF