• Title/Summary/Keyword: Data Tree

Search Result 3,320, Processing Time 0.029 seconds

Modeling of Environmental Survey by Decision Trees

  • Park, Hee-Chang;Cho, Kwang-Hyun
    • Journal of the Korean Data and Information Science Society
    • /
    • v.15 no.4
    • /
    • pp.759-771
    • /
    • 2004
  • The decision tree approach is most useful in classification problems and to divide the search space into rectangular regions. Decision tree algorithms are used extensively for data mining in many domains such as retail target marketing, fraud dection, data reduction and variable screening, category merging, etc. We analyze Gyeongnam social indicator survey data using decision tree techniques for environmental information. We can use these decision tree outputs for environmental preservation and improvement.

  • PDF

Comparison among Algorithms for Decision Tree based on Sasang Constitutional Clinical Data (사상체질 임상자료 기반 의사결정나무 생성 알고리즘 비교)

  • Jin, Hee-Jeong;Lee, Su-Kyung;Lee, Si-Woo
    • Korean Journal of Oriental Medicine
    • /
    • v.17 no.2
    • /
    • pp.121-127
    • /
    • 2011
  • Objectives : In the clinical field, it is important to understand the factors that have effects on a certain disease or symptom. For this, many researchers apply Data Mining method to the clinical data that they have collected. One of the efficient methods for Data Mining is decision tree induction. Many researchers have studied to find the best split criteria of decision tree; however, various split criteria coexist. Methods : In this paper, we applied several split criteria(Information Gain, Gini Index, Chi-Square) to Sasang constitutional clinical information and compared each decision tree in order to find optimal split criteria. Results & Conclusion : We found BMI and body measurement factors are important factors to Sasang constitution by analyzing produced decision trees with different split measures. And the decision tree using information gain had the highest accuracy. However, the decision tree that produced highest accuracy is changed depending on given data. So, researcher have to try to find proper split criteria for given data by understanding attribute of the given data.

Feature-Based Image Retrieval using SOM-Based R*-Tree

  • Shin, Min-Hwa;Kwon, Chang-Hee;Bae, Sang-Hyun
    • Proceedings of the KAIS Fall Conference
    • /
    • 2003.11a
    • /
    • pp.223-230
    • /
    • 2003
  • Feature-based similarity retrieval has become an important research issue in multimedia database systems. The features of multimedia data are useful for discriminating between multimedia objects (e 'g', documents, images, video, music score, etc.). For example, images are represented by their color histograms, texture vectors, and shape descriptors, and are usually high-dimensional data. The performance of conventional multidimensional data structures(e'g', R- Tree family, K-D-B tree, grid file, TV-tree) tends to deteriorate as the number of dimensions of feature vectors increases. The R*-tree is the most successful variant of the R-tree. In this paper, we propose a SOM-based R*-tree as a new indexing method for high-dimensional feature vectors.The SOM-based R*-tree combines SOM and R*-tree to achieve search performance more scalable to high dimensionalities. Self-Organizing Maps (SOMs) provide mapping from high-dimensional feature vectors onto a two dimensional space. The mapping preserves the topology of the feature vectors. The map is called a topological of the feature map, and preserves the mutual relationship (similarity) in the feature spaces of input data, clustering mutually similar feature vectors in neighboring nodes. Each node of the topological feature map holds a codebook vector. A best-matching-image-list. (BMIL) holds similar images that are closest to each codebook vector. In a topological feature map, there are empty nodes in which no image is classified. When we build an R*-tree, we use codebook vectors of topological feature map which eliminates the empty nodes that cause unnecessary disk access and degrade retrieval performance. We experimentally compare the retrieval time cost of a SOM-based R*-tree with that of an SOM and an R*-tree using color feature vectors extracted from 40, 000 images. The result show that the SOM-based R*-tree outperforms both the SOM and R*-tree due to the reduction of the number of nodes required to build R*-tree and retrieval time cost.

  • PDF

A Decision Tree Approach for Identifying Defective Products in the Manufacturing Process

  • Choi, Sungsu;Battulga, Lkhagvadorj;Nasridinov, Aziz;Yoo, Kwan-Hee
    • International Journal of Contents
    • /
    • v.13 no.2
    • /
    • pp.57-65
    • /
    • 2017
  • Recently, due to the significance of Industry 4.0, the manufacturing industry is developing globally. Conventionally, the manufacturing industry generates a large volume of data that is often related to process, line and products. In this paper, we analyzed causes of defective products in the manufacturing process using the decision tree technique, that is a well-known technique used in data mining. We used data collected from the domestic manufacturing industry that includes Manufacturing Execution System (MES), Point of Production (POP), equipment data accumulated directly in equipment, in-process/external air-conditioning sensors and static electricity. We propose to implement a model using C4.5 decision tree algorithm. Specifically, the proposed decision tree model is modeled based on components of a specific part. We propose to identify the state of products, where the defect occurred and compare it with the generated decision tree model to determine the cause of the defect.

Improved Decision Tree Algorithms by Considering Variables Interaction (교호효과를 고려한 향상된 의사결정나무 알고리듬에 관한 연구)

  • Kwon, Keunseob;Choi, Gyunghyun
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.30 no.4
    • /
    • pp.267-276
    • /
    • 2004
  • Much of previous attention on researches of the decision tree focuses on the splitting criteria and optimization of tree size. Nowadays the quantity of the data increase and relation of variables becomes very complex. And hence, this comes to have plenty number of unnecessary node and leaf. Consequently the confidence of the explanation and forecasting of the decision tree falls off. In this research report, we propose some decision tree algorithms considering the interaction of predictor variables. A generic algorithm, the k-1 Algorithm, dealing with the interaction with a combination of all predictor variable is presented. And then, the extended version k-k Algorithm which considers with the interaction every k-depth with a combination of some predictor variables. Also, we present an improved algorithm by introducing control parameter to the algorithms. The algorithms are tested by real field credit card data, census data, bank data, etc.

Development of Decision Tree Program based on Web for Analyzing Clinical Information of Sasang Constitutional Medicine (사상체질 임상정보 분석을 위한 웹 기반의 의사결정 나무 프로그램 개발)

  • Jin, Hee-Jeong;Kim, Myoung-Geun;Kim, Jong-Yeol
    • Korean Journal of Oriental Medicine
    • /
    • v.14 no.3
    • /
    • pp.81-87
    • /
    • 2008
  • Sasanag Contitution Medicine(SCM) is the traditional medicine theory based on constitutional medicine in Korea. It is most import ant that a personal SCM type is determined accurately ahead of applying any Sasang treatments. For this, many researches have been studied to diagnose the SCM type using constitutional clinical data. The decision tree is a tree-structured data-mining methodology. Recently, in the Korean traditional medicine society, there have been several efforts to find diagnosing tools using the decision tree method. So, we developed a decision tree program based on web for analyzing constitutional clinical information. It can use various clinical data as input data, offer filtering function to select clinical data to be used. We can find useful factor to be influential on SCM types using this program.

  • PDF

An Application of Decision Tree Method for Fault Diagnosis of Induction Motors

  • Tran, Van Tung;Yang, Bo-Suk;Oh, Myung-Suck
    • Proceedings of the Korea Committee for Ocean Resources and Engineering Conference
    • /
    • 2006.11a
    • /
    • pp.54-59
    • /
    • 2006
  • Decision tree is one of the most effective and widely used methods for building classification model. Researchers from various disciplines such as statistics, machine learning, pattern recognition, and data mining have considered the decision tree method as an effective solution to their field problems. In this paper, an application of decision tree method to classify the faults of induction motors is proposed. The original data from experiment is dealt with feature calculation to get the useful information as attributes. These data are then assigned the classes which are based on our experience before becoming data inputs for decision tree. The total 9 classes are defined. An implementation of decision tree written in Matlab is used for these data.

  • PDF

Performance of Spatial Join Operations using Multi-Attribute Access Methods (다중-속성 색인기법을 이용한 공간조인 연산의 성능)

  • 황병연
    • Spatial Information Research
    • /
    • v.7 no.2
    • /
    • pp.271-282
    • /
    • 1999
  • In this paper, we derived an efficient indexing scheme, SJ tree, which handles multi-attribute data and spatial join operations efficiently. In addition, a number of algorithms for manipulating multi-attribute data are given , together with their computational and I/O complexity . Moreover , we how that SJ tree is a kind of generalized B-tree. This means that SJ-tree can be easily implemented on existing built-in B-tree in most storage managers in the sense that the structure of SJ tree is like that of B-tree. The spatial join operation with spatial output is benchmarked using R-tree, B-tree, K-D-B tree, and SJ tree. Results from the benchmark test indicate that SJ tree out performance other indexing schemes on spatial join with point data.

  • PDF

Analysis of Forest Structure Using LiDAR Data - A Case Study of Forest in Namchon-Dong, Osan - (LiDAR 데이터를 이용한 산림구조 분석 - 오산시 남촌동의 산림을 대상으로 -)

  • Lee, Dong-Kun;Ryu, Ji-Eun;Kim, Eun-Young;Jeon, Seong-Woo
    • Journal of Environmental Impact Assessment
    • /
    • v.17 no.5
    • /
    • pp.279-288
    • /
    • 2008
  • Vertical forest distribution is one of the important factors to understand various ecological mechanism such as succession, disturbance and environmental effects. LiDAR data provide information, both the horizontal and vertical distribution of forest structure. The laser scanner survey provided a point cloud, in which the x, y, and z coordinates of the points are known. The objectives of this study were 1) to analyze factors of forest structure such as individual tree isolation, tree height, canopy closure and tree density using LiDAR data and 2) to compare the forest structure between outer and interior forest. The paper conducted to extract the individual tree using watershed algorithm and to interpolate using the first return of LiDAR data for yielding digital surface model (DSM). The results of the study show characters of edge such as more isolated individual trees, higher density, lower canopy closure, and lower tree height than those of interior forest. LiDAR data is to be useful for analyzing of forest structure. Further study should be undertaken with species for more accurate results.

A Lifetime-Preserving and Delay-Constrained Data Gathering Tree for Unreliable Sensor Networks

  • Li, Yanjun;Shen, Yueyun;Chi, Kaikai
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.6 no.12
    • /
    • pp.3219-3236
    • /
    • 2012
  • A tree routing structure is often adopted for many-to-one data gathering and aggregation in sensor networks. For real-time scenarios, considering lossy wireless links, it is an important issue how to construct a maximum-lifetime data gathering tree with delay constraint. In this work, we study the problem of lifetime-preserving and delay-constrained tree construction in unreliable sensor networks. We prove that the problem is NP-complete. A greedy approximation algorithm is proposed. We use expected transmissions count (ETX) as the link quality indicator, as well as a measure of delay. Our algorithm starts from an arbitrary least ETX tree, and iteratively adjusts the hierarchy of the tree to reduce the load on bottleneck nodes by pruning and grafting its sub-tree. The complexity of the proposed algorithm is $O(N^4)$. Finally, extensive simulations are carried out to verify our approach. Simulation results show that our algorithm provides longer lifetime in various situations compared to existing data gathering schemes.