• Title/Summary/Keyword: Tree data

Search Result 3,320, Processing Time 0.027 seconds

KDBcs-Tree : An Efficient Cache Conscious KDB-Tree for Multidimentional Data (KDBcs-트리 : 캐시를 고려한 효율적인 KDB-트리)

  • Yeo, Myung-Ho;Min, Young-Soo;Yoo, Jae-Soo
    • Journal of KIISE:Databases
    • /
    • v.34 no.4
    • /
    • pp.328-342
    • /
    • 2007
  • We propose a new cache conscious indexing structure for processing frequently updated data efficiently. Our proposed index structure is based on a KDB-Tree, one of the representative index structures based on space partitioning techniques. In this paper, we propose a data compression technique and a pointer elimination technique to increase the utilization of a cache line. To show our proposed index structure's superiority, we compare our index structure with variants of the CR-tree(e.g. the FF CR-tree and the SE CR-tree) in a variety of environments. As a result, our experimental results show that the proposed index structure achieves about 85%, 97%, and 86% performance improvements over the existing index structures in terms of insertion, update and cache-utilization, respectively.

Performance Evaluation of the FP-tree and the DHP Algorithms for Association Rule Mining (FP-tree와 DHP 연관 규칙 탐사 알고리즘의 실험적 성능 비교)

  • Lee, Hyung-Bong;Kim, Jin-Ho
    • Journal of KIISE:Databases
    • /
    • v.35 no.3
    • /
    • pp.199-207
    • /
    • 2008
  • The FP-tree(Frequency Pattern Tree) mining association rules algorithm was proposed to improve mining performance by reducing DB scan overhead dramatically, and it is recognized that the performance of it is better than that of any other algorithms based on different approaches. But the FP-tree algorithm needs a few more memory because it has to store all transactions including frequent itemsets of the DB. This paper implements a FP-tree algorithm on a general purpose UNK system and compares it with the DHP(Direct Hashing and Pruning) algorithm which uses hash tree and direct hash table from the point of memory usage and execution time. The results show surprisingly that the FP-tree algorithm is poor than the DHP algorithm in some cases even if the system memory is sufficient for the FP-tree. The characteristics of the test data are as follows. The site of DB is look, the number of total items is $1K{\sim}7K$, avenrage length of transactions is $5{\sim}10$, avergage size of maximal frequent itemsets is $2{\sim}12$(these are typical attributes of data for large-scale convenience stores).

Applying Decision Tree Algorithms for Analyzing HS-VOSTS Questionnaire Results

  • Kang, Dae-Ki
    • Journal of Engineering Education Research
    • /
    • v.15 no.4
    • /
    • pp.41-47
    • /
    • 2012
  • Data mining and knowledge discovery techniques have shown to be effective in finding hidden underlying rules inside large database in an automated fashion. On the other hand, analyzing, assessing, and applying students' survey data are very important in science and engineering education because of various reasons such as quality improvement, engineering design process, innovative education, etc. Among those surveys, analyzing the students' views on science-technology-society can be helpful to engineering education. Because, although most researches on the philosophy of science have shown that science is one of the most difficult concepts to define precisely, it is still important to have an eye on science, pseudo-science, and scientific misconducts. In this paper, we report the experimental results of applying decision tree induction algorithms for analyzing the questionnaire results of high school students' views on science-technology-society (HS-VOSTS). Empirical results on various settings of decision tree induction on HS-VOSTS results from one South Korean university students indicate that decision tree induction algorithms can be successfully and effectively applied to automated knowledge discovery from students' survey data.

Tree-Structured Nonlinear Regression

  • Chang, Young-Jae;Kim, Hyeon-Soo
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.5
    • /
    • pp.759-768
    • /
    • 2011
  • Tree algorithms have been widely developed for regression problems. One of the good features of a regression tree is the flexibility of fitting because it can correctly capture the nonlinearity of data well. Especially, data with sudden structural breaks such as the price of oil and exchange rates could be fitted well with a simple mixture of a few piecewise linear regression models. Now that split points are determined by chi-squared statistics related with residuals from fitting piecewise linear models and the split variable is chosen by an objective criterion, we can get a quite reasonable fitting result which goes in line with the visual interpretation of data. The piecewise linear regression by a regression tree can be used as a good fitting method, and can be applied to a dataset with much fluctuation.

Adaptive Decision Tree Algorithm for Data Mining in Real-Time Machine Status Database (실시간 기계 상태 데이터베이스에서 데이터 마이닝을 위한 적응형 의사결정 트리 알고리듬)

  • Baek, Jun-Geol;Kim, Kang-Ho;Kim, Sung-Shick;Kim, Chang-Ouk
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.26 no.2
    • /
    • pp.171-182
    • /
    • 2000
  • For the last five years, data mining has drawn much attention by researchers and practitioners because of its many applicable domains. This article presents an adaptive decision tree algorithm for dynamically reasoning machine failure cause out of real-time, large-scale machine status database. Among many data mining methods, intelligent decision tree building algorithm is especially of interest in the sense that it enables the automatic generation of decision rules from the tree, facilitating the construction of expert system. On the basis of experiment using semiconductor etching machine, it has been verified that our model outperforms previously proposed decision tree models.

  • PDF

Construction of Street Trees Information Management Program Using GIS and Database (GIS와 데이터베이스를 이용한 가로수정보 관리프로그램 구축)

  • Kim, Hee-Nyeon;Jung, Sung-Gwan;Park, Kyung-Hun;You, Ju-Han
    • Current Research on Agriculture and Life Sciences
    • /
    • v.26
    • /
    • pp.45-54
    • /
    • 2008
  • The purpose of this research is to develope street trees management program for more an effective street trees management. The principal point of this program is to relate spatial data and attribute data that is the main concept in GIS(Geographic Information System). To do this function, MapObjects which is ESRI's mapping and GIS components was used to process spatial data and Access which had been developed by MS was used to manipulate attribute data in this program. Visual Basic also was used to design and develop user interfaces and procedures, relate two sort of data, and lastly complete Application. Relational data model was adopted to design tables and their relation, Antenucci's GIS development model was selected to design and complete this program. The configuration of this application is composed of management data and reference data. The management data includes the location of street tree, a growth condition, a surrounding environment, the characters of tree, an equipments, a management records and etc. The reference data include general information about tree, blight and insects.

  • PDF

Ensemble Gene Selection Method Based on Multiple Tree Models

  • Mingzhu Lou
    • Journal of Information Processing Systems
    • /
    • v.19 no.5
    • /
    • pp.652-662
    • /
    • 2023
  • Identifying highly discriminating genes is a critical step in tumor recognition tasks based on microarray gene expression profile data and machine learning. Gene selection based on tree models has been the subject of several studies. However, these methods are based on a single-tree model, often not robust to ultra-highdimensional microarray datasets, resulting in the loss of useful information and unsatisfactory classification accuracy. Motivated by the limitations of single-tree-based gene selection, in this study, ensemble gene selection methods based on multiple-tree models were studied to improve the classification performance of tumor identification. Specifically, we selected the three most representative tree models: ID3, random forest, and gradient boosting decision tree. Each tree model selects top-n genes from the microarray dataset based on its intrinsic mechanism. Subsequently, three ensemble gene selection methods were investigated, namely multipletree model intersection, multiple-tree module union, and multiple-tree module cross-union, were investigated. Experimental results on five benchmark public microarray gene expression datasets proved that the multiple tree module union is significantly superior to gene selection based on a single tree model and other competitive gene selection methods in classification accuracy.

Geohashed Spatial Index Method for a Location-Aware WBAN Data Monitoring System Based on NoSQL

  • Li, Yan;Kim, Dongho;Shin, Byeong-Seok
    • Journal of Information Processing Systems
    • /
    • v.12 no.2
    • /
    • pp.263-274
    • /
    • 2016
  • The exceptional development of electronic device technology, the miniaturization of mobile devices, and the development of telecommunication technology has made it possible to monitor human biometric data anywhere and anytime by using different types of wearable or embedded sensors. In daily life, mobile devices can collect wireless body area network (WBAN) data, and the co-collected location data is also important for disease analysis. In order to efficiently analyze WBAN data, including location information and support medical analysis services, we propose a geohash-based spatial index method for a location-aware WBAN data monitoring system on the NoSQL database system, which uses an R-tree-based global tree to organize the real-time location data of a patient and a B-tree-based local tree to manage historical data. This type of spatial index method is a support cloud-based location-aware WBAN data monitoring system. In order to evaluate the proposed method, we built a system that can support a JavaScript Object Notation (JSON) and Binary JSON (BSON) document data on mobile gateway devices. The proposed spatial index method can efficiently process location-based queries for medical signal monitoring. In order to evaluate our index method, we simulated a small system on MongoDB with our proposed index method, which is a document-based NoSQL database system, and evaluated its performance.

J48 and ADTree for forecast of leaving of hospitals

  • Halim, Faisal;Muttaqin, Rizal
    • Korean Journal of Artificial Intelligence
    • /
    • v.4 no.1
    • /
    • pp.11-13
    • /
    • 2016
  • These days, medical technology has been developed rapidly to meet desire of living healthy life. Average lifespan was extended to let people see a doctor because of many reasons. This study has shown rate of leaving of hospitals to investigate the rate of not only department of surgery but also department of internal medicine. Linear model, tree, classification rule, association and algorithm of data mining were used. This study investigated by using J48 and AD tree of decision-making tree In this study, J48 and AD tree of decision-making tree of data mining were used to investigate based on result of both data. Both algorithms were found to have similar performance. Both algorithms were not equivalent to require detailed experiment. Collect more experimental data in the future to apply from various points of view. Development of medical technology gives dream, hope and pleasure. The ones who suffer from incurable diseases need developed medical technology. Environment being similar to the reality shall be made to experiment exactly to investigate data carefully and to let the ones of various ages visit hospital and to increase survival rate.

Selection of an Optimal Algorithm among Decision Tree Techniques for Feature Analysis of Industrial Accidents in Construction Industries (건설업의 산업재해 특성분석을 위한 의사결정나무 기법의 상용 최적 알고리즘 선정)

  • Leem Young-Moon;Choi Yo-Han
    • Journal of the Korea Safety Management & Science
    • /
    • v.7 no.5
    • /
    • pp.1-8
    • /
    • 2005
  • The consequences of rapid industrial advancement, diversified types of business and unexpected industrial accidents have caused a lot of damage to many unspecified persons both in a human way and a material way Although various previous studies have been analyzed to prevent industrial accidents, these studies only provide managerial and educational policies using frequency analysis and comparative analysis based on data from past industrial accidents. The main objective of this study is to find an optimal algorithm for data analysis of industrial accidents and this paper provides a comparative analysis of 4 kinds of algorithms including CHAID, CART, C4.5, and QUEST. Decision tree algorithm is utilized to predict results using objective and quantified data as a typical technique of data mining. Enterprise Miner of SAS and AnswerTree of SPSS will be used to evaluate the validity of the results of the four algorithms. The sample for this work chosen from 19,574 data related to construction industries during three years ($2002\sim2004$) in Korea.