• 제목/요약/키워드: Data Tree

검색결과 3,320건 처리시간 0.031초

KDBcs-트리 : 캐시를 고려한 효율적인 KDB-트리 (KDBcs-Tree : An Efficient Cache Conscious KDB-Tree for Multidimentional Data)

  • 여명호;민영수;유재수
    • 한국정보과학회논문지:데이타베이스
    • /
    • 제34권4호
    • /
    • pp.328-342
    • /
    • 2007
  • 본 논문에서는 데이타의 갱신이 빈번한 상황에서 데이타의 갱신을 효율적으로 처리하기 위한 색인 기법을 제안한다. 제안하는 색인구조는 대표적인 공간 분할 색인 기법 중 하나인 KDB-트리를 기반으로 하고 있으며, 캐시의 활용도를 높이기 위한 데이타 압축 기법과 포인터 제거 기법을 제안한다. 제안하는 기법의 우수성을 보이기 위해서 기존의 대표적인 캐시를 고려한 색인 구조중 하나인 CR-트리와 실험을 통해 성능을 비교하였으며, 성능평가 결과, 제안하는 색인 구조는 삽입 성능과 갱신 성능, 캐시 활용도 면에서 기존 색인 기법에 비해 각각 85%, 97%, 86% 의 성능이 향상되었다.

FP-tree와 DHP 연관 규칙 탐사 알고리즘의 실험적 성능 비교 (Performance Evaluation of the FP-tree and the DHP Algorithms for Association Rule Mining)

  • 이형봉;김진호
    • 한국정보과학회논문지:데이타베이스
    • /
    • 제35권3호
    • /
    • pp.199-207
    • /
    • 2008
  • FP-tree(Frequency Pattern Tree) 연관 규칙 탐사 알고리즘은 DB 스캔에 대한 부담을 획기적으로 절감시킴으로써 전체적인 성능을 향상시키고자 제안되었고, 따라서 다른 기법에 기반하는 알고리즘보다 성능이 매우 우수한 것으로 알려져 있다. 그러나, FP-tree 알고리즘은 기본적으로 DB에 저장된 거래 내용 중 빈발 항목을 포함하는 모든 거래를 트리에 저장해야 하기 때문에 그만큼 많은 메모리를 필요로 한다. 이 논문에서는 범용 운영체제인 유닉스 시스템 환경에서 FP-tree 알고리즘을 구현하여 소요 메모리와 실행시간 등 두 가지 성능 관점에서 해시 트리 및 직접 해시 테이블을 사용하는 DHP(Direct Hashing and Pruning) 알고리즘과 비교한다. 그 결과로서 알려진 바와는 크게 다르게 시스템 메모리가 충분한 상황에서도 대형 편의점 수준의 규모에 적용 가능한 거래 건수 100K, 전체 항목 개수 $1K{\sim}7K$, 평균 거래 길이 $5{\sim}10$, 평균 빈발 항목 집합 크기 $2{\sim}12$인 데이타에 대해서 FP-tree 알고리즘이 DHP 알고리즘보다 열등한 경우가 존재함을 보인다.

Applying Decision Tree Algorithms for Analyzing HS-VOSTS Questionnaire Results

  • Kang, Dae-Ki
    • 공학교육연구
    • /
    • 제15권4호
    • /
    • pp.41-47
    • /
    • 2012
  • Data mining and knowledge discovery techniques have shown to be effective in finding hidden underlying rules inside large database in an automated fashion. On the other hand, analyzing, assessing, and applying students' survey data are very important in science and engineering education because of various reasons such as quality improvement, engineering design process, innovative education, etc. Among those surveys, analyzing the students' views on science-technology-society can be helpful to engineering education. Because, although most researches on the philosophy of science have shown that science is one of the most difficult concepts to define precisely, it is still important to have an eye on science, pseudo-science, and scientific misconducts. In this paper, we report the experimental results of applying decision tree induction algorithms for analyzing the questionnaire results of high school students' views on science-technology-society (HS-VOSTS). Empirical results on various settings of decision tree induction on HS-VOSTS results from one South Korean university students indicate that decision tree induction algorithms can be successfully and effectively applied to automated knowledge discovery from students' survey data.

Tree-Structured Nonlinear Regression

  • Chang, Young-Jae;Kim, Hyeon-Soo
    • 응용통계연구
    • /
    • 제24권5호
    • /
    • pp.759-768
    • /
    • 2011
  • Tree algorithms have been widely developed for regression problems. One of the good features of a regression tree is the flexibility of fitting because it can correctly capture the nonlinearity of data well. Especially, data with sudden structural breaks such as the price of oil and exchange rates could be fitted well with a simple mixture of a few piecewise linear regression models. Now that split points are determined by chi-squared statistics related with residuals from fitting piecewise linear models and the split variable is chosen by an objective criterion, we can get a quite reasonable fitting result which goes in line with the visual interpretation of data. The piecewise linear regression by a regression tree can be used as a good fitting method, and can be applied to a dataset with much fluctuation.

실시간 기계 상태 데이터베이스에서 데이터 마이닝을 위한 적응형 의사결정 트리 알고리듬 (Adaptive Decision Tree Algorithm for Data Mining in Real-Time Machine Status Database)

  • 백준걸;김강호;김성식;김창욱
    • 대한산업공학회지
    • /
    • 제26권2호
    • /
    • pp.171-182
    • /
    • 2000
  • For the last five years, data mining has drawn much attention by researchers and practitioners because of its many applicable domains. This article presents an adaptive decision tree algorithm for dynamically reasoning machine failure cause out of real-time, large-scale machine status database. Among many data mining methods, intelligent decision tree building algorithm is especially of interest in the sense that it enables the automatic generation of decision rules from the tree, facilitating the construction of expert system. On the basis of experiment using semiconductor etching machine, it has been verified that our model outperforms previously proposed decision tree models.

  • PDF

GIS와 데이터베이스를 이용한 가로수정보 관리프로그램 구축 (Construction of Street Trees Information Management Program Using GIS and Database)

  • 김희년;정성관;박경훈;유주한
    • Current Research on Agriculture and Life Sciences
    • /
    • 제26권
    • /
    • pp.45-54
    • /
    • 2008
  • The purpose of this research is to develope street trees management program for more an effective street trees management. The principal point of this program is to relate spatial data and attribute data that is the main concept in GIS(Geographic Information System). To do this function, MapObjects which is ESRI's mapping and GIS components was used to process spatial data and Access which had been developed by MS was used to manipulate attribute data in this program. Visual Basic also was used to design and develop user interfaces and procedures, relate two sort of data, and lastly complete Application. Relational data model was adopted to design tables and their relation, Antenucci's GIS development model was selected to design and complete this program. The configuration of this application is composed of management data and reference data. The management data includes the location of street tree, a growth condition, a surrounding environment, the characters of tree, an equipments, a management records and etc. The reference data include general information about tree, blight and insects.

  • PDF

Ensemble Gene Selection Method Based on Multiple Tree Models

  • Mingzhu Lou
    • Journal of Information Processing Systems
    • /
    • 제19권5호
    • /
    • pp.652-662
    • /
    • 2023
  • Identifying highly discriminating genes is a critical step in tumor recognition tasks based on microarray gene expression profile data and machine learning. Gene selection based on tree models has been the subject of several studies. However, these methods are based on a single-tree model, often not robust to ultra-highdimensional microarray datasets, resulting in the loss of useful information and unsatisfactory classification accuracy. Motivated by the limitations of single-tree-based gene selection, in this study, ensemble gene selection methods based on multiple-tree models were studied to improve the classification performance of tumor identification. Specifically, we selected the three most representative tree models: ID3, random forest, and gradient boosting decision tree. Each tree model selects top-n genes from the microarray dataset based on its intrinsic mechanism. Subsequently, three ensemble gene selection methods were investigated, namely multipletree model intersection, multiple-tree module union, and multiple-tree module cross-union, were investigated. Experimental results on five benchmark public microarray gene expression datasets proved that the multiple tree module union is significantly superior to gene selection based on a single tree model and other competitive gene selection methods in classification accuracy.

Geohashed Spatial Index Method for a Location-Aware WBAN Data Monitoring System Based on NoSQL

  • Li, Yan;Kim, Dongho;Shin, Byeong-Seok
    • Journal of Information Processing Systems
    • /
    • 제12권2호
    • /
    • pp.263-274
    • /
    • 2016
  • The exceptional development of electronic device technology, the miniaturization of mobile devices, and the development of telecommunication technology has made it possible to monitor human biometric data anywhere and anytime by using different types of wearable or embedded sensors. In daily life, mobile devices can collect wireless body area network (WBAN) data, and the co-collected location data is also important for disease analysis. In order to efficiently analyze WBAN data, including location information and support medical analysis services, we propose a geohash-based spatial index method for a location-aware WBAN data monitoring system on the NoSQL database system, which uses an R-tree-based global tree to organize the real-time location data of a patient and a B-tree-based local tree to manage historical data. This type of spatial index method is a support cloud-based location-aware WBAN data monitoring system. In order to evaluate the proposed method, we built a system that can support a JavaScript Object Notation (JSON) and Binary JSON (BSON) document data on mobile gateway devices. The proposed spatial index method can efficiently process location-based queries for medical signal monitoring. In order to evaluate our index method, we simulated a small system on MongoDB with our proposed index method, which is a document-based NoSQL database system, and evaluated its performance.

J48 and ADTree for forecast of leaving of hospitals

  • Halim, Faisal;Muttaqin, Rizal
    • 한국인공지능학회지
    • /
    • 제4권1호
    • /
    • pp.11-13
    • /
    • 2016
  • These days, medical technology has been developed rapidly to meet desire of living healthy life. Average lifespan was extended to let people see a doctor because of many reasons. This study has shown rate of leaving of hospitals to investigate the rate of not only department of surgery but also department of internal medicine. Linear model, tree, classification rule, association and algorithm of data mining were used. This study investigated by using J48 and AD tree of decision-making tree In this study, J48 and AD tree of decision-making tree of data mining were used to investigate based on result of both data. Both algorithms were found to have similar performance. Both algorithms were not equivalent to require detailed experiment. Collect more experimental data in the future to apply from various points of view. Development of medical technology gives dream, hope and pleasure. The ones who suffer from incurable diseases need developed medical technology. Environment being similar to the reality shall be made to experiment exactly to investigate data carefully and to let the ones of various ages visit hospital and to increase survival rate.

건설업의 산업재해 특성분석을 위한 의사결정나무 기법의 상용 최적 알고리즘 선정 (Selection of an Optimal Algorithm among Decision Tree Techniques for Feature Analysis of Industrial Accidents in Construction Industries)

  • 임영문;최요한
    • 대한안전경영과학회지
    • /
    • 제7권5호
    • /
    • pp.1-8
    • /
    • 2005
  • The consequences of rapid industrial advancement, diversified types of business and unexpected industrial accidents have caused a lot of damage to many unspecified persons both in a human way and a material way Although various previous studies have been analyzed to prevent industrial accidents, these studies only provide managerial and educational policies using frequency analysis and comparative analysis based on data from past industrial accidents. The main objective of this study is to find an optimal algorithm for data analysis of industrial accidents and this paper provides a comparative analysis of 4 kinds of algorithms including CHAID, CART, C4.5, and QUEST. Decision tree algorithm is utilized to predict results using objective and quantified data as a typical technique of data mining. Enterprise Miner of SAS and AnswerTree of SPSS will be used to evaluate the validity of the results of the four algorithms. The sample for this work chosen from 19,574 data related to construction industries during three years ($2002\sim2004$) in Korea.