• Title/Summary/Keyword: decision trees

Search Result 305, Processing Time 0.023 seconds

OLAP System and Performance Evaluation for Analyzing Web Log Data (웹 로그 분석을 위한 OLAP 시스템 및 성능 평가)

  • 김지현;용환승
    • Journal of Korea Multimedia Society
    • /
    • v.6 no.5
    • /
    • pp.909-920
    • /
    • 2003
  • Nowadays, IT for CRM has been growing and developed rapidly. Typical techniques are statistical analysis tools, on-line multidimensional analytical processing (OLAP) tools, and data mining algorithms (such neural networks, decision trees, and association rules). Among customer data, web log data is very important and to use these data efficiently, applying OLAP technology to analyze multi-dimensionally. To make OLAP cube, we have to precalculate multidimensional summary results in order to get fast response. But as the number of dimensions and sparse cells increases, data explosion occurs seriously and the performance of OLAP decreases. In this paper, we presented why the web log data sparsity occurs and then what kinds of sparsity patterns generate in the two and t.he three dimensions for OLAP. Based on this research, we set up the multidimensional data models and query models for benchmark with each sparsity patterns. Finally, we evaluated the performance of three OLAP systems (MS SQL 2000 Analysis Service, Oracle Express and C-MOLAP).

  • PDF

Data Analysis of Facebook Insights (페이스북 인사이트 데이터 분석)

  • Cha, Young Jun;Lee, Hak Jun;Jung, Yong Gyu
    • The Journal of the Convergence on Culture Technology
    • /
    • v.2 no.1
    • /
    • pp.93-98
    • /
    • 2016
  • As information technologies are rapidly developed recently, social networking services through a variety of mobile devices and smart screen is becoming popular. SNS is a social networking based services which is online forms from existed offline. SNS can also be used differently which is confused with the online community. A modelling algorithm is a variety of techniques, which are assocoation, clustering, neural networks, and decision trees, etc. By utilizing this technique, it is necessary to study to effectively using the large number of materials. In this paper, we evaluate in particular the performance of the algorithm based on the results of the clustering using Facebook Insights data for the EM algorithm to be evaluated as a good performance in clustering. Through this analysis it was based on the results of the application of the experimental data of the change and the South Australian state library according to the performance of the EM algorithm.

A Machine learning Approach for Knowledge Base Construction Incorporating GIS Data for land Cover Classification of Landsat ETM+ Image (지식 기반 시스템에서 GIS 자료를 활용하기 위한 기계 학습 기법에 관한 연구 - Landsat ETM+ 영상의 토지 피복 분류를 사례로)

  • Kim, Hwa-Hwan;Ku, Cha-Yang
    • Journal of the Korean Geographical Society
    • /
    • v.43 no.5
    • /
    • pp.761-774
    • /
    • 2008
  • Integration of GIS data and human expert knowledge into digital image processing has long been acknowledged as a necessity to improve remote sensing image analysis. We propose inductive machine learning algorithm for GIS data integration and rule-based classification method for land cover classification. Proposed method is tested with a land cover classification of a Landsat ETM+ multispectral image and GIS data layers including elevation, aspect, slope, distance to water bodies, distance to road network, and population density. Decision trees and production rules for land cover classification are generated by C5.0 inductive machine learning algorithm with 350 stratified random point samples. Production rules are used for land cover classification integrated with unsupervised ISODATA classification. Result shows that GIS data layers such as elevation, distance to water bodies and population density can be effectively integrated for rule-based image classification. Intuitive production rules generated by inductive machine learning are easy to understand. Proposed method demonstrates how various GIS data layers can be integrated with remotely sensed imagery in a framework of knowledge base construction to improve land cover classification.

Classification of Ovarian Cancer Microarray Data based on Intelligent Systems with Marker gene (선별 시스템 기반 표지 유전자를 포함한 난소암 마이크로어레이 데이터 분류)

  • Park, Su-Young;Jung, Chai-Yeoung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.15 no.3
    • /
    • pp.747-752
    • /
    • 2011
  • Microarray classification typically possesses two striking attributes: (1) classifier design and error estimation are based on remarkably small samples and (2) cross-validation error estimation is employed in the majority of the papers. A Microarray data of ovarian cancer consists of the expressions of thens of thousands of genes, and there is no systematic procedure to analyze this information instantaneously. In this paper, gene markers are selected by ranking genes according to statistics, popular classification rules - linear discriminant analysis, k-nearest-neighbor and decision trees - has been performed comparing classification accuracy of data selecting gene markers and not selecting gene markers. The Result that apply linear classification analysis at Microarray data set including marker gene that are selected using ANOVA method represent the highest classification accuracy of 97.78% and the lowest prediction error estimate.

ECG-based Biometric Authentication Using Random Forest (랜덤 포레스트를 이용한 심전도 기반 생체 인증)

  • Kim, JeongKyun;Lee, Kang Bok;Hong, Sang Gi
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.54 no.6
    • /
    • pp.100-105
    • /
    • 2017
  • This work presents an ECG biometric recognition system for the purpose of biometric authentication. ECG biometric approaches are divided into two major categories, fiducial-based and non-fiducial-based methods. This paper proposes a new non-fiducial framework using discrete cosine transform and a Random Forest classifier. When using DCT, most of the signal information tends to be concentrated in a few low-frequency components. In order to apply feature vector of Random Forest, DCT feature vectors of ECG heartbeats are constructed by using the first 40 DCT coefficients. RF is based on the computation of a large number of decision trees. It is relatively fast, robust and inherently suitable for multi-class problems. Furthermore, it trade-off threshold between admission and rejection of ID inside RF classifier. As a result, proposed method offers 99.9% recognition rates when tested on MIT-BIH NSRDB.

An Architectural Feature Study on the Restoration of Tongbanga-House at Samcheok (삼척 대이리 통방앗간 복원에 관한 건축적 특성 연구)

  • Choi, Jang-Soon;Kim, Jin-Won
    • Journal of the Korean Institute of Rural Architecture
    • /
    • v.10 no.1
    • /
    • pp.101-109
    • /
    • 2008
  • It is very hard to find out Tongbanga-millhouse installed nearby a streamlet to use water with Tongbanga(a kind of water-mill) to polish cereals by pounding like a visage of its old days. It plays an important part in folkloric, architectural and educational aspects. The purpose of this study is to analyse the architectural features of Tongbanga and millhouse itself so that to find the way how to build and fabricate the materials and frame members. Therefore this study has been focused on the composition principle and fabrication method of Tongbanga-millhouse on the side of architecture. The fabrication methods of its house in accordance with regular sequences are as follows. ${\cdot}$ Firstly the decision of location of Tongbanga-millhouse and Hwak(a big mortar made of stone). ${\cdot}$ Slantly three rafter installation at an angle of $50^{\circ}$ to err on the safe side and then slantly fifteen rafter installation making a circular cone shape. ${\cdot}$ Installation of twigs to be circles from bottom to top. ${\cdot}$ Manifoldly covering of trunks peeling the barks from flax plants. ${\cdot}$ Threefoldly thatching with upside barks of oak trees. ${\cdot}$ Placing woods alike rafter on the bark thatches as a weight not to fly away by wind. ${\cdot}$ Binding woods alike rafter with vines of arrowroots to maintain the proper place. The decayed Tongbanga-millhouse by means of upper ways was restored out of all recognition.

  • PDF

러프집합과 계층적 분류구조를 이용한 데이터마이닝에서 분류지식발견

  • Lee, Chul-Heui;Seo, Seon-Hak
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.12 no.3
    • /
    • pp.202-209
    • /
    • 2002
  • This paper deals with simplification of classification rules for data mining and rule bases for control systems. Datamining that extracts useful information from such a large amount of data is one of important issues. There are various ways in classification methodologies for data mining such as the decision trees and neural networks, but the result should be explicit and understandable and the classification rules be short and clear. The rough sets theory is an effective technique in extracting knowledge from incomplete and inconsistent data and provides a good solution for classification and approximation by using various attributes effectively This paper investigates granularity of knowledge for reasoning of uncertain concopts by using rough set approximations and uses a hierarchical classification structure that is more effective technique for classification by applying core to upper level. The proposed classification methodology makes analysis of an information system eary and generates minimal classification rules.

A Study on the Landscape Planning Evaluation on Apartment Artificial Ground (아파트 단지 인공지반의 계획적 평가에 관한 연구)

  • 김유일;오정학;김인혜;윤홍범
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.26 no.3
    • /
    • pp.297-311
    • /
    • 1998
  • Landscaping on artificial ground is currently served as a means to imposing a greenery benefit on high-density and high-rise apartment sites. It functions as a sub-hierarchy in apartment planning such as ornamental element from the past. Major parking space tends to be allocated on the basement area in response to the required parking regulation. Therefore, competitive relatioinship between the parking and greenery space I limited outdoor of apartments leads to the development planning strategy and technology of artificial ground. This study aims at evaluating landscape planning on artificial ground of apartment complex through several approaches such as site survey, plan drawing analysis, and interview with related field experts. 15 survey apartment sites including Bundang Model, Shindaebang-dong, Pyoungchon Hyundai Apartments have been selected for conducting the research. Main results of this study are summarized below : First, scattering allocation of artificial ground between apartment building units is a dominant plan layout type among the survey sites. Even though unifying allocation type has an advantage to maximize underground parking space, it has a difficulty in maintaining proper soil ground base for nurturing plants. Therefore, underground parking space should be planned by unifying allocation type placed separately from apartment units. This plan type can provide a balanced planting between soil and artificial ground on surface level. Second, It is strongly recommended to integrate the whole planting base which involves architectural structure, drainage, and water proofing above the planting design. When considering that process as a professional subject dealing with natural material such as trees and shrubs, those tasks should be directed by landscape architectural divison and landscape architect. And planting area for artificial ground has to be specified in initial phase of architectural design. This step provides an opportunity to make a proper decision on structural load, drainage, and water proof design as an integrated part of the management.

  • PDF

Credit Card Bad Debt Prediction Model based on Support Vector Machine (신용카드 대손회원 예측을 위한 SVM 모형)

  • Kim, Jin Woo;Jhee, Won Chul
    • Journal of Information Technology Services
    • /
    • v.11 no.4
    • /
    • pp.233-250
    • /
    • 2012
  • In this paper, credit card delinquency means the possibility of occurring bad debt within the certain near future from the normal accounts that have no debt and the problem is to predict, on the monthly basis, the occurrence of delinquency 3 months in advance. This prediction is typical binary classification problem but suffers from the issue of data imbalance that means the instances of target class is very few. For the effective prediction of bad debt occurrence, Support Vector Machine (SVM) with kernel trick is adopted using credit card usage and payment patterns as its inputs. SVM is widely accepted in the data mining society because of its prediction accuracy and no fear of overfitting. However, it is known that SVM has the limitation in its ability to processing the large-scale data. To resolve the difficulties in applying SVM to bad debt occurrence prediction, two stage clustering is suggested as an effective data reduction method and ensembles of SVM models are also adopted to mitigate the difficulty due to data imbalance intrinsic to the target problem of this paper. In the experiments with the real world data from one of the major domestic credit card companies, the suggested approach reveals the superior prediction accuracy to the traditional data mining approaches that use neural networks, decision trees or logistics regressions. SVM ensemble model learned from T2 training set shows the best prediction results among the alternatives considered and it is noteworthy that the performance of neural networks with T2 is better than that of SVM with T1. These results prove that the suggested approach is very effective for both SVM training and the classification problem of data imbalance.

Medical Image Classification and Retrieval Using BoF Feature Histogram with Random Forest Classifier (Random Forest 분류기와 Bag-of-Feature 특징 히스토그램을 이용한 의료영상 자동 분류 및 검색)

  • Son, Jung Eun;Ko, Byoung Chul;Nam, Jae Yeal
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.2 no.4
    • /
    • pp.273-280
    • /
    • 2013
  • This paper presents novel OCS-LBP (Oriented Center Symmetric Local Binary Patterns) based on orientation of pixel gradient and image retrieval system based on BoF (Bag-of-Feature) and random forest classifier. Feature vectors extracted from training data are clustered into code book and each feature is transformed new BoF feature using code book. BoF features are applied to random forest for training and random forest having N classes is constructed by combining several decision trees. For testing, the same OCS-LBP feature is extracted from a query image and BoF is applied to trained random forest classifier. In contrast to conventional retrieval system, query image selects similar K-nearest neighbor (K-NN) classes after random forest is performed. Then, Top K similar images are retrieved from database images that are only labeled K-NN classes. Compared with other retrieval algorithms, the proposed method shows both fast processing time and improved retrieval performance.