• Title/Summary/Keyword: DataMining

Search Result 4,045, Processing Time 0.031 seconds

A Multivariate Decision Tree using Support Vector Machines (지지 벡터 머신을 이용한 다변수 결정 트리)

  • Kang, Sung-Gu;Lee, B.W.;Na, Y.C.;Jo, H.S.;Yoon, C.M.;Yang, Ji-Hoon
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2006.10b
    • /
    • pp.278-283
    • /
    • 2006
  • 결정 트리는 큰 가설 공간을 가지고 있어 유연하고 강인한 성능을 지닐 수 있다. 하지만 결정트리가 학습 데이터에 지나치게 적응되는 경향이 있다. 학습데이터에 과도하게 적응되는 경향을 없애기 위해 몇몇 가지치기 알고리즘이 개발되었다. 하지만, 데이터가 속성 축에 평행하지 않아서 오는 공간 낭비의 문제는 이러한 방법으로 해결할 수 없다. 따라서 본 논문에서는 다변수 노드를 사용한 선형 분류기를 이용하여 이러한 문제점을 해결하는 방법을 제시하였으며, 결정트리의 성능을 높이고자 지지 벡터 머신을 도입하였다(SVMDT). 본 논문에서 제시한 알고리즘은 세 가지 부분으로 이루어졌다. 첫째로, 각 노드에서 사용할 속성을 선택하는 부분과 둘째로, ID3를 이 목적에 맞게 바꾼 알고리즘과 마지막으로 기본적인 형태의 가지치기 알고리즘을 개발하였다. UCI 데이터 셋을 이용하여 OC1, C4.5, SVM과 비교한 결과, SVMDT는 개선된 결과를 보였다.

  • PDF

TEMPORAL CLASSIFICATION METHOD FOR FORECASTING LOAD PATTERNS FROM AMR DATA

  • Lee, Heon-Gyu;Shin, Jin-Ho;Ryu, Keun-Ho
    • Proceedings of the KSRS Conference
    • /
    • 2007.10a
    • /
    • pp.594-597
    • /
    • 2007
  • We present in this paper a novel mid and long term power load prediction method using temporal pattern mining from AMR (Automatic Meter Reading) data. Since the power load patterns have time-varying characteristic and very different patterns according to the hour, time, day and week and so on, it gives rise to the uninformative results if only traditional data mining is used. Also, research on data mining for analyzing electric load patterns focused on cluster analysis and classification methods. However despite the usefulness of rules that include temporal dimension and the fact that the AMR data has temporal attribute, the above methods were limited in static pattern extraction and did not consider temporal attributes. Therefore, we propose a new classification method for predicting power load patterns. The main tasks include clustering method and temporal classification method. Cluster analysis is used to create load pattern classes and the representative load profiles for each class. Next, the classification method uses representative load profiles to build a classifier able to assign different load patterns to the existing classes. The proposed classification method is the Calendar-based temporal mining and it discovers electric load patterns in multiple time granularities. Lastly, we show that the proposed method used AMR data and discovered more interest patterns.

  • PDF

Strategy to Improve the Management Efficiency of Meta Data Mining System (메타데이터 마이닝 시스템의 관리효율성의 제고전략)

  • Yun, Yong-Un
    • Proceedings of the Korean Operations and Management Science Society Conference
    • /
    • 2005.05a
    • /
    • pp.276-279
    • /
    • 2005
  • Many large organizations that have allocated resources to Data Administration(DA) have DA-context meta data mining. Also meta data is an interesting topic in the data warehouse world. This conceptual view gradually cleared up, and recently we have been talking more confidently about the back-room and front-room meta data. We describe the processes and problems that characterize the general architecture of s meta data mining system to do improve management efficiency that require further research and development.

  • PDF

Analyzing Customer Management Data by Data Mining: Case Study on Chum Prediction Models for Insurance Company in Korea

  • Cho, Mee-Hye;Park, Eun-Sik
    • Journal of the Korean Data and Information Science Society
    • /
    • v.19 no.4
    • /
    • pp.1007-1018
    • /
    • 2008
  • The purpose of this case study is to demonstrate database-marketing management. First, we explore original variables for insurance customer's data, modify them if necessary, and go through variable selection process before analysis. Then, we develop churn prediction models using logistic regression, neural network and SVM analysis. We also compare these three data mining models in terms of misclassification rate.

  • PDF

An Empirical Comparison Study on Attack Detection Mechanisms Using Data Mining (데이터 마이닝을 이용한 공격 탐지 메커니즘의 실험적 비교 연구)

  • Kim, Mi-Hui;Oh, Ha-Young;Chae, Ki-Joon
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.31 no.2C
    • /
    • pp.208-218
    • /
    • 2006
  • In this paper, we introduce the creation methods of attack detection model using data mining technologies that can classify the latest attack types, and can detect the modification of existing attacks as well as the novel attacks. Also, we evaluate comparatively these attack detection models in the view of detection accuracy and detection time. As the important factors for creating detection models, there are data, attribute, and detection algorithm. Thus, we used NetFlow data gathered at the real network, and KDD Cup 1999 data for the experiment in large quantities. And for attribute selection, we used a heuristic method and a theoretical method using decision tree algorithm. We evaluate comparatively detection models using a single supervised/unsupervised data mining approach and a combined supervised data mining approach. As a result, although a combined supervised data mining approach required more modeling time, it had better detection rate. All models using data mining techniques could detect the attacks within 1 second, thus these approaches could prove the real-time detection. Also, our experimental results for anomaly detection showed that our approaches provided the detection possibility for novel attack, and especially SOM model provided the additional information about existing attack that is similar to novel attack.

Mining Frequent Pattern from Large Spatial Data (대용량 공간 데이터로 부터 빈발 패턴 마이닝)

  • Lee, Dong-Gyu;Yi, Gyeong-Min;Jung, Suk-Ho;Lee, Seong-Ho;Ryu, Keun-Ho
    • Journal of Korea Spatial Information System Society
    • /
    • v.12 no.1
    • /
    • pp.49-56
    • /
    • 2010
  • Many researches of frequent pattern mining technique for detecting unknown patterns on spatial data have studied actively. Existing data structures have classified into tree-structure and array-structure, and those structures show the weakness of performance on dense or sparse data. Since spatial data have obtained the characteristics of dense and sparse patterns, it is important for us to mine quickly dense and sparse patterns using only single algorithm. In this paper, we propose novel data structure as compressed patricia frequent pattern tree and frequent pattern mining algorithm based on proposed data structure which can detect frequent patterns quickly in terms of both dense and sparse frequent patterns mining. In our experimental result, proposed algorithm proves about 10 times faster than existing FP-Growth algorithm on both dense and sparse data.

Environmental Consciousness Data Modeling by Association Rules

  • Park, Hee-Chang;Cho, Kwang-Hyun
    • Journal of the Korean Data and Information Science Society
    • /
    • v.16 no.3
    • /
    • pp.529-538
    • /
    • 2005
  • Data mining is the method to find useful information for large amounts of data in database. It is used to find hidden knowledge by massive data, unexpectedly pattern, relation to new rule. The methods of data mining are association rules, decision tree, clustering, neural network and so on. Association rule mining searches for interesting relationships among items in a riven large data set. Association rules are frequently used by retail stores to assist in marketing, advertising, floor placement, and inventory control. There are three primary quality measures for association rule, support and confidence and lift. We analyze Gyeongnam social indicator survey data using association rule technique for environmental information discovery. We can use to environmental preservation and environmental improvement by association rule outputs.

  • PDF

Association Rule of Gyeongnam Social Indicator Survey Data for Environmental Information

  • Park, Hee-Chang;Cho, Kwang-Hyun
    • Journal of the Korean Data and Information Science Society
    • /
    • v.16 no.1
    • /
    • pp.59-69
    • /
    • 2005
  • Data mining is the method to find useful information for large amounts of data in database It is used to find hidden knowledge by massive data, unexpectedly pattern, relation to new rule. The methods of data mining are decision tree, association rules, clustering, neural network and so on. We analyze Gyeongnam social indicator survey data by 2001 using association rule technique for environment information. Association rule mining searches for interesting relationships among items in a given large data set. Association rules are frequently used by retail stores to assist in marketing, advertising, floor placement, and inventory control. There are three primary quality measures for association rule, support and confidence and lift. We can use to environmental preservation and environmental improvement by association rule outputs

  • PDF

PPFP(Push and Pop Frequent Pattern Mining): A Novel Frequent Pattern Mining Method for Bigdata Frequent Pattern Mining (PPFP(Push and Pop Frequent Pattern Mining): 빅데이터 패턴 분석을 위한 새로운 빈발 패턴 마이닝 방법)

  • Lee, Jung-Hun;Min, Youn-A
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.5 no.12
    • /
    • pp.623-634
    • /
    • 2016
  • Most of existing frequent pattern mining methods address time efficiency and greatly rely on the primary memory. However, in the era of big data, the size of real-world databases to mined is exponentially increasing, and hence the primary memory is not sufficient enough to mine for frequent patterns from large real-world data sets. To solve this problem, there are some researches for frequent pattern mining method based on disk, but the processing time compared to the memory based methods took very time consuming. There are some researches to improve scalability of frequent pattern mining, but their processes are very time consuming compare to the memory based methods. In this paper, we present PPFP as a novel disk-based approach for mining frequent itemset from big data; and hence we reduced the main memory size bottleneck. PPFP algorithm is based on FP-growth method which is one of the most popular and efficient frequent pattern mining approaches. The mining with PPFP consists of two setps. (1) Constructing an IFP-tree: After construct FP-tree, we assign index number for each node in FP-tree with novel index numbering method, and then insert the indexed FP-tree (IFP-tree) into disk as IFP-table. (2) Mining frequent patterns with PPFP: Mine frequent patterns by expending patterns using stack based PUSH-POP method (PPFP method). Through this new approach, by using a very small amount of memory for recursive and time consuming operation in mining process, we improved the scalability and time efficiency of the frequent pattern mining. And the reported test results demonstrate them.

FEROM: Feature Extraction and Refinement for Opinion Mining

  • Jeong, Ha-Na;Shin, Dong-Wook;Choi, Joong-Min
    • ETRI Journal
    • /
    • v.33 no.5
    • /
    • pp.720-730
    • /
    • 2011
  • Opinion mining involves the analysis of customer opinions using product reviews and provides meaningful information including the polarity of the opinions. In opinion mining, feature extraction is important since the customers do not normally express their product opinions holistically but separately according to its individual features. However, previous research on feature-based opinion mining has not had good results due to drawbacks, such as selecting a feature considering only syntactical grammar information or treating features with similar meanings as different. To solve these problems, this paper proposes an enhanced feature extraction and refinement method called FEROM that effectively extracts correct features from review data by exploiting both grammatical properties and semantic characteristics of feature words and refines the features by recognizing and merging similar ones. A series of experiments performed on actual online review data demonstrated that FEROM is highly effective at extracting and refining features for analyzing customer review data and eventually contributes to accurate and functional opinion mining.