• Title/Summary/Keyword: 공간데이터 마이닝

Search Result 176, Processing Time 0.029 seconds

An Emerging Pattern Mining based Classification Method for Automated Prediction of Myocardial Ischemia ECG Signals (심근허혈 심전도 신호의 자동화된 예측을 위한 출현 패턴 마이닝 기반의 분류 방법)

  • Heon Gyu Lee;Ming Hao Park;Keun Ho Ryu
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2008.11a
    • /
    • pp.19-22
    • /
    • 2008
  • 최근 서구화된 식생활 패턴과 흡연, 비만 등의 원인으로 인해 심근경색, 협심증과 같은 심근허혈(myocardial ischemia) 질환이 급증하고 있다. 이 논문에서는 심전도 신호로부터 허혈성 심장 질환 진단을 위해 출현 패턴 마이닝을 이용하여 심근경색 및 협심증의 진단 신호인 ischemia beat를 분류 하였다. 또한 기존의 출현 패턴 마이닝에 빠른 패턴 탐사와 저장 공간의 효율성을 고려하여 Apriori-T 빈발 패턴 탐사 알고리즘을 출현 패턴 생성이 가능하도록 확장하였다. PhysioNet의 ST-T 데이터베이스로부터 138개의 대조군(정상)과 ischemia beat 데이터에 제안된 분류 알고리즘을 실험한 결과 최소 75% 및 최대 95%의 예측 정확도를 보였다.

Spatio-temporal Pattern Mining for Power Load Forecasting in GIS-AMR Load Analysis Model (GIS-AMR 부하 분석 모델에서의 전력 부하 예측을 위한 시공간 패턴 마이닝)

  • Lee, Heon Gyu;Piao, Minghao;Park, Jin Hyoung;Shin, Jin-ho;Ryu, Keun Ho
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2009.04a
    • /
    • pp.3-6
    • /
    • 2009
  • 변압기 무선부하감시 시스템에서 30분 간격으로 계측된 부하 데이터와 GIS-AMR 데이터웨어하우스로부터 변압기 속성 및 공간적 특징을 추출하여 정확한 변압기의 부하 패턴을 예측하기 위한 시공간 패턴 마이닝 기법을 적용하였다.

Implementation of Association Rules Creation System from GML Documents (GML 문서에서 연관규칙 생성 시스템 구현)

  • Kim, Eui-Chan;Hwang, Byung-Yeon
    • Journal of Korea Spatial Information System Society
    • /
    • v.8 no.1 s.16
    • /
    • pp.27-35
    • /
    • 2006
  • As the increasing interest about geographical information, such researches and applied fields become wide. OGC(Open GIS Consortium) developed GML(Geography Markup Language) which is adopted XML(extensible Markup Language) in GIS field. In various applied field, GML is used and studied continuously. This paper try to find out the meaningful rules using Apriori algorithm from GML documents, one of the data mining techniques which is studied based on existing XML documents There are two ways to find out the rules. One is the way that find out the related rules as extracting the content in GML documents, the other find out the related rules based on used tags and attributes. This paper describes searching the rules through two ways and shows the system adopted two ways.

  • PDF

The research of preprocessing technique of Data Compaction customized to network packet data (네트워크 패킷 데이터 마이닝을 위한 데이터 압축 전처리 기법에 관한 연구)

  • Na, Sang-Hyuck;Lee, Won-Suk
    • 한국IT서비스학회:학술대회논문집
    • /
    • 2009.05a
    • /
    • pp.341-344
    • /
    • 2009
  • 네트워크(Network) 라우터(Router)와 스위치(Switch) 장치에서 수많은 패킷(Packet)이 통과된다. 네트워크에 연결된 컴퓨터가 20대일 경우에 일일 평균 패킷 전송양은 약 400GB 정도에 이른다. 이러한 패킷 데이터를 분석하기 위해서는 수집된 데이터를 디스크 장치에 저장할 수 있는 대규모의 저장공간과 주기적인 백업이 필요하다. 수집된 데이터 원형에는 사용자가 원하는 정보뿐만 아니라 불필요한 정보가 산재해있다. 따라서 수집된 데이터를 원형 그대로 저장하는 것이 아니라 원하는 정보(Information)와 지식(Knowledge)이 유지되고 쉽게 식별될 수 있도록 데이터를 가공해서 요약된 정보를 유지하는 것이 효과적이다. 전 세계적으로 네트워크를 통과하는 패킷 데이터의 양이 헤아릴 수 없을 만큼 증가하고, 인터넷 보급률이 증가함에 따라서 인터넷 사용자 및 소비자의 정보 분석의 필요성이 부각되고 있다. 본 논문에서는 네트워크에서 수집된 패킷 데이터에 적합한 데이터 전처리 기법(preprocessing)을 제안한다.

  • PDF

Mining High Utility Sequential Patterns Using Sequence Utility Lists (시퀀스 유틸리티 리스트를 사용하여 높은 유틸리티 순차 패턴 탐사 기법)

  • Park, Jong Soo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.7 no.2
    • /
    • pp.51-62
    • /
    • 2018
  • High utility sequential pattern (HUSP) mining has been considered as an important research topic in data mining. Although some algorithms have been proposed for this topic, they incur the problem of producing a large search space for HUSPs. The tighter utility upper bound of a sequence can prune more unpromising patterns early in the search space. In this paper, we propose a sequence expected utility (SEU) as a new utility upper bound of each sequence, which is the maximum expected utility of a sequence and all its descendant sequences. A sequence utility list for each pattern is used as a new data structure to maintain essential information for mining HUSPs. We devise an algorithm, high sequence utility list-span (HSUL-Span), to identify HUSPs by employing SEU. Experimental results on both synthetic and real datasets from different domains show that HSUL-Span generates considerably less candidate patterns and outperforms other algorithms in terms of execution time.

Location Generalization Method of Moving Object using $R^*$-Tree and Grid ($R^*$-Tree와 Grid를 이용한 이동 객체의 위치 일반화 기법)

  • Ko, Hyun;Kim, Kwang-Jong;Lee, Yon-Sik
    • Journal of the Korea Society of Computer and Information
    • /
    • v.12 no.2 s.46
    • /
    • pp.231-242
    • /
    • 2007
  • The existing pattern mining methods[1,2,3,4,5,6,11,12,13] do not use location generalization method on the set of location history data of moving object, but even so they simply do extract only frequent patterns which have no spatio-temporal constraint in moving patterns on specific space. Therefore, it is difficult for those methods to apply to frequent pattern mining which has spatio-temporal constraint such as optimal moving or scheduling paths among the specific points. And also, those methods are required more large memory space due to using pattern tree on memory for reducing repeated scan database. Therefore, more effective pattern mining technique is required for solving these problems. In this paper, in order to develop more effective pattern mining technique, we propose new location generalization method that converts data of detailed level into meaningful spatial information for reducing the processing time for pattern mining of a massive history data set of moving object and space saving. The proposed method can lead the efficient spatial moving pattern mining of moving object using by creating moving sequences through generalizing the location attributes of moving object into 2D spatial area based on $R^*$-Tree and Area Grid Hash Table(AGHT) in preprocessing stage of pattern mining.

  • PDF

Analysis and Performance Evaluation of Pattern Condensing Techniques used in Representative Pattern Mining (대표 패턴 마이닝에 활용되는 패턴 압축 기법들에 대한 분석 및 성능 평가)

  • Lee, Gang-In;Yun, Un-Il
    • Journal of Internet Computing and Services
    • /
    • v.16 no.2
    • /
    • pp.77-83
    • /
    • 2015
  • Frequent pattern mining, which is one of the major areas actively studied in data mining, is a method for extracting useful pattern information hidden from large data sets or databases. Moreover, frequent pattern mining approaches have been actively employed in a variety of application fields because the results obtained from them can allow us to analyze various, important characteristics within databases more easily and automatically. However, traditional frequent pattern mining methods, which simply extract all of the possible frequent patterns such that each of their support values is not smaller than a user-given minimum support threshold, have the following problems. First, traditional approaches have to generate a numerous number of patterns according to the features of a given database and the degree of threshold settings, and the number can also increase in geometrical progression. In addition, such works also cause waste of runtime and memory resources. Furthermore, the pattern results excessively generated from the methods also lead to troubles of pattern analysis for the mining results. In order to solve such issues of previous traditional frequent pattern mining approaches, the concept of representative pattern mining and its various related works have been proposed. In contrast to the traditional ones that find all the possible frequent patterns from databases, representative pattern mining approaches selectively extract a smaller number of patterns that represent general frequent patterns. In this paper, we describe details and characteristics of pattern condensing techniques that consider the maximality or closure property of generated frequent patterns, and conduct comparison and analysis for the techniques. Given a frequent pattern, satisfying the maximality for the pattern signifies that all of the possible super sets of the pattern must have smaller support values than a user-specific minimum support threshold; meanwhile, satisfying the closure property for the pattern means that there is no superset of which the support is equal to that of the pattern with respect to all the possible super sets. By mining maximal frequent patterns or closed frequent ones, we can achieve effective pattern compression and also perform mining operations with much smaller time and space resources. In addition, compressed patterns can be converted into the original frequent pattern forms again if necessary; especially, the closed frequent pattern notation has the ability to convert representative patterns into the original ones again without any information loss. That is, we can obtain a complete set of original frequent patterns from closed frequent ones. Although the maximal frequent pattern notation does not guarantee a complete recovery rate in the process of pattern conversion, it has an advantage that can extract a smaller number of representative patterns more quickly compared to the closed frequent pattern notation. In this paper, we show the performance results and characteristics of the aforementioned techniques in terms of pattern generation, runtime, and memory usage by conducting performance evaluation with respect to various real data sets collected from the real world. For more exact comparison, we also employ the algorithms implementing these techniques on the same platform and Implementation level.

A Spatial Entropy based Decision Tree Method Considering Distribution of Spatial Data (공간 데이터의 분포를 고려한 공간 엔트로피 기반의 의사결정 트리 기법)

  • Jang, Youn-Kyung;You, Byeong-Seob;Lee, Dong-Wook;Cho, Sook-Kyung;Bae, Hae-Young
    • The KIPS Transactions:PartB
    • /
    • v.13B no.7 s.110
    • /
    • pp.643-652
    • /
    • 2006
  • Decision trees are mainly used for the classification and prediction in data mining. The distribution of spatial data and relationships with their neighborhoods are very important when conducting classification for spatial data mining in the real world. Spatial decision trees in previous works have been designed for reflecting spatial data characteristic by rating Euclidean distance. But it only explains the distance of objects in spatial dimension so that it is hard to represent the distribution of spatial data and their relationships. This paper proposes a decision tree based on spatial entropy that represents the distribution of spatial data with the dispersion and dissimilarity. The dispersion presents the distribution of spatial objects within the belonged class. And dissimilarity indicates the distribution and its relationship with other classes. The rate of dispersion by dissimilarity presents that how related spatial distribution and classified data with non-spatial attributes we. Our experiment evaluates accuracy and building time of a decision tree as compared to previous methods. We achieve an improvement in performance by about 18%, 11%, respectively.

A Partition Mining Method of Sequential Patterns using Suffix Checking (서픽스 검사를 이용한 단계적 순차패턴 분할 탐사 방법)

  • 허용도;조동영;박두순
    • Journal of Korea Multimedia Society
    • /
    • v.5 no.5
    • /
    • pp.590-598
    • /
    • 2002
  • For efficient sequential pattern mining, we need to reduce the cost to generate candidate patterns and searching space for the generated ones. Although Apriori-like methods like GSP[8] are simple, they have some problems such as generating of many candidate patterns and repetitive searching of a large database. PrefixSpan[2], which was proposed as an alternative of GSP, constructs the prefix projected databases which are stepwise partitioned in the mining process. It can reduce the searching space to estimate the support of candidate patterns, but the construction cost of projected databases is still high. To solve these problems, we proposed SuffixSpan(Suffix checked Sequential Pattern mining) as a new sequential pattern mining method. It generates a small size of candidate pattern sets using partition property and suffix property at a low cost and also uses 1-prefix projected databases as the searching space in order to reduce the cost of estimating the support of candidate patterns.

  • PDF

Moving Pattern Mining Algorithm of Moving Object for Support of Optimal Path Service (최적 경로 서비스 지원을 위한 이동 객체의 이동 패턴 탐사 알고리즘)

  • Ko, Hyun;Kim, Kwang-Jong;Lee, Yon-Sik
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2006.11a
    • /
    • pp.413-416
    • /
    • 2006
  • 최근 위치 측위 기술의 발달 및 GPS 기술의 상용화로 인해 무선 통신 기기의 보급이 증가하면서 다양한 위치 기반 서비스 개발을 위한 노력이 활발히 진행되고 있다. 사용자들의 특성에 맞게 개인화되고 세분화된 위치 기반 서비스를 제공하기 위해서는 방대한 이동 객체의 위치 이동 데이터로부터 의미있는 지식인 유용한 패턴을 추출하기 위한 시간 패턴 탐사가 필요하다. 기존의 시간 패턴 탐사 기법들 중 일부는 이동 객체의 시간에 따른 공간 속성들의 변화를 충분히 고려하지 못하거나 또는 시공간 속성을 동시에 고려한 패턴 탐사는 가능하나 전체 이동 패턴들 중 추출하고자 하는 패턴에 반드시 포함되어야 하는 공간 정보에 대한 제약이 없어 특정 지점들 사이의 최적 이동 경로 탐색 문제나 단위기간 동안 이동 객체가 순회해야 지점들에 대한 스케줄링 경로 예측 문제 등에 적용하기 어렵다. 따라서 본 논문에서는 이동 객체의 위치 이력 데이터들에 대한 시공간 속성들을 고려하여 다양한 이동 패턴들 중 객체의 최적 이동 경로에 해당하는 패턴을 탐색하기 위한 새로운 시간 패턴 마이닝 알고리즘을 제안한다. 제안된 알고리즘은 특정한 지점들 사이를 이동한 객체의 위치 데이터들 중 객체가 가장 빈번하게 이동한 경로를 탐색하여 최적 경로를 결정하는 알고리즘으로, 공간 추상 계층의 각 계층별 영역 내 포함여부를 고려한 위치 일반화를 수행하여 보다 효과적으로 이동 패턴을 탐색할 수 있다.

  • PDF