• Title/Summary/Keyword: Stream Data Mining

Search Result 97, Processing Time 0.026 seconds

An Efficient Method for Mining Frequent Patterns based on Weighted Support over Data Streams (데이터 스트림에서 가중치 지지도 기반 빈발 패턴 추출 방법)

  • Kim, Young-Hee;Kim, Won-Young;Kim, Ung-Mo
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.10 no.8
    • /
    • pp.1998-2004
    • /
    • 2009
  • Recently, due to technical developments of various storage devices and networks, the amount of data increases rapidly. The large volume of data streams poses unique space and time constraints on the data mining process. The continuous characteristic of streaming data necessitates the use of algorithms that require only one scan over the stream for knowledge discovery. Most of the researches based on the support are concerned with the frequent itemsets, but ignore the infrequent itemsets even if it is crucial. In this paper, we propose an efficient method WSFI-Mine(Weighted Support Frequent Itemsets Mine) to mine all frequent itemsets by one scan from the data stream. This method can discover the closed frequent itemsets using DCT(Data Stream Closed Pattern Tree). We compare the performance of our algorithm with DSM-FI and THUI-Mine, under different minimum supports. As results show that WSFI-Mine not only run significant faster, but also consume less memory.

Frequent Items Mining based on Regression Model in Data Streams (스트림 데이터에서 회귀분석에 기반한 빈발항목 예측)

  • Lee, Uk-Hyun
    • The Journal of the Korea Contents Association
    • /
    • v.9 no.1
    • /
    • pp.147-158
    • /
    • 2009
  • Recently, the data model in stream data environment has massive, continuous, and infinity properties. However the stream data processing like query process or data analysis is conducted using a limited capacity of disk or memory. In these environment, the traditional frequent pattern discovery on transaction database can be performed because it is difficult to manage the information continuously whether a continuous stream data is the frequent item or not. In this paper, we propose the method which we are able to predict the frequent items using the regression model on continuous stream data environment. We can use as a prediction model on indefinite items by constructing the regression model on stream data. We will show that the proposed method is able to be efficiently used on stream data environment through a variety of experiments.

A GEOSENSOR FILTER FOR PROCESSING GEOSENSOR QUERIES ON DATA STREAMS

  • Lee, Dong-Gyu;Ryu, Keun-Ho
    • Proceedings of the KSRS Conference
    • /
    • 2008.10a
    • /
    • pp.119-121
    • /
    • 2008
  • Pattern matching is increasingly being employed in various researches as health care service, RFID-based system, facility management, and surveillance. Geosensor filter correlates a data stream to match specific patterns in distribution environments. In this paper, we present a geosensor query language to represent efficiently declarative geosensor query. Geosensor operators are proposed to use for fast query processing in terms of spatial and temporal area in distribution environments. We also propose a geosensor filter to match new query predicates into incoming stream predicates. Our filter can reduce the volume of transmission data and save power consumption of sensors. It can be utilized the stream data mining system to process in real-time various data as location, time, and geosensor information in distribution environments.

  • PDF

Development of the Performance Benchmark Tool for Data Stream Management Systems Combined with DBMS (DBMS와 결합된 데이터스트림관리시스템을 위한 성능 평가 도구 개발)

  • Kim, Gyoung-Bae
    • Journal of the Korea Society of Computer and Information
    • /
    • v.15 no.8
    • /
    • pp.1-11
    • /
    • 2010
  • Many applications of DSMS(Data Stream Management System) require not only to process real-time stream data efficiently but also to provide high quality services such as data mining and data warehouse combining with DBMS(Database Management System) to users. In this paper we execute the performance benchmark of the combined system of DSMS and DBMS that is developed for high quality services. We use the stream data of network monitoring application system and combine the traditional representative DSMSs and DBMSs in a single system for the performance testing. We develop the total performance benchmark tool implementing JAVA language for the our testing. For our performance testing, we combine DSMS such as STREAM and Coral8 and DBMS such MySQL and Oracle10g respectively.

CONTINUOUS QUERY PROCESSING IN A DATA STREAM ENVIRONMENT

  • Lee, Dong-Gyu;Lee, Bong-Jae;Ryu, Keun-Ho
    • Proceedings of the KSRS Conference
    • /
    • 2007.10a
    • /
    • pp.3-5
    • /
    • 2007
  • Many continuous queries are important to be process efficiently in a data stream environment. It is applied a query index technique that takes linear performance irrespective of the number and width of intervals for processing many continuous queries. Previous researches are not able to support the dynamic insertion and deletion to arrange intervals for constructing an index previously. It shows that the insertion and search performance is slowed by the number and width of interval inserted. Many intervals have to be inserted and searched linearly in a data stream environment. Therefore, we propose Hashed Multiple Lists in order to process continuous queries linearly. Proposed technique shows fast linear search performance. It can be utilized the systems applying a sensor network, and preprocessing technique of spatiotemporal data mining.

  • PDF

Design of Sensor Middleware Architecture on Multi Level Spatial DBMS with Snapshot (스냅샷을 가지는 다중 레벨 공간 DBMS를 기반으로 하는 센서 미들웨어 구조 설계)

  • Oh, Eun-Seog;Kim, Ho-Seok;Kim, Jae-Hong;Bae, Hae-Young
    • Journal of Korea Spatial Information System Society
    • /
    • v.8 no.1 s.16
    • /
    • pp.1-16
    • /
    • 2006
  • Recently, human based computing environment for supporting users to concentrate only user task without sensing other changes from users is being progressively researched and developed. But middleware deletes steream data processed for reducing process load of massive information from RFID sensor in this computing. So, this kind of middleware have problems when user demands probability or statistics needed for data warehousing or data mining and when user demands very important stream data repeatedly but already discarded in the middleware every former time. In this paper, we designs Sensor Middleware Architecture on Multi Level Spatial DBMS with Snapshot and manage repeatedly required stream datas to solve reusing problems of historical stream data in current middleware. This system uses disk databse that manages historical stream datas filtered in middleware for requiring services using historical stream information as data mining or data warehousing from user, and uses memory database that mamages highly reuseable data as a snapshot when stream data storaged in disk database has high reuse frequency from user. For the more, this system processes memory database management policy in a cycle to maintain high reusement and rapid service for users. Our paper system solves problems of repeated requirement of stream datas, or a policy decision service using historical stream data of current middleware. Also offers variant and rapid data services maintaining high data reusement of main memory snapshot datas.

  • PDF

Finding Pseudo Periods over Data Streams based on Multiple Hash Functions (다중 해시함수 기반 데이터 스트림에서의 아이템 의사 주기 탐사 기법)

  • Lee, Hak-Joo;Kim, Jae-Wan;Lee, Won-Suk
    • Journal of Information Technology Services
    • /
    • v.16 no.1
    • /
    • pp.73-82
    • /
    • 2017
  • Recently in-memory data stream processing has been actively applied to various subjects such as query processing, OLAP, data mining, i.e., frequent item sets, association rules, clustering. However, finding regular periodic patterns of events in an infinite data stream gets less attention. Most researches about finding periods use autocorrelation functions to find certain changes in periodic patterns, not period itself. And they usually find periodic patterns in time-series databases, not in data streams. Literally a period means the length or era of time that some phenomenon recur in a certain time interval. However in real applications a data set indeed evolves with tiny differences as time elapses. This kind of a period is called as a pseudo-period. This paper proposes a new scheme called FPMH (Finding Periods using Multiple Hash functions) algorithm to find such a set of pseudo-periods over a data stream based on multiple hash functions. According to the type of pseudo period, this paper categorizes FPMH into three, FPMH-E, FPMH-PC, FPMH-PP. To maximize the performance of the algorithm in the data stream environment and to keep most recent periodic patterns in memory, we applied decay mechanism to FPMH algorithms. FPMH algorithm minimizes the usage of memory as well as processing time with acceptable accuracy.

A Mining Method for Exploration of Causality on Data Stream System (데이터 스트림 시스템에서 인과관계 탐사를 위한 마이닝 방법)

  • Han, Dae-Young;Kim, Dae-In;Hwang, Bu-Hyun
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2009.04a
    • /
    • pp.306-309
    • /
    • 2009
  • 일반적으로 이벤트는 발생 시점이라는 시간 속성을 갖는다. 그리고 고객 단위로 이벤트를 축적한 데이터베이스가 있다면 데이터 마이닝을 통하여 유용한 정보를 탐사할 수 있다. 특히 이벤트 발생의 원인과 결과에 대한 관계 규칙을 찾아낼 수 있다면 과거의 정보를 바탕으로 미래를 예측할 수 있는 예측 판단 정보로 사용할 수 있다. 본 연구에서는 데이터 스트림 시스템에서 시간 관계 규칙을 탐사하고 시간 관계 규칙을 구성하는 이벤트 간의 영향력을 측정하기 위한 SM-EC(data Stream Mining for Exploration of Causality)기법을 제안한다. 실험을 통하여 SM-EC가 제공하는 영향력 정보는 다양한 비상 상황에 대처하는 중요한 척도가 될 수 있음을 확인하였다.

Mining Association Rule for the Abnormal Event in Data Stream Systems (데이터 스트림 시스템에서 이상 이벤트에 대한 연관 규칙 마이닝)

  • Kim, Dae-In;Park, Joon;Hwang, Bu-Hyun
    • The KIPS Transactions:PartD
    • /
    • v.14D no.5
    • /
    • pp.483-490
    • /
    • 2007
  • Recently mining techniques that analyze the data stream to discover potential information, have been widely studied. However, most of the researches based on the support are concerned with the frequent event, but ignore the infrequent event even if it is crucial. In this paper, we propose SM-AF method discovering association rules to an abnormal event. In considering the window that an abnormal event is sensed, SM-AF method can discover the association rules to the critical event, even if it is occurred infrequently. Also, SM-AF method can discover the significant rare itemsets associated with abnormal event and periodic event itemsets. Through analysis and experiments, we show that SM-AF method is superior to the previous methods of mining association rules.