• Title/Summary/Keyword: Massive event streams

Search Result 3, Processing Time 0.02 seconds

An Efficient Complex Event Processing Algorithm based on Multipattern Sharing for Massive Manufacturing Event Streams

  • Wang, Jianhua;Lan, Yubin;Lu, Shilei;Cheng, Lianglun
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.3
    • /
    • pp.1385-1402
    • /
    • 2019
  • Quickly picking up some valuable information from massive manufacturing event stream usually faces with the problem of long detection time, high memory consumption and low detection efficiency due to its stream characteristics of large volume, high velocity, many variety and small value. Aiming to solve the problem above for the current complex event processing methods because of not sharing detection during the detecting process for massive manufacturing event streams, an efficient complex event processing method based on multipattern sharing is presented in this paper. The achievement of this paper lies that a multipattern sharing technology is successfully used to realize the quick detection of complex event for massive manufacturing event streams. Specially, in our scheme, we firstly use pattern sharing technology to merge all the same prefix, suffix, or subpattern that existed in single pattern complex event detection models into a multiple pattern complex event detection model, then we use the new detection model to realize the quick detection for complex events from massive manufacturing event streams, as a result, our scheme can effectively solve the problems above by reducing lots of redundant building, storing, searching and calculating operations with pattern sharing technology. At the end of this paper, we use some simulation experiments to prove that our proposed multiple pattern processing scheme outperforms some general processing methods in current as a whole.

Clustering based on Dependence Tree in Massive Data Streams

  • Yun, Hong-Won
    • Journal of information and communication convergence engineering
    • /
    • v.6 no.2
    • /
    • pp.182-186
    • /
    • 2008
  • RFID systems generate huge amount of data quickly. The data are associated with the locations and the timestamps and the containment relationships. It is requires to assure efficient queries and updates for product tracking and monitoring. We propose a clustering technique for fast query processing. Our study presents the state charts of temporal event flow and proposes the dependence trees with data association and uses them to cluster the linked events. Our experimental evaluation show the power of proposing clustering technique based on dependence tree.

A Real-Time Stock Market Prediction Using Knowledge Accumulation (지식 누적을 이용한 실시간 주식시장 예측)

  • Kim, Jin-Hwa;Hong, Kwang-Hun;Min, Jin-Young
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.4
    • /
    • pp.109-130
    • /
    • 2011
  • One of the major problems in the area of data mining is the size of the data, as most data set has huge volume these days. Streams of data are normally accumulated into data storages or databases. Transactions in internet, mobile devices and ubiquitous environment produce streams of data continuously. Some data set are just buried un-used inside huge data storage due to its huge size. Some data set is quickly lost as soon as it is created as it is not saved due to many reasons. How to use this large size data and to use data on stream efficiently are challenging questions in the study of data mining. Stream data is a data set that is accumulated to the data storage from a data source continuously. The size of this data set, in many cases, becomes increasingly large over time. To mine information from this massive data, it takes too many resources such as storage, money and time. These unique characteristics of the stream data make it difficult and expensive to store all the stream data sets accumulated over time. Otherwise, if one uses only recent or partial of data to mine information or pattern, there can be losses of valuable information, which can be useful. To avoid these problems, this study suggests a method efficiently accumulates information or patterns in the form of rule set over time. A rule set is mined from a data set in stream and this rule set is accumulated into a master rule set storage, which is also a model for real-time decision making. One of the main advantages of this method is that it takes much smaller storage space compared to the traditional method, which saves the whole data set. Another advantage of using this method is that the accumulated rule set is used as a prediction model. Prompt response to the request from users is possible anytime as the rule set is ready anytime to be used to make decisions. This makes real-time decision making possible, which is the greatest advantage of this method. Based on theories of ensemble approaches, combination of many different models can produce better prediction model in performance. The consolidated rule set actually covers all the data set while the traditional sampling approach only covers part of the whole data set. This study uses a stock market data that has a heterogeneous data set as the characteristic of data varies over time. The indexes in stock market data can fluctuate in different situations whenever there is an event influencing the stock market index. Therefore the variance of the values in each variable is large compared to that of the homogeneous data set. Prediction with heterogeneous data set is naturally much more difficult, compared to that of homogeneous data set as it is more difficult to predict in unpredictable situation. This study tests two general mining approaches and compare prediction performances of these two suggested methods with the method we suggest in this study. The first approach is inducing a rule set from the recent data set to predict new data set. The seocnd one is inducing a rule set from all the data which have been accumulated from the beginning every time one has to predict new data set. We found neither of these two is as good as the method of accumulated rule set in its performance. Furthermore, the study shows experiments with different prediction models. The first approach is building a prediction model only with more important rule sets and the second approach is the method using all the rule sets by assigning weights on the rules based on their performance. The second approach shows better performance compared to the first one. The experiments also show that the suggested method in this study can be an efficient approach for mining information and pattern with stream data. This method has a limitation of bounding its application to stock market data. More dynamic real-time steam data set is desirable for the application of this method. There is also another problem in this study. When the number of rules is increasing over time, it has to manage special rules such as redundant rules or conflicting rules efficiently.