• Title/Summary/Keyword: Stream Query Processing

Search Result 124, Processing Time 0.024 seconds

Continuous Query Processing in Data Streams Using Duality of Data and Queries (데이타와 질의의 이원성을 이용한 데이타스트림에서의 연속질의 처리)

  • Lim Hyo-Sang;Lee Jae-Gil;Lee Min-Jae;Whang Kyu-Young
    • Journal of KIISE:Databases
    • /
    • v.33 no.3
    • /
    • pp.310-326
    • /
    • 2006
  • In this paper, we deal with a method of efficiently processing continuous queries in a data stream environment. We classify previous query processing methods into two dual categories - data-initiative and query-initiative - depending on whether query processing is initiated by selecting a data element or a query. This classification stems from the fact that data and queries have been treated asymmetrically. For processing continuous queries, only data-initiative methods have traditionally been employed, and thus, the performance gain that could be obtained by query-initiative methods has been overlooked. To solve this problem, we focus on an observation that data and queries can be treated symmetrically. In this paper, we propose the duality model of data and queries and, based on this model, present a new viewpoint of transforming the continuous query processing problem to a multi-dimensional spatial join problem. We also present a continuous query processing algorithm based on spatial join, named Spatial Join CQ. Spatial Join CQ processes continuous queries by finding the pairs of overlapping regions from a set of data elements and a set of queries defined as regions in the multi-dimensional space. The algorithm achieves the effects of both of the two dual methods by using the spatial join, which is a symmetric operation. Experimental results show that the proposed algorithm outperforms earlier methods by up to 36 times for simple selection continuous queries and by up to 7 times for sliding window join continuous queries.

In-Memory Based Incremental Processing Method for Stream Query Processing in Big Data Environments (빅데이터 환경에서 스트림 질의 처리를 위한 인메모리 기반 점진적 처리 기법)

  • Bok, Kyoungsoo;Yook, Misun;Noh, Yeonwoo;Han, Jieun;Kim, Yeonwoo;Lim, Jongtae;Yoo, Jaesoo
    • The Journal of the Korea Contents Association
    • /
    • v.16 no.2
    • /
    • pp.163-173
    • /
    • 2016
  • Recently, massive amounts of stream data have been studied for distributed processing. In this paper, we propose an incremental stream data processing method based on in-memory in big data environments. The proposed method stores input data in a temporary queue and compare them with data in a master node. If the data is in the master node, the proposed method reuses the previous processing results located in the node chosen by the master node. If there are no previous results of data in the node, the proposed method processes the data and stores the result in a separate node. We also propose a job scheduling technique considering the load and performance of a node. In order to show the superiority of the proposed method, we compare it with the existing method in terms of query processing time. Our experimental results show that our method outperforms the existing method in terms of query processing time.

Approximate Top-k Subgraph Matching Scheme Considering Data Reuse in Large Graph Stream Environments (대용량 그래프 스트림 환경에서 데이터 재사용을 고려한 근사 Top-k 서브 그래프 매칭 기법)

  • Choi, Do-Jin;Bok, Kyoung-Soo;Yoo, Jae-Soo
    • The Journal of the Korea Contents Association
    • /
    • v.20 no.8
    • /
    • pp.42-53
    • /
    • 2020
  • With the development of social network services, graph structures have been utilized to represent relationships among objects in various applications. Recently, a demand of subgraph matching in real-time graph streams has been increased. Therefore, an efficient approximate Top-k subgraph matching scheme for low latency in real-time graph streams is required. In this paper, we propose an approximate Top-k subgraph matching scheme considering data reuse in graph stream environments. The proposed scheme utilizes the distributed stream processing platform, called Storm to handle a large amount of stream data. We also utilize an existing data reuse scheme to decrease stream processing costs. We propose a distance based summary indexing technique to generate Top-k subgraph matching results. The proposed summary indexing technique costs very low since it only stores distances among vertices that are selected in advance. Finally, we provide k subgraph matching results to users by performing an approximate Top-k matching on the summary indexing. In order to show the superiority of the proposed scheme, we conduct various performance evaluations in diverse real world datasets.

A PCA-based Data Stream Reduction Scheme for Sensor Networks (센서 네트워크를 위한 PCA 기반의 데이터 스트림 감소 기법)

  • Fedoseev, Alexander;Choi, Young-Hwan;Hwang, Een-Jun
    • Journal of Internet Computing and Services
    • /
    • v.10 no.4
    • /
    • pp.35-44
    • /
    • 2009
  • The emerging notion of data stream has brought many new challenges to the research communities as a consequence of its conceptual difference with conventional concepts of just data. One typical example is data stream processing in sensor networks. The range of data processing considerations in a sensor network is very wide, from physical resource restrictions such as bandwidth, energy, and memory to the peculiarities of query processing including continuous and specific types of queries. In this paper, as one of the physical constraints in data stream processing, we consider the problem of limited memory and propose a new scheme for data stream reduction based on the Principal Component Analysis (PCA) technique. PCA can transform a number of (possibly) correlated variables into a (smaller) number of uncorrelated variables. We adapt PCA for the data stream of a sensor network assuming the cooperation of a query engine (or application) with a network base station. Our method exploits the spatio-temporal correlation among multiple measurements from different sensors. Finally, we present a new framework for data processing and describe a number of experiments under this framework. We compare our scheme with the wavelet transform and observe the effect of time stamps on the compression ratio. We report on some of the results.

  • PDF

Iceberg Query Evaluation Technical Using a Cuboid Prefix Tree (큐보이드 전위트리를 이용한 빙산질의 처리)

  • Han, Sang-Gil;Yang, Woo-Sock;Lee, Won-Suk
    • Journal of KIISE:Databases
    • /
    • v.36 no.3
    • /
    • pp.226-234
    • /
    • 2009
  • A data stream is a massive unbounded sequence of data elements continuously generated at a rapid rate. Due to the characteristics of a data stream, it is impossible to save all the data elements of a data stream. Therefore it is necessary to define a new synopsis structure to store the summary information of a data stream. For this purpose, this paper proposes a cuboid prefix tree that can be effectively employed in evaluating an iceberg query over data streams. A cuboid prefix tree only stores those itemsets that consist of grouping attributes used in GROUP BY query. In addition, a cuboid prefix tree can compute multiple iceberg queries simultaneously by sharing their common sub-expressions. A cuboid prefix tree evaluates an iceberg query over an infinitely generated data stream while efficiently reducing memory usage and processing time, which is verified by a series of experiments.

Attribute-based Approach for Multiple Continuous Queries over Data Streams (데이터 스트림 상에서 다중 연속 질의 처리를 위한 속성기반 접근 기법)

  • Lee, Hyun-Ho;Lee, Won-Suk
    • The KIPS Transactions:PartD
    • /
    • v.14D no.5
    • /
    • pp.459-470
    • /
    • 2007
  • A data stream is a massive unbounded sequence of data elements continuously generated at a rapid rate. Query processing for such a data stream should also be continuous and rapid, which requires strict time and space constraints. In most DSMS(Data Stream Management System), the selection predicates of continuous queries are grouped or indexed to guarantee these constraints. This paper proposes a new scheme tailed an ASC(Attribute Selection Construct) that collectively evaluates selection predicates containing the same attribute in multiple continuous queries. An ASC contains valuable information, such as attribute usage status, partially pre calculated matching results and selectivity statistics for its multiple selection predicates. The processing order of those ASC's that are corresponding to the attributes of a base data stream can significantly influence the overall performance of multiple query evaluation. Consequently, a method of establishing an efficient evaluation order of multiple ASC's is also proposed. Finally, the performance of the proposed method is analyzed by a series of experiments to identify its various characteristics.

Dynamic Load Management Method for Spatial Data Stream Processing on MapReduce Online Frameworks (맵리듀스 온라인 프레임워크에서 공간 데이터 스트림 처리를 위한 동적 부하 관리 기법)

  • Jeong, Weonil
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.19 no.8
    • /
    • pp.535-544
    • /
    • 2018
  • As the spread of mobile devices equipped with various sensors and high-quality wireless network communications functionsexpands, the amount of spatio-temporal data generated from mobile devices in various service fields is rapidly increasing. In conventional research into processing a large amount of real-time spatio-temporal streams, it is very difficult to apply a Hadoop-based spatial big data system, designed to be a batch processing platform, to a real-time service for spatio-temporal data streams. This paper extends the MapReduce online framework to support real-time query processing for continuous-input, spatio-temporal data streams, and proposes a load management method to distribute overloads for efficient query processing. The proposed scheme shows a dynamic load balancing method for the nodes based on the inflow rate and the load factor of the input data based on the space partition. Experiments show that it is possible to support efficient query processing by distributing the spatial data stream in the corresponding area to the shared resources when load management in a specific area is required.

A Method of Frequent Structure Detection Based on Active Sliding Window (능동적 슬라이딩 윈도우 기반 빈발구조 탐색 기법)

  • Hwang, Jeong-Hee
    • Journal of Digital Contents Society
    • /
    • v.13 no.1
    • /
    • pp.21-29
    • /
    • 2012
  • In ubiquitous computing environment, rising large scale data exchange through sensor network with sharply growing the internet, the processing of the continuous stream data is required. Therefore there are some mining researches related to the extracting of frequent structures and the efficient query processing of XML stream data. In this paper, we propose a mining method to extract frequent structures of XML stream data in recent window based on the active window sliding using trigger rule. The proposed method is a basic research to control the stream data flow for data mining and continuous query by trigger rules.

The Processing Method of Stream Data in the Small-size Operating System (소규모 운영체제에서의 스트림데이터 처리기법)

  • Kim, Jin-Deog
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2007.10a
    • /
    • pp.871-874
    • /
    • 2007
  • Stream data need a efficient data management with high reliability and real time processing. The characteristics of these data are a large volume, a short report interval and asynchronous report time. The typical queries of these systems consist of the current query to search the latest signal value, the snapshot query to search the signal value of a past time, the historical query to search the signal value of a past time to current. This paper proposes the efficient method to manage the above signals by using a file structured database in QNX operating systems. The query model to accommodate various query for stream data is proposed. The proposed methods are applied to reactive protection system to verify their usefulness. The COM(Cabinet Operator Module) based on the QNX employs file database that adopts a delta version and a buffering method for the resource limit of a small storage and a low computing power.

  • PDF

A Pattern-based Query Strategy in Wireless Sensor Network

  • Ding, Yanhong;Qiu, Tie;Jiang, He;Sun, Weifeng
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.6 no.6
    • /
    • pp.1546-1564
    • /
    • 2012
  • Pattern-based query processing has not attracted much attention in wireless sensor network though its counterpart has been studied extensively in data stream. The methods used for data stream usually consume large memory and much energy. This conflicts with the fact that wireless sensor networks are heavily constrained by their hardware resources. In this paper, we use piece wise representation to represent sensor nodes' collected data to save sensor nodes' memory and to reduce the energy consumption for query. After getting data stream's and patterns' approximated line segments, we record each line's slope. We do similar matching on slope sequences. We compute the dynamic time warping distance between slope sequences. If the distance is less than user defined threshold, we say that the subsequence is similar to the pattern. We do experiments on STM32W108 processor to evaluate our strategy's performance compared with naive method. The results show that our strategy's matching precision is less than that of naive method, but our method's energy consumption is much better than that of naive approach. The strategy proposed in this paper can be used in wireless sensor network to process pattern-based queries.