Search | Korea Science

Transformation of Continuous Aggregation Join Queries over Data Streams

Tran, Tri Minh;Lee, Byung-Suk
- Journal of Computing Science and Engineering
- /
- v.3 no.1
- /
- pp.27-58
- /
- 2009
Aggregation join queries are an important class of queries over data streams. These queries involve both join and aggregation operations, with window-based joins followed by an aggregation on the join output. All existing research address join query optimization and aggregation query optimization as separate problems. We observe that, by putting them within the same scope of query optimization, more efficient query execution plans are possible through more versatile query transformations. The enabling idea is to perform aggregation before join so that the join execution time may be reduced. There has been some research done on such query transformations in relational databases, but none has been done in data streams. Doing it in data streams brings new challenges due to the incremental and continuous arrival of tuples. These challenges are addressed in this paper. Specifically, we first present a query processing model geared to facilitate query transformations and propose a query transformation rule specialized to work with streams. The rule is simple and yet covers all possible cases of transformation. Then we present a generic query processing algorithm that works with all alternative query execution plans possible with the transformation, and develop the cost formulas of the query execution plans. Based on the processing algorithm, we validate the rule theoretically by proving the equivalence of query execution plans. Finally, through extensive experiments, we validate the cost formulas and study the performances of alternative query execution plans.
https://doi.org/10.5626/JCSE.2009.3.1.027 인용 PDF

Continuous Query Processing in Data Streams Using Duality of Data and Queries (데이타와 질의의 이원성을 이용한 데이타스트림에서의 연속질의 처리)

Lim Hyo-Sang;Lee Jae-Gil;Lee Min-Jae;Whang Kyu-Young
- Journal of KIISE:Databases
- /
- v.33 no.3
- /
- pp.310-326
- /
- 2006
In this paper, we deal with a method of efficiently processing continuous queries in a data stream environment. We classify previous query processing methods into two dual categories - data-initiative and query-initiative - depending on whether query processing is initiated by selecting a data element or a query. This classification stems from the fact that data and queries have been treated asymmetrically. For processing continuous queries, only data-initiative methods have traditionally been employed, and thus, the performance gain that could be obtained by query-initiative methods has been overlooked. To solve this problem, we focus on an observation that data and queries can be treated symmetrically. In this paper, we propose the duality model of data and queries and, based on this model, present a new viewpoint of transforming the continuous query processing problem to a multi-dimensional spatial join problem. We also present a continuous query processing algorithm based on spatial join, named Spatial Join CQ. Spatial Join CQ processes continuous queries by finding the pairs of overlapping regions from a set of data elements and a set of queries defined as regions in the multi-dimensional space. The algorithm achieves the effects of both of the two dual methods by using the spatial join, which is a symmetric operation. Experimental results show that the proposed algorithm outperforms earlier methods by up to 36 times for simple selection continuous queries and by up to 7 times for sliding window join continuous queries.
PDF KSCI

Preprocessing Method for Handling Multi-Way Join Continuous Queries over Data Streams (데이터 스트림에서 다중 조인 연속질의의 효과적인 처리를 위한 전처리 기법)

Seo, Ki-Yeon;Lee, Joo-Il;Lee, Won-Suk
- Journal of Internet Computing and Services
- /
- v.13 no.3
- /
- pp.93-105
- /
- 2012
A data stream is a series of tuples which are generated in real-time, incessant, immense, and volatile manner. As new information technologies are actively emerging, stream processing methods are being needed to efficiently handle data streams. Especially, finding out an efficient evaluation for a multi-way join would make outstanding contributions toward improving the performance of a data stream management system because a join operation is one of the most resource-consuming operators for evaluating queries. In this paper, in order to evaluate efficiently a multi-way join continuous query, we propose a novel method to decrease the cost of a query by eliminating unsuccessful intermediate results. For this, we propose a matrix-based structure for monitoring data streams and estimate the number of final result tuples of the query and find out unsuccessful tuples by matrix multiplication operations. And then using these information, we process efficiently a multi-way join continuous query by filtering out the unsuccessful tuples in advance before actual evaluation of the query.
https://doi.org/10.7472/jksii.2012.13.3.93 인용 PDF KSCI

Continuous Spatio-Temporal Self-Join Queries over Stream Data of Moving Objects for Symbolic Space (기호공간에서 이동객체 스트림 데이터의 연속 시공간 셀프조인 질의)

Hwang, Byung-Ju;Li, Ki-Joune
- Spatial Information Research
- /
- v.18 no.1
- /
- pp.77-87
- /
- 2010
Spatio-temporal join operators are essential to the management of spatio-temporal data such as moving objects. For example, the join operators are parts of processing to analyze movement of objects and search similar patterns of moving objects. Various studies on spatio-temporal join queries in outdoor space have been done. Recently with advance of indoor positioning techniques, location based services are required in indoor space as well as outdoor space. Nevertheless there is no one about processing of spatio-temporal join query in indoor space. In this paper, we introduce continuous spatio-temporal self-join queries in indoor space and propose a method of processing of the join queries over stream data of moving objects. The continuous spatio-temporal self-join query is to update the joined result set satisfying spatio-temporal predicates continuously. We assume that positions of moving objects are represented by symbols such as a room or corridor. This paper proposes a data structure, called Candidate Pairs Buffer, to filter and maintain massive stream data efficiently and we also investigate performance of proposed method in experimental study.
PDF KSCI

Optimizing Multi-way Join Query Over Data Streams (데이타 스트림에서의 다중 조인 질의 최적화 방법)

Park, Hong-Kyu;Lee, Won-Suk
- Journal of KIISE:Databases
- /
- v.35 no.6
- /
- pp.459-468
- /
- 2008
A data stream which is a massive unbounded sequence of data elements continuously generated at a rapid rate. Many recent research activities for emerging applications often need to deal with the data stream. Such applications can be web click monitoring, sensor data processing, network traffic analysis. telephone records and multi-media data. For this. data processing over a data stream are not performed on the stored data but performed the newly updated data with pre-registered queries, and then return a result immediately or periodically. Recently, many studies are focused on dealing with a data stream more than a stored data set. Especially. there are many researches to optimize continuous queries in order to perform them efficiently. This paper proposes a query optimization algorithm to manage continuous query which has multiple join operators(Multi-way join) over data streams. It is called by an Extended Greedy query optimization based on a greedy algorithm. It defines a join cost by a required operation to compute a join and an operation to process a result and then stores all information for computing join cost and join cost in the statistics catalog. To overcome a weak point of greedy algorithm which has poor performance, the algorithm selects the set of operators with a small lay, instead of operator with the smallest cost. The set is influenced the accuracy and execution time of the algorithm and can be controlled adaptively by two user-defined values. Experiment results illustrate the performance of the EGA algorithm in various stream environments.
PDF KSCI

MMJoin: An Optimization Technique for Multiple Continuous MJoins over Data Streams (데이타 스트림 상에서 다중 연속 복수 조인 질의 처리 최적화 기법)

Byun, Chang-Woo;Lee, Hun-Zu;Park, Seog
- Journal of KIISE:Databases
- /
- v.35 no.1
- /
- pp.1-16
- /
- 2008
Join queries having heavy cost are necessary to Data Stream Management System in Sensor Network where plural short information is generated. It is reasonable that each join operator has a sliding-window constraint for preventing DISK I/O because the data stream represents the infinite size of data. In addition, the join operator should be able to take multiple inputs for overall results. It is possible for the MJoin operator with sliding-windows to do so. In this paper, we consider the data stream environment where multiple MJoin operators are registered and propose MMJoin which deals with issues of building and processing a globally shared query considering characteristics of the MJoin operator with sliding-windows. First, we propose a solution of building the global shared query execution plan. Second, we solved the problems of updating a window size and routing for a join result. Our study can be utilized as a fundamental research for an optimization technique for multiple continuous joins in the data stream environment.
PDF KSCI

Efficient Processing of Continuous Join Queries between a Data Stream and Multiple Relations for Real-Time Analysis of E-Commerce Data (전자상거래 데이터의 실시간 분석을 위한 데이터 스트림과 다수 릴레이션 간의 효율적인 연속 조인 처리 기법)

Kim, Haeri;Lee, Ki Yong
- The Journal of Society for e-Business Studies
- /
- v.18 no.3
- /
- pp.159-175
- /
- 2013
Recently, as real-time availability of e-commerce data becomes possible, the requirement of real-time analysis of e-commerce increases significantly. In the real-time analysis of e-commerce data, it is very important to efficiently process continuous join queries between an e-commerce data stream and disk-based large relations. In this paper, we propose an efficient method for processing a continuous join query between an e-commerce data stream and multiple disk-based relations. The proposed method improves the service rate significantly, while reducing the amount of required memory substantially. Through analysis and various experiments, we show the efficiency of the proposed method compared with the previous one in terms of service rate and memory usage.
https://doi.org/10.7838/jsebs.2013.18.3.159 인용 PDF KSCI

A Review of Window Query Processing for Data Streams

Kim, Hyeon Gyu;Kim, Myoung Ho
- Journal of Computing Science and Engineering
- /
- v.7 no.4
- /
- pp.220-230
- /
- 2013
In recent years, progress in hardware technology has resulted in the possibility of monitoring many events in real time. The volume of incoming data may be so large, that monitoring all individual data might be intractable. Revisiting any particular record can also be impossible in this environment. Therefore, many database schemes, such as aggregation, join, frequent pattern mining, and indexing, become more challenging in this context. This paper surveys the previous efforts to resolve these issues in processing data streams. The emphasis is on specifying and processing sliding window queries, which are supported in many stream processing engines. We also review the related work on stream query processing, including synopsis structures, plan sharing, operator scheduling, load shedding, and disorder control.
https://doi.org/10.5626/JCSE.2013.7.4.220 인용 PDF KSCI KPUBS

A Design and Implementation of Virtual Grid for Reducing Frequency of Continuous Query on LBSNS (LBSNS에서 연속 질의 빈도 감소를 위한 가상그리드 기법의 설계 및 구현)

Lee, Eun-Sik;Cho, Dae-Soo
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.16 no.4
- /
- pp.752-758
- /
- 2012
SNS(Social Networking Services) is oneline service that enable users to construct human network through their relation on web, such as following relation, friend relation, and etc. Recently, owing to the advent of digital devices (smart phone, tablet PC) which embedded GPS some applications which provide services with spatial relevance and social relevance have been released. Such an online service is called LBSNS. It is required to use spatial filtering so as to build the LBSNS system that enable users to subscribe information of interesting area. For spatial filtering, user and tweet attaches location information which divide into static property presenting fixed area and dynamic property presenting user's area changed along the moving user. In the case of using a location information including dynamic property, Continuous query occurred from the moving user causes the problem in server. In this paper, we propose spatial filtering algorithm using Virtual Grid for reducing frequency of query, and conclude that frequency of query on using Virtual Grid is 93% decreased than frequency of query on not using Virtual Grid.
https://doi.org/10.6109/jkiice.2012.16.4.752 인용 PDF KSCI

Greedy Query Optimization Performance Analysis for Join Continuous Query over Data Streams (데이터 스트림 환경에서의 조인 연속 질의의 그리디 질의 최적화 성능 분석)

Park, Hong-Kyu;Lee, Won-Suk
- Proceedings of the Korea Information Processing Society Conference
- /
- 2006.11a
- /
- pp.361-364
- /
- 2006
최근에 제한된 데이터 셋보다 센서 데이터 처리, 웹 서버 로그나 전화 기록과 같은 다양한 트랜잭션 로그 분석 등과 관련된 데이터 스트림 처리에 더 많은 관심이 집중되고 있으며, 특히 데이터 스트림의 질의 처리에 대한 관심이 증가하고 있다. 본 논문에서는 질의 중에서 2 개 이상의 스트림을 조인하는 조인 연속 질의를 처리하는 방법과 성능에 대해서 연구한다. 각 조인의 비용을 스트림의 입력 속도와 조인 선택도를 이용한 조인 비용 모델로 정의하고 그리디 알고리즘을 이용하여 최적화하는 기법을 제안하고 실험을 통해 다양한 스트림 환경에서 최적화 알고리즘이 어떤 성능을 보이는 지를 알아본다.
PDF

Search Result 11, Processing Time 0.023 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)