• Title/Summary/Keyword: 집계연산

Search Result 58, Processing Time 0.023 seconds

Load Shedding Method based on Grid Hash to Improve Accuracy of Spatial Sliding Window Aggregate Queries (공간 슬라이딩 윈도우 집계질의의 정확도 향상을 위한 그리드 해쉬 기반의 부하제한 기법)

  • Baek, Sung-Ha;Lee, Dong-Wook;Kim, Gyoung-Bae;Chung, Weon-Il;Bae, Hae-Young
    • Journal of Korea Spatial Information System Society
    • /
    • v.11 no.2
    • /
    • pp.89-98
    • /
    • 2009
  • As data stream is entered into system continuously and the memory space is limited, the data exceeding the memory size cannot be processed. In order to solve the problem, load shedding methods which drop a part of data to prevent exceeding the storage space have been researched. Generally, a traditional load shedding method uses random sampling with optimized rate according to data deviation. The method samples data not to distinguish those used in spatial query because the method uses only a random sampling with optimized rate according to data deviation. Therefore, the accuracy of query was reduced in u-GIS environment including spatial query. In this paper, we researched a new load shedding method improving accuracy of the query in u-GIS environment which runs spatial query and aspatial query simultaneously. The method uses a new sampling method that samples data having low probability used in query. Therefore proposed method improves spatial query accuracy and query processing speed as applying spatial filtering operation to sampling operator.

  • PDF

Implementation of Query Processing System in Temporal Databases (시간지원 데이터베이스의 질의처리 시스템 구현)

  • Lee, Eon-Bae;Kim, Dong-Ho;Ryu, Keun-Ho
    • The Transactions of the Korea Information Processing Society
    • /
    • v.5 no.6
    • /
    • pp.1418-1430
    • /
    • 1998
  • Temporal databases support an efficient historical management by means of valid time and transaction time. Valid time stands for the time when a data happens in the real world. And transaction time stands for the time when a data is stored in the database, Temporal Query Processing System(TQPS) should be extended so as tc process the temporal operations for the historical informations in the user query as well as the conventional relational operations. In this paper, the extended temporal query processing systems which is based on the previous temporal query processing system for TQuel(Temporal Query Language) consists of the temporal syntax analyzer, temporal semantic analyzer, temporal code generator, and temporal interpreter is to be described, The algorithm for additional functions such as transaction time management, temporal aggregates, temporal views, temporal joins and the heuristic optimization functions and their example how to be processed is shown.

  • PDF

A Storage Scheme of Health Data Stream for Multidimensional Analysis (건강 스트림 데이터의 다차원적 분석을 위한 저장 구조)

  • Shin, Hea-Won;Lim, Yoon-Sun;Kim, Myung
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2005.05a
    • /
    • pp.81-84
    • /
    • 2005
  • 유비쿼터스 의료 기술이 본격화되면서 센서 네트워크를 통해 환자의 건강 관련 데이터 스트림을 수집하여 위험상황을 탐지하고 지속적인 건강 상태를 모니터링할 수 있게 되었다. 그러나 방대한 양의 스트림 데이터로부터 의미 있는 데이터를 효과적으로 찾아내기 위해서는 실시간으로 데이터의 갱신과 집계 연산이 가능해야 하고 데이터의 압축이 효율적으로 처리 될 수 있는 다차원 저장구조가 필요하다. 기존의 다차원 데이터 분석 도구인 OLAP 큐브 저장구조는 실시간 업데이트가 힘들고, 스트림 데이터 저장 구조인 DSMS들은 다차원 데이터 분석이 용이하지 않다. 이에 본 연구에서는 건강 스트림 데이터의 특징과 질의를 분석하고, 이러한 스트림 데이터에 적합한 저장구조의 요건을 제시하였다. 또한 점진적 갱신이 가능하고, 대용량 데이터를 시간 차원으로 압축, 삭제하기 용이하며 실시간에 분석 데이터 구축이 가능한 저장구조를 제안하고 그 효율성을 보였다.

  • PDF

A Study on the Selective Materialization of Spatial Data Cube (공간 데이타 큐브의 선택적 실체화에 관한 연구)

  • 이기영
    • Journal of the Korea Society of Computer and Information
    • /
    • v.4 no.4
    • /
    • pp.69-76
    • /
    • 1999
  • Recently, it has been studied the methods to materialize and precompute the query results for complexed spatial aggregation queries with high response time and the popular use in spatial data warehouse. In this paper, we propose extended selective materialization algorithm and present the way to materialize selectively which is considered access frequency and computation time of spatial operation according to spatial measures of spatial views for improvement of existing selective materialization algorithms.

  • PDF

A Bitmap Index for Multi-Dimensional Data Analysis (다차원 데이터 분석을 위한 비트맵 인덱스)

  • Im, Yoon-Sun;Park, Young-Sun;Kim, Myung
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2002.10c
    • /
    • pp.298-300
    • /
    • 2002
  • 다차원 데이터를 배열에 저장하는 Multidimensional OLAP (MOLAP) 시스템은 배열내의 위치 정보를 통해 데이터를 신속하게 엑세스할 수 있는 장점을 갖는다. 그러나 실생활의 다차원 데이터는 대체로 희박하여 저장될 때 압축되고, 데이터가 검색될 때는 원래의 위치 정보를 찾기 위해 인덱스를 필요로 하게 된다. 다양한 종류의 다차원 인덱스가 테이블 형태의 데이터를 대상으로 개발되어 있으나, 이들은 데이터의 삽입과 삭제에 유연하게 대처할 수 있도록 하기 위해서 인덱스 공간과 데이터 검색 시간에 약간의 낭비를 초래한다. 본 연구에서는 OLAP 데이터가 주기적으로 갱신되며, 분석에 필요한 집계 데이터도 점진적으로 갱신되기보다 실제로는 새로 생성되고 있다는 점을 고려하여, 읽기 전용 MOLAP 데이터를 위한 인덱스 구조를 제안한다. 데이터는 청크들로 나뉜 후 압축 저장되며, 각 청크는 위치 정보를 유지하면서 비트로 표현되어 인덱스에 저장되도록 하였다. 제안한 비트맵 인덱스는 높은 압축률을 보이며, 범위 질의(range query)를 포함한 OLAP 주요 연산들 처리에 특히 효율적이다.

  • PDF

A Study on the Efficiency of Join Operation On Stream Data Using Sliding Windows (스트림 데이터에서 슬라이딩 윈도우를 사용한 조인 연산의 효율에 관한 연구)

  • Yang, Young-Hyoo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.17 no.2
    • /
    • pp.149-157
    • /
    • 2012
  • In this thesis, the problem of computing approximate answers to continuous sliding-window joins over data streams when the available memory may be insufficient to keep the entire join state. One approximation scenario is to provide a maximum subset of the result, with the objective of losing as few result tuples as possible. An alternative scenario is to provide a random sample of the join result, e.g., if the output of the join is being aggregated. It is shown formally that neither approximation can be addressed effectively for a sliding-window join of arbitrary input streams. Previous work has addressed only the maximum-subset problem, and has implicitly used a frequency based model of stream arrival. There exists a sampling problem for this model. More importantly, it is shown that a broad class of applications for which an age-based model of stream arrival is more appropriate, and both approximation scenarios under this new model are addressed. Finally, for the case of multiple joins being executed with an overall memory constraint, an algorithm for memory allocation across the join that optimizes a combined measure of approximation in all scenarios considered is provided.

Iceberg Query Evaluation Technical Using a Cuboid Prefix Tree (큐보이드 전위트리를 이용한 빙산질의 처리)

  • Han, Sang-Gil;Yang, Woo-Sock;Lee, Won-Suk
    • Journal of KIISE:Databases
    • /
    • v.36 no.3
    • /
    • pp.226-234
    • /
    • 2009
  • A data stream is a massive unbounded sequence of data elements continuously generated at a rapid rate. Due to the characteristics of a data stream, it is impossible to save all the data elements of a data stream. Therefore it is necessary to define a new synopsis structure to store the summary information of a data stream. For this purpose, this paper proposes a cuboid prefix tree that can be effectively employed in evaluating an iceberg query over data streams. A cuboid prefix tree only stores those itemsets that consist of grouping attributes used in GROUP BY query. In addition, a cuboid prefix tree can compute multiple iceberg queries simultaneously by sharing their common sub-expressions. A cuboid prefix tree evaluates an iceberg query over an infinitely generated data stream while efficiently reducing memory usage and processing time, which is verified by a series of experiments.

Hilbert Cube for Spatio-Temporal Data Warehouses (시공간 데이타웨어하우스를 위한 힐버트큐브)

  • 최원익;이석호
    • Journal of KIISE:Databases
    • /
    • v.30 no.5
    • /
    • pp.451-463
    • /
    • 2003
  • Recently, there have been various research efforts to develop strategies for accelerating OLAP operations on huge amounts of spatio-temporal data. Most of the work is based on multi-tree structures which consist of a single R-tree variant for spatial dimension and numerous B-trees for temporal dimension. The multi~tree based frameworks, however, are hardly applicable to spatio-temporal OLAP in practice, due mainly to high management cost and low query efficiency. To overcome the limitations of such multi-tree based frameworks, we propose a new approach called Hilbert Cube(H-Cube), which employs fractals in order to impose a total-order on cells. In addition, the H-Cube takes advantage of the traditional Prefix-sum approach to improve Query efficiency significantly. The H-Cube partitions an embedding space into a set of cells which are clustered on disk by Hilbert ordering, and then composes a cube by arranging the grid cells in a chronological order. The H-Cube refines cells adaptively to handle regional data skew, which may change its locations over time. The H-Cube is an adaptive, total-ordered and prefix-summed cube for spatio-temporal data warehouses. Our approach focuses on indexing dynamic point objects in static spatial dimensions. Through the extensive performance studies, we observed that The H-Cube consumed at most 20% of the space required by multi-tree based frameworks, and achieved higher query performance compared with multi-tree structures.

Design of Spark SQL Based Framework for Advanced Analytics (Spark SQL 기반 고도 분석 지원 프레임워크 설계)

  • Chung, Jaehwa
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.5 no.10
    • /
    • pp.477-482
    • /
    • 2016
  • As being the advanced analytics indispensable on big data for agile decision-making and tactical planning in enterprises, distributed processing platforms, such as Hadoop and Spark which distribute and handle the large volume of data on multiple nodes, receive great attention in the field. In Spark platform stack, Spark SQL unveiled recently to make Spark able to support distributed processing framework based on SQL. However, Spark SQL cannot effectively handle advanced analytics that involves machine learning and graph processing in terms of iterative tasks and task allocations. Motivated by these issues, this paper proposes the design of SQL-based big data optimal processing engine and processing framework to support advanced analytics in Spark environments. Big data optimal processing engines copes with complex SQL queries that involves multiple parameters and join, aggregation and sorting operations in distributed/parallel manner and the proposing framework optimizes machine learning process in terms of relational operations.

An Efficient Search Space Generation Technique for Optimal Materialized Views Selection in Data Warehouse Environment (데이타 웨어하우스 환경에서 최적 실체뷰 구성을 위한 효율적인 탐색공간 생성 기법)

  • Lee Tae-Hee;Chang Jae-young;Lee Sang-goo
    • Journal of KIISE:Databases
    • /
    • v.31 no.6
    • /
    • pp.585-595
    • /
    • 2004
  • A query processing is a critical issue in data warehouse environment since queries on data warehouses often involve hundreds of complex operations over large volumes of data. Data warehouses therefore build a large number of materialized views to increase the system performance. Which views to materialized is an important factor on the view maintenance cost as well as the query performance. The goal of materialized view selection problem is to select an optimal set of views that minimizes total query response time in addition to the view maintenance cost. In this paper, we present an efficient solution for the materialized view selection problem. Although the optimal selection of materialized views is NP-hard problem, we developed a feasible solution by utilizing the characteristics of relational operators such as join, selection, and grouping.