• Title/Summary/Keyword: Bitmap Join Index

Search Result 5, Processing Time 0.018 seconds

A Data Mining Approach for Selecting Bitmap Join Indices

  • Bellatreche, Ladjel;Missaoui, Rokia;Necir, Hamid;Drias, Habiba
    • Journal of Computing Science and Engineering
    • /
    • v.1 no.2
    • /
    • pp.177-194
    • /
    • 2007
  • Index selection is one of the most important decisions to take in the physical design of relational data warehouses. Indices reduce significantly the cost of processing complex OLAP queries, but require storage cost and induce maintenance overhead. Two main types of indices are available: mono-attribute indices (e.g., B-tree, bitmap, hash, etc.) and multi-attribute indices (join indices, bitmap join indices). To optimize star join queries characterized by joins between a large fact table and multiple dimension tables and selections on dimension tables, bitmap join indices are well adapted. They require less storage cost due to their binary representation. However, selecting these indices is a difficult task due to the exponential number of candidate attributes to be indexed. Most of approaches for index selection follow two main steps: (1) pruning the search space (i.e., reducing the number of candidate attributes) and (2) selecting indices using the pruned search space. In this paper, we first propose a data mining driven approach to prune the search space of bitmap join index selection problem. As opposed to an existing our technique that only uses frequency of attributes in queries as a pruning metric, our technique uses not only frequencies, but also other parameters such as the size of dimension tables involved in the indexing process, size of each dimension tuple, and page size on disk. We then define a greedy algorithm to select bitmap join indices that minimize processing cost and verify storage constraint. Finally, in order to evaluate the efficiency of our approach, we compare it with some existing techniques.

A Study on Selecting Bitmap Join Index to Speed up Complex Queries in Relational Data Warehouses (관계형 데이터 웨어하우스의 복잡한 질의의 처리 효율 향상을 위한 비트맵 조인 인덱스 선택에 관한 연구)

  • An, Hyoung-Geun;Koh, Jae-Jin
    • The KIPS Transactions:PartD
    • /
    • v.19D no.1
    • /
    • pp.1-14
    • /
    • 2012
  • As the size of the data warehouse is large, the selection of indices on the data warehouse affects the efficiency of the query processing of the data warehouse. Indices induce the lower query processing cost, but they occupy the large storage areas and induce the index maintenance cost which are accompanied by database updates. The bitmap join indices are well applied when we optimize the star join queries which join a fact table and many dimension tables and the selection on dimension tables in data warehouses. Though the bitmap join indices with the binary representations induce the lower storage cost, the task to select the indexing attributes among the huge candidate attributes which are generated is difficult. The processes of index selection are to reduce the number of candidate attributes to be indexed and then select the indexing attributes. In this paper on bitmap join index selection problem we reduce the number of candidate attributes by the data mining techniques. Compared to the existing techniques which reduce the number of candidate attributes by the frequencies of attributes we consider the frequencies of attributes and the size of dimension tables and the size of the tuples of the dimension tables and the page size of disk. We use the mining of the frequent itemsets as mining techniques and reduce the great number of candidate attributes. We make the bitmap join indices which have the least costs and the least storage area adapted to storage constraints by using the cost functions applied to the bitmap join indices of the candidate attributes. We compare the existing techniques and ours and analyze them in order to evaluate the efficiencies of ours.

Improving Join Performance for SPARQL Query Processing in the Clouds (클라우드에서 SPARQL 질의 처리를 위한 조인 성능 향상)

  • Choi, Gyu-Jin;Son, Yun-Hee;Lee, Kyu-Chul
    • Journal of KIISE
    • /
    • v.43 no.6
    • /
    • pp.700-709
    • /
    • 2016
  • Recently, with the rapid growth of LOD (Linked Open Data) existing methods based on a single machine have limitation in performance. Existing solutions use distributed framework such as Mapreduce in order to improve the performance. However, the MapReduce framework for processing SPARQL queries involves multiple MapReduce jobs and additional costs incurred. In addition, the problem of unnecessary data processing arises. In this study, we proposed a method to reduce the number of MapReduce jobs during SPARQL query processing and join indexes based on Bitmap for minimizing the costs of processing unnecessary data.

Efficient Processin of Queries with Joints and Aggregate Functions in ROLAP Data Warehousing Environment (관계형 OLAP 데이터 웨어하우징 환경에서 조인과 집계함수를 포함하는 질의의 효율적인 처리)

  • Kim, Jin-Ho;Kim, Yun-Ho;Kim, Sang-Wook
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.39 no.5
    • /
    • pp.1-10
    • /
    • 2002
  • Efficient processing of expensive queries that include joins and/or aggregate functions is crucial in data warehousing environment since there reside enormous volume of data. In this paper, we propose a new method for processing of queries that have both of joins and aggregate functions. The proposed method first performs grouping of the dimension table and then processes join by using the bitmap join index. This makes only the fact table accessed for processing aggregate functions, and thus resolves the serious performance degradation of the existing method. For showing the superiority of the proposed method, we suggest the cost models for the proposed and existing ones, and perform extensive simulations based on the TPC-H benchmark.

A New Method for Processing Queries in Data Warehouse Environment (데이터 웨어하우징 환경에서 질의 처리를 위한 새로운 기법)

  • 김윤호;김진호;감상욱
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2001.04b
    • /
    • pp.121-123
    • /
    • 2001
  • 대용량의 데이터가 저장되는 데이터 웨어하우징 환경에서는 조인이나 집계 함수와 같은 고비용의 연산의 효율적인 처리는 매우 중요하다. 본 논문에서는 집계 함수(aggregate function)와 조인이 모두 포함된 질의를 처리하는 새로운 기법을 제안한다. 제안하는 기법은 먼저 차원 테이블(dimension table)을 미리 그룹핑한 후, 비트맵 조인 인덱스(bitmap join index)를 이용하여 조인을 처리하는 방식을 사용한다. 이 결과, 사실 테이블만을 접근하여 집계 함수를 처리함으로써 기존 기법이 가지는 성능 저하의 문제점을 해결할 수 있다. 기존 기법과 제안하는 기법에 대한 비용 모델(cost model)을 정립하고, 이를 기반으로 시뮬레이션을 수행함으로써 제안된 기법의 우수성을 규명한다.

  • PDF