[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.3745/KTSDE.2014.3.11.479

Efficient Computation of Data Cubes Using MapReduce

Lee, Ki Yong (숙명여자대학교 컴퓨터과학부)
Park, Sojeong (숙명여자대학교 컴퓨터과학부)
Park, Eunju (숙명여자대학교 컴퓨터과학부)
Park, Jinkyung (숙명여자대학교 컴퓨터과학부)
Choi, Yeunjung (숙명여자대학교 컴퓨터과학부)

Publication Information

KIPS Transactions on Software and Data Engineering / v.3, no.11, 2014 , pp. 479-486 More about this Journal

Abstract

MapReduce is a programing model used for parallelly processing a large amount of data. To analyze a large amount data, the data cube is widely used, which is an operator that computes group-bys for all possible combinations of given dimension attributes. When the number of dimension attributes is n, the data cube computes $2^n$ group-bys. In this paper, we propose an efficient method for computing data cubes using MapReduce. The proposed method partitions $2^n$ group-bys into $_nC_{{\lceil}n/2{\rceil}}$ batches, and computes those batches in stages using ${\lceil}n/2{\rceil}$ MapReduce jobs. Compared to the existing methods, the proposed method significantly reduces the amount of intermediate data generated by mappers, so that the cost of sorting and transferring those intermediate data is reduced significantly. Consequently, the total processing time for computing a data cube is reduced. Through experiments, we show the efficiency of the proposed method over the existing methods.

Keywords

Data Cube; MapReduce; Big Data; Query Processing; OLAP;

Citations & Related Records

Reference

1	Jeffrey Dean, Sanjay Ghemawat, "MapReduce: simplified data processing on large clusters," In Proceedings of OSDI '04, pp.137-150, 2004.
2	http://en.wikipedia.org/wiki/Big_data
3	Mark Beyer, "Gartner Says Solving 'Big Data' Challenge Involves More Than Just Managing Volumes of Data," Gartner, June 27, 2011.
4	J. Gray, A. Bosworth, A. Layman, and H. Pirahesh, "Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals," In Proceedings of the ICDE Conference, pp.152-159, 1996.
5	Arnab Nandi, Cong Yu, Philip Bohannon, and Raghu Ramakrishnan, "Data Cube Materialization and Mining over MapReduce," IEEE Transactions on Knowledge and Data Engineering, Vol.24, No.10, pp.1747-1759, 2012. DOI
6	Zhengkui Wang, Yan Chu, Kian-Lee Tan, Divyakant Agrawal, Amr EI Abbadi, and Xiaolong Xu, "Scalable Data Cube Analysis over Big Data," CoRR, abs/1311.5663, 2013.
7	Wang, Yuxiang, Aibo Song, and Junzhou Luo, "A mapreducemerge-based data cube construction metho," In Proceedings of IEEE International Conference on Grid and Cooperative Computing(GCC), pp.1-6, 2010.
8	Sergey, Kuznecov, and Kudryavcev Yury, "Applying map-reduce paradigm for parallel closed cube computation," In Proceedings of IEEE International Conference on Advances in Databases, Knowledge, and Data Applications, pp.62-67, 2009.
9	You, Jinguo, Jianqing Xi, and Pingjian Zhang, "A parallel algorithm for closed cube computation," In Proceedings of IEEE/ACIS International Conference on Computer and Information Science, pp.95-99, 2008.
10	Venky Harinarayan, Anand Rajaraman, and Jeffrey D. Ullman, "Implementing Data Cubes Efficiently," In Proceedings of ACM SIGMOD, pp.205-216, 1996.
11	Kevin Beyer, Raghu Ramakrishnan, "Bottom-Up Computation of Sparse and Iceberg Cube," In Proceedings of ACM SIGMOD, pp.359-370, 1999.
12	Guoping Wang, Chee-yong Chan, "Multi-Query Optimization in MapReduce Framework," PVLDB, Vol.7, No.3, pp.145-156, 2013.
13	http://aws.amazon.com/ec2/
14	Wang, Wei, Jianlin Feng, Hongjun Lu, and Jeffrey Xu Yu, "Condensed cube: An effective approach to reducing data cube size," In Proceedings of IEEE International Conference on Data Engineering, pp.155-165, 2002.
15	Lakshmanan, Laks VS, Jian Pei, and Jiawei Han, "Quotient cube: How to summarize the semantics of a data cube," In Proceedings of the 28th international conference on Very Large Data Bases, pp.778-789, 2002.

KSCI

Efficient Computation of Data Cubes Using MapReduce 맵리듀스를 사용한 데이터 큐브의 효율적인 계산 기법

Efficient Computation of Data Cubes Using MapReduce