Search | Korea Science

Iceberg Cube Parallel Computation using MapReduce (맵리듀스를 이용한 빙산 큐브 병렬 계산)

Lee, Su-An;Kim, Jin-Ho;Moon, Yang-Sae;Loh, Woong-Kee
- Proceedings of the Korean Information Science Society Conference
- /
- 2010.06a
- /
- pp.25-26
- /
- 2010
대용량 데이터의 효율적 분석을 위해 데이터 뷰브가 연구되었으며, 데이터 큐브 계산의 고비용 문제점을 해결하기 위하여 큐브의 일부 영역만을 계산하는 빙산 큐브가 등장하였다. 빙산 큐브는 저장 공간의 감소, 집중적인 분석 등의 장점이 있으나, 여전히 많은 계산과 저장 공간을 필요로 하는 단점이 있다. 본 논문에서는 이러한 문제점을 해결하는 실용적인 방법으로 대용량 문제를 분산하여 처리하는 분산 병렬 컴퓨팅 기술인 맵리듀스(MapReduce) 프레임워크를 사용하여 분산 병렬 빙산 큐브인 MR-Naive와 MR-BUC 알고리즘을 제안한다. 실험을 통해 맵리듀스 프레임워크를 통한 빙사 큐브 계산이 효율적으로 분산 병렬 처리 됨을 확인하였다.
PDF

Range-based Cube Partitioning for Reducing I/O Cost in Cube Computation (큐브 계산에서 I/O 비용을 줄이는 구간 기반 큐브 분할)

Park, Woong-Je;Chung, Yon-Dohn;Kim, Jin-Nyoung;Lee, Yoon-Joon;Kim, Myoung-Ho
- Journal of KIISE:Databases
- /
- v.28 no.4
- /
- pp.596-605
- /
- 2001
In this paper we propose a method, called the range-based cube partitioning (RCP)method for reducing I/O cost of cube computation in OLAP The method improves I/O performance of cube partitioning process by overlapping some computation between partitioning stages. For overlapping the computation, the method partitions the cube based on the ranges of attribute values, not the points of attribute value, Through analysis any experiments, we show the performance of the proposed method with comparison of the previous cube partitioning method.
PDF

An Efficient Incremental Maintenance Method for Data Cubes in Data Warehouses (데이타 웨어하우스에서 데이타 큐브를 위한 효율적인 점진적 관리 기법)

Lee, Ki-Yong;Park, Chang-Sup;Kim, Myoung-Ho
- Journal of KIISE:Databases
- /
- v.33 no.2
- /
- pp.175-187
- /
- 2006
The data cube is an aggregation operator that computes group-bys for all possible combination of dimension attributes. %on the number of the dimension attributes is n, a data cube computes $2^n$ group-bys. Each group-by in a data cube is called a cuboid. Data cubes are often precomputed and stored as materialized views in data warehouses. These data cubes need to be updated when source relation change. The incremental maintenance of a data cube is to compute and propagate only its changes. To compute the change of a data cube of $2^n$ cuboids, previous works compute a delta cube that has the same number of cuboids as the original data cube. Thus, as the number of dimension attributes increases, the cost of computing a delta cube increases significantly. Each cuboid in a delta cube is called a delta cuboid. In this paper. we propose an incremental cube maintenance method that can maintain a data cube by using only $_nC_{{\lceil}n/2{\rceil}}$ delta cuboids. As a result, the cost of computing a delta cube is substantially reduced. Through various experiments, we show the performance advantages of our method over previous methods.
PDF KSCI

An Iterative Algorithm for the Bottom Up Computation of the Data Cube using MapReduce (맵리듀스를 이용한 데이터 큐브의 상향식 계산을 위한 반복적 알고리즘)

Lee, Suan;Jo, Sunhwa;Kim, Jinho
- Journal of Information Technology and Architecture
- /
- v.9 no.4
- /
- pp.455-464
- /
- 2012
Due to the recent data explosion, methods which can meet the requirement of large data analysis has been studying. This paper proposes MRIterativeBUC algorithm which enables efficient computation of large data cube by distributed parallel processing with MapReduce framework. MRIterativeBUC algorithm is developed for efficient iterative operation of the BUC method with MapReduce, and overcomes the limitations about the storage size and processing ability caused by large data cube computation. It employs the idea from the iceberg cube which computes only the interesting aspect of analysts and the distributed parallel process of cube computation by partitioning and sorting. Thus, it reduces data emission so that it can reduce network overload, processing amount on each node, and eventually the cube computation cost. The bottom-up cube computation and iterative algorithm using MapReduce, proposed in this paper, can be expanded in various way, and will make full use of many applications.
KSCI

Efficient Computation of Data Cubes in MapReduce (맵리듀스에서 데이터 큐브의 효율적인 계산 기법)

Lee, Ki Yong;Park, Sojeong;Park, Eunju;Park, Jinkyung;Choi, Yeunjung
- Proceedings of the Korea Information Processing Society Conference
- /
- 2014.04a
- /
- pp.715-718
- /
- 2014
맵리듀스(MapReduce)는 대용량 데이터의 병렬 처리에 사용되는 프로그래밍 모델이다. 데이터 큐브(data cube)는 대용량 데이터의 다차원 분석에 널리 사용되는 연산자로서, 주어진 차원 애트리뷰트들의 모든 가능한 조합에 대한 group-by 를 계산한다. 차원 애트리뷰트가 n 개일 때, 데이터 큐브는 총 $2^n$ 개의 group-by 를 계산한다. 본 논문은 맵리듀스 환경에서 데이터 큐브를 효율적으로 계산하는 방법을 제안한다. 제안 방법은 $2^n$ 개의 group-by 를 분할하고 이들을 ${\lceil}n/2{\rceil}$개의 맵리듀스 잡(job)을 통해 단계적으로 계산한다. 제안 방법은 각 맵리듀스 잡에서 맵 함수가 출력하는 중간결과의 크기를 최소화함으로써 총 계산 비용을 크게 줄인다. 실험을 통해 제안 방법은 기존 방법에 비해 데이터 큐브를 더 빠르게 계산함을 보인다.
https://doi.org/10.3745/PKIPS.y2014m04a.715 인용 PDF

Efficient Computation of Data Cubes Using MapReduce (맵리듀스를 사용한 데이터 큐브의 효율적인 계산 기법)

Lee, Ki Yong;Park, Sojeong;Park, Eunju;Park, Jinkyung;Choi, Yeunjung
- KIPS Transactions on Software and Data Engineering
- /
- v.3 no.11
- /
- pp.479-486
- /
- 2014
MapReduce is a programing model used for parallelly processing a large amount of data. To analyze a large amount data, the data cube is widely used, which is an operator that computes group-bys for all possible combinations of given dimension attributes. When the number of dimension attributes is n, the data cube computes $2^n$ group-bys. In this paper, we propose an efficient method for computing data cubes using MapReduce. The proposed method partitions $2^n$ group-bys into $_nC_{{\lceil}n/2{\rceil}}$ batches, and computes those batches in stages using ${\lceil}n/2{\rceil}$ MapReduce jobs. Compared to the existing methods, the proposed method significantly reduces the amount of intermediate data generated by mappers, so that the cost of sorting and transferring those intermediate data is reduced significantly. Consequently, the total processing time for computing a data cube is reduced. Through experiments, we show the efficiency of the proposed method over the existing methods.
https://doi.org/10.3745/KTSDE.2014.3.11.479 인용 PDF KSCI

Efficient Computation of Stream Cubes Using AVL Trees (AVL 트리를 사용한 효율적인 스트림 큐브 계산)

Kim, Ji-Hyun;Kim, Myung
- The KIPS Transactions:PartD
- /
- v.14D no.6
- /
- pp.597-604
- /
- 2007
Stream data is a continuous flow of information that mostly arrives as the form of an infinite rapid stream. Recently researchers show a great deal of interests in analyzing such data to obtain value added information. Here, we propose an efficient cube computation algorithm for multidimensional analysis of stream data. The fact that stream data arrives in an unsorted fashion and aggregation results can only be obtained after the last data item has been read. cube computation requires a tremendous amount of memory. In order to resolve such difficulties, we compute user selected aggregation fables only, and use a combination of an way and AVL trees as a temporary storage for aggregation tables. The proposed cube computation algorithm works even when main memory is not large enough to store all the aggregation tables during the computation. We showed that the proposed algorithm is practically fast enough by theoretical analysis and performance evaluation.
https://doi.org/10.3745/KIPSTD.2007.14-D.6.597 인용 PDF KSCI

Performance Evaluation of Front-End OLAP Cube Generation Algorithms on Relational DBMS (관계 DBMS 상에서 전위 방식의 OLAP 큐브 생성 알고리즘의 성능 평가)

Jo, Sun-Hwa;Kim, Jin-Ho;Moon, Yang-Sae
- Proceedings of the Korean Information Science Society Conference
- /
- 2005.07b
- /
- pp.163-165
- /
- 2005
ROLAP 시스템에서는 다차원 OLAP 큐브를 관계 데이터베이스 내에 여러 집계 테이블을 사용하여 저장하며, 관계 DBMS 기능을 그대로 이용하므로 구현이 간단하다. 이들 집계 테이블들은 대용량의 소스 데이타(즉, 사실 테이블)를 정렬한 후 이에 대한 집계 값을 계산하므로 큐브를 생성하는데 많은 시간이 소요된다. 이러한 다차원 큐브를 효율적으로 생성할 수 있는 여러 가지 방법이 제안되었다. 이들 방법들은 큐브 생성 시간이 사실 테이블을 정렬하는데 주로 소요되므로 이 횟수를 줄이는 기법을 주로 제안하였다. 그러나 이러한 큐브 생성 알고리즘의 성능은 실제 DBMS 상에서 평가되지 않았다. 이 연구에서는 기존의 큐브 생성 알고리즘들을 관계 DBMS 상에서 그 성능을 비교 평가하였다.
PDF

라틴-하이퍼큐브 실험게획 간의 거리 계산과 비교

박정수;황현식
- The Korean Journal of Applied Statistics
- /
- v.13 no.2
- /
- pp.477-488
- /
- 2000
A distance measure between two Latin-hypercube designs is defined and its expected value is computed. It was computed by using mathematical statistics, numerical analysis (multidimensional numerical integration), Monte-carlo method, and the theory of asymptotic normal distribution. For the comparison of two Latin-hypercube designs with same structure but different randomness, the difference of expected values of response function and information mass of experimental designs are considered. These methods may be useful in comparison between two general experimental designs.
PDF

Sort-Based Distributed Parallel Data Cube Computation Algorithm using MapReduce (맵리듀스를 이용한 정렬 기반의 데이터 큐브 분산 병렬 계산 알고리즘)

Lee, Suan;Kim, Jinho
- Journal of the Institute of Electronics and Information Engineers
- /
- v.49 no.9
- /
- pp.196-204
- /
- 2012
Recently, many applications perform OLAP(On-Line Analytical Processing) over a very large volume of data. Multidimensional data cube is regarded as a core tool in OLAP analysis. This paper focuses on the method how to efficiently compute data cubes in parallel by using a popular parallel processing tool, MapReduce. We investigate efficient ways to implement PipeSort algorithm, a well-known data cube computation method, on the MapReduce framework. The PipeSort executes several (descendant) cuboids at the same time as a pipeline by scanning one (ancestor) cuboid once, which have the same sorting order. This paper proposed four ways implementing the pipeline of the PipeSort on the MapReduce framework which runs across 20 servers. Our experiments show that PipeMap-NoReduce algorithm outperforms the rest algorithms for high-dimensional data. On the contrary, Post-Pipe stands out above the others for low-dimensional data.
https://doi.org/10.5573/ieek.2012.49.9.196 인용 PDF

Search Result 46, Processing Time 0.023 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)