Cloud P2P OLAP: Query Processing Method and Index structure for Peer-to-Peer OLAP on Cloud Computing

Cloud P2P OLAP: 클라우드 컴퓨팅 환경에서의 Peer-to-Peer OLAP 질의처리기법 및 인덱스 구조

  • 주길홍 (경인교육대학교 컴퓨터교육과) ;
  • 김훈동 ((주)윌비솔루션 기술연구소) ;
  • 이원석 (연세대학교 컴퓨터과학과)
  • Received : 2011.03.03
  • Accepted : 2011.06.01
  • Published : 2011.08.31

Abstract

The latest active studies on distributed OLAP to adopt a distributed environment are mainly focused on DHT P2P OLAP and Grid OLAP. However, these approaches have its weak points, the P2P OLAP has limitations to multidimensional range queries in the cloud computing environment due to the nature of structured P2P. On the other hand, the Grid OLAP has no regard for adjacency and time series. It focused on its own sub set lookup algorithm. To overcome the above limits, this paper proposes an efficient central managed P2P approach for a cloud computing environment. When a multi-level hybrid P2P method is combined with an index load distribution scheme, the performance of a multi-dimensional range query is enhanced. The proposed scheme makes the OLAP query results of a user to be able to reused by other users' volatile cube search. For this purpose, this paper examines the combination of an aggregation cube hierarchy tree, a quad-tree, and an interval-tree as an efficient index structure. As a result, the proposed cloud P2P OLAP scheme can manage the adjacency and time series factor of an OLAP query. The performance of the proposed scheme is analyzed by a series of experiments to identify its various characteristics.

최근 분산 OLAP은 분산 환경에 적용하기 위하여 DHT기반의 P2P OLAP과 그리드 OLAP연구가 활발하게 진행되고 있다. 그러나 클라우드 컴퓨팅 환경에 적용하기 위하여 P2P OLAP은 structured P2P 특성 때문에 다차원 범위 질의에 문제점이 있고, Grid OLAP은 인접성 및 시계열 고려가 없기 때문에 쿼리 자체의 서브 �V 조회 알고리즘 연구에 치중되어 있다. 따라서 본 논문은 클라우드 컴퓨팅에 적합한 환경 제공을 위해 사용자의 조회 결과가 시계열적 특성으로 여러 사용자에 의해 재사용이 가능하고, 서버상의 휘발성 조회 큐브가 사용자 로컬 메모리에서 직접 분석 질의 시 효율이 좋다는 것에 초점을 두어 중앙관리 P2P방식을 제안하였다. 또한 빠른 질의 결과 및 다차원 범위질의를 위한 다단계 Hybrid P2P방식에 인덱스 부하 분산 및 성능 향상을 위한 클라우드 시스템을 접목하여 Cloud P2P OLAP을 제안하였다. 이를 위한 인덱스 구조로는 큐브 위상관계 트리와 인접성 2차원 Quadtree에, 시계열 Interval-트리를 접목하였으며, 이는 조회나 갱신 시에 일반 OLAP에 비해 큰 효율성을 보였다.

Keywords

References

  1. F. Chang, J. Dean, S. Ghemawat, WC. Hsieh, DA. Wallach, M. Burrows, T. Chandra, A. Fikes and RE. Gruber, "Bigtable: A distributed storage system for structured data", Journal ACM Transactions on computer Systems, Vol.26, No.2, pp. 1-14, 2008.
  2. C. Loboz, S.Smyl and S.Nath, research .microsoft.com, "DataGarage: Warehousing Massive Amounts of Performance Data on Commodity Servers", Microsoft Research Technical Report MSR-TR-2010-22, NE Computing, 2010.
  3. S. Russell, V. Yoon and G. Forgionne, "Cloud-based Decision Support Systems and Availability Context: The Probability of Successful Decision Outcomes", Information Systems and E-Business Management, Vol.8, No.3, pp.189-205, 2010. https://doi.org/10.1007/s10257-010-0126-4
  4. J. Dean and S. Ghemawat, "MapReduce: Simplified data processing on large clusters", Communications of the ACM, Vol.51, No.1, pp. 107-113, 2008. https://doi.org/10.1145/1327452.1327492
  5. C. Zhang, H. De Sterck, A. Aboulnaga, H. Djambazian and Rob Sladek, "Case Study of Scientific Data Processing on a Cloud Using Hadoop", High Performance Computing Systems and Applications, vol.5976, No.1, pp.400-415, 2010.
  6. 김진수, 김태웅, "OwFS: 대규모 인터넷 서비스를 위한 분산 파일 시스템", 한국정보과학회 정보과학회지, 제27권 제5호, pp.77-85, 2009.
  7. A. Thusoo, J. Sen Sarma, N. Jain, Z. Shao, P. Chakka, N. Zhang, S. Antony, H. Liu and R. Murthy, "Hive - A Petabyte Scale Data Warehouse Using Hadoop", proceeding of international conference on data engineering, pp.996-1005, 2010.
  8. A. Thusoo, Z. Shao, S. Anthony, D. Borthakur, N. Jain, J. Sen Sarma, R. Murthy and H. Liu, "Data warehousing and analytics infrastructure at facebook", Proceeding of international conference on Management of data, 1013-1020, 2010.
  9. M. Arnedo,M. del, P. VillamilandR. Villanueva, "Improving Performance of Declarative Query Execution in DHT-Based Systems", International Conference on Internet and Web Applications and Services, pp.223-228, 2010.
  10. P. Kalnis, WS. Ng, BC. Ooi, D. Papadias and K. Tan, "An adaptive peer-to-peer network for distributed caching of olap results". Proceeding of international conference of Management of data, pp.25-36, 2002.
  11. P. Kalnis, W. Ng, B. Ooi and K. Tan, "Answering similarity queries in peer-to-peer networks", Information Systems, Vol.31, No.1, pp.57-72, 2006. https://doi.org/10.1016/j.is.2004.09.003
  12. M. Espil and AA. Vaisman, "Aggregate queries in peer-to-peer OLAP", Proceeding of the ACM international workshop on data warehousing, pp.102-111, 2004.
  13. A. Vaisman, M. Espil and M. Paradela, "P2P OLAP: Data model, implementation and case study", Information Systems, Vol.34, No.2, pp.231-257, 2009. https://doi.org/10.1016/j.is.2008.07.001
  14. E. Tanin, A. Harwood and H. Samet, "Using a distributed quadtree index in peer-to-peer networks", The VLDB Journal, Vol.16, No.2, pp.165-178, 2007. https://doi.org/10.1007/s00778-005-0001-y
  15. M. Lawrence and A. Rau-Chaplin, "The OLAP-enabled grid: Model and query processing algorithms", International Symposium on High-performance Computing in an Advanced Collaborative Environment, pp.4, 2006.
  16. A. Vaisman, M. Espil and M. Paradela, "P2P OLAP: Data model, implementation and case study", Information Systems, Vol.34, No.2, pp.231-257, 2009. https://doi.org/10.1016/j.is.2008.07.001
  17. A. Mondal, Y. Lifu and M. Kitsuregawa, "P2pr-tree: An r-tree-based spatial index for peer-to-peer environments", Current Trends in Database Technology EDBT2004 Workshops, pp.516-516. 2004.
  18. P. Ganesan, B. Yang and H. Garcia-Molina, "One torus to rule them all: multi-dimensional queries in P2P systems". Proceeding of the international workshop on the Web and Database, pp.19-24, 2004.
  19. Windows Azure Platform, "Windows Azure, Microsoft Hosting, Online Application, Application Hosing", http://www.microsoft.com/windowsazure/windowsazure/, 2010.
  20. SalesForce.com Platform, "Force.com The leading cloud platform for business apps". http://www.salesforce.com/platform/, 2010.
  21. Google AppEngine Platform, "Run your web apps on Google's infrastructure. Easy to build, easy to maintain, easy to scale.". http://code.google.com/intl/en/appengine/, 2010.
  22. msdn.microsoft.com, ".NET Framework LINQ Introduce", http://msdn.microsoft.com/ko-kr/library/bb397897(VS.90).aspx, 2010
  23. F. Dehne, M. Lawrence and A. Rau-Chaplin. "Cooperative caching for grid-enabled OLAP". International Journal of Gridand Utility Computing, Vo.1, No.2, pp.169. 2009. https://doi.org/10.1504/IJGUC.2009.022032
  24. Extreme Computing Group, "All Azure Benchmark Test Cases". http://azurescope.cloudapp.net/BenchmarkTestCases/, 2010.
  25. L. Arge, M. Berg, H. Haverkort and K. Yi. "The priority R-tree: A practically efficient and worst-case optimal R-tree". ACM Transactionson Algorithms(TALG), Vol.4, No.1, pp.1-30. 2008.