DOI QR코드

DOI QR Code

A Distributed Vertex Rearrangement Algorithm for Compressing and Mining Big Graphs

대용량 그래프 압축과 마이닝을 위한 그래프 정점 재배치 분산 알고리즘

  • 박남용 (서울대학교 컴퓨터공학부) ;
  • 박치완 (서울대학교 컴퓨터공학부) ;
  • 강유 (서울대학교 컴퓨터공학부)
  • Received : 2016.05.09
  • Accepted : 2016.07.12
  • Published : 2016.10.15

Abstract

How can we effectively compress big graphs composed of billions of edges? By concentrating non-zeros in the adjacency matrix through vertex rearrangement, we can compress big graphs more efficiently. Also, we can boost the performance of several graph mining algorithms such as PageRank. SlashBurn is a state-of-the-art vertex rearrangement method. It processes real-world graphs effectively by utilizing the power-law characteristic of the real-world networks. However, the original SlashBurn algorithm displays a noticeable slowdown for large-scale graphs, and cannot be used at all when graphs are too large to fit in a single machine since it is designed to run on a single machine. In this paper, we propose a distributed SlashBurn algorithm to overcome these limitations. Distributed SlashBurn processes big graphs much faster than the original SlashBurn algorithm does. In addition, it scales up well by performing the large-scale vertex rearrangement process in a distributed fashion. In our experiments using real-world big graphs, the proposed distributed SlashBurn algorithm was found to run more than 45 times faster than the single machine counterpart, and process graphs that are 16 times bigger compared to the original method.

수십억 개 간선들로 구성된 대용량 그래프를 어떻게 효과적으로 압축할 수 있을까? 정점 재배치를 통해 인접 행렬의 0이 아닌 값들을 집중시키면 그래프를 효율적으로 압축할 수 있을 뿐 아니라 페이지랭크 등 여러 그래프 마이닝 알고리즘의 수행 속도를 개선할 수 있다. 최신 정점 재배치 기법인 SlashBurn은 실세계 네트워크의 멱법칙 특성을 활용하는 실세계 그래프에 효과적인 방법이다. 하지만 단일 머신 기반으로 설계되어 대용량 그래프에 대해 처리 속도가 현저히 느려지거나 적용이 불가능한 한계가 있다. 본 논문에서는 이러한 한계를 극복하기 위한 분산 SlashBurn을 제안한다. 분산 SlashBurn은 대규모의 정점 재배치 프로세스를 분산 처리하여 대용량 그래프를 기존 방법보다 훨씬 빠르고 확장성 있게 처리한다. 대용량 실세계 그래프들에 대한 실험 결과, 분산 SlashBurn은 단일 머신 SlashBurn보다 45배 이상 빠르게 동작하였고, 16배 이상 큰 그래프를 처리할 수 있었다.

Keywords

Acknowledgement

Grant : 빅데이터 처리 고도화 핵심기술개발 사업 총괄 및 고성능 컴퓨팅 기술을 활용한 성능 가속화 기술 개발

Supported by : 정보통신기술진흥센터

References

  1. D. Chakrabarti, S. Papadimitriou, D. S. Modha, and C. Faloutsos, "Fully Automatic Cross-associations," Proc. of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD) 2004, pp. 79-88, 2004.
  2. Y. Lim, U Kang, and C. Faloutsos, "SlashBurn: Graph Compression and Mining beyond Caveman Communities," IEEE Transactions on Knowledge and Data Engineering (TKDE), Vol. 26, No. 12, pp. 3077-3089, Apr. 2014. https://doi.org/10.1109/TKDE.2014.2320716
  3. U Kang, C. E. Tsourakakis, and C. Faloutsos, "PEGASUS: A Peta-Scale Graph Mining System," Proc. of the 9th IEEE International Conference on Data Mining (ICDM) 2009, pp. 229-238, 2009.
  4. R. Kiveris, S. Lattanzi, V. Mirrokni, V. Rastogi, and S. Vassilvitskii, "Connected Components in MapReduce and Beyond," Proc. of the ACM Symposium on Cloud Computing (SOCC) 2014, 2014.
  5. U Kang, SlashBurn implementation in MATLAB [Online]. Available: http://datalab.snu.ac.kr/-ukang/SlashBurn-1.0.zip (downloaded 2016, May 1)
  6. M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, I. Stoica, "Spark: Cluster Computing with Working Sets," Proc. of the 2nd USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '10), 2010.
  7. D. Chakrabarti, Y. Zhan, and C. Faloutsos, "R-mat: A recursive model for graph mining," Proc. of the 4th SIAM International Conference on Data Mining (SDM) 2004, pp. 442-446, 2004.
  8. B. Jeon, I. Jeon, and U Kang, "TeGViz: Distributed Tera-Scale Graph Generation and Visualization," Proc. of the IEEE International Conference on Data Mining Workshop (ICDMW) 2015, pp. 1620-1623, 2015.
  9. M. Faloutsos, P. Faloutsos, and C. Faloutsos, "On power-law relationships of the internet topology," Proc. of the Conference on Appl., Technol., Archit., and Protocols for Comput. Commun. 1999, pp. 251-262, 1999.
  10. R. Albert, H. Jeong, and A.-L. Barabasi, "Error and attack tolerance of complex networks," Nature, Vol. 406, No. 6794, pp. 378-382, 2000. https://doi.org/10.1038/35019019
  11. A. P. Appel, D. Chakrabarti, C. Faloutsos, R. Kumar, J. Leskovec, and A. Tomkins, "Shatterplots: Fast tools for mining large graphs," Proc. of the SIAM International Conference on Data Mining (SDM) 2009, pp. 802-813, 2009.
  12. G. Karypis and V. Kumar, "Multilevel k-way hypergraph partitioning," Proc. of the 36th Annual ACM/IEEE Design Automation Conference (DAC) 1999, pp. 343-348, 1999.
  13. I. S. Dhillon, S. Mallela, and D. S. Modha, "Information-theoretic co-clustering," Proc. of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDM) 2003, pp. 89-98, 2003.
  14. U. von Luxburg, "A tutorial on spectral clustering," Statistics and Computing, Vol. 17, No. 4, pp. 395-416, 2007. https://doi.org/10.1007/s11222-007-9033-z
  15. F. Chierichetti, R. Kumar, S. Lattanzi, M. Mitzenmacher, A. Panconesi, and P. Raghavan, "On compressing social networks," Proc. of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDM) 2009, pp. 219-228, 2009.
  16. J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney, "Statistical properties of community structure in large social and information networks," Proc. of the 17th International Conference on World Wide Web (WWW) 2008, pp. 695-704, 2008.
  17. Y. Lim, W. J. Lee, H. J. Choi and U Kang, "Discovering large subsets with high quality partitions in real world graphs," Proc. of the 2015 International Conference on Big Data and Smart Computing (BIGCOMP), pp. 186-193, 2015.
  18. P. Boldi and S. Vigna, "The webgraph framework I: compression techniques," Proc. of the 13th International Conference on World Wide Web (WWW) 2008, pp. 595-602, 2004.
  19. W. Fan, J. Li, X. Wang, and Y. Wu, "Query preserving graph compression," Proc. of the ACM SIGMOD International Conference on Management of Data (SIGMOD) 2012, pp. 157-168, 2012.
  20. J. Dean and S. Ghemawat, "MapReduce: Simplified data processing on large clusters," Proc. of the 6th Symposium on Operating System Design and Implementation (OSDI) 2004, pp. 137-150, 2004.
  21. V. Kalavri and V. Vlassov, "MapReduce: Limitations, Optimizations and Open Issues," Proc. of the 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom) 2013, pp. 1031-1038, 2013.