DOI QR코드

DOI QR Code

Parallel Computing Environment for R with on Supercomputer Systems

빅데이터 분석을 위한 슈퍼컴퓨터 환경에서 R의 병렬처리

  • Lee, Sang Yeol (Division of Industrial Management Engineering, Korea University) ;
  • Won, Joong Ho (Department of Statistics, Seoul National University)
  • 이상열 (고려대학교 산업경영공학과) ;
  • 원중호 (서울대학교 통계학과)
  • Received : 2014.09.12
  • Accepted : 2014.10.13
  • Published : 2014.11.30

Abstract

We study parallel processing techniques for the R programming language of high performance computing technology. In this study, we used massively parallel computing system which has 25,408 cpu cores. We conducted a performance evaluation of a distributed memory system using MPI and of a the shared memory system using OpenMP. Our findings are summarized as follows. First, For some particular algorithms, parallel processing is about 150 times faster than serial processing in R. Second, the distributed memory system gets faster as the number of nodes increases while shared memory system is limited in the improvement of performance, due to the limit of the number of cpus in a single system.

Keywords

References

  1. 박용민, 고영준, 김진석, "병렬 컴퓨팅을 위한 R패키지 소개 및 성능평가", 한국자료분석학회, 제14권, 제4호(2012), pp.1951-1961.
  2. 서민구, [R을 이용한 데이터 분석 실무], 1판, http://r4pda.co.kr/, [Online; accessed Dec 2013].
  3. 이홍석, 김정한, 이승우, 이식, [멀티코어 시대에 꼭 알아야할 MPI 병렬프로그래밍], 1판, 어드북스, 2010.
  4. Adler, D., C. Glaser, O. Nenadic, J. Oehlschlagel, and W. Zucchini, "ff : Memory-Efficient Storage of Large Data on Disk and Fast Access Functions. R package version 2.2-13, URL http://cran.r-project.org/web/packages/ff," 2007.
  5. Canty, A. and B. Ripley, "boot : Bootstrap Functions(originally by Angelo Canty for S). R package version 1.3.11 URL http://cran.r-project.org/web/packages/boot/index.html," 1999.
  6. Eddelbuettel, D., "CRAN Task View : High- Performance and Parallel Computing with R, URL http://cran.r-project.org/web/views/HighPerformanceComputing.html," 2014.
  7. Emerson, J.W. and M.J. Kane, "The bigmemory Project Website, URL http://www.bigmemory.org/," 2010.
  8. Emerson, J.W. and M.J. Kane, "biganalytics : A Library of Utilities for big.matrix Objects of Package bigmemory. R package version 1.1.1, URL http://cran.r-project.org/web/packages/biganalytics/," 2010.
  9. Emerson, J.W., M.J. Kane, and P. Haverty, "bigmemory : Manage Massive Matrices with Shared Memrmoy and Memory-Mapped Files. R package version 4.4.6 URL http://cran. rproject.org/web/packages/bigmemory," 2008.
  10. Enea, M., "speedglm : Fitting Linear and Generalized Linear Models to large data sets, R package version 0.2 URL http://cran.rproject.org/web/packages/speedglm/index.html," 2012.
  11. Gentleman, R., V. Carey, M. Morgon, and S. Falcon, "Biobase : Base functions for Bioconductor. R package version 2.24.0 URL http://www.bioconductor.org/packages/release/bioc/html/Biobase.html," 2004.
  12. Jonge, E., J. Wijffels, and J.V. Laan, "ffbase : Basic statistical functions for package ff. R package version 0.11.3 URL http://cran.r-project.org/web/packages/ffbase/index.html," 2010.
  13. Kane, M.J., J.W. Emerson, and S. Weston, "Scalable Strategies for Computing with Massive Data," Journal of Statistical Software, Vol.55, No.14(2013), pp.323-341.
  14. Knaus, J., "snowfall : Easier cluster computing(based on snow). R package version 1.84-6, URL http://cran.r-project.org/web/packages/snowfall/index.html," 2010.
  15. Lim, A., L. Breiman, and A. Cutler, "bigrf : Big Random Forests : Classification and Regression Forests for Large Data Sets. R package version 0.1.11 URL http://cran.r-project.org/web/packages/bigrf/index.html," 2013.
  16. Lumley, T., "biglm : Bounded Memory Linear and Generalized Linear Models, R package version 0.9-1, URL http://cran.r-project.org/web/packages/biglm," 2006.
  17. Matloff, N., "Rdsm : Threads Environment for R. R package version 2.0.2 URL http://cran.r-roject.org/web/packages/Rdsm/index.html," 2010.
  18. R Core Team, "R Installation and Administration manual, URL http://cran.r-project.org/doc/manuals/R-admin.html," 2013.
  19. RE volution Computing, "nws : functions for NetWorkSpaces and Sleigh. R package version 1.7.0.1, URL http://cran.r-project.org/web/packages/nws/index.html," 2010.
  20. Schmidberger, M., M. Morgan, D. Eddelbuettel, H. Yu, L. Tierney, and U. Mansmann, "State of the Art in Parallel Computing with R," Journal of Statistical Software, Vol.31, No. 1(2009), pp.1-27.
  21. Seligman, M., C. Fraley, and T. Hesterberg, "biglars : Scalable Least-Angle Regression and Lasso. R package version 1.0.2 URL http://cran.r-project.org/web/packages/biglars/index.html," 2010.
  22. Sevcikova, H. and A.J. Rossini, "snowFT : Fault Tolerant Simple Network of Workstations. R package version 1.3-0, URL http://cran.r-project.org/web/packages/snowFT/index.html," 2012.
  23. Tierney, L., "pnmath : OpenMP parallel processing directives of recent compilers for implicit parallelism by replacing of internal R functions, R package version 0.0.4, URL http://homepage.stat.uiowa.edu/-luke/R/experimental/," 2010.
  24. Tierney, L., A.J. Rossini, N. Li, and H. Sevcikoca., "snow : Simple Network of Workstations, R package version 0.3-13, URL http://cran.rproject.org/web/packages/snow/index.html," 2003.
  25. Urbanek, S., "multicore : A stub pacakge to ease transition to 'parallel'. R package version 0.2 URL http://cran.r-project.org/web/packages/multicore/index.html," 2009.
  26. Warnes, G.R., "fork : R functions for handling multiple processes. R package version 1.2.4 URL http://cran.r-project.org/web/packages/fork/index.html," 2003.
  27. Wickham, H. and W. Chang, "ggplot2 : An Implementation of the Grammar of Graphics. R package version 1.0.0 URL http://cran.rproject.org/web/packages/ggplot2/index.html," 2007.
  28. Yu, H., "Rmpi : Interface (Wrapper) to MPI (Message-Passing Interface), R package version 0.6-5, URL http://cran.r-project.org/web/packages/Rmpi/index.html," 2002.