Parallelism point selection in nested parallelism situations with focus on the bandwidth selection problem

Cho, Gayoung;Noh, Hohsuk;

doi:10.5351/KJAS.2018.31.3.383

The Korean Journal of Applied Statistics (응용통계연구)

Volume 31 Issue 3
/
Pages.383-396
/
2018
/
1225-066X(pISSN)
/
2383-5818(eISSN)

The Korean Statistical Society (한국통계학회)

DOI QR Code

Parallelism point selection in nested parallelism situations with focus on the bandwidth selection problem

평활량 선택문제 측면에서 본 중첩병렬화 상황에서 병렬처리 포인트선택

Cho, Gayoung (Department of Statistics, Sookmyung Women's University) ;
Noh, Hohsuk (Department of Statistics, The Research Institute of Natural Sciences, Sookmyung Women's University)

조가영 (숙명여자대학교 통계학과) ;
노호석 (숙명여자대학교 통계학과, 자연과학연구소)

Received : 2018.03.29
Accepted : 2018.05.14
Published : 2018.06.30

https://doi.org/10.5351/KJAS.2018.31.3.383 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

Various parallel processing R packages are used for fast processing and the analysis of big data. Parallel processing is used when the work can be decomposed into tasks that are non-interdependent. In some cases, each task decomposed for parallel processing can also be decomposed into non-interdependent subtasks. We have to choose whether to parallelize the decomposed tasks in the first step or to parallelize the subtasks in the second step when facing nested parallelism situations. This choice has a significant impact on the speed of computation; consequently, it is important to understand the nature of the work and decide where to do the parallel processing. In this paper, we provide an idea of how to apply parallel computing effectively to problems by illustrating how to select a parallelism point for the bandwidth selection of nonparametric regression.

빅데이터의 시대가 열림에 따라 데이터의 빠른 처리와 분석을 위한 방법의 하나로 R 프로그램 기반의 다양한 병렬처리 패키지가 사용되고 있다. 병렬처리는 수행하려는 작업이 상호의존적이지 않은 작업들로 분해될 수 있을 때 사용하게 되는데, 경우에 따라서는 병렬처리를 위해 분해된 각각의 작업들이 또 다시 상호의존적이지 않은 세부작업으로 분해되기도 한다. 이러한 중첩병렬화 상황에서는 일반적으로 처음 단계에서 분해된 작업들에 대해 병렬처리를 할지, 두 번째 단계에서 세분화되는 작업들에 대해 병렬처리를 할지 선택하게 된다. 그러한 선택이 계산 속도에 상당한 영향을 주는 경우가 많기 때문에 수행하고자 하는 작업의 상황에 따라 병렬처리를 실시할 곳을 잘 결정하는 것이 중요하다. 본 논문에서는 이러한 병렬화 포인트 선택이라는 문제에 대한 이해를 돕고 자신의 문제에 효과적으로 병렬컴퓨팅을 적용하려는 사람들에게 필요한 아이디어를 제공하려는 시도의 하나로 비모수적 함수 추정의 평활량 선택이라는 구체적인 통계문제에 대해 효율적인 계산을 위한 병렬화 포인트 선택 과정을 제시하였다.

Keywords

References

James, G., Witten, D., Hastie, T., and Tibshirani, R. (2013). An Introduction to Statistical Learning with Applications in R, Springer-Verlag, New York.
Park, Y. M., Ko, Y. J., and Kim, J. S. (2012). R Packages for parallel computing and their performance evaluation. Journal of the Korean Data Analysis Society, 14, 1951-1961.
Rossini, A., Tierney, L., and Li, N. (2003). Simple parallel statistical computing in R, Technical Report.
Schmidberger, M., Morgan, M., Eddlebuettel, D., Yu, H., Tierney, L., and Mansmann, U. (2009). State of the art in parallel computing with R, Journal of Statistical Software, 31, 1-27.
Sevcikova, H. and Rossini, A. J. (2004). Pragmatic parallel computing, Technical Report.

The Korean Journal of Applied Statistics (응용통계연구)

Parallelism point selection in nested parallelism situations with focus on the bandwidth selection problem

평활량 선택문제 측면에서 본 중첩병렬화 상황에서 병렬처리 포인트선택

Abstract

Keywords

References

Detail Search