Introduction to numba library in Python for efficient statistical computing

Cho, Younsang;Yu, Donghyeon;Son, Won;Park, Seoncheol;

doi:10.5351/KJAS.2020.36.6.665

응용통계연구 (The Korean Journal of Applied Statistics)

제33권6호
/
Pages.665-682
/
2020
/
1225-066X(pISSN)
/
2383-5818(eISSN)

한국통계학회 (The Korean Statistical Society)

DOI QR Code

효율적인 통계 계산을 위한 파이썬 numba 라이브러리의 소개

Introduction to numba library in Python for efficient statistical computing

조윤상 (인하대학교 통계학과) ;
유동현 (인하대학교 통계학과) ;
손원 (단국대학교 정보통계학과) ;
박선철

Cho, Younsang (Department of Statistics, Inha University) ;
Yu, Donghyeon (Department of Statistics, Inha University) ;
Son, Won (Department of Information Statistics, Dankook University) ;
Park, Seoncheol (Pacific Climate Impacts Consortium, University of Victoria)

투고 : 2020.09.22
심사 : 2020.10.16
발행 : 2020.12.31

https://doi.org/10.5351/KJAS.2020.36.6.665 인용 PDF KSCI

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

본 논문은 순수하게 파이썬 언어로 작성된 연산에 대하여 just-in-time (JIT) 컴파일을 적용하여 전체 계산 속도를 향상시킬 수 있는 numba 라이브러리에 대한 사용법과 응용에 대하여 소개한다. 실제 통계 계산 문제에 대한 numba 라이브러리의 적용에 대한 예제로 반복문 사용이 요구되는 통계 계산 문제들 중 순열 검정과 정규 혼합 분포의 모수 추정의 EM 알고리즘을 고려하였으며 순수한 파이썬 구문 및 반복문을 활용한 계산 시간과 numba를 활용한 계산 시간을 비교하여 numba 라이브러리 활용의 효율성을 수치적으로 제시하였다.

This paper introduces numba library in Python, which improves computational efficiency of the provided implemented code written by naive Python language by applying just-in-time (JIT) compilation. To apply just-in-time compilation, the numba only needs to use a decorator on a target Python function. We provide implementation examples with numba for the permutation test and the parameter estimation for Gaussian mixture distribution. We also numerically show the efficiency of numba by comparing the total computation times of the implementation using naive python and the implementation using numba for each application.

키워드

참고문헌

Behnel, S., Bradshaw, R., Citro, C., Dalcin, L., Seljebotn, D. S., and Smith, K. (2010). Cython: the best of both worlds, Computing in Science & Engineering, 13, 31-39.
Cho, H. (2018). Initalizing method of finite mixture model using kernel density estimation and application on model-based clustering, Journal of the Korean Data & Information Science Society, 29, 327-338. https://doi.org/10.7465/jkdi.2018.29.2.327
Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society: Series B, 39, 1-38. https://doi.org/10.2307/2347807
Eddelbuettel, D. and Francosis, R. (2011). Rcpp: Seamless R and C++ Integration, Journal of Statistical Software, 40, 1-18.
Lam, S. K., Pitrou, A., and Seibert, S. (2015). Numba: A LLVM-based Python JIT compiler, Proc. 2nd Workshop LLVM Compiler Infrastructure HPC, 7, 1-6.
Lees, J. A., Harris, S. R., Tonkin-Hill, G., Gladstone, R. A., Lo, S. W., Weiser, J. N., Corander, J., Bentley, S. D., and Croucher, N. J. (2019). Fast and flexible bacterial genomic epidemiology with PopPUNK. Genome Research, 29, 304-316. https://doi.org/10.1101/gr.241455.118
McInnes, L., Healy, J., Saul, N., and GroBberger, L. (2018). UMAP: Uniform Manifold Approximation and Projection. Journal of Open Source Software, 3, 861. https://doi.org/10.21105/joss.00861
Pitman, E. J. G. (1937). Significance tests which may be applied to samples from any populations, Journal of the Royal Statistical Society, 4, 119-130.
Stone, J. E., Gohara, D., and Shi, G. (2010). OpenCL: a parallel programming standard for heterogeneous computing systems, Computing in Science & Engineering, 12, 66-73. https://doi.org/10.1109/MCSE.2010.69
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society: Seires B, 58, 267-288.

응용통계연구 (The Korean Journal of Applied Statistics)

효율적인 통계 계산을 위한 파이썬 numba 라이브러리의 소개

Introduction to numba library in Python for efficient statistical computing

초록

키워드

참고문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)