Improving the Performance of Document Similarity by using GPU Parallelism

Park, Il-Nam;Bae, Byung-Gurl;Im, Eun-Jin;Kang, Seung-Shik;

doi:10.3745/KIPSTB.2012.19B.4.243

The KIPS Transactions:PartB (정보처리학회논문지B)

Volume 19B Issue 4
/
Pages.243-248
/
2012
/
1598-284X(pISSN)

Korea Information Processing Society (한국정보처리학회)

DOI QR Code

Improving the Performance of Document Similarity by using GPU Parallelism

GPU 병렬성을 이용한 문서 유사도 계산 성능 개선

박일남 (국민대학교 컴퓨터공학과) ;
배병걸 (국민대학교 컴퓨터공학과) ;
임은진 (국민대학교 컴퓨터공학부) ;
강승식 (국민대학교 컴퓨터공학부)

Received : 2012.02.27
Accepted : 2012.04.07
Published : 2012.08.31

https://doi.org/10.3745/KIPSTB.2012.19B.4.243 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

In the information retrieval systems like vector model implementation and document clustering, document similarity calculation takes a great part on the overall performance of the system. In this paper, GPU parallelism has been explored to enhance the processing speed of document similarity calculation in a CUDA framework. The proposed method increased the similarity calculation speed almost 15 times better compared to the typical CPU-based framework. It is 5.2 and 3.4 times better than the methods by using CUBLAS and Thrust, respectively.

정보검색 분야에서 벡터 모델, 문서 클러스터링 등은 입력 문서 개수가 증가할수록 유사도 계산 속도가 시스템의 성능에 많은 영향을 미치고 있다. 본 논문에서는 문서 유사도 계산 성능을 향상시키기 위하여 유사도를 계산하는 연산을 CPU 대신에 GPU를 이용하는 CUDA 프레임워크에서 병렬처리 기법으로 구현하는 방법을 제안하였다. 이 방법은 보편적인 방식인 CPU 환경에서 구현했을 때와 비교할 때 최대 15배까지 성능이 향상되었다. 또한, 기존의 CUDA 라이브러리인 CUBLAS와 Thrust를 사용한 방법보다도 각각 5.2배, 3.4배의 성능 개선 효과가 있음을 확인하였다.

Keywords

References

G. Salton, A. Wong and C. S. Yang, "A vector space model for automatic indexing," Communications of the ACM, Vol.18, No.11, pp.613-620, 1975. https://doi.org/10.1145/361219.361220
J. S. Lee and H. K. Ryu, "Personalized super computer current and future by GPU parallel computing technology", Korean Electronics, Vol.36, No.5, pp.562-571, 2009.
J. Nickolls, I. Buck, M. Garland, K. Skadron, "Scalable parallel programming with CUDA," queue - GPU computing, Vol.6, No.2, pp.40-53, 2008.
Jason Sanders, 'CUDA by Example: An introduction to general-purpose GPU programming', Addison-Wesley, 2010.
D. Luebke, "CUDA: Scalable parallel programming for high-performance scientific computing", ISBI, pp.836-838, 2008.
M Garland et al., "Parallel computing experiences with CUDA," IEEE Micro, Vol.28, No.4, pp.13-27, 2008. https://doi.org/10.1109/MM.2008.57
T. Park, J. Woo, and C. Kin, "CUDA-based parallel bi-conjugate gradient matrix solver for BioFET simulation", Korean Electronics Journal, Vol.48, No.1, pp.80-100, 2011.
M. J. Kim, "An image processing speed enhancement in a multi-frame super resolution algorithm by CUDA", Korean Journal of Military Science Technique, Vol.14, No.4, pp.663-668, 2011. https://doi.org/10.9766/KIMST.2011.14.4.663
NVIDIA CUDA, "NVIDIA CUDA C Programming guide version3.2", http://developer.nvidia.com.
NVIDIA CUDA, "NVIDIA CUDA CUBLAS library, PG-05326-032_V02", http://developer.nvidia.com.
"Thrust library", http://code.google.com/p/thrust/.