• Title/Summary/Keyword: kd-tree traversal

Search Result 3, Processing Time 0.015 seconds

Performance Analysis and Enhancing Techniques of Kd-Tree Traversal Methods on GPU (GPU용 Kd-트리 탐색 방법의 성능 분석 및 향상 기법)

  • Chang, Byung-Joon;Ihm, In-Sung
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.2
    • /
    • pp.177-185
    • /
    • 2010
  • Ray-object intersection is an important element in ray tracing that takes up a substantial amount of computing time. In general, such spatial data structure as kd-tree has been frequently used for static scenes to accelerate the intersection computation. Recently, a few variants of kd-tree traversal have been proposed suitable for the GPU that has a relatively restricted computing architecture compared to the CPU. In this article, we propose yet another two implementation techniques that can improve those previous ones. First, we present a cached stack method that is aimed to reduce the costly global memory access time needed when the stack is allocated to global memory. Secondly, we present a rope-with-short-stack method that eases the substantial memory requirement, often necessary for the previous rope method. In order to show the effectiveness of our techniques, we compare their performances with those of the previous GPU traversal methods. The experimental results will provide prospective GPU ray tracer developers with valuable information, helping them choose a proper kd-tree traversal method.

Analysis of GPU-based Parallel Shifted Sort Algorithm by comparing with General GPU-based Tree Traversal (일반적인 GPU 트리 탐색과의 비교실험을 통한 GPU 기반 병렬 Shifted Sort 알고리즘 분석)

  • Kim, Heesu;Park, Taejung
    • Journal of Digital Contents Society
    • /
    • v.18 no.6
    • /
    • pp.1151-1156
    • /
    • 2017
  • It is common to achieve lower performance in traversing tree data structures in GPU than one expects. In this paper, we analyze the reason of lower-than-expected performance in GPU tree traversal and present that the warp divergences is caused by the branch instructions ("if${\ldots}$ else") which appear commonly in tree traversal CUDA codes. Also, we compare the parallel shifted sort algorithm which can reduce the number of warp divergences with a kd-tree CUDA implementation to show that the shifted sort algorithm can work faster than the kd-tree CUDA implementation thanks to less warp divergences. As the analysis result, the shifted sort algorithm worked about 16-fold faster than the kd-tree CUDA implementation for $2^{23}$ query points and $2^{23}$ data points in $R^3$ space. The performance gaps tend to increase in proportion to the number of query points and data points.

An efficient acceleration algorithm of GPU ray tracing using CUDA (CUDA를 이용한 효과적인 GPU 광선추적 가속 알고리즘)

  • Ji, Joong-Hyun;Yun, Dong-Ho;Ko, Kwang-Hee
    • 한국HCI학회:학술대회논문집
    • /
    • 2009.02a
    • /
    • pp.469-474
    • /
    • 2009
  • This paper proposes an real time ray tracing system using optimized kd-tree traversal environment and ray/triangle intersection algorithm. The previous kd-tree traversal algorithms search for the upper nodes in a bottom-up manner. In a such way we need to revisit the already visited parent node or use redundant memory after failing to find the intersected primitives in the leaf node. Thus ray tracing for relatively complex scenes become more difficult. The new algorithm contains stacks implemented on GPU's local memory on CUDA framework, thus elegantly eliminate the problems of previous algorithms. After traversing the node we perform the latest CPU-based ray/triangle intersection algorithm 'Plucker coordinate test', which is further accelerated in massively parallel thanks to CUDA. Plucker test can drastically reduce the computational costs since it does not use barycentric coordinates but only simple test using the relations between a ray and the triangle edges. The entire system is consist of a single ray kernel simply and implemented without introduction of complicated synchronization or ray packets. Consequently our experiment shows the new algorithm can is roughly twice as faster as the previous.

  • PDF