Fast Ray Reordering and Approximate Sibson Interpolation for Foveated Rendering on GPU

Kwon, Oh-Seok;park, Keon-kuk;Yoon, Joseph;Kim, Young-Bong;

doi:10.9717/kmms.2019.22.2.311

Journal of Korea Multimedia Society (한국멀티미디어학회논문지)

Volume 22 Issue 2
/
Pages.311-321
/
2019
/
1229-7771(pISSN)
/
2384-0102(eISSN)

Korea Multimedia Society (한국멀티미디어학회)

DOI QR Code

Fast Ray Reordering and Approximate Sibson Interpolation for Foveated Rendering on GPU

Kwon, Oh-Seok (Dept. of IT Convergence and Application Engineering, Graduate School, Pukyong National University) ;
park, Keon-kuk (Dept. of IT Convergence and Application Engineering, Graduate School, Pukyong National University) ;
Yoon, Joseph (Dept. of IT Convergence and Application Engineering, Graduate School, Pukyong National University) ;
Kim, Young-Bong (Dept. of IT Convergence and Application Engineering, Graduate School, Pukyong National University)

Received : 2019.01.24
Accepted : 2019.02.12
Published : 2019.02.28

https://doi.org/10.9717/kmms.2019.22.2.311 Citation PDF KSCI HTML

Download PDF

⟨ Previous Next ⟩

Abstract

Virtual reality applications in Head-Mounted Displays require high frame rates and low latency rendering techniques. Ray tracing offers many benefits, such as high-quality image generation, but has not been utilized due to lower performance than rasterization. But that can obtain good result combined with gaze-tracking technology and human visual system's operation principle. In this paper, we propose a method to optimize the foveated sampling map and to maintain the visual quality through the fast voronoi nearest interpolation. The proposed method further reduces the computational cost that has been improved by the previous foveated sampling. It also smoothes the voronoi boundary using adaptive sibson interpolation, which was not possible in real-time. As a result, the proposed method can render real-time high-quality images with low visual difference.

Keywords

1. INTRODUCTION

The development of Head-Mounted Display (HMD) devices with wide field of view (FOV) and high resolution has increased opportunities for Virtual Reality (VR) experience. Applications for VR require high frame rates and low latency to avoid dizziness due to VR experience [1]. In order to solve this problem, a rasterization-based rendering technology has been developed, but in recent years, combined with gaze-tracking technology, a foveated rendering technology similar to a human visual system has developed. The foveated rendering techniques that operate at these adaptive resolutions have been recently undergoing much research because they can reduce computational cost and power consumption. It also significantly reduces the computational cost for rendering in physically-based simulation that requires a high performance, such as Yoon et al. [2].

Until now, the rendering technique for HMD has mainly applied rasterization-based method due to its performance. On the other hand, ray tracing has the advantage of generating very realistic high quality images [3]. Although ray tracing method is not utilized because it is high computational cost compared to rasterization. However, Koskela et al.[4] showed that about 94% of samples could be omitted. Also, perception-driven rendering methods [5] that take advantage of the inherent properties of the human visual system can reduce computational costs by reducing the complexity of projected image pixels while maintaining a high-quality visual experience. Based on this, adaptive sampling combined with gaze-tracking, in other words foveated rendering can be used to effectively utilize expensive rendering techniques such as ray tracing.

The foveated rendering is closely related to the human visual system. The viewing angle of a healthy adult human extends 150º horizontally and 135º vertically per eye, and in the 18º region of the eye there is a fovea with dense of photoreceptors [6]. Cone cells are located with high density in the fovea region, and decrease rapidly as they move away from center. On the other hand, the peripheral region shows low resolution due to the distribution of low cone cells, but the rod cells recognizing brightness are dense [7]. In particular, the rod cells in the peripheral area are stimulated by light of various intensities and responsible for perceiving the size, shape and brightness of the visual image. Especially, it reacts sensitively to small flickering due to its ability to perceive brightness. Therefore, based on these anatomical facts, if the flickering problem is solved, it can be recognized naturally even if the peripheral region is roughly expressed [8]. An additional caution is that if the peripheral area is excessively smooth, the tunnel vision effect will occur [9]. Thus, most previous approach has presented to solve this problem. But unfortunately, we improved performance using adaptive sampling, we didn't consider the execution structure of the GPU. Therefore, we despite the performance improvements due to adaptive sampling, there is still the potential for further optimization on the GPU. Generally, the GPU handles threads in a unit called warp (group of threads), which can further reduce computation costs when considering warp[10].

Therefore, we propose fast ray reordering and approximate Sibson interpolation to improve performance for foveated rendering on GPU. This makes the computational advantages of performing rendering at an adaptive resolution by the gaze more effective. Adaptive sampling reorders distributed ray tracing threads to optimize in warp units. In addition, based on the fact that the shape of the cell is polygonal in nature, we fill in the empty part of the sparse image with the voronoi structure and propose a method of real-time nearest interpolating the boundary to reduce artifacts in the peripheral area. In addition, based on the fact that the shape of the cell is polygonal in nature, we fill the empty part of the sparse image with the voronoi structure and propose a very fast nearest interpolation method to reduce artifacts in the peripheral. Our method also reduces the tunnel vision effect due to blur of the peripheral area through random foveated sampling and reprojection. The proposed method makes it possible to use the high cost rendering method in real-time.

This paper is organized as follows. Section 2 discusses related work, and Section 3 describes in detail the proposed method. The experimental results and conclusions are described in Section 4 and 5, respectively.

2. RELATED WORK

In this section, we provide an overview of the previous studies related to our system.

Fujita et al. [11] uses a pre-computed sampling pattern to reduce computational cost of ray tracing method and reconstructs images from sparse samples through kNN filtering with neighboring cells. The pre-computed sampling pattern used here seems to preserve performance for kNN filtering in the voronoi structure. Thus, even if additional pixels are sampled by generating noise from the sampling pattern, k-NN interpolation must be reconstructed with voronoi again to perform, which is expensive [12].

And Weier et al. [1] solves the artifacts of silhouette due to the improvement of reconstruction quality and sparse sampling through resampling. In this system, geometric silhouette artifacts caused by sparse sampling are solved by adaptive sampling. However, in the case of a large number of silhouettes, the problem arises that the computational cost is increased. In addition, they did n' t maximize saving of computational costs due to sparse sampling because they didn't consider optimization of the warp, which consists of threads with different computational costs.

Patney et al. [8] solved this problem by approaching what is to maintain visual quality in human peripheral vision. Based on the perceptual basis, he confirmed that the effect of simple Gaussianfiltering induces the tunnel vision effect. To solve this problem, he solved this problem by applying adaptive perceptual filtering which pre-filters shading properties and then under samples other samples. However, filtering shadow attributes that are difficult to predict, such as indirect illumination, is expensive. Also, paradoxically, if there are a lot of objects that need to maintain visual quality in the surrounding area, you do not have to sample them.

3. Overview

We propose a rendering pipeline as shown Fig. 1. The input data of the pipeline uses the position of gaze and the width of the pupil, and consists of six parts:

Geometry, Sampling, Optimization, Shading, Reconstruction, Post-Processing.

MTMDCW_2019_v22n2_311_f0001.png 이미지

Fig. 1. Overview. We describe our rendering pipeline. The geometry step generates a G-Buffer (Position, Normal, and Depth) in full resolution. In the sampling step, the sampling map is generated from a sampling probability derived from gaze with is then used for sparse ray tracing and history resampling. In the ray optimization step, the sampling map is compressed to improve the performance of the GPU. The shading step generates a sparse image by performing threads classified as ray tracing and history resampling. The reconstruction step is efficiently fills the empty parts of the image by nearest interpolation in structured by voronoi. In the final step, the post-process step is used to smooth the artifacts that occur in stochastic sampling and voronoi boundaries.

Our goal is to mimic the biological mechanisms of the human visual system to shade only visually significant pixels and reduce visual errors and rendering time. To achieve our goal, we need to optimize non-uniformly distributed shading threads by adaptive sampling and real-time interpolation in voronoi structure. The core of our approach is to reorder the sampled region through the probability function for efficient processing on GPU and to interpolate it to remove artifacts caused by the voronoi boundary. A detailed description of each step is given in the next chapter.

3.1 Sampling Step

In the sampling step, a sampling map and a reprojection coordinate map generates for use in the shading stage. First, a sampling map is created based on the position of eye tracking. The probability function of foveated sampling for ray generation is shown in Equation (1). This formula is based on the fact that the resolution by which humans perceive objects is determined by the density of cone cells [13]. In order to perform the sampling, it is based on position of the gaze, the width of the pupil, and the user-defined coefficient value.

The probability for sampling is given by

\(\mathrm{P}\left(d_{i}\right)=\frac{1}{w\left(\theta, \sigma_{p}\right) d_{i}^{2}+1}\) (1)

where di is the normalized distance of each pixel from the position of the gaze, and the user adjusts the probability through the weight function w(θ,σp).

\(\mathrm{w}\left(\theta, \sigma_{p}\right)=\frac{\frac{1}{\sigma_{p}}-1}{\theta^{2}}\) (2)

In the weight function w(θ, σp), θ and σp are the width of the pupil and the user probability adjustment coefficient, respectively. Where σp is a value between 0 and 1.

As shown in Fig. 2, the results of the probability by σp and θ maintains the same shape as the density distribution of cone cell in the human visual system. In addition, our reconstruction procedure is structured by the voronoi diagram there is a possibility that artifacts will occur at the boundaries of objects. Therefore, edge detection by the depth buffer contributes to ray tracing at the boundaries of objects. This sampling buffer (Ts) is auint3 texture consisting of the coordinates of the pixel and whether or not to perform ray tracing. More specifically, Ts.x, Ts.y is the coordinate to which the pixel will be written, and Ts.z is whether or not ray tracing will be performed.

MTMDCW_2019_v22n2_311_f0002.png 이미지

Fig. 2. An example of applying the probability coefficient σp = 0.8 passing the pupil width θ= 0.2 to the sampling probability P(di) based on the human visual system.

In addition, our reconstruction procedure is structured by the voronoi diagram there is a possibility that artifacts will occur at the boundaries of objects. Therefore, edge detection by the depth buffer contributes to ray tracing at the boundaries of objects. This sampling buffer (Ts) is a uint3 texture consisting of the coordinates of the pixel and whether or not to perform ray tracing. More specifically, Ts.x, Ts.y is the coordinate to which the pixel will be written, and Ts.z is whether or not ray tracing will be performed (Ts.z = 0 is resampling, Ts.z = 1 ray tracing).

And we reuse information from previous frames because we do foveated sampling to reduce computational costs. Hence, we use the reverse reprojection method to achieve this purpose [14]. We compute the pixel coordinates to be projected onto the current frame using the MVP (model, view, and projection) matrix of the previous frame (t-1) based on the world space position (t). Here, a cache missis performed through comparison with the depth buffer of the previous frame t-1. In addition, consider that there may be no information in the history cache due to foveated sampling. Synthetically, the reprojection coordinates buffer (Tr) is stored as float3. Where Tr.x, Tr.y are reprojection coordinates and Tr.z is the cache miss or cache hit. (Tr.z = 0 is cache miss, Tr = 1 is cache hit).

3.2 GPU Optimization

The main advantage of our approach compared to using an unordered sampling map is that it reordered the sampling map to work effectively in warp units. The foveated sampling map is described as a sparse matrix. We define two types of threads: ray tracing and resampling. Here, the resampling threads simply take the reprojection value from the history buffer, so it have a significantly less working time than ray tracing. (Assuming that the working time of ray tracing is all the same) Therefore, the ideal situation is to group threads that take the same working time.

Our strategy is as follows:

Strategy 1. For the first pixel (u: 0, v: v) of each row in the sparse matrix S do

If S.z = 1, Push to the left side in reordered buffer R.→ Increase the R(u: 0, v: v).z by 1.
If S.z = 0, Push to the right side in reordered buffer R.

Strategy 2. For the first pixel (u: 0, v: v +block. height ) of each row at the block.height interval in the sparse matrix do

Find min and max in R.z of the first pixelin every row of block.
Swap the last pixel of R.z = 1 of the min and max.
Increase or Decrease the R.z of min, max by 1.
Repeat until the difference between min and max R.z is 1.

In the sampling map, ray tracing is defined as a red rectangle (Ts.z = 1) and history resampling is defined as a white rectangle (Ts.z = 0). As shown in Fig. 3, the sparse matrix of Fig. 3 (a) is reordered as a Compressed Sparse Row (CSR) by using strategy 1. as shown in Fig. 3 (b). Each row is executed independently, it is compressed through parallel threads as many as the height of the image on GPU. Strategy 2 improves the performance of optimization by further compressing considering the CUDA block height. (Usually compressed in cuda block units). The performance evaluation by the proposed method is discussed in chapter 4.

MTMDCW_2019_v22n2_311_f0009.png 이미지

Fig. 3. (b) and (c) are the results of compressing (a) composed of resampling. Performs n times in the first pixel of each row to reorder the ray tracing threads (red) in sequence. It then performs the compression once again in units of warp (32 threads).

3.3 Sparse Shading

Our shading step works with two stages: ray tracing and history resampling. However, the coordinates of the shading buffer are considered scattered through the optimization step. So instead of using FragCoord, we use Ts.x and Ts.y in the reordered sampling map. And is performed in two stages by Ts.z. The two stages can be summarized as shown in Fig. 4.

MTMDCW_2019_v22n2_311_f0003.png 이미지

Fig. 4. The ray trace thread (red) of the reordered sampling map (a) performs ray tracing, the history resampling thread (white) references the reprojection coordinate map, and then stores the value of the history buffer in the shading buffer (b).

All procedures are based on a reordered sam-pling map. The two stages in Fig. 4 are described in the following way:

Caching from the reordered sampling buffer(u: u, v: v) and store in R (uint3).
Caching from the reprojection coordinate buffer (u: R.x, v: R.y) and store in W (float3).
Caching from the history buffer (u: W.x, v:W.y) and store in k1 (float4).
If R.z = 1, Generate ray(R.x, R.y) .→ k2 (float4) = Trace ray.→ Save (k1 + k2) in Shading Buffer (u: R.x,v: R.y).
If R.z = 0, Save k1 in shading buffer (u: R.x,v: R.y).

Through the above process, it is possible to reduce the noise caused by stochastic sampling by repeatedly ray tracing in the temporally same pixel. Also, if the user moves only the gaze statically, the resulting image will gradually improve with random sampling and therefore converge to full ray tracing.

3.4 Reconstruction

In the previous steps, the result of ray tracing and history caching is still sparse. Of course, if the history buffer is to accumulated temporally it generates dense images, but we have to fill in the empty part of image to render each sparse frame. First we have assumed a sparsely filled pixel as a photoreceptor cell. In nature, cells represent the shape of polygons, which can be summarized by the voronoi structure [15]. So we fill the empty area with the voronoi structure using the shaded pixels as sites. There are many ways to generate voronoi diagrams, but our shaded pixels converge temporally to the full resolution, so we need algorithms that are independent of the number of sites. Therefore, we generate a dense voronoi structured image using the Jump Flooding Algorithm (JFA) in real-time [16]. Regardless of the number of sites, it always generates the voronoi diagram at the same frame rate and is very effective in nearest interpolation than constructing a kd-tree because it is possible to find the closest site to all raster pixels in O(1).

Dense images with voronoi diagrams using JFA have artifacts due to voronoi boundaries. Also, since the pixels filled in the reconstruction process are from the closest site, it was not accurate ray tracing values. Therefore, we perform nearest interpolation to smooth these artifacts. The ideal interpolation method for voronoi structures is to perform the Natural Nearest Interpolation, such as Sibson 's method. Even the Sibson's interpolation - a very efficient version of a Sibson's interpolation - takes about 1000ms [11]. Instead, our proposed nearest interpolation method works in real-time and is shown in Fig. 5 below.

MTMDCW_2019_v22n2_311_f0004.png 이미지

Fig. 5. The value of each raster position p determines the search window (blue box) to the nearest site, s1, and determines the pixel of p by all pixels q within radius r.

This approach is described as follows.

Input the JFA’s ping-pong buffer (T1: color buffer, T2: coordinate buffer).
For every T1 output raster position p do

→ Find the closest site Sn to the same raster position p in T2 .

→ Calculate r = d(p, Sn).

→ Calculate the search window S with bbox(r).

→ For every raster position q in search window do

If d(q, p) < r, add c(q) and increment n (p) by 1.→ For every output raster position p do
Set f(p) = c(q) / n(p).

Where d is the distance function, c is the accumulated color value to be stored float4, n is the count and the f is the finally interpolant value. Our proposed method is faster than the Discrete Sibson Interpolation because the size of the search window is determined. Because our sites increase, the size of the search window is small and finally converges to the reference image, improving the computational cost and visual quality. However, because it is not a perfect natural neighbor interpolation, it does not converge to the Natural Nearest Interpolation in the intermediate process, but the visual quality of result is verified in the experiment of chapter 4.

3.5 Post-processing

First, tone mapping is performed to compensate for the luminance of the shadow buffer. We reduced the number of artifacts during shading and reconstruction processing. Nevertheless, probabilistic sampling and reprojection errors can lead to convergence mismatches. This causes temporary flicker due to noise. To reduce this noise, you can apply a variety of previously presented filtering. We have already generated the G-Buffer in the Geometry step, so we applied the basic A-Trous Filtering method as a post-processing filter to take advantage of it [17].

4. EXPERIMENTAL RESULT

First, we used several types of objects to compare the performance of the proposed pipeline. There are three types of experiments. It consists of a full ray trace, a version that does not reorder the foveated sampling map, and our method. Our system consists of Intel Core i5-6600 CPU, 8GBRam and NVIDIA GeForce GTX 970 for hardware performance evaluation. We also performed 300 repetitions for accuracy of evaluation. The results for each scene are shown in Fig. 6, and a comparison of it is shown in Table 1.

MTMDCW_2019_v22n2_311_f0005.png 이미지

Fig. 6. The result of each scene performed on the benchmark.

Table 1. Comparing the time of each step of the pipeline performed for each renderer and showing the speed-up of our approach

MTMDCW_2019_v22n2_311_t0001.png 이미지

The most significant of the experimental results is the reduction of the calculation cost. Our method can improve performance even with the same number of rays as other foveated rendering. It is confirmed that the computational performance is also improved by the same ratio as the sampling number is reduced as compared with the full path tracing.

To compare the visual quality, we calculated the Peak Signal-to-Noise Ratio (PSNR) and the Structural SIMilarity (SSIM) for each situation, and the resolution was 1024×1024. The visual quality results for each condition are shown in Fig. 7. There are some conditions to the accuracy of the evaluation. First, the position of the line of sight was fixed at the center. The parameters used in the experiments are σp = 0.8 and θ = {0.04, 0.1, 0.2}, respectively. The evaluation of visual quality is covered in the follow.

MTMDCW_2019_v22n2_311_f0006.png 이미지

Fig. 7. Visual quality comparison of reconstructed results from Shading Buffer (a) by Discrete Sibson’s (b) and our method (c) for each row.

The result of the visual quality assessment maintained a PSNR of about 26 with only about 5% of the rays. In addition, SSIM, which is a measure of structural similarity, was able to confirm its stability by maintaining about 0.8.

MTMDCW_2019_v22n2_311_f0007.png 이미지

Fig. 8. Computation time by the change of the number of sampled rays.

Fig. 8 is a graph showing the efficiency of reordering the sampled rays by the pupil width. By default, foveated sampling is performed about 1.5 times faster if the number of rays is reduced to 9% of the full resolution. However, performing our reordering method achieves a performance improvement of about 1.5 times at 22% and about 3 times at 9%. So our method means that more ray tracing is possible in a small time.

MTMDCW_2019_v22n2_311_f0008.png 이미지

Fig. 9. Performance comparison of interpolation method in voronoi.

At a resolution of 1024×1024, the number of 1% rays is already about 10,000. In addition, the number of sites is further increased by reprojection. As shown in Fig. 9, in voronoi, our neighborhood interpolation is processed much faster than Park 's method and performs interpolation with stable performance regardless of the number of sites.

As a result, our method can be performed faster than traditional foveated rendering methods. In addition, random sampling is performed every time, and the result of history caching is reflected, so the probability of converging to the ground truth increases. However, this may not be perfectly convergent because you have to consider the reprojection error.

5. CONCLUSION

In this paper, we propose a new approach to improve the performance of foveated rendering in GPU through gaze-based fovaeted probability sampling and fast ray reordering, and approximate Sibson interpolation. Our method significantly reduces rendering costs without significant loss in human-perceived visual quality, based on the fact that the resolution of the human retina is divided into central and peripheral area. We performed the rendering considering the pupil position and diameter. And we performed sparse sampling through a probability function that takes into account the density of human photoreceptor cells. Here we have improved the computational performance by reordering foveated sampling maps to work more efficiently on the GPU. We have developed a method of real-time interpolation of the voronoi structure generated from a large number of sites on the GPU. This effectively reduces the silhouette fault and noise due to extreme sampling in real time. Moreover, in our method, considering the change of the position and the number of the rays generated each frame, the visual quality can be gradually improved at low cost in case the focus does not move. In conclusion, we can obtain a realistic image at high speed in an HMD device by generating an adaptive resolution image according to the width of the pupil adjusted according to the focus of eye and the brightness of the scene.

References

M. Weier, T. Roth, E. Kruijff, A. Hinkenjann, A.P. Perard-Gayot, P. Slusallek, et al., "Foveated Real-Time Ray Tracing for Head-Mounted Displays," Computer Graphics Forum, Vol. 35, No. 7, pp. 289-298, 2016. https://doi.org/10.1111/cgf.13026
J. Yoon, K. Park, O. Kwon, and Y. Kim, "A Development of Semi-automatic Trawl-net Surfaces Reconstruction System Using Motion Equations and User Interactions," Journal of Korea Multimedia Society, Vol. 20, No. 8, pp. 1447-1455, 2017. https://doi.org/10.9717/KMMS.2017.20.8.1447
W. Hunt, Virtual Reality: The Next Great Graphics Revolution, Keynote Talk High-Performance Grapics, Oculus Research, 2015.
M. Koskela, T. Vitanen, P. Jaaskelainen, and J. Takala, "Foveated Path Tracing," Proceeding of International Symposium on Visual Computing, Springer, Cham, pp. 723-732, 2016.
M. Weier, M. Stengel, T. Roth, P. Didyk, E. Eisemann, M. Eisemann, et al., "Perceptiondriven Accelerated Rendering," Computer Graphics Forum, Vol. 36, Issue 2, pp. 611-643, 2017. https://doi.org/10.1111/cgf.13150
T.C. Ruch and J.F. Fulton, Medical Physiology and Biophysics, Academic Medicine, 1960.
D. Purves, G.J. Augustine, D. Fitzpatrick, W.C. Hall, A. Lamantia, J.O. Mcnamara, et al., Neuroscience, Sinauer Associates, Sunderland, 2004.
M.F. Deering, “A Photon Accurate Model of the Human Eye,” Association for Computing Machinery Transactions on Graphics, Vol. 24, No. 3, pp. 649-658, 2005.
A. Patney, M. Salvi, J. Kim, A. Kaplanyan, C. Wyman, N. Benty, et al., "Towards Foveated Rendering for Gaze-Tracked Virtual Reality," Association for Computing Machinery Transactions on Graphics, Vol. 35, No. 6, pp. 179.1-179.12, 2016.
M. Bauer, H. Cook, and B. Khailany, "Cuda DMA: Optimizing GPU Memory Bandwidth via Warp Specialization," Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, Association for Computing Machinery, No. 12, pp. 96-106, 2011.
M. Fujita and T. Harada, "Foveated Realtime Ray Tracing for Virtual Reality Headset," Tech. rep., Light Transport Entertainment Research. 2014.
S. Park, L. Linsen, O. Kreylos, J. Owens, and B. Hamann, “Discrete Sibson Interpolation,” IEEE Transactions On Visualization and Computer Graphics, Vol. 12, No. 2, pp. 243-253, 2006. https://doi.org/10.1109/TVCG.2006.27
B. Guenter, M. Finch, S. Drucker, D. Tan, and J. Snyder, "Foveated 3D Graphics," Association for Computing Machinery Transactions on Graphics, Vol. 31, No. 6, pp. 164.1-164.10, 2012.
D. Nehab, P.V. Sander, J. Lawrence, N. Tatarchuk, and J.R. Isidoro, "Accelerating Real-Time Shading with Reverse Reprojection Caching," Proceedings of the 22nd Association for Computing Machinery SIGGRAPH/EUROGRAPHICS Symposium on Graphics Hardware, pp. 25-35, 2007.
M. Bock, A.K. Tyagi, J. Kreft, and W. Alt, “Generalized Voronoi Tessellation as a Model of Two-dimensional Cell Tissu Dynamics,” Bulletin of Mathematical Biology, Vol. 72, No. 7, pp. 1696-1731, 2010. https://doi.org/10.1007/s11538-009-9498-3
G. Rong and T. Tan, "Jump Flooding in GPU with Applications to Voronoi Diagram and Distance Transform," Proceedings of the 2006 symposium on Interactive 3D Graphics and Games Association for Computing Machinery, pp. 109-116, 2006.
H. Dammertz, D. Sewtz, J. Hanika, and H.P.A. Lensch, "Edge-avoiding-trous Wavelet Transform for Fast Global Illumination Filtering," Proceedings of the Conference on High Performance Graphics, Eurographics Association, pp. 67-75, 2010.