• Title/Summary/Keyword: Parallel Rendering

Search Result 69, Processing Time 0.023 seconds

Adaptive Load Balancing Scheme using a Combination of Hierarchical Data Structures and 3D Clustering for Parallel Volume Rendering on GPU Clusters (계층 자료구조의 결합과 3차원 클러스터링을 이용하여 적응적으로 부하 균형된 GPU-클러스터 기반 병렬 볼륨 렌더링)

  • Lee Won-Jong;Park Woo-Chan;Han Tack-Don
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.33 no.1_2
    • /
    • pp.1-14
    • /
    • 2006
  • Sort-last parallel rendering using a cluster of GPUs has been widely used as an efficient method for visualizing large- scale volume datasets. The performance of this method is constrained by load balancing when data parallelism is included. In previous works static partitioning could lead to self-balance when only task level parallelism is included. In this paper, we present a load balancing scheme that adapts to the characteristic of volume dataset when data parallelism is also employed. We effectively combine the hierarchical data structures (octree and BSP tree) in order to skip empty regions and distribute workload to corresponding rendering nodes. Moreover, we also exploit a 3D clustering method to determine visibility order and save the AGP bandwidths on each rendering node. Experimental results show that our scheme can achieve significant performance gains compared with traditional static load distribution schemes.

Enhanced Image Mapping Method for Computer-Generated Integral Imaging System (집적 영상 시스템을 위한 향상된 이미지 매핑 방법)

  • Lee Bin-Na-Ra;Cho Yong-Joo;Park Kyoung-Shin;Min Sung-Wook
    • The KIPS Transactions:PartB
    • /
    • v.13B no.3 s.106
    • /
    • pp.295-300
    • /
    • 2006
  • The integral imaging system is an auto-stereoscopic display that allows users to see 3D images without wearing special glasses. In the integral imaging system, the 3D object information is taken from several view points and stored as elemental images. Then, users can see a 3D reconstructed image by the elemental images displayed through a lens array. The elemental images can be created by computer graphics, which is referred to the computer-generated integral imaging. The process of creating the elemental images is called image mapping. There are some image mapping methods proposed in the past, such as PRR(Point Retracing Rendering), MVR(Multi-Viewpoint Rendering) and PGR(Parallel Group Rendering). However, they have problems with heavy rendering computations or performance barrier as the number of elemental lenses in the lens array increases. Thus, it is difficult to use them in real-time graphics applications, such as virtual reality or real-time, interactive games. In this paper, we propose a new image mapping method named VVR(Viewpoint Vector Rendering) that improves real-time rendering performance. This paper describes the concept of VVR first and the performance comparison of image mapping process with previous methods. Then, it discusses possible directions for the future improvements.

A Software Method for Improving the Performance of Real-time Rendering of 3D Games (3D 게임의 실시간 렌더링 속도 향상을 위한 소프트웨어적 기법)

  • Whang, Suk-Min;Sung, Mee-Young;You, Yong-Hee;Kim, Nam-Joong
    • Journal of Korea Game Society
    • /
    • v.6 no.4
    • /
    • pp.55-61
    • /
    • 2006
  • Graphics rendering pipeline (application, geometry, and rasterizer) is the core of real-time graphics which is the most important functionality for computer games. Usually this rendering process is completed by both the CPU and the GPU, and a bottleneck can be located either in the CPU or the GPU. This paper focuses on reducing the bottleneck between the CPU and the GPU. We are proposing a method for improving the performance of parallel processing for real-time graphics rendering by separating the CPU operations (usually performed using a thread) into two parts: pure CPU operations and operations related to the GPU, and let them operate in parallel. This allows for maximizing the parallelism in processing the communication between the CPU and the GPU. Some experiments lead us to confirm that our method proposed in this paper can allow for faster graphics rendering. In addition to our method of using a dedicated thread for GPU related operations, we are also proposing an algorithm for balancing the graphics pipeline using the idle time due to the bottleneck. We have implemented the two methods proposed in this paper in our networked 3D game engine and verified that our methods are effective in real systems.

  • PDF

A Ray-Tracing Algorithm Based On Processor Farm Model (프로세서 farm 모델을 이용한 광추적 알고리듬)

  • Lee, Hyo Jong
    • Journal of the Korea Computer Graphics Society
    • /
    • v.2 no.1
    • /
    • pp.24-30
    • /
    • 1996
  • The ray tracing method, which is one of many photorealistic rendering techniques, requires heavy computational processing to synthesize images. Parallel processing can be used to reduce the computational processing time. A parallel algorithm for the ray tracing has been implemented and executed for various images on transputer systems. In order to develop a scalable parallel algorithm, a processor farming technique has been exploited. Since each image is divided and distributed to each farming processor, the scalability of the parallel system and load balancing are achieved naturally in the proposed algorithm. Efficiency of the parallel algorithm is obtained up to 95% for nine processors. However, the best size of a distributed task is much higher in simple images due to less computational requirement for every pixel. Efficiency degradation is observed for large granularity tasks because of load unbalancing caused by the large task. Overall, transputer systems behave as good scalable parallel processing system with respect to the cost-performance ratio.

  • PDF

Implementation of Parallel Volume Rendering Using the Sequential Shear-Warp Algorithm (순차 Shear-Warp 알고리즘을 이용한 병렬볼륨렌더링의 구현)

  • Kim, Eung-Kon
    • The Transactions of the Korea Information Processing Society
    • /
    • v.5 no.6
    • /
    • pp.1620-1632
    • /
    • 1998
  • This paper presents a fast parallel algorithm for volume rendering and its implementation using C language and MPI MasPar Programming Language) on the 4,096 processor MasPar MP-2 machine. This parallel algorithm is a parallelization hased on the Lacroute' s sequential shear - warp algorithm currently acknowledged to be the fastest sequential volume rendering algorithm. This algorithm reduces communication overheads by using the sheared space partition scheme and the load balancing technique using load estimates from the previous iteration, and the number of voxels to be processed by using the run-length encoded volume data structure.Actual performance is 3 to 4 frames/second on the human hrain scan dataset of $128\times128\times128$ voxels. Because of the scalability of this algorithm, performance of ]2-16 frames/sc.'cond is expected on the 16,384 processor MasPar MP-2 machine. It is expected that implementation on more current SIMD or MIMD architectures would provide 3O~60 frames/second on large volumes.

  • PDF

Non-Photorealistic Rendering Using CUDA-Based Image Segmentation (CUDA 기반 영상 분할을 사용한 비사실적 렌더링)

  • Yoon, Hyun-Cheol;Park, Jong-Seung
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.4 no.11
    • /
    • pp.529-536
    • /
    • 2015
  • When rendering both three-dimensional objects and photo images together, the non-photorealistic rendering results are in visual discord since the two contents have their own independent color distributions. This paper proposes a non-photorealistic rendering technique which renders both three-dimensional objects and photo images such as cartoons and sketches. The proposed technique computes the color distribution property of the photo images and reduces the number of colors of both photo images and 3D objects. NPR is performed based on the reduced colormaps and edge features. To enhance the natural scene presentation, the image region segmentation process is preferred when extracting and applying colormaps. However, the image segmentation technique needs a lot of computational operations. It takes a long time for non-photorealistic rendering for large size frames. To speed up the time-consuming segmentation procedure, we use GPGPU for the parallel computing using the GPU. As a result, we significantly improve the execution speed of the algorithm.

Efficient Parallel Visualization of Large-scale Finite Element Analysis Data in Distributed Parallel Computing Environment (분산 병렬 계산환경에 적합한 초대형 유한요소 해석 결과의 효율적 병렬 가시화)

  • Kim, Chang-Sik;Song, You-Me;Kim, Ki-Ook;Cho, Jin-Yeon
    • Journal of the Korean Society for Aeronautical & Space Sciences
    • /
    • v.32 no.10
    • /
    • pp.38-45
    • /
    • 2004
  • In this paper, a parallel visualization algorithm is proposed for efficient visualization of the massive data generated from large-scale parallel finite element analysis through investigating the characteristics of parallel rendering methods. The proposed parallel visualization algorithm is designed to be highly compatible with the characteristics of domain-wise computation in parallel finite element analysis by using the sort-last-sparse approach. In the proposed algorithm, the binary tree communication pattern is utilized to reduce the network communication time in image composition routine. Several benchmarking tests are carried out by using the developed in-house software, and the performance of the proposed algorithm is investigated.

Haptic Display of A Puncture Task with 4-legged 6 DOF Parallel Haptic Device (6자유도 병렬형 햅틱장치를 이용한 구멍뚫기 작업의 햅틱 디스플레이)

  • 김형욱;서일홍
    • Journal of the Institute of Electronics Engineers of Korea SC
    • /
    • v.41 no.6
    • /
    • pp.1-10
    • /
    • 2004
  • A haptic rendering system is proposed for a puncture task of a virtual vertebra model. To build a mesh model from medical images, Delaunay triangulation is applied and physical models are based on elasticity theory. Also, a redundant actuated 6 DOF parallel type haptic device is designed to display large force and to resolve the singularity problem of parallel type mechanisms. Haptic feeling of puncture task and the performance of the proposed haptic device are tested by two puncture task experiments.

Memory Efficient Parallel Ray Casting Algorithm for Unstructured Grid Volume Rendering on Multi-core CPUs (비정렬 격자 볼륨 렌더링을 위한 다중코어 CPU기반 메모리 효율적 광선 투사 병렬 알고리즘)

  • Kim, Duksu
    • Journal of KIISE
    • /
    • v.43 no.3
    • /
    • pp.304-313
    • /
    • 2016
  • We present a novel memory-efficient parallel ray casting algorithm for unstructured grid volume rendering on multi-core CPUs. Our method is based on the Bunyk ray casting algorithm. To solve the high memory overhead problem of the Bunyk algorithm, we allocate a fixed size local buffer for each thread and the local buffers contain information of recently visited faces. The stored information is used by other rays or replaced by other face's information. To improve the utilization of local buffers, we propose an image-plane based ray grouping algorithm that makes ray groups have high coherency. The ray groups are then distributed to computing threads and each thread processes the given groups independently. We also propose a novel hash function that uses the index of faces as keys for calculating the buffer index each face will use to store the information. To see the benefits of our method, we applied it to three unstructured grid datasets with different sizes and measured the performance. We found that our method requires just 6% of the memory space compared with the Bunyk algorithm for storing face information. Also it shows compatible performance with the Bunyk algorithm even though it uses less memory. In addition, our method achieves up to 22% higher performance for a large-scale unstructured grid dataset with less memory than Bunyk algorithm. These results show the robustness and efficiency of our method and it demonstrates that our method is suitable to volume rendering for a large-scale unstructured grid dataset.

A Parallel Processing Technique for Large Spatial Data (대용량 공간 데이터를 위한 병렬 처리 기법)

  • Park, Seunghyun;Oh, Byoung-Woo
    • Spatial Information Research
    • /
    • v.23 no.2
    • /
    • pp.1-9
    • /
    • 2015
  • Graphical processing unit (GPU) contains many arithmetic logic units (ALUs). Because many ALUs can be exploited to process parallel processing, GPU provides efficient data processing. The spatial data require many geographic coordinates to represent the shape of them in a map. The coordinates are usually stored as geodetic longitude and latitude. To display a map in 2-dimensional Cartesian coordinate system, the geodetic longitude and latitude should be converted to the Universal Transverse Mercator (UTM) coordinate system. The conversion to the other coordinate system and the rendering process to represent the converted coordinates to screen use complex floating-point computations. In this paper, we propose a parallel processing technique that processes the conversion and the rendering using the GPU to improve the performance. Large spatial data is stored in the disk on files. To process the large amount of spatial data efficiently, we propose a technique that merges the spatial data files to a large file and access the file with the method of memory mapped file. We implement the proposed technique and perform the experiment with the 747,302,971 points of the TIGER/Line spatial data. The result of the experiment is that the conversion time for the coordinate systems with the GPU is 30.16 times faster than the CPU only method and the rendering time is 80.40 times faster than the CPU.