Search | Korea Science

Compression-Based Volume Rendering on Distributed Memory Parallel Computers (분산 메모리 구조를 갖는 병렬 컴퓨터 상에서의 압축 기반 볼륨 렌더링)

Koo, Gee-Bum;Park, Sang-Hun;Song, Dong-Sub;Ihm, In-Sung
- Journal of KIISE:Computing Practices and Letters
- /
- v.6 no.5
- /
- pp.457-467
- /
- 2000
본 논문에서는 분산 메모리 구조를 갖는 병렬 컴퓨터 상에서 방대한 크기를 갖는 볼륨 데이터의 효과적인 가시화를 위한 병렬 광선 투사법을 제안한다. 데이터의 압축을 기반으로 하는 본 기법은 다른 프로세서의 메모리로부터 데이터를 읽기보다는 자신의 지역 메모리에 존재하는 압축된 데이터를 빠르게 복원함으로써 병렬 렌더링 성능을 향상시키는 것을 목표로 한다. 본 기법은 객체-순서와 영상-순서 탐색 알고리즘 모두의 정점을 이용하여 성능을 향상시켰다. 즉, 블록 단위의 최대-최소 팔진트리의 탐색과 각 픽셀의 불투명도 값을 동적으로 유지하는 실시간 사진트리를 응용함으로써 객체-공간과 영상-공간 각각의 응집성을 이용하였다. 본 논문에서 제안하는 압축 기반 병렬 볼륨 렌더링 방법은 렌더링 수행 중 발생하는 프로세서간의 통신을 최소화하도록 구현되었는데, 이러한 특징은 프로세서 사이의 상당히 높은 데이터 통신 비용을 감수하여야 하는 PC 및 워크스테이션의 클러스터와 같은 더욱 실용적인 분산 환경에서 매우 유용하다. 본 논문에서는 Cray T3E 병렬 컴퓨터 상에서 Visible Man 데이터를 이용하여 실험을 수행하였다.
PDF

Design of a Parallel Rendering Processor Architecture with Effective Memory System (효과적인 메모리 구조를 갖는 병렬 렌더링 프로세서 설계)

Park Woo-Chan;Yoon Duk-Ki;Kim Kyoung-Su
- The KIPS Transactions:PartA
- /
- v.13A no.4 s.101
- /
- pp.305-316
- /
- 2006
Current rendering processors are organized mainly to process a triangle as fast as possible and recently parallel 3D rendering processors, which can process multiple triangles in parallel with multiple rasterizers, begin to appear. For high performance in processing triangles, it is desirable for each rasterizer have its own local pixel cache. However, the consistency problem may occur in accessing the data at the same address simultaneously by more than one rasterizer. In this paper, we propose a parallel rendering processor architecture resolving such consistency problem effectively. Moreover, the proposed architecture reduces the latency due to a pixel cache miss significantly. For the above two goals, effective memory organizations including a new pixel cache architecture are presented. The experimental results show that the proposed architecture achieves almost linear speedup at best case even in sixteen rasterizers.
https://doi.org/10.3745/KIPSTA.2006.13A.4.305 인용 PDF KSCI

Compression-Based Ray-Casting of Huge Volume Data on Distributed Memory Environments (분산 메모리 환경에서의 방대한 볼륨데이터의 압축기반 광선추적법)

송동섭;박상훈;임인성
- Proceedings of the Korean Information Science Society Conference
- /
- 2000.04b
- /
- pp.634-636
- /
- 2000
기존의 병렬 볼륨 렌더링 방법들은 프로세서간의 발생하는 많은 통신량 때문에 통신 속도가 매우 빠른 병렬컴퓨터를 이용하였고 통신속도가 느린 분산 환경에서는 구현이 불가능해 보였다. 또한 가시화하려는 볼륨 데이터도 점점 방대해지고 있는 실정이다. 이에 본 논문에서는 통신 속도에 구애받지 앉을뿐더러 매우 큰 볼륨데이터를 다루는 병렬/분산 볼륨 렌더링을 제안한다. 본 방법은 고비용을 필요로 하는 원격 메모리 접근 대신에 압축을 기반으로 하여 필요한 데이터를 지역 메모리에서 빠르게 복원함으로써 좋은 성능향상(speedup)을 나타낸다. 이것은 각 프로세서가 전체 볼륨 데이터를 모두 적재하고 있다는 것을 의미한다. 다라서 렌더링 과정중에 발생하는 프로세서간의 통신을 최소화할 수 있었고, 이런 방식은 높은 통신 비용으로 효율적 병렬/분산 처리가 힘든 분산 메모리 병렬 컴퓨터나 PC/워크스테이션 클러스터상에서 매우 적합하다.
PDF

Implementation of Parallel Volume Rendering Using the Sequential Shear-Warp Algorithm (순차 Shear-Warp 알고리즘을 이용한 병렬볼륨렌더링의 구현)

Kim, Eung-Kon
- The Transactions of the Korea Information Processing Society
- /
- v.5 no.6
- /
- pp.1620-1632
- /
- 1998
This paper presents a fast parallel algorithm for volume rendering and its implementation using C language and MPI MasPar Programming Language) on the 4,096 processor MasPar MP-2 machine. This parallel algorithm is a parallelization hased on the Lacroute' s sequential shear - warp algorithm currently acknowledged to be the fastest sequential volume rendering algorithm. This algorithm reduces communication overheads by using the sheared space partition scheme and the load balancing technique using load estimates from the previous iteration, and the number of voxels to be processed by using the run-length encoded volume data structure.Actual performance is 3 to 4 frames/second on the human hrain scan dataset of $128\times128\times128$ voxels. Because of the scalability of this algorithm, performance of ]2-16 frames/sc.'cond is expected on the 16,384 processor MasPar MP-2 machine. It is expected that implementation on more current SIMD or MIMD architectures would provide 3O~60 frames/second on large volumes.
PDF

A Pixel Cache Architecture with Selective Loading Scheme based on Z-test (깊이 검사 결과에 의한 선택적 적재 방법을 가지는 픽셀 캐쉬 구조)

이길환;박우찬;김일산;한탁돈
- Journal of KIISE:Computer Systems and Theory
- /
- v.30 no.10
- /
- pp.579-585
- /
- 2003
Recently most of 3D graphics rendering Processors have the pixel cache storing depth data and color data to reduce the memory latency and the bandwidth requirement. In this paper, we propose the effective pixel cache for improving the performance of a rendering processor. The proposed cache system stores the depth data selectively based on the result of Z-test and the color data are stored into the auxiliary buffer. Simulation results show that the 16Kbyte proposed cache system provides better performance than the 32Kbyte conventional cache.
PDF KSCI

A Processor Architecture with Effective Memory System for Sort-Last Parallel Rendering (Sort-Last 병렬 렌더링을 위한 효과적인 메모리 프로세서 구조)

Yoon Duk-Ki;Kim Kyoung-So;Lee Kyung-Ho;Park Wo-Chan
- Proceedings of the Korea Information Processing Society Conference
- /
- 2006.05a
- /
- pp.1363-1366
- /
- 2006
본 논문에서는 각각의 그래픽 가속기에 픽셀 캐시를 사용가능 하게 하면서 성능을 증가시키고 일관성 문제를 해결하는 병렬 렌더링 프로세서를 제안한다. 제안하는 구조에서는 픽셀 캐시 미스에 의한 latency를 감소시켰다. 이러한 2가지 성과를 위하여 현재의 새로운 픽셀 캐시 구조에 효과적인 메모리 구조를 포함시켰다. 실험 결과는 제안하는 구조가 16개 이상의 레스터라이저에서 거의 선형적으로 속도 향상을 가져옴을 보여준다..
PDF

An effective visibility culling method for 3D rendering processor (3 차원 렌더링 프로세서를 위한 효과적인 가시성 선별 방법)

Choi, Moon-Hee;Park, Woo-Chan;Kim, Shin-Dug
- Proceedings of the Korea Information Processing Society Conference
- /
- 2005.05a
- /
- pp.1713-1716
- /
- 2005
최근 3 차원 그래픽 영상의 복잡도가 점점 증가함에 따라, 가시성 선별에 관련된 연구는 3 차원 렌더링 프로세서 설계에 있어서 중요한 핵심 연구 중 하나가 되었다. 본 논문에서는 기존의 픽셀 캐쉬의 정보를 이용하여 가시성 선별을 수행하는 새로운 래스터라이제이션 파이프라인을 제안하고 있다. 제안 구조에서는 가시성 정보를 관리하기 위해서 계층적 z-버퍼 (HZB)와 같이 규모가 큰 별도의 하드웨어를 추가하지 않고, 픽셀 캐쉬에 저장되어 있는 데이터를 참조하여 주사 변환 과정에서 가시성 선별을 수행하고 있다. 캐쉬에서 접근 참조 실패된 프리미티브에 대해서는 픽셀 래스터라이제이션 파이프라인의 z-테스트 과정에서 은면 제거를 수행하도록 하였고, 선 인출 기법을 적용하여 픽셀 캐쉬의 접근 실패에 따른 손실을 줄여주었다. 실험 결과, 제안 구조는 일반 픽셀 파이프라인 구조에 비해 약 32%, HZB 구조에 비해 약 7%의 성능 향상을 보이고 있다.
PDF

Performance of Parallel Ray Tracing Algorithm (병렬 광선 추적 알고리듬의 성능)

Lee, Hyo-Jong;Im, Beom-Hyeon
- Proceedings of the Korea Information Processing Society Conference
- /
- 2001.10a
- /
- pp.255-258
- /
- 2001
광선추적기법은 사진과 같은 고해상도의 영상을 만들어내는 렌더링 기법중의 하나이다. 이 기법은 이미지를 합성하는데 많은 양의 계산 시간을 필요로 한다. 병렬처리 기법이 광선추적에 계산양의 처리 기간을 감소하기 위하여 사용될 수 있다. 본 논문에서는 병렬 광선추적 기법을 MPI(Message Passing Interface)를 사용하여 IBM Supercomputer 상에서 노드의 개수의 증가에 따른 속도 향상과 노드간에 전달되는 메시지의 크기에 따른 성능 향상을 실험하였다. 본 논문에서 실험한 병렬 광선 추적 기법으로 IBM SP 시스템 상에서 다양한 영상을 생성하였다. 영상은 분할가능하고 노드에 분배할 수 있기 때문에 병렬화 범주에 들 수 있으며 부하균형을 맞출 수 있다. 실험에서 프로세서수의 증가에 따른 이상적인 속도향상률(Speed-up rate)을 15개의 프로세서를 사용하여 얻을 수 있었다. 광선을 추적하여 영상을 합성해 낼 때 표현하고자 하는 영상이 단순한 객체로 이루어져 있다면 각 노드에 분산해줘야 할 작업의 크기는 복잡한 객체들로 구성된 영상보다 클 때 더 놓은 성능을 나타내었다. 분산작업의 크기가 작아 상대적으로 통신횟수가 증가할 때 렌더링시 효율저하를 나타내었다.
PDF

Design of Special Function Unit for Vectorized SIMD Programmable Unified Shader (벡터화된 SIMD 프로그램어블 통합 셰이더를 위한 특수 함수 유닛 설계)

Jung, Jin-Ha;Kim, Kyeong-Seob;Yun, Jeong-Hee;Seo, Jang-Won;Choi, Sang-Bang
- Journal of the Institute of Electronics Engineers of Korea SD
- /
- v.47 no.5
- /
- pp.56-70
- /
- 2010
Rendering technique generating 2 dimensional image to give reality and high performance graphical processor for efficient processing of massive data are necessary to support realistic 3 dimensional graphical image. Recently, graphical hardwares have evolved rapidly. This enables high quality rendering effect that we were unable to process in realtime. Improving shading technique enabled us to render realistic images but still much time is required for this process. Multiple operational units are being integrated in a graphical processor for effective floating point operation using massive data to process almost real looking images. In this paper, we have designed and implemented a special functional unit to support high quality 3 dimensional computer graphic image on programmable integrated shader processor. We have done evaluation through functional level simulation of designed special functional unit. Hardware resource usage rate and execution speed are measured implementing directly on FPGA Virtex-4(xc4vlx200).
PDF KSCI

A architecture for parallel rendering processor with by effective memory organization (효과적인 메모리 구조를 갖는 병렬 렌더링 프로세서 구조)

Kim, Kyung-Su;Yoon, Duk-Ki;Kim, Il-San;Park, Woo-Chan
- Journal of Korea Game Society
- /
- v.5 no.3
- /
- pp.39-47
- /
- 2005
Current rendering processors are organized mainly to process a triangle as fast as possible and recently parallel 3D rendering processors, which can process multiple triangles in parallel with multiple rasterizers, begin to appear. For high performance in processing triangles, it is desirable for each rasterizer have its own local pixel cache. However, the consistency problem may occur in accessing the data at the same address simulaneously by more than one rasterizer. In this paper, we propose a parallel rendering processor architecture resolving such consistency problem effectively. Moreover, the proposed architecture reduces the latency due to a pixel cache miss significantly. The experimental results show that proposed architecture achieves almost linear speedup at best case even in sixteen rasterizer
PDF

Search Result 30, Processing Time 0.029 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)