• Title/Summary/Keyword: 3D Graphics Accelerator

Search Result 29, Processing Time 0.027 seconds

An Implementation of 3D Graphic Accelerator for Phong Shading (퐁 음영법을 위한 3차원 그래픽 가속기의 구현)

  • Lee, Hyung;Park, Youn-Ok;Park, Jong-Won
    • Journal of Korea Multimedia Society
    • /
    • v.3 no.5
    • /
    • pp.526-534
    • /
    • 2000
  • There have been many researches on the 3D graphic accelerator for high speed by needs of CAD/CAM,3D modeling, virtual reality or medical image. In this paper, an SIMD processor architecture for 3D graphic accelerator is proposed in order to improve the processing time of the 3D graphics, and a parallel Phong shading algorithm is presented to estimate performance of the proposed architecture. The proposed SIMD processor architecture for 3D graphic accelerator consists of PCI local bus interface, 16 Processing Elements (PE's), and Park's multi-access memory system (NAMS) that has 17 memory modules. A serial algorithm for Phong shading is modified for the architecture and the main key is to divide a polygon into $4\times{4}$ squares. And, for processing a square, 4 PE's are regarded as a PE Grou logically. Since MAMS can support block access type with interval 1, it is possible that 4 PE Groups process a square at a time. In consequence, 16 pixels are processed simultaneously. The proposed SIMD processor architecture is simulated by CADENCE Verilog-XL that is a package for the hardware simulation. With the same simulated results as that of the serial algorithm, the speed enhancement by the parallel algorithm to the serial one is 5.68.

  • PDF

A Design on Rasterizer for the verification in a 3D Graphic Processor (3D 그래픽 프로세서 검증을 위한 래스터라이저 설계)

  • Lee, Mi-Kyoung;Jang, Young Jo
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2009.10a
    • /
    • pp.639-642
    • /
    • 2009
  • When the graphics accelerator for high-quality multimedia content design, hardware verification environment, easy and accurate performance evaluation in an embedded device is required. To work around this is not verified through the simulation waveform analysis to determine the actual calculated graphic images has designed a software rasterizer. Rasterizer is designed for Windows-based environment using the C language implementation of rasterization has a function at each step. Vertex data is entered and the results were verified.

  • PDF

The Early Write Back Scheme For Write-Back Cache (라이트 백 캐쉬를 위한 빠른 라이트 백 기법)

  • Chung, Young-Jin;Lee, Kil-Whan;Lee, Yong-Surk
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.46 no.11
    • /
    • pp.101-109
    • /
    • 2009
  • Generally, depth cache and pixel cache of 3D graphics are designed by using write-back scheme for efficient use of memory bandwidth. Also, there are write after read operations of same address or only write operations are occurred frequently in 3D graphics cache. If a cache miss is detected, an access to the external memory for write back operation and another access to the memory for handling the cache miss are operated simultaneously. So on frequent cache miss situations, as the memory access bandwidth limited, the access time of the external memory will be increased due to memory bottleneck problem. As a result, the total performance of the processor or the IP will be decreased, also the problem will increase peak power consumption. So in this paper, we proposed a novel early write back cache architecture so as to solve the problems issued above. The proposed architecture controls the point when to access the external memory as to copy the valid data block. And this architecture can improve the cache performance with same hit ratio and same capacity cache. As a result, the proposed architecture can solve the memory bottleneck problem by preventing intensive memory accesses. We have evaluated the new proposed architecture on 3D graphics z cache and pixel cache on a SoC environment where ARM11, 3D graphic accelerator and various IPs are embedded. The simulation results indicated that there were maximum 75% of performance increase when using various simulation vectors.

Design of the Triangle Setup Stage Reusing the Values of Shared Edge in 3D Graphics Accelerator (공통 변 정보를 재 사용하는 3차원 그래픽 가속기의 삼각형 셋업 부의 설계)

  • Choi, Moon-Hee;Park, Woo-Chan;Kim, Shin-Dug
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2000.10b
    • /
    • pp.1637-1640
    • /
    • 2000
  • 최근 3 차원 그래픽스 분야에서 실감 영상 지원 요구에 따라 객체를 이루는 데이터의 수가 기하급수적으로 증가하게 되었다. 이에 고성능의 3 차원 그래픽 가속기에 대한 도입뿐만 아니라 가속기에서 처리될 데이터의 표현 및 여러 처리 방법들에 대한 연구도 요구되어지고 있다. 본 논문에서는 삼각형 스트림 기법을 이용하여 3 차원 그래픽 데이터를 효과적으로 표현할 수 있고, 이 기법의 특징을 이용하여 전체 시스템의 계산량을 줄일 수 있는 구조를 제안하였다. 즉 제안하는 구조는 3차원 그래픽 가속기의 뒷 단인 래스터라이저의 삼각형 셋업 부에 공통 변 버퍼를 두어 인접한 삼각형 들 간에 공유되는 변들의 정보를 재 사용하도륵 하였다. 이 구조는 공통 변 버퍼를 사용하지 않는 기존의 구조와 비교했을 경우 최대 31.8%의 수행 성능 향상을 보여준다.

  • PDF

Design of the Pipelined Scan Conversion Unit based on Tile Traversal Method for High Performance 3D Graphics Accelerator (고성능 3차원 그래픽 가속기를 위한 타일 트래버설 방식의 파이프라인된 스캔 컨버젼 유닛 설계)

  • 전원호;최문희;박우찬;한탁돈;김신덕
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2001.10c
    • /
    • pp.16-18
    • /
    • 2001
  • 3차원 영상을 처리하는데 있어 래스터라이제이션은 프레임 버퍼에 저장될 픽셀을 구하는 과정이다. 여러 개의 픽셀로 구성되는 폴리곤을 렌더링하기 위해서 스캔라인 방식 또는 반 평면 함수를 이용한 타일 트래버설 방식 등이 사용되고 있다. 본 논문에서 기반으로 하고 있는 타일 트래버설 방식은 스캔라인 방식에 비해 메모리 효율 및 텍스쳐 캐쉬의 지역성에서 이점을 가지고 있으나 복잡한 탐색 과정 때문에 파이프라인 구조로 구현하기는 어렵다. 본 논문에서 제안하는 구조는 분기 예측 기법을 적용하여 트래버설 과정에서의 분기로 인해 발생되는 파이프라인 지연을 기존의 트래버설 구조에 비해 약 30% 정도 줄임으로써 고성능 3차원 그래픽 가속기에 적합한 스캔 컨버젼 유닛을 제안하였다

  • PDF

Hardware Design of Special-Purpose Arithmetic Unit for 3-Dimensional Graphics Processor (3차원 그래픽프로세서용 특수 목적 연산장치의 하드웨어 설계)

  • Choi, Byeong-Yoon
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2011.05a
    • /
    • pp.140-142
    • /
    • 2011
  • In this paper, special purpose arithmetic unit for mobile graphics accelerator is designed. The designed processor supports six operations, such as $1/{\chi}$, $\frac{1}{{\sqrt{x}}$, $log_2x$, $2^x$, $sin(x)$, $cos(x)$. The processor adopts 2nd-order polynomial minimax approximation scheme based on IEEE floating point data format to satisfy accuracy conditions and has 5-stage pipeline structure to meet high operational rates. The SFAU processor consists of 23,000 gates and its estimated operating frequency is about 400 Mhz at operating condition of 65nm CMOS technology. Because the processor can execute all operations with 5-stage pipeline scheme, it has about 400 MOPS(million operations per second) execution rate. Thus, it can be applicable to the 3D mobile graphics processors.

  • PDF

Texture Cache with Automatical Index Splitting Based on Texture Size (텍스처의 크기에 따라 인덱스를 자동 분할하는 텍스처 캐시)

  • Kim, Jin-Woo;Park, Young-Jin;Kim, Young-Sik;Han, Tack-Don
    • Journal of Korea Game Society
    • /
    • v.8 no.2
    • /
    • pp.57-68
    • /
    • 2008
  • Texture Mapping is a technique for adding realism to an image in 3D graphics Chip. Bilinear filtering mode of this technique needs accesses of 4 texels to process one pixel. In this paper we analyzed the access pattern of texture, and proposed the high performance texture cache which can access 4 texels simultaneously. We evaluated using simulation results of 3D game(Quake 3, Unreal Tournament 2004). Simulation results show that proposed texture cache has high performance on the case where physical size is less then or equal 8KBytes.

  • PDF

A New Network Bandwidth Reduction Method of Distributed Rendering System for Scalable Display (확장형 디스플레이를 위한 분산 렌더링 시스템의 네트워크 대역폭 감소 기법)

  • Park, Woo-Chan;Lee, Won-Jong;Kim, Hyung-Rae;Kim, Jung-Woo;Han, Tack-Don;Yang, Sung-Bong
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.29 no.10
    • /
    • pp.582-588
    • /
    • 2002
  • Scalable displays generate large and high resolution images and provide an immersive environment. Recently, scalable displays are built on the networked clusters of PCs, each of which has a fast graphics accelerator, memory, CPU, and storage. However, the distributed rendering on clusters is a network bound work because of limited network bandwidth. In this paper, we present a new algorithm for reducing the network bandwidth and implement it with a conventional distributed rendering system. This paper describes the algorithm called geometry tracking that avoids the redundant geometry transmission by indexing geometry data. The experimental results show that our algorithm reduces the network bandwidth up to 42%.

A Real-time Single-Pass Visibility Culling Method Based on a 3D Graphics Accelerator Architecture (실시간 단일 패스 가시성 선별 기법 기반의 3차원 그래픽스 가속기 구조)

  • Choo, Catherine;Choi, Moon-Hee;Kim, Shin-Dug
    • The KIPS Transactions:PartA
    • /
    • v.15A no.1
    • /
    • pp.1-8
    • /
    • 2008
  • An occlusion culling method, one of visibility culling methods, excludes invisible objects or triangles which are covered by other objects. As it reduces computation quantity, occlusion culling is an effective method to handle complex scenes in real-time. But an existing common occlusion culling method, such as hardware occlusion query method, sends objects' data twice to GPU and this causes processing overheads once for occlusion culling test and the other is for rendering. And another existing hardware occlusion culling method, VCBP, can test objects' visibility quickly, but it neither test bounding volume nor return test result to application stage. In this paper, we propose a single pass occlusion culling method which uses temporal and spatial coherency, with effective occlusion culling hardware architecture. In our approach, the hardware performs occlusion culling test rapidly with cache on the rasterization stage where triangles are transformed into fragments. At the same time, hardware sends each primitive's visibility information to application stage. As a result, the application stage reduces data transmission quantity by excluding covered objects using the visibility information on previous frame and hierarchical spatial tree. Our proposed method improved maximum 44%, minimum 14% compared with S&W method based on hardware occlusion query. And the performance is increased 25% and 17% respectively, compared to maximum and minimum performance of CHC method which is based on occlusion culling method.