• 제목/요약/키워드: parallel processor

검색결과 485건 처리시간 0.033초

효과적인 메모리 구조를 갖는 병렬 렌더링 프로세서 구조 (A architecture for parallel rendering processor with by effective memory organization)

  • 김경수;윤덕기;김일산;박우찬
    • 한국게임학회 논문지
    • /
    • 제5권3호
    • /
    • pp.39-47
    • /
    • 2005
  • 현재의 거의 대부분의 3차원 그래픽 프로세서는 한 개의 삼각형을 빠르게 처리하는 구조로 되어 있으며, 향후 여러 개의 삼각형을 병렬적으로 처리할 수 있는 프로세서가 등장할 것으로 예상된다. 고성능으로 삼각형을 처리하기 위해서는 각각의 레스터라이저마다 각각의 고유한 픽셀 캐시를 가져야 한다. 그런데, 병렬로 처리되는 경우 각각의 프로세서와 프레임 메모리 간에 일관성 문제가 발생할 수 있다. 본 논문에서는 각각의 그래픽 가속기에 픽셀 캐시를 사용가능 하게 하면서 성능을 증가시키고 일관성 문제를 효과적으로 해결하는 병렬 렌더링 프로세서를 제안한다. 또한 제안하는 구조에서는 픽셀 캐시 미스에 의한 지연시간을 크게 감소시켰다. 실험 결과는 본 구조가 16개 이상의 레스터라이저에서 선형적으로 속도 향상을 가져옴을 보여준다.

  • PDF

효과적인 메모리 구조를 갖는 병렬 렌더링 프로세서 설계 (Design of a Parallel Rendering Processor Architecture with Effective Memory System)

  • 박우찬;윤덕기;김경수
    • 정보처리학회논문지A
    • /
    • 제13A권4호
    • /
    • pp.305-316
    • /
    • 2006
  • 현재의 거의 대부분의 3차원 그래픽 프로세서는 한 개의 삼각형을 빠르게 처리하는 구조로 되어 있으며, 향후 여러 개의 삼각형을 병렬적으로 처리할 수 있는 프로세서가 등장할 것으로 예상된다. 고성능으로 삼각형을 처리하기 위해서는 각 래스터라이저마다 고유한 픽셀 캐시를 가져야 한다. 그런데, 병렬로 처리되는 경우 각각의 프로세서와 프레임 메모리 간에 일관성 문제가 발생할 수 있다. 본 논문에서는 각각의 그래픽 가속기에 픽셀 캐시를 사용가능 하게 하면서 성능을 증가시키고 일관성 문제를 해결하는 병렬 렌더링 프로세서를 제안한다. 제안하는 구조에서는 픽셀 캐시 미스에 의한 지연(latency)을 감소시켰다. 이러한 2가지 성과를 위하여 현재의 새로운 픽셀 캐시 구조에 효과적인 메모리 구조를 포함시켰다. 실험 결과는 제안하는 구조가 16개 이상의 래스터라이저에서 거의 선형적으로 속도 향상을 가져옴을 보여준다.

판금형 해석을 위한 정적/외연적 유한요소 프로그램의 병령화에 관한 연구 (On The Parallel Inplementation of a Static/Explicit FEM Program for Sheet Metal Forming)

  • 진석기;정동원
    • 한국정밀공학회:학술대회논문집
    • /
    • 한국정밀공학회 1995년도 추계학술대회 논문집
    • /
    • pp.625-628
    • /
    • 1995
  • A static/implicit finite element code for sheet forming (ITAS3D) is parallelized on IBM SP 6000 multi-processor computer. Computing-load-balanced domain decomposition method and the direct solution method at each subdomain (and interface) equation are developed. The system of equations for each subdomain are constructed by condensation and calculated on each processor. Approximated operation counts are calculated to set up the nonlinear equation system for balancing the compute load on each subdomain. Th esquare cup tests with several numbers of elements are used in demonstrating the performance of this parallel implementation. This procedure are proved to be efficient for moderate number of processors, especially for large number of elements.

  • PDF

연합 처리기를 이용한 직교선형 스타이너 트리의 병렬 알고리즘 (A Parallel Algorithm For Rectilinear Steiner Tree Using Associative Processor)

  • Taegeun Park
    • 전자공학회논문지B
    • /
    • 제32B권8호
    • /
    • pp.1057-1063
    • /
    • 1995
  • This paper describes an approach for constucting a Rectilinear Steiner Tree (RST) derivable from a Minimum Spanning Tree (MST), using Associative Processor (AP). We propose a fast parallel algorithm using AP's basic algorithms which can be realized by the processing capability of rudimentary logic and the selective matching capability of Content- Addressable Memory (CAM). The main idea behind the proposed algorithm is to maximize the overlaps between the consecutive edges in MST, thus minimizing the cost of a RST. An efficient parallel linear algorithm with O(n) complexity to construct a RST is proposed using an algorithm to find a MST, where n is the number of nodes. A node insertion method is introduced to allow the Z-type layout. The routing process which only depends on the neighbor edges and the no-rerouting strategy both help to speed up finding a RST.

  • PDF

유도탄 성능분석을 위한 실시간 병렬처리 시뮬레이터 연구 (Study on Real-time Parallel Processing Simulator for Performance Analysis of Missiles)

  • 김병문;정순기
    • 제어로봇시스템학회논문지
    • /
    • 제11권1호
    • /
    • pp.84-91
    • /
    • 2005
  • In this paper, we describe the real-time parallel processing simulator developed for the use of performance analysis of rolling missiles. The real-time parallel processing simulator developed here consists of seeker emulator generating infrared image signal on aircraft, real-time computer, host computer, system unit, and actual equipments such as auto-pilot processor and seeker processor. Software is developed from mathematic models, 6 degree-of-freedom module, aerodynamic module which are resided in real-time computer, and graphic user interface program resided in host computer. The real-time computer consists of six TIC-40 processors connected in parallel. The seeker emulator is designed by using analog circuits coupled with mechanical equipments. The system unit provides interface function to match impedance between the components and processes very small electrical signals. Also real launch unit of missiles is interfaced to simulator through system unit. In order to apply the real-time parallel processing simulator to performance analysis equipment of rolling missiles it is essential to perform the performance verification test of simulator.

Parallel Deblocking Filter Based on Modified Order of Accessing the Coding Tree Units for HEVC on Multicore Processor

  • Lei, Haiwei;Liu, Wenyi;Wang, Anhong
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제11권3호
    • /
    • pp.1684-1699
    • /
    • 2017
  • The deblocking filter (DF) reduces blocking artifacts in encoded video sequences, and thereby significantly improves the subjective and objective quality of videos. Statistics show that the DF accounts for 5-18% of the total decoding time in high-efficiency video coding. Therefore, speeding up the DF will improve codec performance, especially for the decoder. In view of the rapid development of multicore technology, we propose a parallel DF scheme based on a modified order of accessing the coding tree units (CTUs) by analyzing the data dependencies between adjacent CTUs. This enables the DF to run in parallel, providing accelerated performance and more flexibility in the degree of parallelism, as well as finer parallel granularity. We additionally solve the problems of variable privatization and thread synchronization in the parallelization of the DF. Finally, the DF module is parallelized based on the HM16.1 reference software using OpenMP technology. The acceleration performance is experimentally tested under various numbers of cores, and the results show that the proposed scheme is very effective at speeding up the DF.

3-Way 32 bit VLIW Multimedia Signal Processor

  • Park, Jaebok;Jaehee You
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2001년도 하계종합학술대회 논문집(2)
    • /
    • pp.97-100
    • /
    • 2001
  • A 3-way VLIW multimedia signal processor capable of efficient repeated operations as well as both load/store and type transformations for various data types is presented. It is composed of a 32-bit execution unit that can execute two instructions in parallel, an independent load/store unit and a control unit. The processor is implemented with 0.6${\mu}{\textrm}{m}$ gate array and the results are discussed.

  • PDF

병렬 출력을 갖는 LFSR 구조를 적용한 HIGHT 프로세서 설계 (Design of an HIGHT Processor Employing LFSR Architecture Allowing Parallel Outputs)

  • 이제훈;김상춘
    • 융합보안논문지
    • /
    • 제15권2호
    • /
    • pp.81-89
    • /
    • 2015
  • HIGHT (HIght security and light weighHT) 암호는 기밀성을 요구하는 네트워크 환경에서 사용할 수 있도록 국내에서 개발된 저전력 경량화 64비트 블록 암호 알고리즘이다. 본 논문은 키스케쥴러에 사용되는 LFSR 및 역 LFSR의 4개의 병렬 출력을 허용할 수 있는 구조를 제안하였다. 또한, 각 라운드 연산에 필요한 4개의 서브키를 동일 클럭 사이클에 생성할 수 있도록 구성하였다. 따라서, 전체 HIGHT 암호 프로세서가 단일 시스템 클럭에 의해 제어할 수 있다. VHDL을 이용하여 회로를 합성한 후, 검증한 결과 제안된 키 스케쥴러의 회로 크기는 기존 키 스케쥴러에 비해 9% 감소되었다.

부분행렬을 사용한 행렬.벡터 연산용 1차원 시스톨릭 어레이 프로세서 설계에 관한 연구 (A Study On Improving the Performance of One Dimensional Systolic Array Processor for Matrix.Vector Operation using Sub-Matrix)

  • 김용성
    • 정보학연구
    • /
    • 제10권3호
    • /
    • pp.33-45
    • /
    • 2007
  • Systolic Array Processor is used for designing the special purpose processor in Digital Signal Processing, Computer Graphics, Neural Network Applications etc., since it has the characteristic of parallelism, pipeline processing and architecture of regularity. But, in case of using general design method, it has intial waiting period as large as No. of PE-1. And if the connected system needs parallel and simultaneous outputs, processor has some problems of the performance, since it generates only one output at each clock in output state. So in this paper, one dimensional Systolic Array Processor that is designed according to the dependance of data and operations using the partitioned sub-matrix is proposed for the purpose of improving the performance. 1-D Systolic Array using 4 partitioned sub-matrix has efficient method in case of considering those two problems.

  • PDF

A Fast SIFT Implementation Based on Integer Gaussian and Reconfigurable Processor

  • Su, Le Tran;Lee, Jong Soo
    • 한국정보전자통신기술학회논문지
    • /
    • 제2권3호
    • /
    • pp.39-52
    • /
    • 2009
  • Scale Invariant Feature Transform (SIFT) is an effective algorithm in object recognition, panorama stitching, and image matching, however, due to its complexity, real time processing is difficult to achieve with software approaches. This paper proposes using a reconfigurable hardware processor with integer half kernel. The integer half kernel Gaussian reduces the Gaussian pyramid complexity in about half [] and the reconfigurable processor carries out a parallel implementation of a full search Fast SIFT algorithm. We use a low memory, fine grain single instruction stream multiple data stream (SIMD) pixel processor that is currently being developed. This implementation fully exposes the available parallelism of the SIFT algorithm process and exploits the processing and I/O capabilities of the processor which results in a system that can perform real time image and video compression. We apply this novel implementation to images and measure the effectiveness. Experimental simulation results indicate that the proposed implementation is capable of real time applications.

  • PDF