• 제목/요약/키워드: parallel architecture

검색결과 891건 처리시간 0.023초

AOP를 이용한 유한체 위에서의 고속 병렬연산기의 구조 (An Architecture of the Fast Parallel Multiplier over Finite Fields using AOP)

  • 김용태
    • 한국전자통신학회논문지
    • /
    • 제7권1호
    • /
    • pp.69-79
    • /
    • 2012
  • 본 논문에서는 m은 홀수이고 n=mk인 경우에, 확대체 GF($2^n$)위에서의 곱셈기를 보조기로 사용하는 타입 k 가우스 주기를 가지는 유한 부분체 GF($2^m$)위에서의 새로운 병렬 곱셈기를 제안한다. 이 곱셈기의 공간과 시간 복잡도는 타입 IV인 경우에는 지금까지 알려진 곱셈기 중에서 가장 효율적인 Reyhani-Masoleh and Hasan의 곱셈기와 동등하다.

An Efficient Multidimensional Index Structure for Parallel Environments

  • Bok Koung-Soo;Song Seok-Il;Yoo Jae-Soo
    • International Journal of Contents
    • /
    • 제1권1호
    • /
    • pp.50-58
    • /
    • 2005
  • Generally, multidimensional data such as image and spatial data require large amount of storage space. There is a limit to store and manage those large amounts of data in single workstation. If we manage the data on parallel computing environment which is being actively researched these days, we can get highly improved performance. In this paper, we propose a parallel multidimensional index structure that exploits the parallelism of the parallel computing environment. The proposed index structure is nP(processor)-nxmD(disk) architecture which is the hybrid type of nP-nD and 1P-nD. Its node structure in-creases fan-out and reduces the height of an index. Also, a range search algorithm that maximizes I/O parallelism is devised, and it is applied to k-nearest neighbor queries. Through various experiments, it is shown that the proposed method outperforms other parallel index structures.

  • PDF

Novel Parallel Approach for SIFT Algorithm Implementation

  • Le, Tran Su;Lee, Jong-Soo
    • Journal of information and communication convergence engineering
    • /
    • 제11권4호
    • /
    • pp.298-306
    • /
    • 2013
  • The scale invariant feature transform (SIFT) is an effective algorithm used in object recognition, panorama stitching, and image matching. However, due to its complexity, real-time processing is difficult to achieve with current software approaches. The increasing availability of parallel computers makes parallelizing these tasks an attractive approach. This paper proposes a novel parallel approach for SIFT algorithm implementation using a block filtering technique in a Gaussian convolution process on the SIMD Pixel Processor. This implementation fully exposes the available parallelism of the SIFT algorithm process and exploits the processing and input/output capabilities of the processor, which results in a system that can perform real-time image and video compression. We apply this implementation to images and measure the effectiveness of such an approach. Experimental simulation results indicate that the proposed method is capable of real-time applications, and the result of our parallel approach is outstanding in terms of the processing performance.

JPEG 인코더를 위한 고성능 병렬 프로세서 하드웨어 설계 및 검증 (Design and Verification of High-Performance Parallel Processor Hardware for JPEG Encoder)

  • 김용민;김종면
    • 대한임베디드공학회논문지
    • /
    • 제6권2호
    • /
    • pp.100-107
    • /
    • 2011
  • As the use of mobile multimedia devices is increasing in the recent year, the needs for high-performance multimedia processors are increasing. In this regard, we propose a SIMD (Single Instruction Multiple Data) based parallel processor that supports high-performance multimedia applications with low energy consumption. The proposed parallel processor consists of 16 processing elements(PEs) and operates on a 3-stage pipelining. Experimental results for the JPEG encoding algorithm indicate that the proposed parallel processor outperforms conventional parallel processors in terms of performance and energy efficiency. In addition, the proposed parallel processor architecture was developed and verified with verilog HDL and a FPGA prototype system.

다중스트리밍을 이용한 3차원 그래픽 프로세서 구조 (3D graphics processor architecture based on multistreaming)

  • 박용진;이동호
    • 전자공학회논문지C
    • /
    • 제34C권9호
    • /
    • pp.10-21
    • /
    • 1997
  • In this paper, we propose multiple instruction issuable multi-streaming as a processor architecture for 3D graphics processor. Multistreaming can eliminate inteferences within concurrently executing instructions inthe pipelined processor to allow enough parallelism for parallel processing. Through cycle level simulation study, we show that the proposed architecture outperforms a conventional RISC processor, MIPS R3000 by three times with reasonable resource overheads. Multiple instruction issuable multistreaming processor will be a bood architecture for instruction processor when a large number of threads are guaranteed.

  • PDF

수정된 하니발 구조를 이용한 신경회로망의 하드웨어 구현 (A hardware implementation of neural network with modified HANNIBAL architecture)

  • 이범엽;정덕진
    • 대한전기학회논문지
    • /
    • 제45권3호
    • /
    • pp.444-450
    • /
    • 1996
  • A digital hardware architecture for artificial neural network with learning capability is described in this paper. It is a modified hardware architecture known as HANNIBAL(Hardware Architecture for Neural Networks Implementing Back propagation Algorithm Learning). For implementing an efficient neural network hardware, we analyzed various type of multiplier which is major function block of neuro-processor cell. With this result, we design a efficient digital neural network hardware using serial/parallel multiplier, and test the operation. We also analyze the hardware efficiency with logic level simulation. (author). refs., figs., tabs.

  • PDF

Architecture of a PDM VLSI Fuzzy Logic Controller with an Explicit Rule Base

  • Ungering, Ansgar P.;Goser, K.
    • 한국지능시스템학회:학술대회논문집
    • /
    • 한국퍼지및지능시스템학회 1993년도 Fifth International Fuzzy Systems Association World Congress 93
    • /
    • pp.1386-1389
    • /
    • 1993
  • We are describing the architecture of a fuzzy logic controller using pulse-width-modulation (PDM) technique and a pipeline structure. Features of this controller are: A new architecture for the inference unit, reduced chip area and less I/O-pins. Additionally we present two different rule-bases: one hardwired with reduced chip-area and the other programmable for prototyping. Also an architecture of a parallel minimum-gate is shown.

  • PDF

Numerical investigation of water-entry characteristics of high-speed parallel projectiles

  • Lu, Lin;Wang, Chen;Li, Qiang;Sahoo, Prasanta K.
    • International Journal of Naval Architecture and Ocean Engineering
    • /
    • 제13권1호
    • /
    • pp.450-465
    • /
    • 2021
  • In this study, an attempt has been made to investigate the water-entry characteristics of the high-speed parallel projectile numerically. The shear stress transport k-𝜔 turbulence model and the Zwart-Gerber-Belamri cavitation model based on the Reynolds-Averaged Navier-Stokes method were used. The grid independent inspection and grid convergence index is carried out and verified. The influences of the parallel water-entry on flow filed characteristics, trajectory stability and drag reduction performance for different values of initial water-entry speed (𝜈0 = 280 m/s, 340 m/s, 400 m/s) and clearance between the parallel projectiles (Lp = 0.5D, 1.0D, 2.0D, 3.0D) are presented and analyzed in detail. Under the condition of the parallel water-entry, it can be found that due to the intense interference between the parallel projectiles, the distribution of cavity is non-uniform and part of the projectile is exposed to water, resulting in the destruction of the cavity structure and the decline of trajectory stability. In addition, the parallel projectile suffers more severe lateral force that separates the two projectiles. The drag reduction performance is impacted and the velocity attenuation is accelerated as the clearance between the parallel projectiles reduces.

Content-Addressable Memory를 이용한 확장 가능한 범용 병렬 Associative Processor 설계 (Design of a scalable general-purpose parallel associative processor using content-addressable memory)

  • 박태근
    • 대한전자공학회논문지SD
    • /
    • 제43권2호
    • /
    • pp.51-59
    • /
    • 2006
  • 일반 컴퓨터에서 중앙처리장치와 메모리 사이의 병목현상인 "Von Neumann Bottleneck"을 보이는데 본 논문에서는 이러한 문제점을 해소하고 검색위주의 응용분야에서 우수한 성능을 보이는 Content-addressable memory(CAM) 기반의 확장 가능한 범용 Associative Processor(AP) 구조를 제안하였다. 본 연구에서는 Associative computing을 효율적으로 수행할 수 있는 명령어 세트를 제안하였으며 다양하고 대용량 응용분야에도 적용할 수 있도록 구조를 확장 가능하게 설계함으로써 유연한 구조를 갖는다. 12 가지의 명령어가 정의되었으며 프로그램이 효율적으로 수행될 수 있도록 명령어 셋을 구성하고 연속된 명령어를 하나의 명령어로 구현함으로써 처리시간을 단축하였다. 제안된 프로세서는 bit-serial, word-parallel로 동작하며 대용량 병렬 SIMD 구조를 갖는 32 비트 범용 병렬 프로세서로 동작한다. 포괄적인 검증을 위하여 명령어 단위의 검증 뿐 아니라 최대/최소 검색, 이상/이하 검색, 병렬 덧셈 등의 기본적인 병렬 알고리즘을 검증하였으며 알고리즘은 처리 데이터의 개수와는 무관한 상수의 복잡도 O(k)를 갖으며 데이터의 비트 수만큼의 이터레이션을 갖는다.