• Title/Summary/Keyword: parallel architecture

Search Result 891, Processing Time 0.022 seconds

A Public Key knapsack Crytosystem Algorithm for Security in Computer Communication (컴퓨터 통신의 안전을 위한 공개키 배낭 암호계 앨고리듬)

  • 이영노;신인철
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.16 no.9
    • /
    • pp.893-900
    • /
    • 1991
  • And this system is compared with past knapsack system by implementation of low density attack in Brickell and Lagarias, Odlyzko’s method. Also the VLSI architecture for parallel implementation of this linearly shift knapsack system is presented

  • PDF

Color Media Instructions for Embedded Parallel Processors (임베디드 병렬 프로세서를 위한 칼라미디어 명령어 구현)

  • Kim, Cheol-Hong;Kim, Jong-Myon
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.35 no.7
    • /
    • pp.305-317
    • /
    • 2008
  • As a mobile computing environment is rapidly changing, increasing user demand for multimedia-over-wireless capabilities on embedded processors places constraints on performance, power, and sire. In this regard, this paper proposes color media instructions (CMI) for single instruction, multiple data (SIMD) parallel processors to meet the computational requirements and cost goals. While existing multimedia extensions store and process 48-bit pixels in a 32-bit register, CMI, which considers that color components are perceptually less significant, supports parallel operations on two-packed compressed 16-bit YCbCr (6 bit Y and 5 bits Cb, Cr) data in a 32-bit datapath processor. This provides greater concurrency and efficiency for YCbCr data processing. Moreover, the ability to reduce data format size reduces system cost. The reduction in data bandwidth also simplifies system design. Experimental results on a representative SIMD parallel processor architecture show that CMI achieves an average speedup of 6.3x over the baseline SIMD parallel processor performance. This is in contrast to MMX (a representative Intel's multimedia extensions), which achieves an average speedup of only 3.7x over the same baseline SIMD architecture. CMI also outperforms MMX in both area efficiency (a 52% increase versus a 13% increase) and energy efficiency (a 50% increase versus an 11% increase). CMI improves the performance and efficiency with a mere 3% increase in the system area and a 5% increase in the system power, while MMX requires a 14% increase in the system area and a 16% increase in the system power.

Modular Multiplier based on Cellular Automata Over $GF(2^m)$ (셀룰라 오토마타를 이용한 $GF(2^m)$ 상의 곱셈기)

  • 이형목;김현성;전준철;유기영
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.31 no.1_2
    • /
    • pp.112-117
    • /
    • 2004
  • In this paper, we propose a suitable multiplication architecture for cellular automata in a finite field $GF(2^m)$. Proposed least significant bit first multiplier is based on irreducible all one Polynomial, and has a latency of (m+1) and a critical path of $ 1-D_{AND}+1-D{XOR}$.Specially it is efficient for implementing VLSI architecture and has potential for use as a basic architecture for division, exponentiation and inverses since it is a parallel structure with regularity and modularity. Moreover our architecture can be used as a basic architecture for well-known public-key information service in $GF(2^m)$ such as Diffie-Hellman key exchange protocol, Digital Signature Algorithm and ElGamal cryptosystem.

Design of a New Bit-serial Multiplier/Divier Architecture (새로운 Bit-serial 방식의 곱셈기 및 나눗셈기 아키텍쳐 설계)

  • 옹수환;선우명훈
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.36C no.3
    • /
    • pp.17-25
    • /
    • 1999
  • This paper proposes a new bit-serial multiplier/divider architecture to reduce the hardware complexity significantly and to maintain the same number of cycles compared with existing architectures. Since the proposed bit-serial multiplier/divider architecture does not extend the number of bits in registers and an adde $r_tractor to calculate a partial product or a partial remainder, the hardware overhead can be greatly reduced. In addition, the proposed architecture can perform an additio $n_traction and a shift operation in parallel and the number of cycles for $\textit{N}$-bit multiplication and division for the proposed circuits is $\textit{N}$ and $\textit{N}$ + 2, repectively. Thus, the number of cycles for multiplication and division is the same compared with existing architectures. The SliM Image Processor employs the proposed multiplier/divider architecture and proves the performance of the proposed architecture.cture.

  • PDF

Low-Power Block Filtering Architecture for Digital IF Down Sampler and Up Sampler (디지털 IF 다운 샘플러와 업 샘플러의 저전력 블록 필터링 아키텍처)

  • 장영범;김낙명
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.25 no.5A
    • /
    • pp.743-750
    • /
    • 2000
  • In this paper, low-power block filtering architecture for digital If down sampler and up sampler is proposed. Software radio technology requires low power and cost effective digital If down and up sampler. Digital If down sampler and up sampler are accompanied with decimation filter and interpolation filter, respectively. In the proposed down sampler architecture, it is shown that the parallel and low-speed processing architecture can be produced by cancellation of inherent up sampler of block filter and down sampler. Proposed up sampler also utilizes cancellation of up sampler and inherent down sampler of block filtering structure. The proposed architecture is compared with the conventional polyphase architecture.

  • PDF

Parallel SystemC Cosimulation using Virtual Synchronization (가상 동기화 기법을 이용한 SystemC 통합시뮬레이션의 병렬 수행)

  • Yi, Young-Min;Kwon, Seong-Nam;Ha, Soon-Hoi
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.33 no.12
    • /
    • pp.867-879
    • /
    • 2006
  • This paper concerns fast and time accurate HW/SW cosimulation for MPSoC(Multi-Processor System-on-chip) architecture where multiple software and/or hardware components exist. It is becoming more and more common to use MPSoC architecture to design complex embedded systems. In cosimulation of such architecture, as the number of the component simulators participating in the cosimulation increases, the time synchronization overhead among simulators increases, thereby resulting in low overall cosimulation performance. Although SystemC cosimulation frameworks show high cosimulation performance, it is in inverse proportion to the number of simulators. In this paper, we extend the novel technique, called virtual synchronization, which boosts cosimulation speed by reducing time synchronization overhead: (1) SystemC simulation is supported seamlessly in the virtual synchronization framework without requiring the modification on SystemC kernel (2) Parallel execution of component simulators with virtual synchronization is supported. We compared the performance and accuracy of the proposed parallel SystemC cosimulation framework with MaxSim, a well-known commercial SystemC cosimulation framework, and the proposed one showed 11 times faster performance for H.263 decoder example, while the accuracy was maintained below 5%.

A Design of 4×4 Block Parallel Interpolation Motion Compensation Architecture for 4K UHD H.264/AVC Decoder (4K UHD급 H.264/AVC 복호화기를 위한 4×4 블록 병렬 보간 움직임보상기 아키텍처 설계)

  • Lee, Kyung-Ho;Kong, Jin-Hyeung
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.50 no.5
    • /
    • pp.102-111
    • /
    • 2013
  • In this paper, we proposed a $4{\times}4$ block parallel architecture of interpolation for high-performance H.264/AVC Motion Compensation in 4K UHD($3840{\times}2160$) video real time processing. To improve throughput, we design $4{\times}4$ block parallel interpolation. For supplying the $9{\times}9$ reference data for interpolation, we design 2D cache buffer which consists of the $9{\times}9$ memory arrays. We minimize redundant storage of the reference pixel by applying the Search Area Stripe Reuse scheme(SASR), and implement high-speed plane interpolator with 3-stage pipeline(Horizontal Vertical 1/2 interpolation, Diagonal 1/2 interpolation, 1/4 interpolation). The proposed architecture was simulated in 0.13um standard cell library. The maximum operation frequency is 150MHz. The gate count is 161Kgates. The proposed H.264/AVC Motion Compensation can support 4K UHD at 72 frames per second by running at 150MHz.

Design of a Graphic Processor for Multimedia Data Processing (멀티미디어 데이타 처리를 위한 그래픽 프로세서 설계)

  • 고익상;한우종;선우명동
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.36C no.10
    • /
    • pp.56-65
    • /
    • 1999
  • This paper presents an architecture and its instruction set for a graphic coprocessor(GCP) which can be used for a multimedia server. The proposed instruction set employs parallel architecture concepts, such as SIMD and Superscalar. GCP consists of a scheduler and four functional units. The scheduler solves an instruction bottleneck problem causing by sharing with four general processors(GPs). GCP can execute up to 4 instructions in parallel. It consists of about 56,000 gates and operates at 30 MHz clock frequency due to speed limitation of SOG technology. GCP meets the real-time DCT algorithm requirement of the CIF image format and can process up to 63 frames/sec for the DCT Algorithm and 21 frames/sec for the Full Block matching Algorithm of the CIF image format.

  • PDF

A Study on the 3 Dimension Graphics Accelerator for Phong Shading Algorithm (Phong Shading 알고리즘을 적용한 3차원 영상을 위한 고속 그래픽스 가속기 연구)

  • Park, Youn-Ok;Park, Jong-Won
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.10 no.5
    • /
    • pp.97-103
    • /
    • 2010
  • There are many algorithms for 2D to 3D graphic conversion technology which have the high complexity and large scale of iterative computation. So in this paper propose parallel algorithm and high speed graphics accelerator architecture using Park's MAMS(Multiple Access Memory System) for Phong Shading, one of many 3D algorithms. The Proposed SIMD processor architecture is simulated by HDL and simulated and got 30 times faster result. It means any kinds of 3D algorithm can make parallel algorithm and accelerated by SIMD processor with Park's MAMS for real time processing.