• Title/Summary/Keyword: 병렬처리 회로

Search Result 512, Processing Time 0.027 seconds

Fault Tolerant Static Shuffle-Exchange Network (결함 포용 정적 Shuffle-Exchange 네트워크)

  • Choi Hong In
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.30 no.3_4
    • /
    • pp.160-167
    • /
    • 2003
  • A static shuffle-exchange network is not only useful for several parallel applications but also use less hardware than the popular multi-stage network or hypercube. Even though it has a lot of advantages, it has never been used in any implemented parallel machine. One of the reasons is there has not been any techniques to make the network fault-tolerant. In this paper multiple fault-tolerant static shuffle-exchange networks are presented. In order to recover from k faulty processing elements, a network needs at least 2 k additional processing elements and at most 4 k additional shuffle ports for each processing elements. By decomposing the k fault-tolerant static shuffle-exchange network into m identical modules, this paper shows that the reliability of the network can be increased.

Fault-Tolerant Design of Array Systems Using Multichip Modules (다중칩을 이용한 어레이시스템의 결함허용 설계)

  • Kim, Sung-Soo
    • The Transactions of the Korea Information Processing Society
    • /
    • v.6 no.12
    • /
    • pp.3662-3674
    • /
    • 1999
  • This paper addresses some design issues for establishing the optimal number of spare units in array systems manufactured using fault-tolerant multichip modules(MCM's) for massively parallel computing(MPC). We propose a new quantitative approach to an optimal cost-effective MCM system design under yield and reliability constraints. In the proposed approach, we analyze the effect of residual redundancy on operational reliability of fault-tolerant MCM's. In particular, the issues of imperfect support circuitry, chip assembly yield and array topology are investigated. Extensive parametric results for the analysis are provided to show that our scheme can be applied to design ways using MCM's for MPC applications more efficiently, subject to yield and reliability constraints.

  • PDF

Optical CBC Block Encryption Method using Free Space Parallel Processing of XOR Operations (XOR 연산의 자유 공간 병렬 처리를 이용한 광학적 CBC 블록 암호화 기법)

  • Gil, Sang Keun
    • Korean Journal of Optics and Photonics
    • /
    • v.24 no.5
    • /
    • pp.262-270
    • /
    • 2013
  • In this paper, we propose a modified optical CBC(Cipher Block Chaining) encryption method using optical XOR logic operations. The proposed method is optically implemented by using dual encoding and a free-space interconnected optical logic gate technique in order to process XOR operations in parallel. Also, we suggest a CBC encryption/decryption optical module which can be fabricated with simple optical architecture. The proposed method makes it possible to encrypt and decrypt vast two-dimensional data very quickly due to the fast optical parallel processing property, and provides more security strength than the conventional electronic CBC algorithm because of the longer security key with the two-dimensional array. Computer simulations show that the proposed method is very effective in CBC encryption processing and can be applied to even ECB(Electronic Code Book) mode and CFB(Cipher Feedback Block) mode.

Automatic Optimization Methods for Image Processing Programs Using OpenCL (OpenCL을 이용한 이미지 처리 프로그램의 자동 최적화 방법)

  • Shin, Jaeho;Jo, Gangwon;Lee, Ilkoo;Lee, Jaejin
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.3
    • /
    • pp.188-193
    • /
    • 2017
  • In this paper, we propose automatic OpenCL optimization techniques that offer the best performance for image processing programs on any hardware system. Developers should seek a proper way of parallelization and an appropriate work-group size for the architecture of target compute devices to achieve the best performance. However, testing potential devices to find them is both time-consuming and costly. Our techniques automatically set up hardware-optimized parallelization and find a suitable work-group size for the target device. Furthermore, using OpenCL does not always provide better performance in image processing. Hence, we also propose a way to automatically search for a threshold image size to allow image processing programs to decide whether or not to use OpenCL. Our findings demonstrate that out techniques improve the image processing performance significantly.

Parallel Range Query Processing with R-tree on Multi-GPUs (다중 GPU를 이용한 R-tree의 병렬 범위 질의 처리 기법)

  • Ryu, Hongsu;Kim, Mincheol;Choi, Wonik
    • Journal of KIISE
    • /
    • v.42 no.4
    • /
    • pp.522-529
    • /
    • 2015
  • Ever since the R-tree was proposed to index multi-dimensional data, many efforts have been made to improve its query performances. One common trend to improve query performance is to parallelize query processing with the use of multi-core architectures. To this end, a GPU-base R-tree has been recently proposed. However, even though a GPU-based R-tree can exhibit an improvement in query performance, it is limited in its ability to handle large volumes of data because GPUs have limited physical memory. To address this problem, we propose MGR-tree (Multi-GPU R-tree), which can manage large volumes of data by dividing nodes into multiple GPUs. Our experiments show that MGR-tree is up to 9.1 times faster than a sequential search on a GPU and up to 1.6 times faster than a conventional GPU-based R-tree.

An Advanced Parallel Join Algorithm for Managing Data Skew on Hypercube Systems (하이퍼큐브 시스템에서 데이타 비대칭성을 고려한 향상된 병렬 결합 알고리즘)

  • 원영선;홍만표
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.30 no.3_4
    • /
    • pp.117-129
    • /
    • 2003
  • In this paper, we propose advanced parallel join algorithm to efficiently process join operation on hypercube systems. This algorithm uses a broadcasting method in processing relation R which is compatible with hypercube structure. Hence, we can present optimized parallel join algorithm for that hypercube structure. The proposed algorithm has a complete solution of two essential problems - load balancing problem and data skew problem - in parallelization of join operation. In order to solve these problems, we made good use of the characteristics of clustering effect in the algorithm. As a result of this, performance is improved on the whole system than existing algorithms. Moreover. new algorithm has an advantage that can implement non-equijoin operation easily which is difficult to be implemented in hash based algorithm. Finally, according to the cost model analysis. this algorithm showed better performance than existing parallel join algorithms.

Design of Contention Free Parallel MAP Decode Module (메모리 경합이 없는 병렬 MAP 복호 모듈 설계)

  • Chung, Jae-Hun;Rim, Chong-Suck
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.48 no.1
    • /
    • pp.39-49
    • /
    • 2011
  • Turbo code needs long decoding time because of iterative decoding. To communicate with high speed, we have to shorten decoding time and it is possible with parallel process. But memory contention can cause from parallel process, and it reduces performance of decoder. To avoid memory contention, QPP interleaver is proposed in 2006. In this paper, we propose MDF method which is fit to QPP interleaver, and has relatively short decoding time and reduced logic. And introduce the design of MAP decode module using MDF method. Designed decoder is targetted to FPGA of Xilinx, and its throughput is 80Mbps maximum.

Design of Morphological Filter for Image Processing (영상처리용 Morphological Filter의 하드웨어 설계)

  • 문성용;김종교
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.17 no.10
    • /
    • pp.1109-1116
    • /
    • 1992
  • Mathematical morphology, theoretical foundation for morphological filter, is very efficient for the analysis of the geometrical characteristics of signals and systems and is used as a predominant tool for smoothing the data with noise. In this study, H/W design of morphological filter is implemented to process the gray scale dilation and the erosion in the same circuit and to choose the maximum and minimum value by a selector, resulting in their education of the complexity of the circuit and an architecture for parallel processing. The structure of morphological filter consists of the structuring-element block, the image data block, the control block, the ADD block, the MIN/MAX block, etc, and is designed on an one-chip for real time operation.

  • PDF

A Direct Digital Frequency Synthesizer Using A Low Power Pipelined Parallel Accumulator (저전력 파이프라인 병렬 누적기를 사용한 직접 디지털 주파수 합성기)

  • 양병도;김이섭
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.40 no.5
    • /
    • pp.361-368
    • /
    • 2003
  • A new high-speed direct digital frequency synthesizer using a low power pipelined parallel accumulator is proposed. The proposed pipelined parallel accumulator uses both pipelining and paralleling techniques to increase speed and to reduce power consumption. The 2-pipelined 2-parallel accumulator only consumes 66% and 69% power of the 4-pipelined accumulator and the 4-parallel accumulator respectively with the same throughput. The proposed accumulator can achieve higher throughput with smaller area and less power consumption in lower clock frequency. All circuit simulations and implementations are based on a 0.35um CMOS process with VCC = 3.3V.

An FPGA Implementation of Parallel Hardware Architecture for the Real-time Window-based Image Processing (실시간 윈도우 기반 영상 처리를 위한 병렬 하드웨어 구조의 FPGA 구현)

  • Jin S.H.;Cho J.U.;Kwon K.H.;Jeon J.W.
    • The KIPS Transactions:PartB
    • /
    • v.13B no.3 s.106
    • /
    • pp.223-230
    • /
    • 2006
  • A window-based image processing is an elementary part of image processing area. Because window-based image processing is computationally intensive and data intensive, it is hard to perform ail of the operations of a window-based image processing in real-time by using a software program on general-purpose computers. This paper proposes a parallel hardware architecture that can perform a window-based image processing in real-time using FPGA(Field Programmable Gate Array). A dynamic threshold circuit and a local histogram equalization circuit of the proposed architecture are designed using VHDL(VHSIC Hardware Description Language) and implemented with an FPGA. The performances of both implementations are measured.