• Title/Summary/Keyword: parallel architecture

Search Result 888, Processing Time 0.027 seconds

Study of Parallel Network Processor using Global Cache (글로벌 캐시를 이용한 네트워크 병렬 프로세서 구조 연구)

  • Park, Jae-Won;Chung, Won-Young;Kim, Hyun-Pil;Lee, Jung-Hee;Lee, Yong-Surk
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.36 no.1B
    • /
    • pp.80-85
    • /
    • 2011
  • The mount of network traffic from the Internet is increasing because of the use of Broadband Convergence Networks(BcN). Network traffic is also increasing because of the development of application, especially multimedia traffic from IPTV, VOD, and online games. This multimedia traffic not only has a huge payload but also should be considered a threat in real time. For this reason, this study examines the ways that routers distribute the bandwidth in accordance to traffic properties. To classify the property of the traffic, it is essential to analyze the application layer. However, the general network processor architecture serially processes the L2-4 and L7 layer. We propose a novel parallel network processor architecture with a global cache that processes L2-4 and L7 in parallel. To verify the proposed architecture, we simulated both of the architecture with SystemC. EEMBC and SNORT was used to measure L2-4 and L7 processing time. When multimedia traffic was entered into the network processor in the same flow, the proposed architecture showed about 85% higher performance than general architecture.

Parallel Connected Component Labeling Based on the Selective Four Directional Label Search Using CUDA

  • Soh, Young-Sung;Hong, Jung-Woo
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.16 no.3
    • /
    • pp.83-89
    • /
    • 2015
  • Connected component labeling (CCL) is a mandatory step in image segmentation where objects are extracted and uniquely labeled. CCL is a computationally expensive operation and thus is often done in parallel processing framework to reduce execution time. Various parallel CCL methods have been proposed in the literature. Among them are NSZ label equivalence (NSZ-LE) method, modified 8 directional label selection (M8DLS) method, HYBRID1 method, and HYBRID2 method. Soh et al. showed that HYBRID2 outperforms the others and is the best so far. In this paper we propose a new hybrid parallel CCL algorithm termed as HYBRID3 that combines selective four directional label search (S4DLS) with label backtracking (LB). We show that the average percentage speedup of the proposed over M8DLS is around 60% more than that of HYBRID2 over M8DLS for various kinds of images.

Efficient Parallel Block-layered Nonbinary Quasi-cyclic Low-density Parity-check Decoding on a GPU

  • Thi, Huyen Pham;Lee, Hanho
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.6 no.3
    • /
    • pp.210-219
    • /
    • 2017
  • This paper proposes a modified min-max algorithm (MMMA) for nonbinary quasi-cyclic low-density parity-check (NB-QC-LDPC) codes and an efficient parallel block-layered decoder architecture corresponding to the algorithm on a graphics processing unit (GPU) platform. The algorithm removes multiplications over the Galois field (GF) in the merger step to reduce decoding latency without any performance loss. The decoding implementation on a GPU for NB-QC-LDPC codes achieves improvements in both flexibility and scalability. To perform the decoding on the GPU, data and memory structures suitable for parallel computing are designed. The implementation results for NB-QC-LDPC codes over GF(32) and GF(64) demonstrate that the parallel block-layered decoding on a GPU accelerates the decoding process to provide a faster decoding runtime, and obtains a higher coding gain under a low $10^{-10}$ bit error rate and low $10^{-7}$ frame error rate, compared to existing methods.

A design of synchronous nonlinear and parallel for pipeline stage on IP-based H.264 decoder implementation (IP기반 H.264 디코더 설계를 위한 동기식 비선형 및 병렬화 파이프라인 설계)

  • Ko, Byung-Soo;Kong, Jin-Hyeung
    • Proceedings of the IEEK Conference
    • /
    • 2008.06a
    • /
    • pp.409-410
    • /
    • 2008
  • This paper presents nonlinear and parallel design for synchronous pipelining in IP-based H.264 decoder implementation. Since H.264 decoder includes the dataflow of feedback loop, the data dependency requires one NOP stage per pipelining latency to drop the throughput into 1/2. Further, it is found that, in execution time, the stage scheduled for MC is more occupied than that for CAVLD/ITQ/DF. The less efficient stage would be improved by nonlinear scheduling, while the fully-utilized stage could be accelerated by parallel scheduling of IP. The optimization yields 3 nonlinear {CAVLD&ITQ}|3 parallel (MC/IP&Rec.)| 3 nonlinear {DF} pipelined architecture for IP-based H.264 decoder. In experiments, the nonlinear and parallel pipelined H.264 decoder, including existing IPs, could deal with full HD video at 41.86MHz, in real time processing.

  • PDF

Inverse Kinematic and Dynamic Analyses of 6-DOF PUS Type parallel Manipulators

  • Kim, Jong-Phil;Jeha Ryu
    • Journal of Mechanical Science and Technology
    • /
    • v.16 no.1
    • /
    • pp.13-23
    • /
    • 2002
  • This paper presents inverse kinematic and dynamic analyses of HexaSlide type six degree-of-freedom parallel manipulators. The HexaSlide type parallel manipulators (HSM) can be characterized as an architecture with constant link lengths that are attached to moving sliders on the ground and to a mobile platform. In the inverse kinematic analyses, the slider and link motion (position, velocity, and acceleration) is computed given the desired mobile platform motion. Based on the inverse kinematic analysis, in order to compute the required actuator forces given the desired platform motion, inverse dynamic equations of motion of a parallel manipulator is derived by the Newton-Euler approach. In this derivation, the joint friction as well as all link inertia are included. Relative importance of the link inertia and joint frictions on the computed torque is investigated by computer simulations. It is expected that the inverse kinematic and dynamic equations can be used in the computed torque control and model-based adaptive control strategies.

Low-Complexity Triple-Error-Correcting Parallel BCH Decoder

  • Yeon, Jaewoong;Yang, Seung-Jun;Kim, Cheolho;Lee, Hanho
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.13 no.5
    • /
    • pp.465-472
    • /
    • 2013
  • This paper presents a low-complexity triple-error-correcting parallel Bose-Chaudhuri-Hocquenghem (BCH) decoder architecture and its efficient design techniques. A novel modified step-by-step (m-SBS) decoding algorithm, which significantly reduces computational complexity, is proposed for the parallel BCH decoder. In addition, a determinant calculator and a error locator are proposed to reduce hardware complexity. Specifically, a sharing syndrome factor calculator and a self-error detection scheme are proposed. The multi-channel multi-parallel BCH decoder using the proposed m-SBS algorithm and design techniques have considerably less hardware complexity and latency than those using a conventional algorithms. For a 16-channel 4-parallel (1020, 990) BCH decoder over GF($2^{12}$), the proposed design can lead to a reduction in complexity of at least 23 % compared to conventional architecttures.

병렬분산 환경에서의 DEVS형식론의 시뮬레이션

  • Seong, Yeong-Rak;Jung, Sung-Hun;Kon, Tag-Gon;Park, Kyu-Ho-
    • Proceedings of the Korea Society for Simulation Conference
    • /
    • 1992.10a
    • /
    • pp.5-5
    • /
    • 1992
  • The DEVS(discrete event system specification) formalism describes a discrete event system in a hierarchical, modular form. DEVSIM++ is C++ based general purpose DEVS abstract simulator which can simulate systems to be modeled by the DEVS formalism in a sequential environment. We implement P-DEVSIM++ which is a parallel version of DEVSIM++. In P-DEVSIM++, the external and internal event of models can be processed in parallel. To process in parallel, we introduce a hierarchical distributed simulation technique and some optimistic distributed simulation techniques. But in our algorithm, the rollback of a model is localized itself in contrast to the Time Warp approach. To evaluate its performance, we simulate a single bus multiprocessor architecture system with an external common memory. Simulation result shows that significant speedup is made possible with our algorithm in a parallel environment.

  • PDF

An Implementation of the DEVS Formalism on a Parallel Distributed Environment (병렬 분산 환경에서의 DEVS 형식론의 구현)

  • 성영락
    • Journal of the Korea Society for Simulation
    • /
    • v.1 no.1
    • /
    • pp.64-76
    • /
    • 1992
  • The DEVS(discrete event system specificaition) formalism specifies a discrete event system in a hierarchical, modular form. DEVSIM++ is a C++based general purpose DEVS abstract simulator which can simulate systems modeled by the DEVS formalism in a sequential environment. This paper describes P-DEVSIM++which is a parallel version of DEVSIM++ . In P-DEVSIM++, the external and internal event of DEVS models can by processed in parallel. For such processing, we propose a parallel, distributed optimistic simulation algorithm based on the Time Warp approach. However, the proposed algorithm localizes the rollback of a model within itself, not possible in the standard Time Warp approach. An advantage of such localization is that the simulation time may be reduced. To evaluate its performance, we simulate a single bus multiprocessor architecture system with an external common memory. Simulation result shows that significant speedup is made possible with our algorithm in a parallel environment.

  • PDF

High-Performance Variable-Length Reed-Solomon Decoder Architecture for Gigabit WPAN Applications (기가비트 WPAN용 고성능 가변길이 리드-솔로몬 복호기 구조)

  • Choi, Chang-Seok;Lee, Han-Ho
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.49 no.1
    • /
    • pp.25-34
    • /
    • 2012
  • This paper presents a universal architecture for variable-length eight-parallel Reed-Solomon (RS) decoder for high-rate WPAN systems. The proposed architecture can support not only RS(255,239) code but various shortened RS codes. Moreover, variable-length architecture provides variable low latency for various shortened RS codes and the eight-parallel design also provides high data processing rate. Using 90-$nm$ CMOS standard cell technology, the proposed RS decoder has been synthesized and measured for performance. The proposed RS decoder can provide a maximum 19-$Gbps$ data rate at clock frequency 300 $MHz$.