• Title/Summary/Keyword: pipelining

Search Result 141, Processing Time 0.025 seconds

Design and Analysis of MPEG-2 MP@HL Decoder in Multi-Processor Environments

  • Yoo, Seung-Hwan;Lee, Hyun-Seung;Lee, Sang-Jo;Park, Rae-Hong;Kim, Do-Hyung
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2009.01a
    • /
    • pp.211-216
    • /
    • 2009
  • As demands for high-definition television (HDTV) increase, the implementation of real-time decoding of high-definition (HD) video becomes an important issue. The data size for HD video is so large that real-time processing of the data is difficult to implement, especially with software. In order to implement a fast moving picture expert group-2 decoder for HDTV, we compose five scenarios that use parallel processing techniques such as data decomposition, task decomposition, and pipelining. Assuming the multi digital signal processor environments, we analyze each scenario in three aspects: decoding speed, L1 memory size, and bandwidth. By comparing the scenarios, we decide the most suitable cases for different situations. We simulate the scenarios in the dual-core and dual-central processing unit environment by using OpenMP and analyze the simulation results.

  • PDF

Compact Hardware Multiple Input Multiple Output Channel Emulator for Wireless Local Area Network 802.11ac

  • Khai, Lam Duc;Tien, Tran Van
    • Journal of information and communication convergence engineering
    • /
    • v.18 no.1
    • /
    • pp.1-7
    • /
    • 2020
  • This paper proposes a fast-processing and low-cost hardware multiple input multiple output (MIMO) channel emulator. The channel emulator is an important component of hardware-based simulation systems. The novelty of this work is the use of sharing and pipelining functions to reduce hardware resource utilization while maintaining a high sample rate. In our proposed emulator, the samples are created sequentially and interpolated to ensure the sample rate is equal to the base band rate. The proposed 4 × 4 MIMO requires low-cost hardware resource so that it can be implemented on a single field-programmable gate array (FPGA) chip. An implementation on Xilinx Virtex-7 VX980T was found to occupy 10.47% of the available configurable slice registers and 12.58% of the FPGA's slice lookup tables. The maximum frequency of the proposed emulator is 758.064 MHz, so up to 560 different paths can be processed simultaneously to generate 560 × 758 million × 2 × 32 bit complex-valued fading samples per second.

음성인식용 DTW PE의 IC화를 위한 ADD 및 ABS 회로의 설계

  • 정광재;문홍진;최규훈;김종교
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.15 no.8
    • /
    • pp.648-658
    • /
    • 1990
  • There are many methods for speed up counting in speech recongition. A multiple processing method is the one way to achieve the aim using systolic array. This arithmetic operation by the array is achieved pipelining skill. And the operation is multiprocessing by processing element(PE) that is incresing counting efficiencies. The DTW PE cell is seperated into three large blocks. "MIN" is the one block for counting accumulated minimum distance, "ADD" block calculated these minimum distances, and "ABS" seeks for the absolut values to the total sum of local distances. We have accomplished circuit design and verification about the "ADD" and "ABS" blocks, and performed total layout '||'&'||' DRC(design rule check) using 3um CMOS N-Well rule base.le check) using 3$\mu$m CMOS N-Well rule base.

  • PDF

Preprocessing Methods for Effective Modulo Scheduling on High Performance DSPs (고성능 디지털 신호 처리 프로세서상에서 효율적인 모듈로 스케쥴링을 위한 전처리 기법)

  • Cho, Doo-San;Paek, Yun-Heung
    • Journal of KIISE:Software and Applications
    • /
    • v.34 no.5
    • /
    • pp.487-501
    • /
    • 2007
  • To achieve high resource utilization for multi-issue DSPs, production compiler commonly includes variants of iterative modulo scheduling algorithm. However, excessive cyclic data dependences, which exist in communication and media processing loops, unduly restrict modulo scheduling freedom. As a result, replicated functional units in multi-issue DSPs are often under-utilized. To address this resource under-utilization problem, our paper describes a novel compiler preprocessing strategy for effective modulo scheduling. The preprocessing strategy proposed capitalizes on two new transformations, which are referred to as cloning and dismantling. Our preprocessing strategy has been validated by an implementation for StarCore SC140 DSP compiler.

FPGA Implementation of an FDTrS/DF Signal Detector for High-density DVD System (고밀도 DVD 시스템을 위한 FDTrS/DF 신호 검출기의 FPGA 구현)

  • 정조훈
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.25 no.10B
    • /
    • pp.1732-1743
    • /
    • 2000
  • In this paper a fixed-delay trellis search with decision feedback (FDTrS/DF) for high-density DVD systems (4.7-15GB) is proposed and implemented with FPGA. The proposed FDTrS/DF is derived by transforming the binary tree search structure into trellis search structure implying that FDTrS/DF performs better than the singnal detection techniques based on tree search structure such as FDTS/DF and SSD/DF. Advantages of FDTrS/DF are significant reductions in hardware complexity due to the unique structure of FDTrS composed of only one trellis stage requiring no traceback procedure usually implemented in the Viterbi detector. Also in this paper the PDFS/DF and SSD/DF orginally proposed for high-density magnetic recording systems are modified for the DVD system and compared with the proposed FDTrS/DF. In order to increase speed in the FPGA implementation the pipelining technique and absolute branch metric (instead of square branch metric) are applied. The proposed FDTrS/DF is shown to provide the best performance among various signal detection techniques such as PRML, DFE, FDTS/DF and SSD/DF even with a small hardware complexity.

  • PDF

A Direct Digital Frequency Synthesizer Using A Low Power Pipelined Parallel Accumulator (저전력 파이프라인 병렬 누적기를 사용한 직접 디지털 주파수 합성기)

  • 양병도;김이섭
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.40 no.5
    • /
    • pp.361-368
    • /
    • 2003
  • A new high-speed direct digital frequency synthesizer using a low power pipelined parallel accumulator is proposed. The proposed pipelined parallel accumulator uses both pipelining and paralleling techniques to increase speed and to reduce power consumption. The 2-pipelined 2-parallel accumulator only consumes 66% and 69% power of the 4-pipelined accumulator and the 4-parallel accumulator respectively with the same throughput. The proposed accumulator can achieve higher throughput with smaller area and less power consumption in lower clock frequency. All circuit simulations and implementations are based on a 0.35um CMOS process with VCC = 3.3V.

Hardware Implementation of Genetic Algorithm and Its Analysis (유전알고리즘의 하드웨어 구현 및 실험과 분석)

  • Dong, Sung-Soo;Lee, Chong-Ho
    • 전자공학회논문지 IE
    • /
    • v.46 no.2
    • /
    • pp.7-10
    • /
    • 2009
  • This paper presents the implementation of libraries of hardware modules for genetic algorithm using VHDL. Evolvable hardware refers to hardware that can change its architecture and behavior dynamically and autonomously by interacting with its environment. So, it is especially suited to applications where no hardware specifications can be given in advance. Evolvable hardware is based on the idea of combining reconfigurable hardware device with evolutionary computation, such as genetic algorithm. Because of parallel, no function call overhead and pipelining, a hardware genetic algorithm give speedup over a software genetic algorithm. This paper suggests the hardware genetic algorithm for evolvable embedded system chip. That includes simulation results and analysis for several fitness functions. It can be seen that our design works well for the three examples.

High Performance Routing Engine for an Advanced Input-Queued Switch Fabric (고속 입력 큐 스위치를 위한 고성능 라우팅엔진)

  • Jeong, Gab-Joong;Lee, Bhum-Cheol
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2002.05a
    • /
    • pp.264-267
    • /
    • 2002
  • This paper presents the design of a pipelined virtual output queue routing engine for an advanced input-queued ATM switch, which has a serial cross bar structure. The proposed routing engine has been designed for wire-speed routing with a pipelined buffer management. It provides the tolerance of requests and grants data transmission latency between the routing engine and central arbiter using a new request control method that is based on a high-speed shifter. The designed routing engine has been implemented in a field programmable gate array (FPGA) chip with a 77MHz operating frequency, 16$\times$16 switch size, and 2.5Gbps/port speed.

  • PDF

Hardware Implementation of Genetic Algorithm for Evolvable Hardware (진화하드웨어 구현을 위한 유전알고리즘 설계)

  • Dong, Sung-Soo;Lee, Chong-Ho
    • 전자공학회논문지 IE
    • /
    • v.45 no.4
    • /
    • pp.27-32
    • /
    • 2008
  • This paper presents the implementation of simple genetic algorithm using hardware description language for evolvable hardware embedded system. Evolvable hardware refers to hardware that can change its architecture and behavior dynamically and autonomously by interacting with its environment. So, it is especially suited to applications where no hardware specifications can be given in advance. Evolvable hardware is based on the idea of combining reconfigurable hardware device with evolutionary computation, such as genetic algorithm. Because of parallel, no function call overhead and pipelining, a hardware genetic algorithm give speedup over a software genetic algorithm. This paper suggests the hardware genetic algorithm for evolvable embedded system chip. That includes simulation results for several fitness functions.

Hardware Design with Efficient Pipelining for High-throughput AES (높은 처리량을 가지는 AES를 위한 효율적인 파이프라인을 적용한 하드웨어 설계)

  • Antwi, Alexander O.A;Ryoo, Kwangki
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2017.10a
    • /
    • pp.578-580
    • /
    • 2017
  • IoT technology poses a lot of security threats. Various algorithms are thus employed in ensuring security of transactions between IoT devices. Advanced Encryption Standard (AES) has gained huge popularity among many other symmetric key algorithms due to its robustness till date. This paper presents a hardware based implementation of the AES algorithm. We present a four-stage pipelined architecture of the encryption and key generation. This method allowed a total plain text size of 512 bits to be encrypted in 46 cycles. The proposed hardware design achieved a maximum frequency of 1.18GHz yielding a throughput of 13Gbps and 800MHz yielding a throughput of 8.9Gbps on the 65nm and 180nm processes respectively.

  • PDF