• Title/Summary/Keyword: verilog HDL

Search Result 416, Processing Time 0.026 seconds

Hardware Design of High Performance CAVLC Encoder (H.264/AVC를 위한 고성능 CAVLC 부호화기 하드웨어 설계)

  • Lee, Yang-Bok;Ryoo, Kwang-Ki
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.49 no.3
    • /
    • pp.21-29
    • /
    • 2012
  • This paper presents optimized searching technique to improve the performance of H.264/AVC. By using the proposed forward and backward searching algorithm, redundant cycles of latency for data reordering can be removed. Furthermore, in order to reduce the total number of execution cycles of CAVLC encoder, early termination mode and two stage pipelined architecture are proposed. The experimental result shows that the proposed architecture needs only 36.0 cycles on average for each $16{\times}16$ macroblock encoding. The proposed architecture improves the performance by 57.8% than that of previous designs. The proposed CAVLC encoder was implemented using Verilog HDL and synthesized with Magnachip $0.18{\mu}m$ standard cell library. The synthesis result shows that the gate count is about 17K with 125Mhz clock frequency.

A New Asynchronous Pipeline Architecture for CISC type Embedded Micro-Controller, A8051 (CISC 임베디드 컨트롤러를 위한 새로운 비동기 파이프라인 아키텍쳐, A8051)

  • 이제훈;조경록
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.40 no.4
    • /
    • pp.85-94
    • /
    • 2003
  • The asynchronous design methods proved to have the higher performance in power consumption and execution speed than synchronous ones because it just needs to activate the required module without feeding clock in the system. Despite the advantage of CISC machine providing the variable addressing modes and instructions, its execution scheme is hardly suited for a synchronous Pipeline architecture and incurs a lot of overhead. This paper proposes a novel asynchronous pipeline architecture, A80sl, whose instruction set is fully compatible with that of Intel 80C51, an embedded micro controller. We classify the instructions into the group keeping the same execution scheme for the asynchronous pipeline and optimize it eliminating the bubble stage that comes from the overhead of the multi-cycle execution. The new methodologies for branch and various instruction lengths are suggested to minimize the number of states required for instructions execution and to increase its parallelism. The proposed A80C51 architecture is synthesized with 0.35${\mu}{\textrm}{m}$ CMOS standard cell library. The simulation results show higher speed than that of Intel 80C51 with 36 MHz and other asynchronous counterparts by 24 times.

A Hardware Design of Feature Detector for Realtime Processing of SIFT(Scale Invariant Feature Transform) Algorithm in Embedded Systems (임베디드 환경에서 SIFT 알고리즘의 실시간 처리를 위한 특징점 검출기의 하드웨어 구현)

  • Park, Chan-Il;Lee, Su-Hyun;Jeong, Yong-Jin
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.46 no.3
    • /
    • pp.86-95
    • /
    • 2009
  • SIFT is an algorithm to extract vectors at pixels around keypoints, in which the pixel colors are very different from neighbors, such as vertices and edges of an object. The SIFT algorithm is being actively researched for various image processing applications including 3D image reconstructions and intelligent vision system for robots. In this paper, we implement a hardware to sift feature detection algorithm for real time processing in embedded systems. We estimate that the hardware implementation give a performance 25ms of $1,280{\times}960$ image and 5ms of $640{\times}480$ image at 100MHz. And the implemented hardware consumes 45,792 LUTs(85%) with Synplify 8.li synthesis tool.

New Parallel MDC FFT Processor for Low Computation Complexity (연산복잡도 감소를 위한 새로운 8-병렬 MDC FFT 프로세서)

  • Kim, Moon Gi;Sunwoo, Myung Hoon
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.52 no.3
    • /
    • pp.75-81
    • /
    • 2015
  • This paper proposed the new eight-parallel MDC FFT processor using the eight-parallel MDC architecture and the efficient scheduling scheme. The proposed FFT processor supports the 256-point FFT based on the modified radix-$2^6$ FFT algorithm. The proposed scheduling scheme can reduce the number of complex multipliers from eight to six without increasing delay buffers and computation cycles. Moreover, the proposed FFT processor can be used in OFDM systems required high throughput and low hardware complexity. The proposed FFT processor has been designed and implemented with a 90nm CMOS technology. The experimental result shows that the area of the proposed FFT processor is $0.27mm^2$. Furthermore, the proposed eight-parallel MDC FFT processor can achieve the throughput rate up to 2.7 GSample/s at 388MHz.

The Hardware Design and Implementation of a New Ultra Lightweight Block Cipher (새로운 초경량 블록 암호의 하드웨어 설계 및 구현)

  • Gookyi Dennis, A.N.;Park, Seungyong;Ryoo, Kwangki
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.53 no.10
    • /
    • pp.103-108
    • /
    • 2016
  • With the growing trend of pervasive computing, (the idea that technology is moving beyond personal computers to everyday devices) there is a growing demand for lightweight ciphers to safeguard data in a network that is always available. For all block cipher applications, the AES is the preferred choice. However, devices used in pervasive computing have extremely constraint environment and as such the AES will not be suitable. In this paper we design and implement a new lightweight compact block cipher that takes advantage of both S-P network and the Feistel structure. The cipher uses the S-box of PRESENT algorithm and a key dependent one stage omega permutation network is used as the cipher's P-box. The cipher is implemented on iNEXT-V6 board equipped with virtex-6 FPGA. The design synthesized to 196 slices at 337 MHz maximum clock frequency.

Efficient Scheduling Schemes for Low-Area Mixed-radix MDC FFT Processor (저면적 Mixed-radix MDC FFT 프로세서를 위한 효율적인 스케줄링 기법)

  • Jang, Jeong Keun;Sunwoo, Myung Hoon
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.54 no.7
    • /
    • pp.29-35
    • /
    • 2017
  • This paper presents a high-throughput area-efficient mixed-radix fast Fourier transform (FFT) processor using the efficient scheduling schemes. The proposed FFT processor can support 64, 128, 256, and 512-point FFTs for orthogonal frequency division multiplexing (OFDM) systems, and can achieve a high throughput using mixed-radix algorithm and eight-parallel multipath delay commutator (MDC) architecture. This paper proposes new scheduling schemes to reduce the size of read-only memories (ROMs) and complex constant multipliers without increasing delay elements and computation cycles; thus, reducing the hardware complexity further. The proposed mixed-radix MDC FFT processor is designed and implemented using the Samsung 65nm complementary metal-oxide semiconductor (CMOS) technology. The experimental result shows that the area of the proposed FFT processor is 0.36 mm2. Furthermore, the proposed processor can achieve high throughput rates of up to 2.64 GSample/s at 330 MHz.

A Design of AES-based CCMP Core for IEEE 802.11i Wireless LAN Security (IEEE 802.11i 무선 랜 보안을 위한 AES 기반 CCMP Core 설계)

  • Hwang Seok-Ki;Lee Jin-Woo;Kim Chay-Hyeun;Song You-Soo;Shin Kyung-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.9 no.4
    • /
    • pp.798-803
    • /
    • 2005
  • This paper describes a design of AES(Advanced Encryption Standard)-based CCMP core for IEEE 802.1li wireless LAN security. To maximize its performance, two AES cores ate used, one is for counter mode for data confidentiality and the other is for CBC(Cipher Block Chaining) mode for authentication and data integrity. The S-box that requires the largest hardware in AES core is implemented using composite field arithmetic, and the gate count is reduced by about $20\%$ compared with conventional LUT(Lookup Table)-based design. The CCMP core designed in Verilog-HDL has 13,360 gates, and the estimated throughput is about 168 Mbps at 54-MHz clock frequency. The functionality of the CCMP core is verified by Excalibur SoC implementation.

Performance analysis and hardware design of LDPC Decoder for WiMAX using INMS algorithm (INMS 복호 알고리듬을 적용한 WiMAX용 LDPC 복호기의 성능분석 및 하드웨어 설계)

  • Seo, Jin-Ho;Shin, Kyung-Wook
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2012.10a
    • /
    • pp.229-232
    • /
    • 2012
  • This paper describes performance evaluation using fixed-point Matlab modeling and simulation, and hardware design of LDPC decoder which is based on Improved Normalized Min-Sum(INMS) decoding algorithm. The designed LDPC decoder supports 19 block lengths(576~2304) and 6 code rates(1/2, 2/3A, 2/3B, 3/4A, 3/4B, 5/6) of IEEE 802.16e mobile WiMAX standard. Considering hardware complexity, it is designed using a block-serial(partially parallel) architecture which is based on layered decoding scheme. A DFU based on sign-magnitude arithmetic is adopted to minimize hardware area. Hardware design is optimized by using INMS decoding algorithm whose performance is better than min-sum algorithm.

  • PDF

Hardware Design and Implementation of Joint Viterbi Detection and Decoding Algorithm for Bluetooth Low Energy Systems (블루투스 저전력 시스템을 위한 저복잡도 결합 비터비 검출 및 복호 알고리즘의 하드웨어 설계 및 구현)

  • Park, Chul-hyun;Jung, Yongchul;Jung, Yunho
    • Journal of IKEEE
    • /
    • v.24 no.3
    • /
    • pp.838-844
    • /
    • 2020
  • In this paper, we propose an efficient Viterbi processor using Joint Viterbi detection and decoding (JVDD) algorithm for a for bluetooth low energy (BLE) system. Since the convolutional coded Gaussian minimum-shift keying (GMSK) signal is specified in the BLE 5.0 standard, two Viterbi processors are needed for detection and decoding. However, the proposed JVDD scheme uses only one Viterbi processor by modifying the branch metric with inter-symbol interference information from GMSK modulation; therefore, the hardware complexity can be significantly reduced without performance degradation. Low-latency and low-complexity hardware architecture for the proposed JVDD algorithm was proposed, which makes Viterbi decoding completed within one clock cycle. Viterbi Processor RTL synthesis results on a GF55nm process show that the gate count is 12K and the memory unit and the initial latency is reduced by 33% compared to the modified state exchange (MSE).

Code Rate 1/2, 2304-b LDPC Decoder for IEEE 802.16e WiMAX (IEEE 802.16e WiMAX용 부호율 1/2, 2304-비트 LDPC 복호기)

  • Kim, Hae-Ju;Shin, Kyung-Wook
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.36 no.4A
    • /
    • pp.414-422
    • /
    • 2011
  • This paper describes a design of low-density parity-check(LDPC) decoder supporting block length 2,304-bit and code rate 1/2 of IEEE 802.16e mobile WiMAX standard. The designed LDPC decoder employs the min-sum algorithm and partially parallel layered-decoding architecture which processes a sub-matrix of $96{\times}96$ in parallel. By exploiting the properties of the min-sum algorithm, a new memory reduction technique is proposed, which reduces check node memory by 46% compared to conventional method. Functional verification results show that it has average bit-error-rate(BER) of $4.34{\times}10^{-5}$ for AWGN channel with Fb/No=2.1dB. Our LDPC decoder synthesized with a $0.18{\mu}m$ CMOS cell library has 174,181 gates and 52,992 bits memory, and the estimated throughput is about 417 Mbps at 100-MHz@l.8-V.