• Title/Summary/Keyword: memory-efficient design

Search Result 308, Processing Time 0.022 seconds

Efficient Utilization of Burst Data Transfers of DMA (직접 메모리 접근 장치에서 버스트 데이터 전송 기능의 효과적인 활용)

  • Lee, Jongwon;Cho, Doosan;Paek, Yunheung
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.8 no.5
    • /
    • pp.255-264
    • /
    • 2013
  • Resolving of memory access latency is one of the most important problems in modern embedded system design. Recently, tons of studies are presented to reduce and hide the access latency. Burst/page data transfer modes are representative hardware techniques for achieving such purpose. The burst data transfer capability offers an average access time reduction of more than 65 percent for an eight-word sequential transfer. However, solution of utilizing such burst data transfer to improve memory performance has not been accomplished at commercial level. Therefore, this paper presents a new technique that provides the maximum utilization of burst transfer for memory accesses with local variables in code by reorganizing variables placement.

A Design of Memory-efficient 2k/8k FFT/IFFT Processor using R4SDF/R4SDC Hybrid Structure (R4SDF/R4SDC Hybrid 구조를 이용한 메모리 효율적인 2k/8k FFT/IFFT 프로세서 설계)

  • 신경욱
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.8 no.2
    • /
    • pp.430-439
    • /
    • 2004
  • This paper describes a design of 8192/2048-point FFT/IFFT processor (CFFT8k2k), which performs multi-carrier modulation/demodulation in OFDM-based DVB-T receiver. Since a large size FFT requires a large buffer memory, two design techniques are considered to achieve memory-efficient implementation of 8192-point FFT/IFFT. A hybrid structure, which is composed of radix-4 single-path delay feedback (R4SDF) and radix-4 single-path delay commutator (R4SDC), reduces its memory by 20% compared to R4SDC structure. In addition, a memory reduction of about 24% is achieved by a novel two-step convergent block floating-point scaling. As a result, it requires only 57% of memory used in conventional design, reducing chip area and power consumption. The CFFT8k2k core is designed in Verilog-HDL, and has about 102,000 Bates, RAM of 292k bits, and ROM of 39k bits. Using gate-level netlist with SDF which is synthesized using a $0.25-{\um}m$ CMOS library, timing simulation show that it can safely operate with 50-MHz clock at 2.5-V supply, resulting that a 8192-point FFT/IFFT can be computed every 164-${\mu}\textrm{s}$. The functionality of the core is fully verified by FPGA implementation, and the average SQNR of 60-㏈ is achieved.

Memory Reduction Method of DIT-based IFFT Bit-Reversal (DIT 기반 IFFT의 Bit-Reversal 메모리 감소 기법)

  • Kim, Jun-Ho;Piao, Zheyan;Cho, Kyung-Ju;Chung, Jin-Gyun
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.52 no.5
    • /
    • pp.66-73
    • /
    • 2015
  • IFFT is one of the key components in OFDM-based communication systems. In this paper, we propose a new memory efficient IFFT design method for OFDM-based communication systems, based on a mapping of three IFFT input signals which consist of modulated data, pilot and null signals. The proposed method focuses on reducing the memory size in the bit-reversal block which requires the largest number of memory cells in IFFT architectures. To reduce the memory size, we propose a selection mapping method based on decimation-in-time (DIT) algorithm. It is shown that the proposed method achieves a memory reduction of about 50% compared to conventional methods.

DESIGN OF A HIGH-THROUGHPUT VITERBI DECODER (고속 전송을 위한 비터비 디코더 설계)

  • Kim, Tae-Jin;Lee, Chan-Ho
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.30 no.2A
    • /
    • pp.20-25
    • /
    • 2005
  • A high performance Viterbi decoder is designed using modified register exchange scheme and block decoding method. The elimination of the trace-back operation reduces the operation cycles to determine the merging state and the amount of memory. The Viterbi decoder has low latency, efficient memory organization, and low hardware complexity compared with other Viterbi decoding methods in block decoding architectures. The elimination of trace-back also reduces the power consumption for finding the merging state and the access to the memory. The proposed decoder can be designed with emphasis on either efficient memory or low latency. Also, it has a scalable structure so that the complexity of the hardware and the throughput are adjusted by changing a few design parameters before synthesis.

Design Methodology of LDPC Codes based on Partial Parallel Algorithm (부분병렬 알고리즘 기반의 LDPC 부호 구현 방안)

  • Jung, Ji-Won
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.4 no.4
    • /
    • pp.278-285
    • /
    • 2011
  • This paper makes an analysis of the encoding structure and the decoding algorithm proposed by the DVB-S2 specification. The methods of implementing the LDPC decoder are fully serial decoder, the partially parallel decoder and the fully parallel decoder. The partial parallel scheme is the efficient selection to achieve appropriate trade-offs between hardware complexity and decoding speed. Therefore, this paper proposed an efficient memory structure for check node update block, bit node update block, and LLR memory.

MultiRing An Efficient Hardware Accelerator for Design Rule Checking (멀티링 설계규칙검사를 위한 효과적인 하드웨어 가속기)

  • 노길수;경종민
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.24 no.6
    • /
    • pp.1040-1048
    • /
    • 1987
  • We propose a hardware architecture called Multiring which is applicable for various geometrical operations on rectilinear objects such as design rule checking in VLSI layout and many image processing operations including noise suppression and coutour extraction. It has both a fast execution speed and extremely high flexibility. The whole architecture is mainly divided into four parts` I/O between host and Multiring, ring memory, linear processor array and instruction decoder. Data transmission between host and Multiring is bit serial thereby reducing the bandwidth requirement for teh channel and the number of external pins, while each row data in the bit map stored in ring memory is processed in the corresponding processor in full parallelism. Each processor is simultaneously configured by the instruction decoder/controller to perform one of the 16 basic instructions such as Boolean (AND, OR, NOT, and Copy), geometrical(Expand and Shrink), and I/O operations each ring cycle, which gives Multiring maximal flexibility in terms of design rule change or the instruction set enhancement. Correct functional behavior of Multiring was confirmed by successfully running a software simulator having one-to-one structural correspondence to the Multiring hardware.

  • PDF

Inductive Switching Noise Suppression Technique for Mixed-Signal ICs Using Standard CMOS Digital Technology

  • Im, Hyungjin;Kim, Ki Hyuk
    • Journal of information and communication convergence engineering
    • /
    • v.14 no.4
    • /
    • pp.268-271
    • /
    • 2016
  • An efficient inductive switching noise suppression technique for mixed-signal integrated circuits (ICs) using standard CMOS digital technology is proposed. The proposed design technique uses a parallel RC circuit, which provides a damping path for the switching noise. The proposed design technique is used for designing a mixed-signal circuit composed of a ring oscillator, a digital output buffer, and an analog noise sensor node for $0.13-{\mu}m$ CMOS digital IC technology. Simulation results show a 47% reduction in the on-chip inductive switching noise coupling from the noisy digital to the analog blocks in the same substrate without an additional propagation delay. The increased power consumption due to the damping resistor is only 67% of that of the conventional source damping technique. This design can be widely used for any kind of analog and high frequency digital mixed-signal circuits in CMOS technology

A Memory-efficient Partially Parallel LDPC Decoder for CMMB Standard (메모리 사용을 최적화한 부분 병렬화 구조의 CMMB 표준 지원 LDPC 복호기 설계)

  • Park, Joo-Yul;Lee, So-Jin;Chung, Ki-Seok;Cho, Seong-Min;Ha, Jin-Seok;Song, Yong-Ho
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.48 no.1
    • /
    • pp.22-30
    • /
    • 2011
  • In this paper, we propose a memory efficient multi-rate Low Density Parity Check (LDPC) decoder for China Mobile Multimedia Broadcasting (CMMB). We find the best trade-off between the performance and the circuit area by designing a partially parallel decoder which is capable of passing multiple messages in parallel. By designing an efficient address generation unit (AGU) with an index matrix, we could reduce both the amount of memory requirement and the complexity of computation. The proposed regular LDPC decoder was designed in Verilog HDL and was synthesized by Synopsys' Design Compiler using Chartered $0.18{\mu}m$ CMOS cell library. The synthesized design has the gate size of 455K (in NAND2). For the two code rates supported by CMMB, the rate-1/2 decoder has a throughput of 14.32 Mbps, and the rate-3/4 decoder has a throughput of 26.97 Mbps. Compared with a conventional LDPC for CMMB, our proposed design requires only 0.39% of the memory.

Design of Downlink Beamforming Transmitter in OFDMA/ TDD system (OFDMA/TDD 시스템의 하향링크 빔형성 송신기 설계)

  • Park Hyeong-Sook;Park Youn-Ok;Kim Cheol-Sung
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.31 no.5A
    • /
    • pp.493-500
    • /
    • 2006
  • This paper presents the efficient structure and parameter optimization of downlink beamforming transmitter in OFDMA/TDD system. To design downlink beamforming transmitter for multiple transmit antennas, an efficient beamforming structure for multiple users and the choice of word-length of each block are critical in the aspect of its performance and hardware complexity. We propose an efficient beamforming scheme, which stores the weights of subcarriers into memory without user identification at the receiver of base station and calculates the weights for corresponding user in a subcarrier unit of IFFT input at high speed. Also, we obtain the word-length of main data path and other design parameters by fixed-point simulation analysis. The proposed architecture could reduce the memory size proportional to the maximum number of users per frame, and the processing time of an OFDM symbol at the receiver of base station without the need of additional processing time for calculating the weights at the transmitter.