• Title/Summary/Keyword: radix-4 algorithm

Search Result 89, Processing Time 0.019 seconds

A Low-area and Low-power 512-point Pipelined FFT Design Using Radix-24-23 for OFDM Applications

  • Yu, Jian;Cho, Kyung-Ju
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.11 no.5
    • /
    • pp.475-480
    • /
    • 2018
  • In OFDM-based systems, FFT is a critical component since it occupies large area and consumes more power. In this paper, we present a low hardware-cost and low power 512-point pipelined FFT design method for OFDM applications. To reduce the number of twiddle factors and to choose simple design architecture, the radix-$2^4-2^3$ algorithm are exploited. For twiddle factor multiplication, we propose a new canonical signed digit (CSD) complex multiplier design method to minimize the hardware-cost. In hardware implementation with Intel FPGA, the proposed FFT design achieves more than about 28% reduction in gate count and 18% reduction in power consumption compared to the previous approaches.

An Efficient Array Algorithm for VLSI Implementation of Vector-radix 2-D Fast Discrete Cosine Transform (Vector-radix 2차원 고속 DCT의 VLSI 구현을 위한 효율적인 어레이 알고리듬)

  • 신경욱;전흥우;강용섬
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.18 no.12
    • /
    • pp.1970-1982
    • /
    • 1993
  • This paper describes an efficient array algorithm for parallel computation of vector-radix two-dimensional (2-D) fast discrete cosine transform (VR-FCT), and its VLSI implementation. By mapping the 2-D VR-FCT onto a 2-D array of processing elements (PEs), the butterfly structure of the VR-FCT can be efficiently importanted with high concurrency and local communication geometry. The proposed array algorithm features architectural modularity, regularity and locality, so that it is very suitable for VLSI realization. Also, no transposition memory is required, which is invitable in the conventional row-column decomposition approach. It has the time complexity of O(N+Nnzp-log2N) for (N*N) 2-D DCT, where Nnzd is the number of non-zero digits in canonic-signed digit(CSD) code, By adopting the CSD arithmetic in circuit desine, the number of addition is reduced by about 30%, as compared to the 2`s complement arithmetic. The computational accuracy analysis for finite wordlength processing is presented. From simulation result, it is estimated that (8*8) 2-D DCT (with Nnzp=4) can be computed in about 0.88 sec at 50 MHz clock frequency, resulting in the throughput rate of about 72 Mega pixels per second.

  • PDF

FFT/IFFT IP Generator for OFDM Modems (OFDM 모뎀용 FFT/IFFT IP 자동 생성기)

  • Lee Jin-Woo;Shin Kyung-Wook;Kim Jong-Whan;Baek Young-Seok;Eo Ik-Soo
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.31 no.3A
    • /
    • pp.368-376
    • /
    • 2006
  • This paper describes a Fcore_GenSim(Parameterized FFT Core Generation & Simulation Program), which can be used as an essential If(Intellectual Property) in various OFDM modem designs. The Fcore_Gensim is composed of two parts, a parameterized core generator(PFFT_CoreGen) that generates Verilog-HDL models of FFT cores, and a fixed-point FFT simulator(FXP_FFTSim) which can be used to estimate the SQNR performance of the generated cores. The parameters that can be specified for core generation are FFT length in the range of 64 ~2048-point and word-lengths of input/output/internal/twiddle data in the range of 8-b "24-b with 2-b step. Total 43,659 FFT cores can be generated by Fcore_Gensim. In addition, CBFP(Convergent Block Floating Point) scaling can be optionally specified. To achieve an optimized hardware and SQNR performance of the generated core, a hybrid structure of R2SDF and R2SDC stages and a hybrid algorithm of radix-2, radix-2/4, radix-2/4/8 are adopted according to FFT length and CBFP scaling.

Asynchronous 16bit Multiplier with micropipelined structure (마이크로파이프라인 구조의 16bit 비동기 곱셈기)

  • 장미숙;이유진;김학윤;이우석;최호용
    • Proceedings of the IEEK Conference
    • /
    • 2000.06b
    • /
    • pp.145-148
    • /
    • 2000
  • A 16bit asynchronous multiplier has been designed using micropipelind structure with 2 phase and data bundling. And 4-radix modified Booth algorithm, CPlatch(Cature-Pass latch) and modified 4-2 counters have adopted in this design. It is implemented in 0.65$\mu\textrm{m}$ double-poly/double-metal CMOS technology by using 12,074 transistors with core size of 1.4${\times}$1.8$\textrm{mm}^2$. And our design results in a computation rate 55MHz a supply voltage of 3.3V.

  • PDF

A Study On the Design of a Floating Point Unit for MPEG-2 AAC Decoder (MPEG-2 AAC 복호기를 위한 부동소수점유닛 설계에 관한 연구)

  • 구대성;김필중;김종빈
    • Journal of the Institute of Electronics Engineers of Korea TE
    • /
    • v.39 no.4
    • /
    • pp.355-355
    • /
    • 2002
  • In this paper, we designed a FPU(floating point unit) that it is very important and requires of high density when digital audio is designed. Almost audio system must support the multi-channel and required for high quality. A floating point arithmetic function in MPEG-2 AAC that implemented by hardware is able to realtime decoding when DSP realization. The reason is that MPEG-2 AAC is compatible to the Audio field of MPEG-4 and afterwards. We designed a FPU by hardware to increase the speed of a floating point unit with much calculation part in the MPEG-2 AAC Decoder. A FPU is composed of a multiplier and an adder. A multiplier used the Radix-4 Booth algorithm and an adder adopted 1's complement method for speed up. A form of a floating point unit has 8bit of exponent part and 24bit of mantissa. It's compatible with the IEEE single precision format and adopted a pipeline architecture to increase the speed of a processor. All of sub blocks are based on ISO/IEC 13818-7 standard. The algorithm is tested by C language and the design does by use of VHDL(VHSIC Hardware Description Language). The maximum operation speed is 23.2MHz and the stable operation speed is 19MHz.

A Generator of 64~8,192-point FFT/IFFT Cores with Single-memory Architecture for OFDM-based Communication Systems (OFDM 기반 통신 시스템용 단일 메모리 구조의 64~8,192점 FFI/IFFFT 코어 생성기)

  • Yeem, Chang-Wan;Jeon, Heung-Woo;Shin, Kyung-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.14 no.1
    • /
    • pp.205-212
    • /
    • 2010
  • This paper describes a core generator (FCore_Gen) which generates Verilog-HDL models of 640 different FFT/IFFT cores with selected parameter value for OFDM-based communication systems. The generated FFT/IFFT cores are based on in-place single memory architecture and use a hybrid structure of radix-4 and radix-2 DIF algorithm to accommodate various FFT lengths. To achieve both memory reduction and the improved SQNR, a conditional scaling technique is adopted, which conditionally scales the intermediate results of each computational stage. The cores synthesized with a $0.35-{\mu}m$ CMOS standard cell library can operate with 75-MHz@3.3-V, and a 8,192-point FFT can be computed m $762.7-{\mu}s$, thus the cores satisfy the specifications of wireless LAN, DMB, and DVB systems.

VLSI Design of a 2048 Point FFT/IFFT by Sequential Data Processing for Digital Audio Broadcasting System (순차적 데이터 처리방식을 이용한 디지틀 오디오 방송용 2048 Point FFT/IFFT의 VLSI 설계)

  • Choe, Jun-Rim
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.39 no.5
    • /
    • pp.65-73
    • /
    • 2002
  • In this paper, we propose and verify an implementation method for a single-chip 2048 complex point FFT/IFFT in terms of sequential data processing. For the sequential processing of 2048 complex data, buffers to store the input data are necessary. Therefore, DRAM-like pipelined commutator architecture is used as a buffer. The proposed structure brings about the 60% chip size reduction compared with conventional approach by using this design method. The 16-point FFT is a basic building block of the entire FFT chip, and the 2048-point FFT consists of the cascaded blocks with five stages of radix-4 and one stage of radix-2. Since each stage requires rounding of the resulting bits while maintaining the proper S/N ratio, the convergent block floating point (CBFP) algorithm is used for the effective internal bit rounding and their method contributed to a single chip design of digital audio broadcasting system.

Design of a Floating-Point Divider for IEEE 754-1985 Single-Precision Operations (IEEE 754-1985 단정도 부동 소수점 연산용 나눗셈기 설계)

  • Park, Ann-Soo;Chung, Tea-Sang
    • Proceedings of the KIEE Conference
    • /
    • 2001.11c
    • /
    • pp.165-168
    • /
    • 2001
  • This paper presents a design of a divide unit supporting IEEE-754 floating point standard single-precision with 32-bit word length. Its functions have been verified with ALTERA MAX PLUS II tool. For a high-speed division operation, the radix-4 non-restoring algorithm has been applied and CLA(carry-look -ahead) adders has been used in order to improve the area efficiency and the speed of performance for the fraction division part. The prevention of the speed decrement of operations due to clocking has been achieved by taking advantage of combinational logic. A quotient select block which is very complicated and significant in the high-radix part was designed by using P-D plot in order to select the fast and accurate quotient. Also, we designed all division steps with Gate-level which visualize the operations and delay time.

  • PDF

Design of serial pipeline SRFFT for OFDM system (OFDM시스템에 적합한 Serial Pipeline 방식의 SRFFT 설계)

  • 정진일;임재형;조용범
    • Proceedings of the IEEK Conference
    • /
    • 2002.06b
    • /
    • pp.153-156
    • /
    • 2002
  • FFT/IFFT block is very important module to determine the performance of OFDM system. This block has been implemented using several FFT algorithms such as radix-2, radix-4 etc. However SRFFT algorithm has not been implemented because of the complexity for implementation. This paper proposes a serial-pipeline SRFfT for OFDM system. The serial-pipeline SRFFT is optimized to use a serial input and serial output. We have implemented the SRFFT block using anam 0.25 Um five-metal process. The simulation show that the SRFFT block can operate about 200MHz. This architecture could be adapted to IEEE 802.lla wireless LAN standard.

  • PDF

Trends of Low-Precision Processing for AI Processor (NPU 반도체를 위한 저정밀도 데이터 타입 개발 동향)

  • Kim, H.J.;Han, J.H.;Kwon, Y.S.
    • Electronics and Telecommunications Trends
    • /
    • v.37 no.1
    • /
    • pp.53-62
    • /
    • 2022
  • With increasing size of transformer-based neural networks, a light-weight algorithm and efficient AI accelerator has been developed to train these huge networks in practical design time. In this article, we present a survey of state-of-the-art research on the low-precision computational algorithms especially for floating-point formats and their hardware accelerator. We describe the trends by focusing on the work of two leading research groups-IBM and Seoul National University-which have deep knowledge in both AI algorithm and hardware architecture. For the low-precision algorithm, we summarize two efficient floating-point formats (hybrid FP8 and radix-4 FP4) with accuracy-preserving algorithms for training on the main research stream. Moreover, we describe the AI processor architecture supporting the low-bit mixed precision computing unit including the integer engine.