DOI QR코드

DOI QR Code

Acceleration of FFT on a SIMD Processor

SIMD 구조를 갖는 프로세서에서 FFT 연산 가속화

  • Lee, Juyeong (Dept. of Electronics and Communications Engineering, Kwangwoon Univ.) ;
  • Hong, Yong-Guen (Dept. of Electronics and Communications Engineering, Kwangwoon Univ.) ;
  • Lee, Hyunseok (Dept. of Electronics and Communications Engineering, Kwangwoon Univ.)
  • 이주영 (광운대학교 전자통신공학과) ;
  • 홍용근 (광운대학교 전자통신공학과) ;
  • 이현석 (광운대학교 전자통신공학과)
  • Received : 2014.07.31
  • Accepted : 2015.02.04
  • Published : 2015.02.25

Abstract

This paper discusses the implementation of Bruun's FFT on a SIMD processor. FFT is an algorithm used in digital signal processing area and its effective processing is important in the enhancement of signal processing performance. Bruun's FFT algorithm is one of fast Fourier transform algorithms based on recursive factorization. Compared to popular Cooley-Tukey algorithm, it is advantageous in computations because most of its operations are based on real number multiplications instead of complex ones. However it shows more complicated data alignment patterns and requires a larger memory for storing coefficient data in its implementation on a SIMD processor. According to our experiment result, in the processing of the FFT with 1024 complex input data on a SIMD processor, The Bruun's algorithm shows approximately 1.2 times higher throughput but uses approximately 4 times more memory (20 Kbyte) than the Cooley-Tukey algorithm. Therefore, in the case with loose constraints on silicon area, the Bruun's algorithm is proper for the processing of FFT on a SIMD processor.

이 논문은 SIMD 구조를 갖는 프로세서에서 FFT 연산을 효과적으로 처리하는 방법에 대한 것이다. FFT는 디지털 신호처리 분야에서 널리 사용되는 범용 알고리즘으로 이의 효과적인 처리는 성능 향상에 있어서 매우 중요하다. Bruun 알고리즘은 반복적인 인수분해를 통해 구현되는 FFT 알고리즘으로, 널리 사용되는 Cooley-Tukey 알고리즘에 비해 복소수 곱셈이 아닌 실수 곱셈으로 대부분의 동작을 수행하는 장점을 가지고 있으나, SIMD 프로세서에서 구현하는 데는 벡터 데이터의 정렬 형태가 복잡하고 연산에 필요한 계수들을 저장할 메모리를 더 필요로 하는 단점이 있다. 실험 결과에 따르면 길이 1024인 FFT 연산을 SIMD 프로세서에서 수행하는데 있어서 Bruun 알고리즘은 Cooley-Tukey 알고리즘에 비해서 약 1.2배의 더 높은 처리성능을 보이지만, 약 4 배 더 큰 데이터 메모리를 필요로 한다. 따라서 데이터 메모리에 대한 제약이 큰 경우가 아니라면 SIMD 프로세서에서 Bruun 알고리즘이 FFT 연산에 적합하다.

Keywords

References

  1. James W. Cooley and John W. Tukey, An Algorithm for the Machine Calculation of Complex Fourier Series, Mathematics of computation 19.90, pp.297-301, 1965. https://doi.org/10.1090/S0025-5718-1965-0178586-1
  2. S. C. Chan and K. L. Ho, On Indexing the Prime Factor Fast Fourier Transform Algorithm, IEEE Transactions on Circuits and Systems, Vol. 38, No, 8, pp.951-953, 1991. https://doi.org/10.1109/31.85638
  3. Georg Bruun, z-Transform DFT Filters and FFT's, IEEE Transactions on Acoustics, Speech, And Signal Processing, Vol. 26, NO. 1, February, 1978.
  4. Rader, C.M., Discrete Fourier transforms when the number of data samples is prime, IEEE, Proceedings letters, No. 56, pp.1107-1108, 1968.
  5. Wang Xu, Zhang Yan and Ding Shunying, A High Performance FFT Library with Single Instruction Multiple Data(SIMD) Architecture, IEEE, International Conference on ICECC, pp.630-633, September, 2011.
  6. Ting Chen, Hengzhu Liu and Botao Zhang, A scalable, fixed-shuffling, parallel FFT butterfly processing architecture for SDR environment, IEICE Electronics Express, Vol.11, No.2, pp.1-9, 2014.
  7. T. Chen, X. Pan, H. Liu and T. Wu, Rapid Prototype and Implementation of a High-Throughput and Flexible FFT ASIP Based on LISA 2.0, IEEE, 15th International Symposium on ISQED, 2014.
  8. F. Yu, R. GE and Z. Wang, Efficient Utilization of Vector Registers to Improve FFT Performance on SIMD Microprocessors, IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, Vol.E960A, No.7, July, 2013.
  9. Mittal Shashank. Efficient and High-Speed FFT Architectures for Software Defined Radio, Master Thesis. International Institute of Information Technology Hyderabad, INDIA, 2009.
  10. Yuhang Wu, New FFT Structures Based on the Bruun Algorithm, IEEE Transactions On Acoustics, Speech. And Signal Processing, Vol. 38. No. 1, pp.188-191, January, 1990. https://doi.org/10.1109/29.45572
  11. Harold S. Stone, Parallel processing with the perfect shuffle, IEEE Transactions on Computers, Vol. 20, No. 2 pp.153-161, 1971.
  12. Mittal, S., Area Efficient High Speed Architecture of Bruun's FFT for Software Defined Radio, IEEE, GLOBECOM '07, Global Telecommunications Conference, 2007.
  13. C. Antonio, SSim - A Simple Discrete-Event Simulation Library(2012), Retrieved Febuary, 2012, from http://www.inf.usi.ch/carzaniga/ssim/index.html
  14. Sehoon Yoo, A Reconfigurable Parallel Processor for Efficient Processing of Mobile Multimedia, Journal of the Institute of Electronics Engineers of Korea SD, Vol. 44, No. 10, pp.23-32, 2007.
  15. Kyeong-Seob Kim, Yun-Sub Lee, Byung-Cheol Yu, Control Unit Design and Implementation for SIMD Programmable Unified Shader, Journal of the Institute of Electronics Engineers of Korea SD, Vol. 48, No. 7, pp.37-47, 2011.
  16. Hillery C. Hunter, A new look at exploiting data parallelism in embedded systems, CASES '03 Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems, pp.159-169, 2003.