Browse > Article
http://dx.doi.org/10.3745/KIPSTA.2005.12A.3.215

High-Performance FFT Using Data Reorganization  

Park Neungsoo (건국대학교 컴퓨터공학부)
Choi Yungho (건국대학교 전기공학과)
Abstract
The efficient utilization of cache memories is a key factor in achieving high performance for computing large signal transforms. Nonunit stride access in computation of large DFTs causes cache conflict misses, thereby resulting in poor cache performance. It leads to a severe degradation in overall performance. In this paper, we propose a dynamic data layout approach considering the memory hierarchy system. In our approach, data reorganization is performed between computation stages to reduce the number of cache misses. Also, we develop an efficient search algorithm to determine the optimal tree with the minimum execution time among possible factorization trees considering the size of DFTs and the data access stride. Our approach is applied to compute the fast Fourier Transform (FFT). Experiments were performed on Pentium 4, $Athlon^{TM}$ 64, Alpha 21264, UtraSPARC III. Experiment results show that our FFT achieve performance improvement of up to 3.37 times better than the previous FFT packages.
Keywords
Dynamic Data Layout; Cache; Cache Miss; FFT; Memory Hierarchy;
Citations & Related Records
연도 인용수 순위
  • Reference
1 S. Egner, 'Zur Algorithmischen Zerlegungstheorie Linearer Transformationen mit Symmetire,' Ph.D. Thesis, Universitat Karlsruhe, 1997
2 M. Frigo and S. G. Johnson, 'FFTW: An Adaptive Software Architecture for the FFT,' International Conference on Acoustics, Speech, and Signal Processing 1998 (ICASSP 1998), 3, 1998   DOI
3 K. S. Gatlin and L. Carter, 'Faster FFTs via Architecture Cognizance,' International Conference on Parallel Architectures and Compilation Techniques (PACT2000), Oct., 2000   DOI
4 R. Tolimieri, M. An, and C. Lu, 'Algorithms for Discrete Fourier Transforms and Convolution,' Springer, 1997
5 D. Mirkovic, R. Mahassom, and L. Johnsson, 'An Adaptive Software Library for Fast Fourier Transforms,' Proceedings of the 2000 International Conference on Supercomputing, May, 2000
6 N. Prak, B. Hong, and V. K. Prasanna, 'Analysis of Memory Hierarchy Performance of Block Data Layout,' Proceedings of the 2002 International Conference on Parallel Processing (JCPP 2002), Aug., 2002   DOI
7 M. Linderman and R. Linderman, 'Real-Time STAP Demonstration on an Embedded High-Performance Computer,' National Radar Conference, 1997   DOI
8 N. Prak and V. K. Prasanna, 'Cache Conscious Walsh-Hadamard Transform,' International Conference on Acoustics, Speech, and Signal Processing 2001 (ICASSP 2001), May, 2001   DOI
9 G. Haentjens, 'An Investigation of Recursive FFT Implementations,' Master's Thesis, Dept. of Electrical and Computer Engineering, Canegie Mellon University, 2000
10 J. Johnson and M. Piischel, 'In Search of the Optimal Walsh-Hadamard Transform,' International Conference on Acoustics, Speech, and Signal Processing 2000 (JCASSP 2000), June, 2000   DOI
11 W. Liu and V. K. Prasanna, 'Utilizing the Power of High-Performance Computing,' IEEE Signal Processing, September, 1998   DOI   ScienceOn
12 V. VanLoan, 'Computational Frameworks for the Fast Fourier Transform,' Frontieres in Applied Mathmetics, Vol. 10, SIAM, 1992
13 A. Ailamaki, D. DeWitt, M. D. Hill, M. Skounakis, 'Weaving Relations for Cache Performance,' in Proc. 27th International Conference Very Large Data Base, 2001
14 D. H. Bailey, 'Unfavorable Strides in Cache Memory Systems,' Scientific Programming, 1995