Search | Korea Science

High-Performance FFT Using Data Reorganization (데이터 재구성 기법을 이용한 고성능 FFT)

Park Neungsoo;Choi Yungho
- The KIPS Transactions:PartA
- /
- v.12A no.3 s.93
- /
- pp.215-222
- /
- 2005
The efficient utilization of cache memories is a key factor in achieving high performance for computing large signal transforms. Nonunit stride access in computation of large DFTs causes cache conflict misses, thereby resulting in poor cache performance. It leads to a severe degradation in overall performance. In this paper, we propose a dynamic data layout approach considering the memory hierarchy system. In our approach, data reorganization is performed between computation stages to reduce the number of cache misses. Also, we develop an efficient search algorithm to determine the optimal tree with the minimum execution time among possible factorization trees considering the size of DFTs and the data access stride. Our approach is applied to compute the fast Fourier Transform (FFT). Experiments were performed on Pentium 4, $Athlon^{TM}$ 64, Alpha 21264, UtraSPARC III. Experiment results show that our FFT achieve performance improvement of up to 3.37 times better than the previous FFT packages.
https://doi.org/10.3745/KIPSTA.2005.12A.3.215 인용 PDF KSCI

Energy-Efficient Signal Processing Using FPGAs (FPGA 상에서 에너지 효율이 높은 병렬 신호처리 기법)

Jang Ju-wook;Hwang Yunil;Scrofano Ronald;Prasanna Viktor K.
- The KIPS Transactions:PartA
- /
- v.12A no.4 s.94
- /
- pp.305-312
- /
- 2005
In this paper, we present algorithm-level techniques for energy-efficient design at the algorithm level using FPGAs. We then use these techniques to create energy-efficient designs for two signal processing kernel applications: fast Fourier transform(FFT) and matrix multiplication. We evaluate the performance, in terms of both latency and energy efficiency, of FPGAs in performing these tasks. Using a Xilinx Virtex-II as the target FPGA, we compare the performance of our designs to those from the Xilinx library as well as to conventional algorithms run on the PowerPC core embedded in the Virtex-II Pro and the Texas Instruments TMS320C6415. Our evaluations are done both through estimation based on energy and latency equations on high-level and through low-level simulation. For FFT, our designs dissipated an average of $50\%$ less energy than the design from the Xilinx library and $56\%$ less than the DSP. Our designs showed an EAT factor of 10 times improvement over the embedded processor. These results provide a concrete evidence to substantiate the idea that FPGAs can outperform DSPs and embedded processors in signal processing. Further, they show that PFGAs can achieve this performance while still dissipating less energy than the other two types of devices.
https://doi.org/10.3745/KIPSTA.2005.12A.4.305 인용 PDF KSCI

Design and Implementation Systolic Array FFT Processor Based on Shared Memory (공유 메모리 기반 시스토릭 어레이 FFT 프로세서 설계 및 구현)

Jeong, Dongmin;Roh, yunseok;Son, Hanna;Jung, Yongchul;Jung, Yunho
- Journal of IKEEE
- /
- v.24 no.3
- /
- pp.797-802
- /
- 2020
In this paper, we presents the design and implementation results of the FFT processor, which supports 4096 points of operation with less memory by sharing several memory used in the base-4 systolic array FFT processor into one memory. Sharing memory provides the advantage of reducing the area, and also simplifies the flow of data as I/O of the data progresses in one memory. The presented FFT processor was implemented and verified on the FPGA device. The implementation resulted in 51,855 CLB LUTs, 29,712 CLB registers, 8 block RAM tiles and 450 DSPs, and confirmed that the memory area could be reduced by 65% compared to the existing base-4 systolic array structure.
https://doi.org/10.7471/ikeee.2020.24.3.797 인용 PDF KSCI

A Study on OFDM FFT Design for Peformance of Wireless Multimedia Network (무선 멀티미디어 통신망의 성능 향상을 위한 OFDM FFT 설계에 관한 연구)

Kang Jung-yong;Lee Seon-keun
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.30 no.1A
- /
- pp.70-75
- /
- 2005
The efficient hardware design of the the algorithm is important in wide variety of DSP. One example is OFDM(Orthogonal Frequency Division Multiplexing) based WLAN(Wireless Local Area Network) systems which place high requirements on throughput and power consumption on FFT. The output RAM is composed of two banks of $64{\times}W.$ The banks are swapped immediately following the falling edge or the start signal strobe. This bank swapping allows 64-Point FFT to continue Processing samples and to continue filling the alternative bank, without affecting the data flow outputs.
PDF KSCI

Modified CSD Group Multiplier Design for Predetermined Coefficient Groups (그룹 곱셈 계수를 위한 Modified CSD 그룹 곱셈기 디자인)

Kim, Yong-Eun;Xu, Yi-Nan;Chung, Jin-Gyun
- Journal of the Institute of Electronics Engineers of Korea SD
- /
- v.44 no.9
- /
- pp.48-53
- /
- 2007
Some digital signal processing applications, such as FFT, request multiplications with a group(or, groups) of a few predetermined coefficients. In this paper, based on the modified CSD algorithm, an efficient multiplier design method for predetermined coefficient groups is proposed. In the multiplier design for sine-cosine generator used in direct digital frequency synthesizer(DDFS), and in the multiplier design used in 128 point $radix-2^4$ FFT, it is shown that the area, power and delay time can be reduced up to 34%.
PDF KSCI

Design of Bit-Pattern Specialized Adder for Constant Multiplication (고정계수 곱셈을 위한 비트패턴 전용덧셈기 설계)

Cho, Kyung-Ju;Kim, Yong-Eun
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.12 no.11
- /
- pp.2039-2044
- /
- 2008
The problem of an efficient hardware implementation of multiple constant multiplication is frequently encountered in many digital signal processing applications such as FIR filter and linear transform (e.g., DCT and FFT). It is known that efficient solutions based on common subexpression elimination (CSE) algorithm can yield significant improvements with respect to the area and power consumption. In this paper, we present an efficient specialized adder design method for two common subexpressions ($10{\bar{1}}$, 101) in canonic signed digit (CSD) coefficients. By Synopsys simulations of a radix-24 FFT example, it is shown that the proposed method leads to about 21%, 11% and 12% reduction in the area, propagation delay time and power consumption compared with the conventional methods, respectively.
https://doi.org/10.6109/jkiice.2008.12.11.2039 인용 PDF KSCI

An Efficient Method for the Mass Unbalance Analysis of a Rotor System Using FFT and Lissajous Diagram

Su, Hua;Chong, Kil-To
- 제어로봇시스템학회:학술대회논문집
- /
- 2004.08a
- /
- pp.1612-1617
- /
- 2004
Unbalance analysis is essential in the rotor system. However, some problems still remain in the aspects of computational efficiency and accuracy. In the present paper a new method is proposed for estimating the mass unbalance of a rotating shaft by using the vibration signals. This is an advanced new method for the detection of a mass unbalance and its phase position. Based on the signal processing with FFT, an estimator is designed to detect the mass of unbalance. And an improved Lissajous diagram is also introduced with statistical analysis, which make it possible to compute the phase position of the mass unbalance efficiently and arranged at a certain location of the shaft. The proposed method is demonstrated and validated through several test examples.
PDF

Modeling of the Time-frequency Auditory Perception Characteristics Using Continuous Wavelet Transform (연속 웨이브렛 변환을 이용한 청각계의 시간-주파수 인지 특성 모델링)

이상권;박기성;서진성
- The Journal of the Acoustical Society of Korea
- /
- v.20 no.8
- /
- pp.81-87
- /
- 2001
The human auditory system is appropriate for the "constant Q"system. The STFT (Short Time Fourier Transform) is not suitable for the auditory perception model since it has constant bandwidth. In this paper, the CWT (continuous wavelet transform) is employed for the auditory filter model. In the CWT, the frequency resolution can be adjusted for auditory sensation models. The proposed CWT is applied to the modeling of the JNVF. In addition, other signal processing methods such as STFT, VER-FFT and VFR-STFT are discussed. Among these methods, the model of JNVF (Just Noticeable Variation in Frequency) by using the CWT fits in with the JNVF of auditory model although it requires quite a long time.
PDF

Spectral Analysis and Performance Evaluation of DTMF Receivers with the QFT Algorithm (QFT알고리즘을 이용한 DTMF 수신기의 신호해석 및 성능평가)

Yoon, Dal-Hwan
- Journal of the Institute of Electronics Engineers of Korea TC
- /
- v.38 no.9
- /
- pp.21-28
- /
- 2001
The economical detection of dual-tone multi-frequency(DTMF) signals is an important factor when developing cost-effective telecommunication equipment. Each chanel has independently a DTMF receiver, and it informs the detected signal to processors. In order to detect the DTMF signals, the receiver use algorithm such DFT, FFT and Goertzel methods. This paper analyze the power spectra of the DTMF receiver by using the QFT algorithm. As experimental results, it show that can the improved performance of the DTMF receiver and can reduce memory waste and the real time processing.
PDF

Novel Radix-2⁶ DF IFFT Processor with Low Computational Complexity (연산복잡도가 적은 radix-2⁶ FFT 프로세서)

Cho, Kyung-Ju
- The Journal of Korea Institute of Information, Electronics, and Communication Technology
- /
- v.13 no.1
- /
- pp.35-41
- /
- 2020
Fast Fourier transform (FFT) processors have been widely used in various application such as communications, image, and biomedical signal processing. Especially, high-performance and low-power FFT processing is indispensable in OFDM-based communication systems. This paper presents a novel radix-26 FFT algorithm with low computational complexity and high hardware efficiency. Applying a 7-dimensional index mapping, the twiddle factor is decomposed and then radix-26 FFT algorithm is derived. The proposed algorithm has a simple twiddle factor sequence and a small number of complex multiplications, which can reduce the memory size for storing the twiddle factor. When the coefficient of twiddle factor is small, complex constant multipliers can be used efficiently instead of complex multipliers. Complex constant multipliers can be designed more efficiently using canonic signed digit (CSD) and common subexpression elimination (CSE) algorithm. An efficient complex constant multiplier design method for the twiddle factor multiplication used in the proposed radix-26 algorithm is proposed applying CSD and CSE algorithm. To evaluate performance of the previous and the proposed methods, 256-point single-path delay feedback (SDF) FFT is designed and synthesized into FPGA. The proposed algorithm uses about 10% less hardware than the previous algorithm.
https://doi.org/10.17661/jkiiect.2020.13.1.35 인용 PDF KSCI

Search Result 203, Processing Time 0.029 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)