# A Simple Discrete Cosine Transform Systolic Array Based on DFT for Video Codec (DFT에 의한 비데오 코덱용 DCT의 단순한 시스톨릭 어레이) 朴 鍾 五\* 李 光 宰\* 梁 根 鎬\* 朴 周 用\*\* 李 門 浩\*\* (Jong Oh Park, Kwang Jae Lee, Guen Ho Yang, Ju Yong Park, and Moon Ho Lee) ## 要 約 본 논문에서는 이산 푸리에 변환에 의해 이산 여현 변환 연산을 위한 새로운 시스톨릭 어레이를 소개한다. 제안된 어레이는 DFT를 약간 변형한 변형 이산 푸리에 변환을 기초로 하고 있고, Kung의 접근 방법에 고르체 알고리듬을 적용하여 얻어지며, 수학적으로 증명하였다. 제안된 어레이는 N개의 셀과 한 개의 곱셈기로 구성되며, N점 DCT 연산을 N클럭싸이클에 수행하고, 또한 입력 데이타를 연속적으로 처리할 수 있다. 하나의 기본 셀에 대한 신호대 잡음비를 구하였고, 회로 레벨에서 설계하였다. 어레이 상수는 고정적이고, 계수 양자화에 따른 에러를 감소시키고 가격을 제한하기 위하여 곱셈기 대신에 ROM을 사용하였다. #### Abstract In this paper, a new approach for systolic array realizing the discrete cosine transform (DCT) based on discrete Fourier transform (DFT) of an input sequence is presented. The proposed array is based on a simple modified DFT (MDFT) version of the Goertzel algorithm combined with Kung's approach and is proved perfectly. This array requires N cells, one multiplier and takes N clock cycles to produce a complete N-point DCT and also is able to process a continuous stream of data sequences. We have analyzed the output signal-to-noise ratio (SNR) and designed the circuit level layout of one-PE chip. The array coefficients are static and thus stored-product ROM's can be used in place of multipliers to limit cost as well as eliminate errors due to coefficients quantization. (Dept. of Inform. and Telecom. Eng., Chonbuk Nat'l Univ.) 接受日字: 1989年 6月 23日 ## I. Introduction The DCT was discovered in 1974[1]. For well correlated signals, its performance is very close to that of the statistically optimal Karhunen-Loeve transform(KLT). The DCT is real, orthogonal and separable (extention to multidimensions is straight forward). Similar to the DFT, DCT has a variety <sup>\*</sup>準會員,全北大學校 電子工學科 <sup>(</sup>Dept. of Elec. Eng., Chonbuk Nat'l Univ.) <sup>\*\*</sup>正會員,全北大學校情報通信工學科 of fast algorithms that reduce its complexity in implementation. Its applications extend from speech coding to HDTV coding. Several standards groups such as the CCITT and ISO have recommended the DCT as the main compression tool in applications such as videoconference, videophone and still frame communication services. This has led to DCT VLSI chip design and development by the Universities, Research Labs and Industry. Some of these ASICs (Application Specific Integrate Circuits) can implement multiple block size DCTs and in real time up to 80 MHz. There are two kinds of method to implement the DCT chip: First, the DCT has been developed by fast algorithms[2,3], and then realized the chip [6]. As the DCT matrix is real and othogonal, fast algorithm developed for the DCT are equally applicable to its inverse. With some minor changes, the same DCT chip can also perform its inverse. An efficient recurisve algorithm developed by B.G Lee[3] was utilized by SGS-Thomson B.G Lee[3] was utilized by SGS-Thomson microelectronics[7]. Second, fast DCT algorithms follows the via other discrete transform such as DFT, Hadamard transform and discrete Hartley transform. Fast DCT algorithm for systolic array via 4N-point Winograd Fourier transform has been developed[8]. The DFT has been adopted in a wide range of digital signal processing applications with advent of state-of-art VLSI technology. Because of its regularity, systolic architecture has been envisaged to be an elegant realization of the DFT. The first DFT systolic array was proposed by Kung[9]. In this paper, we propose a systolic array for DCT based on the modified DFT (MDFT). The MDFT can be easily systolized and implemented in a modular processor array with local communication. Note that the proposed array coefficients (twiddles factors) are static and the modified DFT outputs Z(k) stay in the nodes and will eventually be pumped output. The DFT systolic array was applied the Goertzel algorithm via Horner's rule[11,12] and is also expanded to systolic array of DCT based on Kung's model [4,5,9,10]. All these algorithms optimizations are necessary in order to obtain, with currently available technology, a linear systolic array combination that is not only expanded to 2dimension DCT but also compatible with other orthogonal transforms. # II. Proposed DCT Algorithm The N-point DFT of a sequence x(0), x(1), ..., x(N-1) is defined as $$X\left(k\right)=\sum\limits_{n=0}^{N-1}~x\left(n\right)~W_{N}^{~nk},~for~0\leq k\leq N-1$$ , where $W_{N}=\exp\left\{-j\,2\,\pi/N\right\}$ and $j=\sqrt{-1}$ (1) The DCT is defined in[1] as follows: $$Y(k) = C(k) \sum_{n=0}^{N-1} x(n) \cos[(2n+1)k\pi/2N]$$ where $$C(k) = \begin{cases} 1/\sqrt{2} & \text{for } k=0 \\ 1 & \text{for } k \neq 0 \end{cases}$$ (2) Then, with a minor modification DFT, which is similar to the Chirp-z transform[13], we define MDFT as shown below. $$Z(k) \stackrel{\triangle}{=} \sum_{n=0}^{N-1} x(n) U_N^{nk}$$ , for $0 \le k \le N-1$ (3) , where $U_N = (j \pi/N)$ we present a comparison property of DFT and MDFT in Table 1. Table 1. DFT and MDFT properties pertainning to twiddle factor. | Property<br>Transform | Twiddle<br>factor | Number of<br>Elements | Elements of<br>unit circle | Features | |-----------------------|----------------------|-------------------------------------|----------------------------|------------------------------------------------| | DFT | $W_N = e^{-J2\pi/N}$ | $2^{n} = N;$ $n = 0, 1 \cdot N - 1$ | Ex. N=4 | Signal<br>processing | | MDFT | $U_N = e^J \pi / N$ | $2^{n} = N;$ $n = 0, 1 \cdot N - 1$ | Ex. N= 4 | $U_N^{N} = -1$ $U_N^{2N} = 1$ $U_N^{-2} = W_N$ | The DCT in (2), can be rewritten as $$Y(k) = C(k) Rel \exp (ik \pi/2N) Z(k)$$ (4) This implies that the DCT of an N-point sequence can be implemented by N-point MDFT. The other operations such as multiplication by C(k) exp $(jk\pi/2N)$ and taking the real part are also needed. Fig. 1 shows a block diagram for DCT computation. Fig.1. Block diagram for DCT computation. The MDFT shown in (3) can be rewritten using the Goertzel algorithm via Horner's rule[11]: $$Z(k) = ((\cdots (x(N-1) \cup_{N}^{k} + x(N-2)) \cup_{N}^{k} + \cdots + x(2)) \cup_{N}^{k} + x(1)) \cup_{N}^{k} + x(0)$$ (5) In order to process a continuous flow of data sequence, the polynomial (5) should be evaluated with the data samples in the proper order, x(0) first and x(N-1) last. Then, we can obtain $$A(k) = ((\cdots (x(0) U_N^k + x(1)) U_N^k + \cdots + x(N-2)) U_N^k + x(N-1)$$ (6) Obviously, (6) can be expressed by a closed form shown below: $$A(k) = \sum_{n=0}^{N-1} x(n) U_{N}^{-N-1-n/K}$$ (7) Since $U_N^{2N}=1$ compiles with the periodic property, (7) can be rewritten as follows: $$A(k) = (-1)^k U_N^{-K} - \sum_{n=0}^{N-1} x(n) U_N^{-nK}$$ (8) Assuming that x(n) is real and x(n)=x(-n), the following equation can be obtained from the symmetric property of the MDFT in (8): $$\begin{split} A\left(k\right) &= (-1)^{k} \ U_{N}^{-K} \sum_{n=0}^{N-1} \ x(n) \ U_{N}^{-nK} \\ &= (-1)^{K} \ U_{N}^{-K} \ Z\left(-k\right) \\ &= \left\{ \begin{array}{l} + \ U_{N}^{-K} \ Z\left(-k\right) \ \text{where k is even} \\ \\ - \ U_{N}^{-K} \ Z\left(-k\right) \ \text{where k is odd} \end{array} \right. \end{split}$$ By interchanging variables Z and A in (9) and rearranging the terms, the MDFT of the input sequence x(n) can be formulated as $$Z(k) = (-1)^{k} U_{N}^{-k} A(-k)$$ (10) In order to map (10) into a systolic array, (10) is rewritten as a linear first-order recursive form: $$y(n, k) = U_N^{-K} \{x(n) + y(n-1, k)\}$$ (11) , where y(n,k) indicate n-th recursive step and the k-th MDFT sample. Where $0 \le k \le N-1$ , $0 \le n \le N-1$ and y(-1,k)=0. Therefore, the MDFT samples, Z(k) are obtained after N iterations, $$Z(k) = v(N-1, k)$$ for $0 \le k \le N-1$ (12) ## III. Examples As shown in Fig. 2, we show an 1-dimensional systolic array for realizing the (11) of the N-point MDFT for DCT in a similar fashion as DFT [12]. but one multiplier is added. Fig. 2 depict the basic function of PE and the systolic array for N=4, and the basic structure of PE. It shows that the array uses a stage of storage elements to store the intermediate results and multiplexers to pump the final results. In the aforementioned (4) and Fig. 1, the MDFT samples are multiplied by C(k) exp $(jk\pi/2N)$ processor at the left node, and then the output sign changes alternatively depending on the number of input data k. This algorithm is not only versatile for the orthogonal transform but also easier for data handling, by changing the twiddle factors. Applying the proposed algorithm described above, the computation of an N-point MDFT is exampled in Table 2. From the Table 2. the data samples are shifted out to their near cell. A new data sample is available at each cell for the computation of the MDFT samples, Z(k) which leave the array in sequence every clock and then the final results are pumped out of the array through the stage of multiplexers with storage element. Therefore, an N-point DCT's is obtained after 2N-1 clocks and then a new data sequences are computed for every N clock cycles, but the latches in the array are reseted after receiving the N-point input samples. Fig.2. A systolic array of realizing the propesed 4-point modified DFT for DCT # IV. Performance Analysis (b) Systolic array for N=4 Below, as shown in Fig. 2(c), the one-PE chip, which is composed of two adders (with look ahead carry generators), data flip/flops and multiplexer with storage, was designed using standard cell library and 3 $\mu$ m single metal N-well CMOS process. It contains about 4600 transistors and Table 2. An example for the processing of an 4- point DCT. | CLK. | MULTI. | PE 1 | PE 2 | PE 3 | PE 4 | |------|--------------------|----------------------------------------------------------------|-------------------------------------------------|--------------------------------------------------|--------------------------------------------------| | 1 | | $Z_{+}(0) = \bigcup_{4} {}^{0}x_{+}(0)$ | | | | | 2 | | $Z_{1}(0) \cup_{\varepsilon^{-4}} \{Z_{1}(0) + x_{t}(1)\}$ | $Z_{1}(1) = U_{4}^{-1}x_{1}(0)$ | | | | 3 | | $Z_1(0) = U_4^{-0} \{Z_1(0) + x_1(2)\}$ | $Z_1(1) = U_{\epsilon^{-1}}\{Z_1(1) + x_1(1)\}$ | $Z_1(2) = \bigcup_4 {}^2x_1(0)$ | | | 4 | | $Z_{1}(0) = U_{4}^{-0} \{Z_{1}(0) + x_{1}(3)\}$ | $Z_1(1) = U_{\bullet}^{-1} \{Z_1(1) + x_1(2)\}$ | $Z_1(2) = U_4^{-2} \{Z_1(2) + x_1(1)\}$ | $Z_1(3) = \bigcup_{4=3} x_1(0)$ | | 5 | | $Z_{z}\left(0\right)=\bigcup_{4}^{-\theta}x_{z}\left(0\right)$ | $Z_{1}(1) = U_{4}^{-1} \{Z_{1}(1) + x_{1}(3)\}$ | $Z_{1}(2) = U_{4}^{-2} \{Z_{1}(2) + x_{1}(2)\}$ | $Z_1(3) = U_4^{-1} \{Z_1(3) + x_1(1)\}$ | | 6 | | $Z_{1}(0) = U_{4}^{-0} \{Z_{2}(0) + x_{1}(1)\}$ | $Z_{1}(1) = U_{4}^{-1} \mathbf{x}_{1}(0)$ | $Z_{1}(2) = U_{4}^{-1}\{Z_{1}(2) + x_{1}(3)\}$ | $Z_1(3) = U_{\bullet}^{-1} \{Z_1(3) + x_1(2)\}$ | | 7 | | $Z_{1}(0) = \bigcup_{4} {}^{0} \{Z_{1}(0) + x_{1}(2)\}$ | $Z_{2}(1) = U_{4}^{-1} \{Z_{2}(1) + x_{2}(1)\}$ | $Z_1(2) = U_4^{-2} x_2(0)$ | $Z_1(3) = U_4^{-3} \{Z_1(3) + x_1(3)\}$ | | 8 | Y <sub>1</sub> (0) | $Z_{z}(0) = U_{4} \{Z_{z}(0) + x_{z}(3)\}$ | $Z_2(1) = U_4^{-1} \{Z_2(1) + x_2(2)\}$ | $Z_{1}(2) = U_{4}^{-1} \{Z_{2}(2) + x_{2}(1)\}$ | $Z_2(3) = U_4^{-3} x_2(0)$ | | 9 | Y <sub>1</sub> (1) | $Z_{1}\left( 0\right) =U_{4}^{-\alpha }x_{3}\left( 0\right)$ | $Z_{2}(1) = U_{4}^{-1} \{Z_{2}(1) + x_{2}(3)\}$ | $Z_1(2) = U_4^{-1} \{Z_2(2) + x_1(2)\}$ | $Z_{2}(3) = U_{4}^{-3} \{Z_{1}(3) + x_{2}(1)\}$ | | 10 | Y <sub>1</sub> (2) | $Z_{1}(0) = \bigcup_{4} \{Z_{1}(0) + \chi_{1}(1)\}$ | $Z_3(1) = U_4^{-1}x_3(0)$ | $Z_1(2) = U_4^{-1} \{Z_1(2) + x_1(3)\}$ | $Z_{2}(3) = U_{4}^{-3}\{Z_{1}(3) + x_{2}(2)\}$ | | 11 | Y <sub>1</sub> (3) | $Z_{1}(0) = \bigcup_{1}^{-1} \{Z_{1}(0) + x_{1}(2)\}$ | $Z_{s}(1) = U_{4}^{-1} \{Z_{s}(1) + x_{s}(1)\}$ | $Z_{3}(2) = U_{4}^{-1}x_{3}(0)$ | $Z_{1}(3) = U_{4}^{-3} Z_{1}(3) + x_{2}(3)$ | | 12 | Y <sub>1</sub> (0) | $Z_{1}(0) = \bigcup_{i=0}^{n} \{Z_{1}(0) + x_{1}(3)\}$ | $Z_{1}(1) = U_{1}^{-1} \{Z_{1}(1) + x_{1}(2)\}$ | $Z_3(2) = \bigcup_4 {}^{-2} \{Z_3(2) + x_3(1)\}$ | $Z_{1}(3) = U_{4}^{-1}x_{1}(0)$ | | 13 | Y,(1) | $Z_{\bullet}(0) = U_{\bullet}^{\bullet} x_{\bullet}(0)$ | $Z_1(1) = U_4^{-1} \{Z_1(11 + x_1(3))\}$ | $Z_{s}(2) = U_{4}^{-1} \{Z_{3}(2) + x_{3}(2)\}$ | $Z_1(3) = U_1^{-1} \{Z_1(3) + x_2(1)\}$ | | 14 | Y,(2) | $Z_{4}(0) = U_{4}^{-0} \{Z_{4}(0) + x_{4}(1)\}$ | $Z_{4}(1) = U_{4}^{-1}x_{4}(0)$ | $Z_1(2) = U_4^{-2} \{Z_1(2) + \chi_1(3)\}$ | $Z_3(3) = U_4^{-1} \{Z_3(3) + x_3(2)\}$ | | 15 | Y, (3) | $Z_4(0) = U_4^{-6} \{Z_4(0) + x_4(2)\}$ | $Z_4(1) = U_4^{-1} \{Z_4(1) + x_4(1)\}$ | $Z_4(2) = U_4^{-2} x_4(0)$ | $Z_1(3) = \bigcup_{4=3}^{3} \{Z_1(3) + x_1(3)\}$ | where, CLK: Number of clock cycles. MULTI: Multiplier. $x_i(j)$ enotes i-th input sequence and j-th input sample. MDFT samples: Z<sub>1</sub>(j) denotes i-th MDFT sequence and j-th MDFT sample. DCT samples : Y<sub>i</sub>(j) denotes i-th DCT sequence and j-th DCT sample. (a) Top view of one-PE chip (1) == (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = (1) = Fig.3. A layout of Process Element (PE) chip. measures about 8.0 x 6.6 mm<sup>2</sup> including 92 I/O pads. Fig.3(a), (b) describe the top view and the layout of one-PE chip. This parallel arithmetic algorithm, the Z(k) samples leave the array in sequence every clock cycle but after an initial delay of 2N-1 clock cycles and the outputs are computed every N clock cycles. The throughput rate per sample depends on the delay by the flip-flops( $T_{FF}$ ), adders( $T_A$ ) and ROM's ( $T_{ROM}$ ) in one basic cell. The total delay is given by $T_D$ : $$T_{\mathsf{D}} = T_{\mathsf{FF}} + 2T_{\mathsf{A}} + T_{\mathsf{ROM}} \tag{13}$$ This chip has an internal accuracy of 12 bits. Table 3 shows the delay time for the DCT. **Table 3.** DCT clock cycle using VSC100 components. $$T_{FF} = 4.2 \text{ns}$$ $T_A = 15.9 \text{ns}$ $T_{ROM} = 60 \text{ns}$ $Total T_D = 96 \text{ns}$ $Throughput rate = 10.4 MHz$ The analysis of signal-to-noise ratio(SNR) is given by [11] $$SNR = \frac{\sigma_y^2}{\sigma_c^2} \tag{14}$$ ,where $\sigma_y^2$ is the output signal variance and $\sigma_f^2$ is the output error signal variance. The unit impulse responses required for the analysis are given: First, the impulse responses from the noise source $e_1$ - $e_4$ towards the outputs Re(y(n,k)) and Im(y(n,k)) [he(n)'s]. Second, the impulse responses from the input signals Re(x(n)) and Im(x(n)) to the same outputs [gx(n)'s] using Mason's rule [12]: Finally, we get $$SNR = \frac{2^{2B+1}}{(N+1)^2}$$ (15) We can find that the SNR of the proposed array is proportional $(2^{2B})$ that contains an internal accuracy B bits and $1/(N+1)^2$ that contains a number of input samples and is similar as that of the DFT systolic array [12]. ### V. Conclusion A simple systolic array for the computation of DCT is proposed. This array is based on a simple modified DFT (MDFT) version of Goertzel agorithm combined with Kung's approach. The MDFT array can be easily systolized and implemented in a modular processor array with local communication. This array requires N cells, one multiplier and takes N clock cycles to produce a complete N-point DCT and also is able to process a continuous stream of data sequences. Each cell of the array contains one twiddle factor and the modified DFT outputs Z(k) stay in the nodes and will eventually be pumped out. The SNR of a fixed point computation for the proposed systolic array is similar as that of the DFT systolic array, but the proposed linear systolic array PE is compatible with other orthogonal transform, by changing the array coefficients. # References - [1] N. Ahmed, T. Natarajan, K.R. Rao, "Discrete cosine transform," *IEEE Trans.* on Computers, vol. C-24, pp. 90-93, Jan. 1974. - [2] W.C. Chen, Smith and S.C. Fralick, "A fast computational algolithm for the discrete cosine transform," *IEEE Trans. Comm.* vol. Com-25, pp. 1004-1008, Sept. 1977. - [3] B.G. Lee, "A new algolithm to compute the discrete cosine transform," *IEEE Trans. on ASSP.*, vol. ASSP-32, pp. 1243-1245, Dec. 1984. - [4] Moon Ho Lee, Jong Oh Park, Yasuhiko Yasuda, "A new systolic array for DCT," Submitted to IEEE, Trans, on ASSP. - [5] Moon Ho Lee, Yasuhiko Yasuda, "A new 2-D systolic array algorithm for DCT/DST to be appear electronics letters. - [6] A. Leger, "Implementation of fast DCT for full color videotex services and terminals," GLOBECOM 84, Global Telecomm. Conf., pp. 333-337, Atlanta, GA, Nov. 1984. - [7] J. Francis et al., "A single chip video rate 16\*16 DCT." ICASSP 86, Intl. Conf. a Acoust., Speech, Signal Process., pp; 15.8.1-15.8.4, Tokyo, Japan, April 1986. - [8] Ward J.S. and B.J. Stanier, "Fast discrete cosine transform algorithm for systolic arrays," Electron. Lett. vol. 19, pp. 58-60, 1983. - [9] H.T. Kung "Special-purpose device for signal and image processing: An opportunity in VLSI,", SPIE, Real Time Signal Processing III, vol. 241, pp. 74-84, 1980. - [10] S.Y. Kung, VLSI Array Processors. Prentice Hall. 1988. - [11] A.V. Oppenheim and R.W. Shafer, *Digital Signal Processing*. Englewood Cliffs, NJ: Prentice-Hall 1975. - [12] J.A. Belaldin, Tyseer Aboulnasr, and Willem Steenaart, "Efficient One-dimensional Systolic Array Relaization of the Discrete Fourier transform," *IEEE, Trans. on CAS*, vol. 36, no. 1, pp. 95-100, Jan. 1989. - [13] Leo I. Bluestein, "A linear filtering approach to the computation of discrete fourier transform," *IEEE Trans. on Audio and Electroacoustics*, vol. AU-18, no. 4, pp. 451-455, Dec. 1970. #### 著者紹介 李 門 浩 (正會員) 第26巻 第10號 參照 현재 전북대학교 정보통신 공학과 부교수 # 朴 周 用(正會員) 1982年 2月 전북대학교 전자공학과 졸업. 1986年 2月 전북대학교 대학원 전자공학과 졸업(공학석사). 1986年 4月~1988年 9月 전북대학병원 의공과 연구원. 1988年 3月~현재 전북대학교 대학원 박사과정 재학중, 정보통신공학과 조교. 주관 심분야는 데이타통신 및 영상신호처리, 신경회로망 등임. # 朴 鍾 五(準會員) 1988年 2月 전북대학교 전자공학과 졸업. 1988年 3月~ 현재 전북대학교 대학원 전자공학과 석 사과정재학중. 주관심분야는 영상 신호처리 및 영상신호처리용 VLSI 등임. 李 光 宰 (正會員) 第26巻 第10號 參照 현재 전북대학교 대학원 전자공학과 석사과정 재학중. 梁 根 鎬 (準會員) 第26巻 第10號 參照 현재 전북대학교 대학원 전자공학과 석사과정 재학중