# A Design of A Multistandard Digital Video Encoder using a Pipelined Architecture Seung-Ho Oh, Han-Jun Choi, Sung-Woo Kwon, and Moon-Key Lee #### **Abstract** This paper describes the design of a multistandard video encoder. The proposed encoder accepts conventional NTSC/PAL video signals. It also processes the PAL-plus video signal which is now popular in Europe. The encoder consists of five major building functions which are letter-box converter, color space converter, digital filters, color modulator, and timing generator. In order to support multistandard video signals, a programmable systolic architecture is adopted in designing various digital filters. Interpolation digital filters are also used to enhance signal-to-noise ratio of encoded video signals. The input to the encoder can be either YCbCr signal or RGB signal. The outputs are luminance(Y), chrominance(C), and composite video baseband(Y+C) signals. The architecture of the encoder is defined by using Matlab program and is modelled by using Verilog-HDL language. The overall operation is verified by using various video signals, such as color bar patterns, ramp signals, and so on. The encoder contains 42K gates and is implemented by using 0.6um CMOS process. #### I. Introduction According to recent increasing consumers demand on versatile high quality visual services, many efforts in research and development have been actively made in the areas of high definition television(HDTV), interactive TV, EDTV, multimedia and so forth. Most of the new services are likely to offer wide screen image sequences of 16:9(=1.78:1) display aspect ratio. Because the wide screen image sequences of 16:9 are not compatible with conventional television receivers, the PALplus and NTSCcompatible system[1, 2] with wide display aspect ratio have adopted three different methods for transmitting 16:9 images - the side panel method, the letter-box method, and the window method - to assure compatibility with the conventional television receivers. We have implemented the encoder with the letter-box converter which decimates at the 4:3 ratio between a stored line and a coming input. The proposed encoder takes component video signals(e.q., RGB or YCbCr) and encodes these into a NTSC/ PAL/PALplus composite video signal. The digital encoder has some outstanding advantages such as a capability of selecting three different modes, and independence of process variation due to being almost all-digital. For compatibility with conventional TV receivers in both NTSC/PAL mode and PALplus mode in many different applications, the vertical and horizontal synchronous timings of the encoder has been made fully programmable to the various frequencies. Table 1 shows some of typical values of frequencies. While the letter-box converter used in PALplus mode operates at external clock, 18 MHz with fixed 1152 pixels per line, the rest of encoder block can operate at many different pixel rates established for various 4:3 ratio applications. By adopting the systolic pipelined architecture in designing all the subblocks of the encoder, high throughput is accomplished. In the following section, operations of the encoder in two different modes are mentioned with some descriptions on functionality of each module. The entire architecture and the implementation of each module are described in section III. ## II. Encoder Configuration and Functionality #### 1. NTSC/PAL Standard Mode When an input signal is recognized as NTSC/PAL standard mode signal, the letter-box portion is disabled. The input can be in either gamma corrected RGB or YCbCr format[2, 3]. The encoding process begins by receiving gamma corrected RGB or YCbCr signal. When the reset is low for both NTSC and PAL mode, all the necessary programming values such as the mode selection and timing, level, control registers are programmed. Then when the reset becomes high, the encoder operates in synchronous with the clock whose frequency is half the 20-30 MHz external clock. If the input is in RGB format, it is directly Manuscript received July 4, 1997; accepted October 10, 1997. The authors are with Electronics VLSI & CAD Lab., Yonsei University Engineering College, Korea. Table 1. The supported frequencies. | Pixel rates<br>(MHz) | Total # of pixels<br>per scan line | Typical Application | | | | |----------------------|------------------------------------|---------------------|--|--|--| | NTSC | | | | | | | 13.50 | 858 | CCIR-601 | | | | | 12,27 | 780 | square pixel | | | | | 14.32 | 910 | 4 × FSC | | | | | PAL | | | | | | | 13.50 | 864 | CCIR-601 | | | | | 14.75 | 944 | square pixel | | | | | 17.72 | 1135 | 4 × FSC | | | | | 16:9 image | | | | | | | 18.00 | 1144 | NTSC-compatible | | | | | 18.00 | 1152 | PALplus | | | | Fig. 1. Decimation algorithm between lines. fed to the color converter (Matrix) where the color video camera RGB input components are transformed to luminance and two color difference signals according to CCIR-601 specification, otherwise the matrix is bypassed. After the two color difference signals are passed through the systolic pipelined low pass filter, the filtered signals are modulated by the quadrature modulator which is programmed along with the various pixel clock. In the process, the signals are changed into chroma signals (chrominance and color burst). At the same time, the luminance signal takes the sync signals from sync generator which is also programmed along with the various pixel clock and becomes the luma signal (luminance and sync signal). Finally, through the interpolator, each signal is turned into the S-video (Y/C) outputs and these S-video signals are immediately passed down for addition to give the composite signal as output as well. Verilog-HDL simulation result shows that valid output comes out at every clock after 17 pixel clock of initial latency. #### 2. Wide Screen PALplus Mode If an input is 16:9 wide screen TV signal of PALplus mode, it undergoes 4:3 decimation process. The letter box converter which performs the decimation by taking a stored previous line and a coming input is responsible for this process. The Fig. 2. Configuration of Top-module. decimation algorithm on interlace image is shown in Figure 1. In the case of progressive image, only the decimation algorithm for odd field is applied. Compared with side-panel method[1], the letter-box method can display total wide image without loss. Because 16:9 wide image is displayed on 4:3 screen, there should be inactive video regions on the top and bottom of the screen. These regions are filled with helper signal and can be utilized for caption in some occasions. Now the processed signal and black bars are going to the Matrix module just like in NTSC/PAL mode. # III. Encoder System Architecture #### 1. Top\_Module The encoder contains 6 primary submodules namely, the letter-box converter, the Color converter (Matrix), the Low pass filter(LPF), the Quadrature modulator (Quad\_mod and Subcarrier generator (Subgen)), the Sync generator (Vgen and Hgen), and the Interpolator filter (INT), respectively, as shown in the Figure 2. The encoder is operated by 20-30 MHz of external clock (xclk), which is used to operate each module together with internal clock (clk) which is generated internally by diving the xclk into half. Input signals (RGB or YCbCr) are fed into the encoder at the positive edge of clk, and output signals (Y/C, Composite) are also produced at same edge. #### 2. Letter-box Converter The letter-box converter takes two lines; current line and former line stored in the bit-serial buffer, and then relates them according to 4:3 decimation equations as shown in Figure 3. In designing the letter-box converter, we only use adders and shifters to design the architecture as simple as possible. To assure stable operation at high pixel rate, the converter is designed in pipelined architecture. The depth of pipeline is three stages. With use of proper combinations of select signals, S1-S4, all the decimation equations are realized. The decimated signal is eithery Fig. 3. Structure of letter-box decimator. Fig. 4. Verification of letter-box conversion. active or helper signals and this can be determined by helper enable signal. When the signal is classified as the active video signal, the signal enters the NTSC/PAL standard mode encoder which we have described in II.1. On the other hand, the helper signal consisting only of luminance signal goes to the insert block directly in sync with the sync generator. As shown in Figure 4, the operation of letter-box converter which converts 16:9 image to 4:3 format is verified by using Matlab. #### 3. Matrix In order to cut down on area and yet, maintain a high throughput, the matrix which converts RGB to YUV is designed in systolic pipelined commutator-like structure[7]. To be more specific, instead of having 3 channels which of each having 3 processing elements for separate RGB inputs, we only have one systolic pipelined channel which computes all RGB inputs by commutating 3 different coefficients. As shown in Figure 5, Matrix produces three products of RGB inputs and 3 different coefficients selected by different combinations of external clock (xclk) and internal clock (clk). The three products advance to Fig. 5. Matrix and the operation timing. next stage in the order of V, U and Y. (where V, Y at the negative edge of xclk, U at the positive edge xclk) In the next stage, pre-added values (V, U) and adding value (Y) comes out at the positive edge clk. #### 4. Systolic Pipelined Low Pass Filter The low pass filter is placed on the middle of chroma channels to band-limit the signals to permitted bandwidth of video channel. The low pass filter is indispensable in the encoder as it is used to maintain color difference signals within 0.5 MHz for interleaving the color difference signals on luminance signals. Considering that the coefficients extracted by applying window method in SPW tool are symmetrical, we adopt the symmetrical FIR systolic pipelined structure[2] which can reduce 5 taps to 3 taps and then cut down hardware complexity. In Figure 6, with systolic array of latches in the middle, the processing elements are divided into odd and even coefficient groups to make an efficient pipelined structure having decent latency. In addition to that, this structure doesnt cause any signal broadcasting which can be a serious problem in some occasions[4]. Comparing to general direct type FIR filter, this architecture has advantages on size as well as throughput. These advantages are made even better as number of coefficients are minimized in the SPW extraction process which gets sampling rate 13.5 MHz and pass band of 0.5 MHz as basic specification inputs. This minimization process is made possible as the filter doesn't require a sharp characteristic curve because human visual system is not that sensitive when it comes to color differentiation[3]. The low pass filter produces an output at every clock after two clocks of initial latency. Fig. 6. Systolic pipelined architecture of LPF. Fig. 7. Ratio counter & ROM access of Subgen. #### 5. Quadrature Modulator The quadrature modulator consists of Subcarrier generator (Subgen) and Quad\_mod. Subgen produces subcarrier frequency which is constant (NTSC: 3.579545 MHz, PAL: 4.43 MHz) regardless of various pixel rates of each NTSC and PAL mode, and outputs the phase of sine and cosine of the produced frequency. In order to produce constant frequency in spite of various pixel rates, p:q ratio-counter is used, where p:q implies the relationship between subcarrier frequency(FSC) and pixel rate frequency (FS)[1]. Considering various pixel rate, input value of p.q ratio counter, p1, p2 and p3 are made programmable. As shown Figure 7, 11bits value produced at the p:q ratio-counters are used as address of ROMs which contains 2048 sine and cosine values corresponding to 0°-360°. In this case, $2\times2048\times8$ sized ROM is needed. But ROM size can be reduced to 1/4, namely 512×7 by the sinusoidal characteristic of sine and cosine. Then lower 9 bits of 11bits is for addressing a ROM and higher 2bits determine which quadrant the address assigns and signs of sine and cosine values. In order to invert cosine phase at every line for PAL mode, 11bits output of ratio counter is added to different compensation values every line. Quad\_mod produces chroma signal by taking Fig. 8. Configuration of Quad\_mod. Fig. 9. Block diagram of Vgen. U\_lpf, V\_lpf from LPF and sine, cosine values of subcarrier from Subgen. As shown Figure 8, when Hgens burst enable signal is high, instead of U\_lpf and V\_lpf, 1, 0 are fed into multipliers to produce color burst. The color burst is used as a basis to reconstruct correct color at the receiving end. Consequently chroma signal output from the quadrature modulator consists of chrominance (C) and color burst. #### 6. Vertical & Horizontal Sync Generator The sync generator consists of two modules, the vertical sync generator(Vgen) and horizontal sync generator(Hgen). As it is the module that produces the vertical sync signal, Vgen has the 10bit counter which starts incrementing at the start point of every line. The hardware implementation is in Figure 9. Outputs are composed of various enable signals. These enable signals are produced in such order that proper timings of sync signal is maintained for NTSC / PAL modes. The sync signal can be either for NTSC or PAL mode depending on Pal\_op switch. Output enable signals of Vgen enter Hgen. Fig. 10. Block diagram of Hgen. Hgen is the module which produces signals on various lines (Vsync: vertical sync line, VE: half of vertical sync and half of equalization line, EE: equalization line, EV: half of equalization and half of vertical sync, EB: half of equalization and half of black burst line, UVV: active video line, UBB: black burst line, UVE: half of active video and half of equalization line, UBV: half of black burst and half of active video line, where from UVV to UBV enable signal consist of FP\_en, SY\_en, BR\_en, BU\_en, CBP\_en, and VA\_en). Hgens timing and level registers should be initially programmed with appropriate set up values as different pixel clocks have slightly different sync signals. The size of needed register file is 8×24 bit. As shown in Figure 10, Hgen operates by 11 bits counter which increases with pixel clock. Each output of Hgen are produced exclusively by the enable signals from Vgen. This output enters Insert module which mixes luminance (Y) and sync level value. If Vsync, VE, EE, EV and EB signal is active, each level value outputs at Insert, and if UVV, UBB, UVE and UBV signal is active, each level value and luminance(Y) of FP, SY, BR, CBP and VA goes to Insert. Therefore luma signal came out from Insert consists of luminance Y and sync level value. #### 7. Systolic Pipelined Interpolator An interpolation filter basically consists of up-sampler and low pass filter[2, 6]. The low pass filter used in the interpolator has been designed under very tolerant specifications; pass band of 6.5 MHz and cut off band of 9.8 MHz, as the main purpose of the interpolator is not the prevention of aliasing but the increase of sampling rate. The pixel rate of interpolator is twice the input clock rate of the encoder and in order for the output rate to Fig. 11. Systolic pipelined architecture INT. become double of the input rate, the upsampler has to insert one "0" between every two inputs. The low pass filter then, takes 0s and inputs for further interpolation process. Just like the low pass filter in the previous section, the low pass filter used in the interpolator is linear FIR type which is then implemented efficient systolic pipelined symmetrical architecture to reduce the numbers of adder and multiplier into half. The data which comes from the upsampler is multiplied by the each coefficient extracted once again by SPW tool and the data and result are fed into pipelined architecture as seen in Figure 11. The 16 coefficients produced with SPW tool is made to only 8 taps as coefficients are symmetrical. In this way, total gate count of interpolator is reduced and yet, consistent throughput is still achieved. The overall interpolation system receives the input at the speed of 13.5 MHz and outputs at 27.0 MHz. ## IV. Simulated Results We have designed all the modules described in part III using Verilog-HDL and tested the model by inserting the color bar test signal as an input. Figure 12. and 13. are the composite signals for NTSC and PAL modes. In the Figures, we can see that sub-signals such as the horizontal sync, color burst and active video are all present in a composite signal. NTSC and PAL composite signals basically have the same waveforms but their timings and levels are slightly different. Table 2 denotes all the expected levels of EIA color bars for (M) NTSC (75% amplitude, 100% saturation) and EBU color bars for (B, D, G, H, I) PAL (75% amplitude, 100% saturation). The spectral analysis of same model is shown in Figure 14. and 15. We take the same composite signals extracted from Verilog-HDL and run the data through special Matlab program intended for spectral analysis. The Figure 14. shows that the subcarrier has a frequency of 3.58 MHz for NTSC mode. Similarly, in Figure 15, the PAL mode subcarrier frequency occurs at 4.43 MHz. Fig. 12. A composite signal of EIA color bars for (M) NTSC (75% Amp., 100% Sat.). Fig. 13. A composite signal of EBU color bars for (B, D, G, H, I) PAL. Fig. 14. A spectral analysis of a composite signal shown in Figure 12. We have concentrated on reducing gate counts of sub-modules Fig. 15. A Spectral analysis of a composite signal shown in Figure 13. Table 2. (a) EIA Color Bars for (M) NTSC (75% Amplitude, 100% Saturation). Chroma IRE levels are peak to peak. RGB values are gamma corrected values. | | Nom.<br>Range | w | ·Y | Cyan | G | Mag-<br>enta | R | В | Blck | |----|------------------|-----|------|------|------|--------------|------|-------|------| | R' | 0to255 | 191 | 191 | 0 | 0 | 191 | 191 | 0 | 0 | | G' | 0to255 | 191 | 191 | 191 | 191 | 0. | 0 | . 0 | 0 | | B, | 0to255 | 191 | 0 | 191 | 0 | 191 | 0 | 191 | 0 | | Y | 16to235 | 180 | 162 | 131 | 112 | 84 | 65 | 35 | 16 | | Cb | 16to240 | 128 | 44 | 156 | 72 | 184 | 100 | 212 | 128 | | Cr | 16to240 | 128 | 142 | 44 | 58 | 198 | 212 | 114 ~ | 128 | | 1 | ninance<br>IRE) | 77 | 69 | 56 | 48 | 36 | 28 | 15 | 7.5 | | | minance<br>IRE) | 0 | 62 | 88 | 82 | 82 | 88 | 62 | 0 | | 1 | minance<br>hase) | _ | 167° | 283° | 241° | 61° | 103° | 347° | _ | such as color converter, low pass filter, interpolator, and subcarrier generator. The total pipeline depth of the encoder is 29 stages. Layout result synthesized with the 0.6 um CMOS standard cell library shows that the whole gate count was estimated to be about 42 K gates and confirmed stable operation under the target frequency, 30 MHz. ### **V.** Conclusion In this paper, we proposed the design of the systolic pipelined digital video encoder for NTSC/PAL/PALplus on 4:3 screen. We have concentrated on four points in designing the encoder. First, for compatibility with wide image, letter-box converter architecture which performs 4:3 decimation filtering was described. Table 2. (b) EBU Color Bars for (B, D, G, H, I) PAL (75% Amplitude, 100% Saturation). | | Nom.<br>Range | w | Y | Cyan | G | Mag-<br>enta | R | В | Blck | |----|-----------------------------|-----|------|------|------|--------------|------|------|------| | R' | 0to255 | 255 | 191 | 0 | 0 | .191 | 191 | 0 | 0 | | G' | 0to255 | 255 | 191 | 191 | 191 | 0 | 0 | 0 | 0 | | B' | 0to255 | 255 | 0 | 191 | 0 | 191 | 0 | 191 | 0 | | Y | 16to235 | 235 | 162 | 131 | 112 | 84 | 65 | 35 | 16 | | Сь | 16to240 | 128 | 44 | 156 | 72 | 184 | 100 | 212 | 128 | | Cr | 16to240 | 128 | 142 | 44 | 58 | 198 | 212 | 114 | 128 | | 1 | ninance<br>IRE) | 100 | 66 | 53 | 44 | 31 | 22 | 9 | 0 | | 1 | minance<br>IRE) | 0 | 62 | 88 | 82 | 82 | 88 | 62 | 0 | | F | minance<br>ohase<br>st=135° | _ | 167° | 283° | 241° | 61° | 103° | 347° | | | r | minance<br>hase<br>st=225° | _ | 193° | 77° | 120° | 300° | 257° | 13° | _ | Second, for supporting both NTSC/PAL and wide screen PALplus mode, we have made Sync generator and Subcarrier generator modules programmable. Third, we have reduced hardware complexity for the Matrix, Quadrature modulator, Low pass filter, interpolator filter modules. Finally, we have designed the Encoder in systolic pipelined structure for consistent throughput. We have modeled the Encoder in Verilog-HDL and verified HDL models in the speed of 10~15 MPPS. We will soon be re-synthesized and fabricated using 0.35 um process technology supported by Hyundai Electronic. In conclusion, with good performance, the proposed encoder architecture is suitable for VOD set-top box, HDTV, etc. ## Acknowledgement The authors wish to thank Dr. Ki-Soo Hwang and all the colleagues and this research is supported by Hyundai Electronic Industry, Seoul, Korea. #### References - [1] P. Liuha, P. Pohjala, P. Vanni, and J. Nieminen Implementation of PALplus Decoder with Programmable Video Signal Processor, IEEE Trans. Circuits and Syst. Video Technol., Vol. 5, pp. 429-435, October 1995. - [2] Keith Jack, Video Demystified: NTSC/PAL Digital Encoding: High Text Interactive, 1995. - [3] John Watkinson, The D-2 Digital Video Recoder, Focal Press, 1990. - [4] Zhi-jian (Alex) Mou, A Study of VLSI Symmetric FIR filter Structure, Journal of VLSI of signal processing, Vol. 4, 1992, pp. 371-377. - [5] Phillip E. Mattison, Practical Digital Video with Programming Examples in C, John Wiley & sons, Inc., 1994. - [6] Raytheon Semiconductor, TMC22190 Digital Video Encoder/ Layering Engine Users manual. - [7] S. Inoue, S. Kageyama, H. Uwabata, and Y. Yasumoto, Encoding and decoding in the 6-MHz NTSC-compatible widescreen television system, IEEE Trans. Circuits Syst. Video Technol., Vol. 1, pp. 49-58, Mar. 1991. - [8] S. Y. Kung, VLSI Array Processors, Prentice Hall, 1988. - [9] S. W. Kwon, H. J. Choi, S. H. Oh, M. K. Lee, A Fully Programmable Systolic pipelined Digital Video Encoder for NTSC/PAL/ PALplus compatibility on a 4:3 Screen, Proc. IEEE ICCE, pp. 236-237, June 1997. - [10] H. J. Choi, S. W. Kwon, S. H. Oh, M. K. Lee, J. S. Kim, K. S. Hwang, A Fully Programmable Systolic pipelined Digital Video Encoder, ITC-CSCC97, Vol. 1, pp. 161-164, July 1997. - [11] Seung Hyun Nam, Jong Seob Baek, Moon Key Lee, Flexible VLSI architecture of full search motion estimation for video applications, IEEE Trans. Consumer Electronics, Vol. 4, No. 2, pp. 176-184, May 1994. - [12] N. J. Fliege, Multirate Digital Signal Processing, John Wiley & Sons ltd., 1994. Seung-Ho Oh was born in Seoul, Korea, in 1956. He received the B.S. and M.S degrees in Electronics Engineering from Yonsei University, Seoul, Korea, in 1991 and 1996 respectively. From 1991 to 1994, he was researcher of semiconductor Design Division at the Samsung Electronics Corporation, Kiheung, Korea. Since 1996, he is currently in the course of Ph.D. of Electronics Engineering in Yonsei University. He works on the VLSI design of image processing. Sung-Woo Kwon was born in Seoul, Korea, in 1971. He received the B.S. degree in Electronics Engineering from Yonsei University, Seoul, Korea, in 1994 and is currently working toward his M.S. degree. His research interests include the VLSI design of image processing. Han-Jun Choi was born in Seoul, Korea, in 1969. He received the B.S. degree in Electronics Engineering from Yonsei University, Seoul, Korea, in 1993 and is currently working toward his M.S. degree. His research interests include the VLSI design of image processing. Moon-Key Lee was born in Seoul, Korea, in 1941. He received the B.S., M.S. and doctor of Engineering degrees in Electrical Engineering from Yonsei University, Seoul, Korea in 1965, 1967, and 1973 respectively. Also, he received the ph. D. degree in Electronic Engineering from University of Oklahoma, Okla, in 1980. He is currently professor of the Electronic Engineering Department at Yonsei University, Seoul, Korea. From 1980 to 1982, he was Director of Semiconductor design division at the Korea Institute of ElectronicTechnology (ETRI), Kumi, Korea. He is a founder of Reseach Institute of ASIC Design (RIAD), which was established in 1989 and located at Yonsei University, Seoul, Korea. His current research interests include high performance Microprocessor and Digital Signal Processor VLSI Design.