# High Speed Pulse-based Flip-Flop with Pseudo MUX-type Scan for Standard Cell Library Min-su Kim\*'\*\*, Sang-Shin Han\*\*, KyoungKuk Chae\*\*, Chunghee Kim\*\*, Gunok Jung\*\*, Kwang Il Kim\*\*, JinYoung Park\*\*, Youngmin Shin\*\*, Sung-Bae Park\*\*, Young-Hyun Jun\*\*, and Bai-Sun Kong\* Abstract—This paper presents a high-speed pulse-based flip-flop with pseudo MUX-type scan compatible with the conventional master-slave flip-flop with MUX-type scan. The proposed flip-flop was implemented as the standard cell library using Samsung 130nm HS technology. The data-to-output delay and power-delay-product of the proposed flip-flop are reduced by up to 59% and 49%, respectively. By using this flop-flop, ARM11 softcore has achieved the maximum 1GHz operating speed. Index Terms—Flip flop, pulse, scan, standard cell, ARM11 ### I. Introduction Recently in high performance chip design, pulse-based flip-flops [1-2] have been widely used instead of master-slave flip-flops [3] to decrease the data-to-output (D-Q) delay. In the scan logic for pulse-based flip-flops in [1-2], however, the scan control became too complex and incompatible with the conventional master-slave flip-flop for further enhancing the D-Q speed of the flip-flop. In this paper, a pulse-based flip-flop with pseudo MUX-type scan compatible with the conventional flip-flop is presented, reducing the D-Q delay and power-delay-product of the flip-flop. E-mail: min-su.kim@samsung.com ### II. CIRCUIT DESIGN The conventional master-slave flip-flop with MUX-type scan is shown in Fig. 1 The master and slave latches consist of G1, G2, and G6, and G3, G7, and G8, respectively. G1 is a tri-state complex gate, which is used to make the MUX-type scan. The flip-flop holds the state of the normal input(D) when the scan enable (SE) is low and holds the state of the scan input (SI) when SE is high. Fig. 1. The conventional master-slave flip-flop with MUX-type scan. Fig. 2. The proposed pulse-based flip-flop with MUX-type scan for small layout area. Manuscript received Apr. 12, 2006; revised May 27, 2006. <sup>\*</sup> Sungkyunkwan University, 300 Chunchun-dong, Jangan-gu, Suwon, Kyunggi-do, 440-746, Korea <sup>\*\*</sup> Samsung Electronics, San #24, Nongseo-ri, Giheung-eup, Yongin, Kyunggi-do, 449-711, Korea ### 1. Pulse-based Flip-flop with Muxed Scan for Small Area Fig. 2 shows the proposed pulse-based flip-flop, whose scan uses the same MUX type as shown in Fig. 1 The flip-flop is composed of scan logic (G1-G2), latch (G3-G6), and pulse generator (G7-G9, N1-N3, and P1). When the clock is at low level, the drain node of N1 is precharged to high level by P1. If the clock (CK) is pulled up, the pulse signals (Pb and P) make the falling and rising transitions, respectively. The rising transition of P makes N1 turned on and the drain node of N1 pulled down. After these three gate delay, G7 and G8 make the rising and falling transitions, respectively and the pulse waves are formed at the nodes Pb and P. For the remained period of the high level of CK the drain node of N1 becomes floated since N1 and P1 are turned off. G9 and N2-N3 preserve the drain node of N1 for the period. The input data through G1 is latched by the asserted pulse signal at the rising edge of CK. Many papers [1-2] would make the effort for the scan logic to be disappeared on the D-Q path to reduce the D-Q delay, making the scan control complicated and incompatible with the conventional scan logic. As shown in Fig. 2, however, the scan control of the proposed flip-flop is compatible with that of Fig. 1 and the cell area can be decreased because of the simple structure. Although the D-Q delay increases somewhat compared with [1-2] by inserting MUX logic, since the D-Q delay decreases by the latch delay compared with Fig. 1, the proposed flip-flop can be used extensively to reduce the chip area and the cycle time except the top-critical path. ## 2. Pulse-based Flip-flop with Pseudo-muxed Scan for High Speed Operation The proposed pulse-based flip-flop with pseudo MUX-type scan for high speed operation is shown in Fig. 3. When the scan enable (SE) is low, N2 is turned off and the upper PMOS of G12 is on. This situation is similar to Fig. 2 Consequently, the pulse generator (G10-G13 and N1-N2) generates a pulse signal at the rise edge of the clock and the latch (G1-G4) holds the input data (D). When SE is high, the drain node of N2 is pulled down and G10 does not generate the pulse signal any more irrespective of CK. G1 is turned off and G3 is on accordingly. On the other side, the scan clock (SC and SCb) is asserted by G14 and G15 and the flip-flop operates as a master-slave flip-flop for the scan input (SI). That is, the master and slave latches consist of G6-G9 and G2-G5, respectively. Also, when SE is low, SC becomes low by G14-G15 and the transmission gate (G5) is turned off. Consequently, the master latch (G6-G9) has no influence on the slave latch (G1-G4). Therefore, the proposed flip-flop stores the value of D or SI dependent on the status of SE. This behavior is similar to that of MUX but SE has to be switched only for the low level of CK. The transition of SE for the high level of CK causes the scan clock to be synchronized with SE instead of CK. To solve the problem the latch storing the SE signal can be introduced into the proposed flip-flop. But, we exclude the latch for SE because the cell area increases too much. Alternatively, we use only one latch at the primary input of SE in the whole chip or prevent the primary input of SE from changing for the high level of CK. Since these methods have no overhead in the design and this constraint needs not be considered on the synthesis or placement-and-routing flows, designers can control the scan of the proposed flip-flop equally to the MUX-type scan flip-flop. **Fig. 3.** The proposed pulse-based flip-flop with pseudo MUX-type scan for high speed operation. ### III. SIMULATION RESULTS SPICE simulation was performed using a 130nm **Table 1.** Simulation results of the flip-flops (a) without reset mode and (b) with reset mode for the driving strength of D2, and (c) average values of the flip-flops with various modes and driving strengths. | F/F | D-Q | Setup | Hold | Power | Area | |--------------|-------|--------|-------|--------|----------------------| | Conventional | 459ps | 156ps | -61ps | 18.9uW | 45.00um <sup>2</sup> | | Proposed 1 | 244ps | -23ps | 166ps | 25.8uW | 40.50um <sup>2</sup> | | Proposed 2 | 190ps | -103ps | 244ps | 24.2uW | 56.25um <sup>2</sup> | (a) | F/F wit reset | D-Q | Setup | Hold | Power | Area | |---------------|-------|--------|-------|--------|----------------------| | Conventional | 461ps | 155ps | -96ps | 19.7uW | 54.00um <sup>2</sup> | | Proposed 1 | 244ps | -60ps | 189ps | 26.4uW | 47.25um <sup>2</sup> | | Proposed 2 | 192ps | -100ps | 242ps | 25.8uW | 65.25um <sup>2</sup> | (b) | F/F | D-Q | Setup | Hold | Power | Area | |--------------|-------|--------|-------|--------|----------------------| | Conventional | 478ps | 166ps | -65ps | 20.9uW | 51.00um <sup>2</sup> | | Proposed 1 | 254ps | -53ps | 181ps | 27.5uW | 47.50um <sup>2</sup> | | Proposed 2 | 197ps | -123ps | 263ps | 25.8uW | 64.50um <sup>2</sup> | (c) **Table 2.** Comparison of data-to-output delay, power, and power-delay-product. | F/F | D-Q | Power | PDP | Normalized PDP | |--------------|-------|--------|--------|----------------| | Conventional | 478ps | 20.9uW | 9.99fJ | 1.00 | | Proposed 2 | 254ps | 27.5uW | 6.99fJ | 0.70 | | Proposed 1 | 197ps | 25.8uW | 5.08fJ | 0.51 | Table 3. Synthesis example | MainTLB | Cycle Time | Gate Count | |--------------|------------|------------| | Conventional | 2.1ns | 39,217 | | Proposed | 1.9ns | 39,529 | Samsung HS CMOS process to compare the proposed flip-flops with the conventional master-slave flip-flop. The worst process corner was used, and the operating temperature and supply voltage were 125 ℃ and 1.05 V, respectively. The simulated power was measured when the switching activity of the input data is 25%. The simulation results of the conventional master-slave flip-flop and the proposed flip-flops are shown in Table 1, where "Conventional" is the conventional master-slave flip-flop of Fig. 1, "Proposed 1" is the proposed flip-flop of Fig. 2, and "Proposed 2" is the proposed flip-flop of Fig. 3. In addition to these flip-flops, the flip-flops with reset and set modes, and with the driving strengths of D1, D2, and D4 were simulated. Table 1 (a) and (b) show the simulation results for the flip-flops without and with reset mode, respectively, for the driving strength of D2. Table 1 (c) shows the average values of the flip-flops with various modes and driving strengths. While the clock-to-output delays of all flip-flops were similar, since the setup times of the proposed flip-flops was reduced by 220~290ps, the D-Q delays of the proposed flip-flops decreased by 47~59%. Although the hold times of the proposed flip-flops were increased by 250~330ps due to the pulse width and the reduction of the setup time, the aperture time, which is the sum of the setup and hold times, was increased by just 40ps. The powers of the proposed flip-flops were consumed 23~31% more, and the cell area of the proposed 1 decreased by 7% and the cell area of the proposed 2 increased by 26%. The power-delay-products (PDP), which are the products of the powers and D-Q delays of the conventional and proposed flip-flops, are shown in Table 2 and the PDPs were normalized to that of the conventional flip-flop. The PDPs of the proposed flip-flop decreased by 30~49% due to the reduction of the D-Q delay. These results show that the proposed flip-flops are suitable for the high speed and low power design. We built the standard cell library for the proposed flip-flops and applied these flip-flops to ARM11. The synthesis result applied to MainTLB block of ARM11 softcore is summarized in Table 3. The performance of the design using the proposed flip-flops as enhanced by about 10% in terms of the cycle time, as compared to that using the conventional flip-flops. The gate counts of these designs are similar because the proposed flip-flop with MUX-type scan was used extensively and that with pseudo MUX-type scan was used just in the critical path. ARM11 softcore using the proposed flip-flops worked above 1GHz at 1.35V operating voltage and the FF corner of process. ### IV. CONCLUSIONS In this paper, we proposed the pulse-based flip-flops with MUX-type scan and pseudo MUX-type scan for the standard cell library and compared these flip-flops with the conventional master-slave flip-flop with MUX-type scan. The proposed flip-flops were applied to ARM11 softcore and the stable and enhanced results were acquired. The D-Q delay and power-delay-product of the proposed flip-flop were reduced by 47~59% and 30~49%, respectively. The cell area decreased by 7% for the proposed 1 and increased by 26% for the proposed 2. The total gate count was similar but the cycle time decreased by 10% in the synthesis example. #### REFERENCES - [1] V. Zyuban, "Optimization of scannable latches for low energy," *IEEE Trans. VLSI Systems*, vol. 11, no. 5, pp.778-788, Oct. 2003. - [2] Franco Ricci, and et. al., "A 1.5 GHz 90 nm embedded microprocessor core," *International Symposium on VLSI Circuits*, pp.12-15, 16-18 June 2005. - [3] J.P. Hurst and N. Kanopoulos, "Flip-flop sharing in standard scan path to enhance delay fault testing of sequential circuits," *Test Symposium, Proceedings of* the Fourth Asian, pp.346-352, 23-24 Nov. 1995. Min-su Kim was born in Daegu, Korea, in 1973. He received the B.S. degree from Kyungpook National University, Taegu, in 1996, and the M.S. degree from Pohang University of Science and Technology (POSTECH), Pohang, Korea, in 1998, in electronic and electrical engineering. In 1998, he joined Samsung Electronics Company, Korea, where he is involved in the designs of ARM Core and standard cell library using 130nm, 90nm, and 65nm technology. He is also currently working toward the Ph.D. degree in the Mixed-Signal Laboratory, Samsung Institute of Technology, which is a joint scientific research course with Sungkyunkwan University, Korea. His research interests include high-speed and low-power core designs and circuits. Sang-Shin Han was born in Daejeon, Korea, in 1972. He received the B.S. degree in Electronic engineering from the Ulsan University in 1997. From 1997 to 2002, he was with a DRAM design group at Hynix Semiconductor and designed a family of SDRAMs. In 2003, he joined Samsung Electronics, Korea, and has been involved in the development of ARM CPU. KyoungKuk Chae received the B.S. in electrical engineering from Ajou University, Korea, in 1998. From 1997 to 2002, he worked as a layout engineer of Hynix Semiconductor Inc. He joined the Processor Architecture Lab. of Samsung Electronics Co., Ltd. in 2003. He is now a Senior Engineer of Samsung Electronics and since year 2006, and he is interested in P&R and layout for ARM core. Chunghee Kim received the B.S. and M.S. in electronics engineering for Ajou University in 2000, 2002, respectively. I currently research in Library and Custom circuit as a engineer of SoC R&D Center in Samsung Electronics. Gunok Jung received the B.S. degree in electrical engineering from the Hanyang University, Korea, in 1991, and the M.S. and Ph.D. degrees from the University of Minnesota, U.S.A., in 1998 and 2002, respectively. In 2002, he joined the Processor Architecture Lab., SOC R&D center, Samsung Electronics, Korea, where he has been designing ARM related CPU core design. Kwang Il Kim was born in Kyungki-Do, Korea, on January 7, 1966. He received the B.S. degree in electrical engineering from Sungkyunkwan University, Seoul, Korea, in 1991. He joined Samsung Electronics Company, Kyungki-do, Korea, in 1991, where he has worked on circuit design of high-speed SRAM and 0.35μm, 0.25μm Alpha microprocessors, 0.13μm ARM CPU. He is recently working in the development of 90nm and 65nm low power circuit technology. Jin Young Park received the B.S. in computer engineering from Kyung pook National University in 1998, respectively. Currently he works as a senior engineer of Samsung Electronics Inc. He is implementing next generation high performance ARM processor and interested in SOC design and optimization. Youngmin Shin received the B.S. degree in electrical engineering from Yonsei University, Seoul, Korea, in 1988. He joined Samsung Electronics Company, Kyungki-do, Korea, in 1988, where he has worked on circuit design of microprocessor and $0.35\mu m$ , $0.25\mu m$ , Alpha microprocessors in Digital, USA, $0.13\mu m$ ARM10/11 RISC. He is recently working in the development of 90nm and 65nm high speed ARM CPU and in charge of advanced processor implementation team. Sung Bae Park was born in Seoul, Korea, on August 12, 1958. He received the B.S. and M.S. degrees in electronics engineering from Korea University, Seoul, Korea, in 1981 and 1989, respectively. He joined ETRI (Electronics and Telecommunication Research Institute) – government funded national research center, Daeduk, Korea, in 1982, where he worked on the design and development of 5μm NMOS 8048 8-bit single-chip microcomputer, 3μm CMOS 80C48 8-bit single-chip microcomputer, HC68000 16-bit microprocessor, 1.5μm CMOS M640 64-bit RISC microprocessor and 1μm CMOS SiART'90 32-bit RISC microprocessor. He also worked on the microprocessor architecture development for the ICS (Intelligent Computer System) based on MP (Multi Processor) and MPP (Massively Parallel Processor). In 1991, he joined Samsung Electronics Company, Kyungki-do, Korea, where he worked on the design of 0.5μm 100MHz CMOS PA-RISC microprocessor. He was responsible for SBDC (Samsung Boston Design Center) in Boston, MA, USA from 1996 to 1997 for joint development with DEC (Compaq and now HP) for 667MHz 21164PC and 600MHz 21264. After he came back to Korea in 1998, he had focused on the Alpha microprocessor performance improvements by way of trade-offs among the 0.35μm, 0.25μm and 0.18μm bulk, SOI, Cu, flip-chip technologies as well as microarchitectural tuning. He is now a Vice President of Samsung Electronics and since year 2001, and he has been leading for high performance ARM CPU design team. Young-Hyun Jun received master and Ph.D. degrees in electrical engineering from Korea Advanced Institute of Science and Technology, Seoul in 1986 and 1989, respectively. In 1999 he joined Samsung Electronics Co., Ltd., where he has been involved in the development of High Speed DRAMs. Currently he is a senior vice president responsible for the development of High Speed DRAM and next generation DRAM development. **Bai-Sun Kong** received the B.S. degree in electronics engineering from Yonsei University, Seoul, Korea, in 1990, and the M.S. and the Ph.D. degrees in electrical engineering from Korea Advanced Institute of Science and Technology (KAIST), Taejon, Korea, in 1992 and 1996, respectively. From 1996 to 1999 he was with LG Semicon (currently Hynix Semiconductor), Seoul, Korea, as a senior design engineer, where he was working on the design of high-density and high-bandwidth DRAMs. In 2000, he joined the faculty of Hankuk Aviation University, Goyang, Korea, as an assistant professor at the School of Electronics Telecommunication and Computer Engineering. In 2005, he moved to Sungkyunkwan University, Suwon, Korea, where he is currently an associate professor at the School of Information and Communication Engineering. His research interests include high-performance microprocessor/ memory architecture and circuit designs, high-speed I/O interface design, and IC designs for low-power and/or high-speed applications.