# A set of self-timed latches for high-speed VLSI Bai-Sun Kong, and Young-Hyun Jun LG Semicon Corp. 16, Woomyeon-dong, Seocho-gu, Seoul, 137-140, Korea, E-mail: bskong@lgsemicon.co.kr Abstract - In this paper, a set of novel self-timed latches are introduced and analyzed. These latches have no back-tohack connection as in conventional self-timed latch, and both and noninverting outputs simultaneously leading to higher operating frequencies. Power consumption of these latches are also comparable to or less than that of conventional circuits. Novel type of cross-coupled inverter used in the proposed circuits implements static operation without signal fighting with the main driver during signal transition. Proposed latches are tested using a 0.6µm triple-poly triple-metal n-well CMOS technology. The result indicates that proposed Active-Low Self-Timed Latch (ALSTL) improves speed by 14-34% over conventional NAND SR latch, while in Active-High Self-Timed Latch (AHSTL) the improvements are 15-35% with less power as compared with corresponding NOR SR latch. These novel latches have been successfully implemented in a high-speed synchronous DRAM (SDRAM). #### I. Introduction Latches and Flip-Flops (FF's), which are usually used as storage elements to contain data, are fundamental elements in VLSI systems [1]. Most of latches and FF's are synchronously operated by a clock signal which tells them when data inputs are considered to be valid. In this case, the transition of the clock signal must occur at the same moment of time at all synchronizing points of a system where storage elements are placed. In real situation, however, clock signals are routed along several different wiring paths with different loads, thus, may reach at each point with different delays. This variation on timing delay during clock distribution results in clock skew. and can cause serious problems such as false output latching. In a multiphase clocking scheme, a nonoverlapping period is introduced as a margin to prevent problems caused by the skew between clock phases. Even in a single-phase clocking scheme. although there is no skew between phases, similar problems can occur due to clock delay between remotely located storage elements [2]. This delay must be accommodated by lengthening the clock period to provide sufficient timing margin. In a moderate frequency range, the portion taken by this margin is negligibly small as compared to the clock period. But, as the clock frequency increases, this cannot be ignored because it occupies considerable portion in a given clock period, and causes difficulty in increasing clock speed. Moreover, the distribution of clock signal uniformly throughout a system with tight skew control requires increased design cost. On the other hand, self-timed latch needs no explicit timing control. Control information is encoded into the input itself and becomes available as soon as input change occurs. Therefore, no clock skew-related problems occur and clock distribution cost is avoided. The simplest form of self-timed latch is the conventional Set-Reset (SR) latch which is shown in Fig 1, and is popularly used in synchronous systems to improve performance. For example, PHIMOS [3] technique combines SR latch with Differential Cascode Voltage Switch (DCVS) lógic [4] to implement a race-free single-phase clocked pipeline structure. This latch is also used as a slave stage of a high performance differential edge-triggered FF to reduce latency and clock load [5]. However, the conventional SR latch has several disadvantages due to back-to-back connection between logic gates. A critical path established through this connection makes the latch inherently slow with high power consumption. In addition, serial output evaluation makes the inverting and noninverting output transitions unsymmetrical to each other. To overcome these drawbacks, we introduce a set of novel self-timed latches in this paper. Section II explains the circuit structure and operation principle of the proposed circuits. In Section III, simulation results are presented with discussion on the advantages of novel circuits over the conventional ones. Finally, we draw our conclusion in Section IV. #### II. Circuit Architecture The schematic diagrams for the proposed self-timed latches are depicted in Fig. 2. The circuit shown in Fig. 2-(a), which is called Active-Low Self-Timed Latch (ALSTL), consists of two input inverters, two main drivers, and two cross-coupled inverters. Input inverters, INV1, and INV2, are used to generate Sb and Rb for use as inputs to the main driver. Transistors, MP1, MP2, MN1, and MN2, form main drivers in which p-type transistors are connected to external inputs. S and R, while n-type transistors are connected to Sb and Rb. The source terminals of MN1 and MN2 are connected to Sb and Rb. The cross-coupled inverters, constructed of MP3, MP4, MN3, and MN4, receive external inputs and the outputs of input inverters through the source terminals. The major function of these inverters is to compensate leakage current at output nodes, thus, the transistor width of these inverters can be smaller than that of other transistors in the circuit. These inverters preserve static logic states when no inputs are applied, and causes no signal fighting during logic transition as explained later in this section. The circuit structure addressed above with predefined input combinations operates safely as a self-timed latch. ALSTL operates with both S and R inputs normally at 'one', unless the state of the circuit has to be changed. If we assume that the initial states of Q and Qb are respectively low and high with both S and R high. Then, all the transistors in the main drivers and the transistors MN3 and MP4 are off, and transistors, MP3, and MN4, are on. Thus, output nodes are connected to the respective supply lines preserving the output logic states. When a 'zero' is applied to the input node, S, transistors, MP1, and MN2, become on, and this makes Q and Qb high and low regardless of their previous states. Once S and Sb change their values, transistors, MP3 and MN4, no longer drive old values because the source nodes change their values. Instead, they drive new logic values, and thus, help speed up logic transition by providing another driving path to output nodes. On subsequent return of the input, both MP1 and MN2 become off and new output values are sustained by the cross-coupled inverters. Similarly, the application of 'zero' to the R input makes Q low and Qb high, and these values are also not changed after the input returns to 'one'. When both inputs go low, all the transistors in the main driver become active, and Vdd is driven to all the source terminals of these transistors to make both outputs high. This corresponds to the 'set' state of the latch. The circuit structure of Active-High Self-Timed Latch (AHSTL) shown in Fig. 2-(b) is similar to that of ALSTL except for some connections. Namely, external inputs. S and R, are connected to the n-type transistors of the main drivers, while p-type transistors receive Sb and Rb signals as inputs. The operation principle of the circuit is completely equal to ALSTL except the input signal polarity. That is, the inputs for AHSTL normally stay at 'zero', and any high pulse on S or R makes the outputs change their states. Simultaneous high value on both inputs makes the 'set' state of the latch as in the case of ALSTL. The circuits shown in Fig. 3 improve speed over the circuits in Fig. 2 by changing some connections in the main drivers. Namely, the source terminals of MN1 and MN2 are connected to the ground instead of being connected to internal signals. The modification eliminates the stacked transistor connection in the main driver, and leads to higher speed. However, simultaneous active pulse on both S and R inputs must not be allowed. If this happens, all the transistors in the main driver become on and a substantial short-circuit current flows resulting in increased power consumption. Therefore, in case of using these latches, applying active signals on both inputs simultaneously even due to noise must be prohibited. The circuits shown in Fig. 4 show the implementation of the latch with reduced number of transistors. In these circuits, input inverters are eliminated, and the connection and type of some devices in the main driver are changed. In the circuit shown in Fig. 4-(a), MN1 and MN2 receive inputs through the sources with their gate terminals being connected to the output. The sources of MN3 and MN4 are connected to ground instead of inputs. The circuit shown in Fig. 4-(b) uses n-channel device as the pull-up transistors in the main drivers. The transistors, MP3 and MP4, in the cross-coupled inverters are connected to the power supply. The operation of these circuits is similar to that of the respective circuits in Fig. 2. Simultaneous low or high pulses are not allowed to happen due to the same reason as in the circuits in Fig. 3 due to their unsafe behavior. Proposed self-timed latches described above have the advantages provided by the elimination of the clock signal such as the immunity to clock skew problems and no global clock distribution cost, as shown in Section I. Moreover, they have additional merits over conventional self-timed latches. First of all, the speed of proposed circuits is superior to that of conventional circuits due to novel structure that has no speed-limiting back-to-back connection. In conventional self-timed latches, serial logic evaluations through the logic gates connected back-to-back make the speed inherently slow. Second, these latches have no crossover current in the main driver during signal transition. While one transistor in a main driver is being turn on, the other transistor in the same main driver is always in cut-off state as is verified by the operation principle of the circuits described above. Thus, the crossover current during transition is totally eliminated, and all the current being provided is used to charge or discharge the output node. This leads to the improvement on operating frequency and the reduction in power consumption. Another factor contributing to high speed and low power comes from the behavior of the proposed cross-coupled inverter. Conventional cross-coupled inverter as a means to obtain static operation causes signal fighting against the old value during logic transition leading to the degradation in speed and power. This can be somewhat relieved by reducing W/L ratio of the devices, but, this can cause increased parasitic component and the circuit still tends to be slow. On the other hand, the crosscoupled inverter in the proposed self-timed latch causes no signal fighting during signal transition and helps speed up the signal transition. ### III. Simulation Comparison and Discussion To compare the performance of the proposed latches with conventional one, these latches are designed using a 0.6µm triple-poly triple-metal n-well CMOS technology, and a highspeed 16M-bit synchronous DRAM (SDRAM) with these latches has been fabricated as a test vehicle. The simulation is performed at the power supply of 3.3V, and the waveforms of the circuits are illustrated in Fig. 5. For a precise comparison, the inverters at the output of the conventional latches in Fig. 1 and the main drivers of the proposed circuits are made to have the same size. Furthermore, each input is generated by two stages of input buffer with the first stage being the same size for all cases. The size of the second stage is optimized according to the amount of input capacitance of each circuit. The power consumed by these input buffers is included in the comparison. Table 1 summarizes the comparison results of the conventional and the proposed latches. It lists the number of devices, total gate width of the transistors, worst-case propagation delay, average power consumption. propagation delay is measured with 200fF load per output node, while power consumption is measured with no load capacitance when the input toggles at the frequency of 100MHz. When it comes to device count, novel circuits require the same or smaller number of devices. Furthermore, total gate width of the proposed circuit is smaller for almost all cases. The gate width of ALSTL-I is comparable to that of conventional circuit due to input inverters with larger size. As far as the performance is concerned, ALSTL latches improve speed by 14-34%, and the power consumption is comparable to or less than conventional NAND latch. Meanwhile, the improvement of the speed of AHSTL latches is 15-35% as compared to the corresponding NOR latch. Power consumption in this case is also smaller. To investigate delay variation on different output load, Fig. 6 plots the propagation delay of novel and conventional circuits with changing load capacitance from 100fF to 800fF. According to the simulation result, type III latches are the best when the amount of load is small, but, their performance becomes degraded as the load increases. Type-I and II have lower latency than the conventional counterparts for all the range of comparison. Among them, type II latches show the best overall performance. Thus, type II is recommended for use in high speed circuit design due to superior performance figure in terms of speed and power. In this case, prohibited input combinations must be completely prevented from happening. Type III latches are useful only when the area and the power are critical and the amount of load is small. ### IV. Conclusion In this paper, a set of novel self-timed latches are introduced and analyzed. These latches have no back-to-back connection that exist in conventional latches, and both inverting and noninverting outputs are simultaneously evaluated leading to higher operating frequencies. New type of cross-coupled inverter used in the proposed latches implements static operation without signal fighting with the main driver during signal transition. Power consumption of these latches are also comparable to or less than that of conventional circuits. The performance of novel latches are verified by SPICE simulation and through a successful implementation of a high-speed SDRAM incorporating these latches. ## References - [1] C. Mead, and L. Conway, Introduction to VLSI systems. Reading. MA: Addition-Wesley, 1980. - [2] M. Afghahi, and C. Svensson, "A unified single-phase clocking scheme for VLSI systems," *IEEE J. Solid-State Circuits*, vol. 25, no.1, pp. 225-232, Feb. 1990. - [3] D. Renshaw, and C. H. Lau. "Race-free clocking of CMOS pipelines using a single global clock." *IEEE J. Solid-State Circuits*, vol.25, no. 3, pp. 766-769. June 1990 - [4] L. G. Heller, W. R. Griffin, J. W. Davis, and N. G. Thoma, "Cascode voltage switch logic: A differential CMOS logic family," in ISSCC Dig. Tech. Papers, pp. 16-17, Feb. 1984. - [5] J. Montanaro, et al., "A 160MHz 32b 0.5W CMOS RISC microprocessor." in ISSCC Dig. Tech. Papers, pp. 214-215. Feb. 1996. Fig. 1 Set-Reset Latch as Self-Timed Latch (STL): (a) NAND -STL. (b) NOR-STL. Fig. 2 Novel Self-Timed Latch (Type I): (a) Active-Low Self-Timed Latch (ALSTL-I). (b) Active-High Self-Timed Latch (AHSTL-I). Fig. 3 Novel Self-Timed Latch (type-II): (a) Active-Low Self-Timed Latch (ALSTL-II). (b) Active-High Self-Timed Latch (AHSTL-II). Fig. 4 Novel Self-Timed Latch (type-III): (a) Active-Low Self-Timed Latch (ALSTL-III), (b) Active-High Self-Timed Latch (AHSTL-III). Fig. 5 Simulation waveforms of STL circuits: the waveforms on the left represent NAND-STL and ALSTL-II, while the right waveforms represent NOR-STL and AHSTL-II. | | Device<br>Count | Total Gate<br>Width(μm) | Delay<br>(nSec) | Power<br>(μW) | |-----------|-----------------|-------------------------|-----------------|---------------| | NAND-STL | 12 | 68 | 0.93 | 213 | | ALSTL-I | 12 | 68 | 0.80 | 210 | | ALSTL-II | 12 | 50 | 0.61 | 166 | | ALSTL-III | 8 | 40 | 0.65 | 187 | | | Device | Total Gate | Delay | Power | |-----------|--------|------------|--------|---------------| | | Count | Width(μm) | (nSec) | (μ <b>W</b> ) | | NOR-STL | 12 | 76 | 0.94 | 230 | | AHSTL-I | 12 | 48 | 0.80 | 220 | | AHSTL-II | 12 | 52 | 0.60 | 173 | | AHSTL-III | 8 | 40 | 0.65 | 148 | Table I Summary of comparison for STL circuits: (a) NAND-STL vs. ALSTL, (b) NOR-STL vs. AHSTL. (b) Fig. 6 Simulation comparison of STL circuits with varing load copacitance: (a) NAND-STL vs. ALSTL, (b) NOR-STL vs. AHSTL.