# Scan Cell Grouping Algorithm for Low Power Design ## Insoo Kim<sup>†</sup> and Hyoung Bok Min\* **Abstract** – The increasing size of very large scale integration (VLSI) circuits, high transistor density, and popularity of low-power circuit and system design are making the minimization of power dissipation an important issue in VLSI design. Test Power dissipation is exceedingly high in scanbased environments wherein scan chain transitions during the shift of test data further reflect into significant levels of circuit switching unnecessarily. Scan chain or cell modification lead to reduced dissipations of power. The ETC algorithm of previous work has weak points. Taking all of this into account, we therefore propose a new algorithm. Its name is RE ETC. The proposed modifications in the scan chain consist of Exclusive-OR gate insertion and scan cell reordering, leading to significant power reductions with absolutely no area or performance penalty whatsoever. Experimental results confirm the considerable reductions in scan chain transitions. We show that modified scan cell has the improvement of test efficiency and power dissipations. **Keywords:** Cell grouping, ETC, Power dissipation, RE ETC, Scan cells, Switching activity #### 1. Introduction Scan technique is the most widely used design for testability (DFT) technique [1, 5, 8]. This technique involves replacing sequential non-scan cells by scan cells of a scan style. This transformation enables the sequential scan cells to be connected as a shift register in the test mode. In a full scan methodology, all the sequential cells in the netlist are replaced by scan cells [1, 5, 8]. Scan cells have two different modes of operation: normal and test. In normal mode, the scan cell's functionality is the same as that of the sequential non-scan cell. In test mode, the scan cells are linked in the form of a shift register [1, 5, 8]. Systems registers, when activated during test mode, can be in states that are not reached in normal mode. As a result, state transitions that are not possible during normal mode are often possible during test mode. Thus, during testing, sequences that lead to much larger power generation are often applied to a circuit when compared with sequences that are applied during normal mode [1, 5, 6, 8, 18, 19]. During test mode, filling in the scan chain with the test data requires shifting the bits one by one into each chain, thus creating increased switching activity in the flipflops [1, 3, 5]. Excessive switching activity during test mode occurs due to the fact that all the flipflops of the circuit are active; during the shift of test data, each flipflop receives its input from the preceding one. Functional operation of a circuit, on the other hand, typically Even though the source of the power dissipation problem is the scan chain transitions, the major component contributing to the power dissipated is the switching activity in the circuit under testing [2]. Testing the circuit with a set of predetermined test vectors necessitates the delivery of the required test vector bits to the scan cells [1, 2]. Scan chain modifications can be utilized to reduce the number of scan chain transitions. The transition frequency between test data bits can be analyzed to reorder the scan cells and to identify the appropriate locations on the scan path for Exclusive-OR logic gate insertion. We present a methodology to reduce scan chain transitions through the implementation of appropriate modifications in the scan chain, reducing power dissipation during the shift of test data. Our methodology is such that the scan chain is modified by inserting logic gates between the scan and grouping scan cells to reduce the number of transitions [3]. The proposed scan chain modifications, which consist of Exclusive-OR logic gate insertion on scan path and scan cell reordering (scan cell grouping), impose no performance penalty, the functional operation of the circuit remains intact timing-wise. The technique we propose can be easily implemented and incorporated in the conventional test flow used in the industry. Received 19 April, 2007; Accepted 21 October, 2007 ## 2. Previous Work A block diagram of a full integrated scan is shown in necessitates reading and updating only a few flipflops every cycle, resulting in relatively much lower switching activity in the circuit under testing [2]. Corresponding Author: School of Information and Communication Engineering, Sungkyunkwan University, Korea. (iskim@ece.skku.ac.kr) School of Information and Communication Engineering, Sungkyunkwan University, Korea. (min@ece.skku.ac.kr) Fig. 1. Full Scan Structures Fig.1. C is the combinational part of the circuit, while R is the test register where the tests are scanned serially [1, 6]. The scan design is considered to be the best design for testability discipline [2]. Over the years, it has gained wide-spread acceptability in system design environments and is now commonly used to test digital circuitry in integrated circuits or System-on-Chip cores. However, scan-based architectures are expensive in power consumption as each test pattern requires a large number of shift operations with high circuit activity [2]. Fig. 1 presents classical scan chain architecture. This technique simplifies the pattern generation problem by dividing complex sequential designs into fully combinational blocks. The internal scan modifies existing sequential elements in the design to support a serial shift capability in addition to its normal functions. This serial shift capability enhances internal node controllability and observability with a minimum of additional I/O pins [4]. When scan cells are linked to form a scan chain, all the scan cells are controllable and observable. Since shifting of data into the scan chain is performed serially, it takes N clock cycles to shift an entire pattern into the scan chain, where N is the length of the scan chain. Note that as a new vector is scanned in, the input to the combinational circuit C changes. In recent years, a considerable amount of effort has been expended in reducing test power for embedded cores in a SOC; the proposed solutions in the literature have focused on reducing the switching activity in the circuit. The transitions that originate from the scan chain can be prevented from propagating into the circuit through the use of externally controlled gates [10]; however, such techniques result in performance degradation as they necessitate gate delay insertion on critical paths. A combinational ATPG (Automatic Test Pattern Generation) methodology, which generates test vectors with minimized power dissipation, is proposed in [9]. Transition controllability and observability cost functions are defined and used to guide the backtrace and objective selection procedures of ATPG. However, incorporating power optimizations into ATPG algorithms typically results in test application time prolongation. First of all, the proposed solutions in [2] are reordering of scan chains and inserting inverter logic gate. But the methodology has weak points. The methodology is applied to small test vectors but not to large test vectors and correlated vectors. Large or correlated vectors cause increase of transitions in test application time. This results in an increase of power dissipation. We present a methodology to reduce scan chain transitions through the implementation of appropriate modifications in the scan chain, thereby reducing power dissipation during the shift of test data. Our methodology is such that the scan chain is modified by inserting logic gates between the scan and grouping scan cells to reduce the number of transitions. The proposed scan chain modifications, which consist of Exclusive-OR logic gate insertion on scan path and scan cell reordering (scan cell grouping), impose no performance penalty allowing the functional operation of the circuit to remain intact timingwise. The percentage of cases wherein a conflict occurs between the bit positions being considered constitutes the transition frequency, denoted by STr. On the other hand, the frequency of cases wherein identical binary values exist in both bit positions is denoted by SNoTr. These metrics are computed for every bit position pair based on the formulas given below wherein the transition frequency between the i<sup>th</sup> and the j<sup>th</sup> bits is denoted by Str.(i,j), tk [i] represents the ith bit of the k<sup>th</sup> test stimulus, and T, the set of test stimuli[2]. $$S_{Tr.}(i,j) = \frac{1}{|T|} \sum_{k \in T} t_k[i] \oplus t_k[j]$$ (1) $$S_{NoTr.}(i,j) = \frac{1}{|T|} \sum_{k \in T} (t_k[i] \oplus t_k[j])'$$ (2) Controllability measures are used in estimating the transition frequencies in test responses. The probability that a transition occurs between two output bits equals the probability that these bits are controlled to conflicting binary values [2]. $$\begin{split} r_{Tr.}(i,j) &= \frac{1}{|T|} \sum_{k \in T} \{cont(t_k[i],0) \times cont(t_k[j],1) \\ &+ cont(t_k[i],1) \times cont(t_k[j],0) \} \\ &= \frac{1}{|T|} \sum_{k \in T} \{cont(t_k[i],0) + cont(t_k[j],0) \\ &- 2 \times cont(t_k[i],0) \times cont(t_k[j],0) \} \\ &= \frac{1}{|T|} \sum_{k \in T} \{cont(t_k[i],1) + cont(t_k[j],1) \\ &- 2 \times cont(t_k[i],1) \times cont(t_k[j],1) \} \end{split}$$ $$(3)$$ In the expression above, $r_{Tr.}(i,j)$ denotes the probabilistic transition frequency between the $i^{th}$ and the $j^{th}$ output bits while $cont(t_k[i],v)$ constitutes the controllability probability of the $i^{th}$ output bit to v when the $k^{th}$ test stimulus is applied. It should be noted that the sum of the 0-controllability and the 1-controllability of a line always equals 1, as does the sum of $r_{Tr.}(i,j)$ and $r_{NoTr.}(i,j)$ , the probabilistic no transition frequency between i and j. Each test stimulus should be considered individually in computing the controllability values of the outputs, as it may imply distinct controllability probabilities. The probabilistic transition frequency helps quantify the correlation between the output bits [2]. The expected transition cost can then be defined based on the transition frequencies and the position in the scan chain. Given two scan cells, denoted by i and j, and the candidate scan chain position for these two cells to be placed, denoted by l, the expected transition cost due to the transitions between these two bits can be expressed as follows. $$ETC = \min(S_{Tr.}(i, j) \times (N - l) + r_{Tr.}(i, j) \times l,$$ $$S_{NoTr.}(i, j) \times (N - l) + (r_{NoTr.}(i, j)) \times l)$$ (4) In this expression, ETC(i,j) denotes the expected transition cost of placing the i<sup>th</sup> and the j<sup>th</sup> cells in adjacent positions while N represents the scan chain length [2]. The type of correlation between two scan cells determines whether an inverter should be inserted in between the cells. As specification of don't care bits in the test stimuli may alter the controllability measures, the transition frequencies in both the stimuli and the responses are recomputed. Fig. 2. Transition Cost Graph Fig. 3. Outcome of the Proposed Scheme in Previous Paper As the minimum weighted edge in Fig. 2 is the one that connects nodes 2 and 5, the corresponding scan cells are placed in the first two adjacent positions. The outcome of the algorithm, i.e. the modified scan chain along with the transformed stimuli and responses, is given in Fig. 3. In this example, appropriate reordering of the scan cells through the proposed graph-based algorithm results in the utilization of the direct correlation between the third and fourth cells and of the second and fifth cells. Alternatively, the high inverse correlation between the first and the third cells and the first and the fifth cells results not only in their adjacent placement but furthermore in inverter insertion between them. The resulting frequencies for the test data given in Fig. 3 summarize the benefit of scan cell correlation exploitation. The transition cost graph structure shows modified scan chain (scan\_in → $4 \rightarrow 3 \rightarrow \text{inverter} \rightarrow 1 \rightarrow \text{inverter} \rightarrow 5 \rightarrow 2 \rightarrow \text{scan out}$ . #### 3. Proposed Methodology ## 3.1 Grouping of Scan Cells Previous work has weak points. The methodology is applied to small test vectors but not to large test vectors and correlated vectors. Large or correlated vectors cause increase of transitions in test application time. This results in an increase of power dissipation. In Fig. 4, the input vectors of {00100} are required for scan cells that have {00000}. As compared with the case of not adding inverter logic gate, this case shows that unnecessary transitions are produced twice while 1 finds its location. Therefore, we propose the technique that can handle more detailed parts by grouping test vectors. Fig. 4. Example of Unnecessary Invert Fig. 5. Proposed Circuit Precisely, we use Exclusive-OR gate logic instead of inverter gate logic. They both have the same function. By supplementing control circuits, Exclusive-OR gate logic has the capacity of passing through signals purely. In other words, impure signals will not be transmitted. These functions of Exclusive-OR gate logic can prevent transitions to circuits under testing. Fig. 5 reveals our proposed circuit. [Exclusive-OR=1] has the same function of the inverter. Furthermore, [Exclusive-OR=0] passes signals. Our proposed techniques demonstrate that by grouping similar test vectors out of unnecessary transitions produced by the inverter and connecting them in a new way, transitions can be reduced to more detailed parts. Reduced transitions mean reduced power dissipations. This is good technique for low power. ## 3.2 Grouping Algorithm for Low Power The main grouping techniques in this paper are formed from the recursive function of the ETC algorithm. We confirmed that the ETC equation introduced in the previous work has unnecessary transitions. Therefore, we propose a new algorithm. The suggested methodology, which is RE ETC, is represented briefly as shown. $$RE\_ETC(group\_number) \{$$ $$for(i = 0; i \le TEST\_VECTOR\_NUMBER; i + +) \{$$ $$if(ETC()! = TEST\_VECTOR(i)) \{$$ $$RE\_ETC(group\_number + 1); \}$$ $$GROUP(group\_number) \le ETC(); \}$$ $$(5)$$ The RE\_ETC algorithm collects unsuitable patterns after execution of the ETC algorithm. The grouping technique is a repeat execution of the algorithm in collected patterns. To begin with, we determine optimized cell positions and the locations of additional inverters according to the ETC algorithm. Then we abstract the vectors without transitions from the locations of additional inverters. In the 'if' sentence, there are contents that perform the algorithm repeatedly using abstracted vectors. In other words, it is a recursive function. Even though the recursive function has a defect of long running time; nevertheless, it is easier for grouping as compared with previous papers. Moreover, it is efficient when it has plenty of vectors. ## 4. Experimental Results Table. 1 shows the experimental evaluation out of our results. All of the examples belong to the ISCAS89 Sequential Benchmark Circuits [7, 11]. The proposed test power reduction scheme has been applied to several fully-scanned circuits in ISCAS89. We have obtained the outcome on a SUN-Microsystems<sup>TM</sup> Ultra-Sparc II 360MHz with 1024MB of memory. The numbers of patterns, events and added gates as well as CPU times are generated by Design\_Analyzer and TetraMAX of SYNOPSYS<sup>TM</sup> [8, 12-17]. The acquired results are simulated by Verilog-XL of CADENCE<sup>TM</sup>. Table 1 indicates that the results are not obviously efficient in the case that only one group is selected by adopting previous methods. As we increase the number of groups gradually, however, the numbers of events become decreased. Also, there is a prominent drop when the group is divided by half. The half group here means the group which divides the test vectors by half. When we select the groups bigger than the half group, the numbers of events grow smaller, yet the quantity does not change much. Thus, it only increases the memory dimensions to store control vectors of the groups. From this, we confirm that the half group is the most optimized number of group. In addition, the result shows as we increase the numbers of the groups, CPU time is reduced by decreased unnecessary transitions in circuits under testing with high complexity. Table 1. Test Events Number and CPU Times | S208<br>(10571)<br>Patterns<br>#:30 | | Previous<br>method | Divide by<br>2 groups | Divide by<br>3 groups | Divide by<br>4 groups | Half groups | |------------------------------------------|-----------------------|--------------------|-----------------------|-----------------------|-----------------------|---------------------| | | Event# | 10490<br>(0.8%) | 10304<br>(2.5%) | 10269<br>(2.8%) | 9755<br>(7.1%) | 8562<br>(19%) | | | CPU time | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | | | Additional gates<br># | 0 | 3 | 4 | 4 | 4 | | S510<br>(43693)<br>Pattems<br>#:67 | Event# | 43536<br>(0.4%) | 43025<br>(1.5%) | 41328<br>(5.4%) | 38712<br>(11.4%) | 34086<br>(22%) | | | CPU time | 9.3 | 0.3 | 0.3 | 0.3 | 0.3 | | | Additional gates<br># | 0 | 2 | 2 | 3 | 3 | | S1488<br>(122588)<br>Patterns<br>#:105 | Event# | 119645<br>(2.4%) | 114252<br>(6.8%) | 109471<br>(10.7%) | 102361<br>(16.5%) | 92618<br>(24.4%) | | | CPU time | 0.4 | 0.4 | 0.4 | 0.4 | 0.4 | | | Additional gates<br># | 0 | 3 | 4 | 5 | 4 | | S1423<br>(1832473)<br>Patterns<br>#:69 | Event# | 1806818<br>(1.4%) | 1753676<br>(4.3%) | 1605246<br>(12.4%) | 1530114<br>(16.5%) | 1304720<br>(28.8%) | | | CPU time | 1.5 | 1.5 | 1.3 | 1.3 | 1.1 | | | Additional gates<br># | 0 | 44 | 52 | 39 | 49 | | S5378<br>(24759989)<br>Patterns<br>#:213 | Event # | 24487629<br>(1.1%) | 23967669<br>(3.2%) | 23150589<br>(6.5%) | 20798490<br>(15.9%) | 18315284<br>(26%) | | | CPU time | 16.9 | 16.9 | 16.9 | 16.9 | 12.9 | | | Additional gates<br># | 0 | 93 | 121 | 85 | 96 | | S9234<br>(13768138)<br>Patterns<br>#:136 | Event # | 13396398<br>(2.7%) | 12763063<br>(7.3%) | 12198570<br>(11.4%) | 11287221<br>(18%) | 10036972<br>(27.1%) | | | CPU time | 7.8 | 7.3 | 7.0 | 7.0 | 6.7 | | | Additional gates | 0 | 145 | 196 | 168 | 174 | #### 5. Conclusion In this paper, we adopt the Exclusive-OR gate logic that has the capacities of inverting and passing through signals purely instead of inverter gate logic. Our results signify the reordering scan chain techniques. With the adoption of the Exclusive-OR gate logic and reordering algorithm, our work demonstrates reduced power dissipation techniques. ## Acknowledgements This work was supported by the IC Design Education Center (IDEC). #### References - [1] Laung-Terng Wang, Cheng-Wen Wu and Xiaoqing Wen, "VLSI Test Principles and Architectures, Design For Testability", *Elsevier Inc.*, 2006. - [2] Ozgur Sinanoglu, Ismet Bayraktaroglu and Alex Orailoglu, "Scan Power Reduction Through Test Data Transition Frequency Analysis", *International Test* Conference, pp. 844-850, 2002. - [3] Ozgur Sinanoglu, Ismet Bayraktaroglu and Alex Orailoglu, "Test Power Reduction Through Minimization of Scan Chain Transitions", *Proceedings of the 20<sup>th</sup> IEEE VLSI Test Symposium (VTS'02)*, 2002. - [4] Abhijit Ghosh, Srinivas Devadas, Kurt Keutzer and Jacob White, "Estimation of Average Switching Activity in Combinational and Sequential Circuits", 29<sup>th</sup> ACM/IEEE Design Automation Conference, pp. 253-259, 1992. - [5] M. L. Bushnell, and V. D. Agrawal, "Essentials of Electronic Testing", Academic publishers, 2000. - [6] Vinay Dabholkar, Sreejit Chakravarty, Irith Pomeranz and Sudhakar Reddy, "Techniques for Minimizing Power Dissipation in Scan and Combinational Circuits During Test Application", IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 17, NO. 12, DECEMBER, PP. 1325-1333, 1998. - [7] F. Brglez, D. Bryant and K. Kozminski, "Combinational Profiles of Sequential Benchmark Circuits", *IEEE Int. Symp. On Circuits and Systems*, pp. 1929-1934, 1989. - [8] Pran Kurup, and Taher Abbasi, "Logic Synthesis using SYNOPSYS 2<sup>nd</sup>,", *Kluwer Academic Publishers, Massachusetts*, 1997. - [9] S. Wang and S. K. Gupta, "ATPG for heat dissipation minimization during test application", *International Test Conference*, pp. 250-258, 1998. - [10] H. J. Wunderlich and S. Gerstendorfer, "Minimized power consumption for scan based BIST", - International Test Conference, pp. 85-94, 1999. - [11] F. Brglez, D. Bryan and K. Kozminski, "Combinational Profiles of Sequential Benchmark Circuits", *IEEE ISCAS*, vol. 14, n. 2, pp. 1929-1934, May 1989. - [12] http://www.synopsys.com/products/solutions/galaxy/test/test.html, 2006 - [13] TetraMAX ATPG User Guide, Version 2000-11, Synopsys Inc., 2000 - [14] TetraMAX Release Note, Version 2000-11, Synopsys Inc., 2000 - [15] http://www.synopsys.com/products/solutions/galaxy/power/power.html, 2006 - [16] http://www.synopsys.com/products/logic/design compiler.html, 2006 - [17] http://www.synopsys.com/products/simulation/simulation.html, 2006 - [18] Sangwook Cho and Sungju Park, "A new synthesis technique of sequential circuits for low power and testing", *Current Applied Physics*, Volume 4, Issue 1, pp. 83-86, February 2004. - [19] Hyungwoo Lee, Hakgun Shin and Juho Kim, "Unified low power optimization algorithm by gate freezing, gate sizing and buffer insertion", *Current Applied Physics*, Volume 5, Issue 4, pp. 378-380, May 2005. #### Insoo Kim He received his B.S., M.S., and Ph.D. degrees in Electrical and Computer Engineering from Sungkyunkwan University in 2000, 2002, and 2008. He is presently at VLSI Design & Test Lab and a Lecturer at Sungkyunkwan University. His research interests include embedded systems, low power systems, computer architecture, design of computing systems, SOC design, and VLSI testing. #### **Hyoung Bok Min** He received his B.S. degree in Electronic Engineering from Seoul National University in 1980, his M.S. degree in Electronic Engineering from KAIST in 1982, and his Ph.D. degree in Electronic Engineering from The University of Texas at Austin in 1990. He has worked at the LG Communication R&D Center (1982~1985), the Neuro Institute of Columbia University (1986~1987) and Intelligent Signal Processing, Inc. (1987~1988). He joined the School of Information and Communication Engineering, Sungkyunkwan University in 1991, where he is presently a Full Professor. His research interests include Fault Tolerant Computing Systems, design of computing systems, SOC design, and VLSI testing.