# A New Scan Partition Scheme for Low-Power Embedded Systems

Hong-Sik Kim, Cheong-Ghil Kim, and Sungho Kang

A new scan partition architecture to reduce both the average and peak power dissipation during scan testing is proposed for low-power embedded systems. In scan-based testing, due to the extremely high switching activity during the scan shift operation, the power consumption increases considerably. In addition, the reduced correlation between consecutive test patterns may increase the power consumed during the capture cycle. In the proposed architecture, only a subset of scan cells is loaded with test stimulus and captured with test responses by freezing the remaining scan cells according to the spectrum of unspecified bits in the test cubes. To optimize the proposed process, a novel graph-based heuristic to partition the scan chain into several segments and a technique to increase the number of don't cares in the given test set have been developed. Experimental results on large ISCAS89 benchmark circuits show that the proposed technique, compared to the traditional full scan scheme, can reduce both the average switching activities and the average peak switching activities by 92.37% and 41.21%, respectively.

Keywords: Scan testing, power dissipation, design for testability, scan partitioning, test cube.

#### I. Introduction

As the integrity of very large-scale integration (VLSI) circuits increases, the complexity and the cost of testing the manufactured chips are also increasing. In highly integrated VLSI systems, it is quite difficult and time consuming to generate tests for fully sequential circuits. Therefore scan-based design style has become an attractive solution to achieve the desired fault coverage and to speed up time-to-market regardless of its area and performance overhead [1], [2]. In scan testing, however, the shift operations for loading and observing test data can lead to excessive transitions of the scan chain and unnecessary switching activities of circuit signals [3]. Moreover, the reduced correlation between consecutive test patterns can increase the power consumed during the capture cycle.

Increased power dissipation may reduce the reliability of circuits because the heat dissipation and current density can exceed the limit of the design specification through high levels of switching activities in the circuit. Therefore, many techniques to reduce the test power in scan-based test environments have been proposed [4]-[12].

In [4]-[6], low-power scan test techniques to reduce the transitions in scan-in vectors are studied. However, with these techniques, the transitions in scan-out vectors cannot be controlled and the reduction of peak power consumption during the capture cycle is not guaranteed.

In [7], unspecified bit positions in the original test cube set are filled with so called preferred fill values, which are determined by the analysis of signal probabilities in a circuit under test to reduce the capture power consumption for transition fault testing. In [8], the preferred filling method and the adjacent filling method are used together to reduce both the

Manuscript received Mar. 27, 2007; revised Dec. 13, 2007.

Hong-Sik Kim (phone: + 82 10 3542 0630, email: hongsik@yonsei.ac.kr) and Sungho Kang (phone: +82 2 2123 2775, email: shkang@yonsei.ac.kr) are with the Department of Electrical and Electronic Engineering, Yonsei University, Seoul, Rep. of Korea.

Cheong-Ghil Kim (email: tetons@yonsei.ac.kr) is with the Department of Computer Science, Namseoul University, Cheonan, Rep. of Korea.

shift power and capture power dissipation for stuck-at fault testing and transition fault testing. In addition, to reduce the test data size, they relax the fully specified test set instead of using the test cube set.

In [9] and [10], the transitions in the scan chain are prevented from propagating to circuit lines by the addition of gates that are controlled externally between the outputs of scan cells and the inputs of the circuit under test. Although this technique can completely disable the switching activities inside the circuit under test during the scan shift operation, it may introduce undesirable timing impact on critical paths and increase the power consumption in the capture cycle.

In [11]-[13], a scan chain is partitioned into multiple segments, and only one segment is activated at a time. In [11], the original scan chain partition scheme is proposed. Theoretically, the reduction of test power dissipation is linearly proportional to the number of segments without increasing the test application time. In [12], a selective activation scheme is applied to both the shift operation and the capture operation to reduce the peak power consumption by using different clock phases for segments in the capture cycle. In this scheme, several clock cycles are required for exclusively activating segments during the capture cycle so the test application time can be increased. In [13], the test response data captured in some scan segments is used for the generation of the subsequent test stimulus by exploiting *don't-care bits* to reduce both test time and test data volume.

In [14], a scan chain disabling technique is proposed such that the test set is generated and ordered so that some of the scan chains can be frozen for portions of the test set. The difference between our proposed scheme and that of [14] is that only the power reduction during the scan shift operation is considered in [14], while the proposed scheme reduces power dissipation during both the scan shift and capture operations by partitioning scan chains according to an analysis of the test cube set and by reusing the test response data as the consecutive test stimuli.

In this paper, a new scan architecture is proposed to reduce power dissipation during both the shift and capture cycles without impacting the original fault coverage. In the proposed architecture, a scan chain is partitioned into several segments, and scan chain rippling is restricted only inside one of the segments during the scan shift operation. According to the profile of don't-care bit positions in the test cube set and the test response set, some segments are controlled to reuse the test response data as subsequent test stimulus without updating it with new test data so that the average power consumption can be optimized. In addition, some segments are controlled not to be captured with the test response in order to reduce the peak power consumption during the capture operation. The original

test cube set is preserved in the final test stimulus data applied by the proposed scheme so that the fault coverage does not change. Therefore, this paper proposes a novel graph-based heuristic for scan chain partitioning, which can efficiently divide the scan chain into multiple segments so that the power dissipation during scan testing can be minimized. The heuristic is to increase the average weight sum inside each sub-graph and reduce the average weight sum among the sub-graphs. In addition, since the efficiency of the proposed scheme depends on the spectrum of don't-care bits in the test cube, it is necessary to increase the number of don't-care bits in the original test cube set. To do this, a heuristic based on a support set has been applied. Experimental results on large ISCAS89 benchmark circuits show that the proposed technique with four segments can reduce the average switching activities inside the circuit under test by an average of 92.37% and the peak switching activities by 41.21% compared to the switching activities in the traditional full scan scheme.

The remainder of this paper is organized as follows. Section II describes the proposed architecture. The scan chain partitioning heuristic will be explained in section III. Section IV describes the method to increase the number of don't-care bits. Experimental results will be provided in section V. Finally, section VI concludes this paper.

## II. Proposed Scan Architecture

Figure 1 shows the proposed scan segmentation architecture and the detailed scheme for the segment and control unit. The proposed scheme is similar to the Illinois Scan Architecture (ILS) [15], [16]. The ILS architecture consists of two operation modes: broadcast mode and serial mode. In serial mode, it operates as a traditional scan chain. In broadcast mode, the scan chain is split into several length-balanced segments, and the same data is broadcast to all the segments. That is, all the segments are activated and fed with the same data during the shift cycle in broadcast mode. However, in the proposed scheme, the scan chain is split into a given number of lengthbalanced segments and only one segment is activated or observed at a time during each test clock. This technique can reduce the number of scan cells that switch simultaneously during both scan shift and capture periods. In the shift cycle, the test data is scanned into one of the scan partitions through the scan-in pin, and the content of this partition is scanned-out through the scan-out pin at the same time. The contents of the remaining partitions are frozen until the activated partition completes the entire shift operation. Prior to the activation of any particular segment, the segment configuration data is shifted into the scan configure register (SCR) through the scanin pin. Then, according to the contents of the SCR and the state



Fig. 1. Proposed scan architecture.

of the test state machine (TSM), which is a simple finite state machine (FSM), the segment control unit controls the activation of scan segments and the MUX. In order to freeze the scan segments selectively, the scan cells are modified as shown in Fig. 1. The combination of MUX enable signal encoding is chosen to avoid timing delay on the functional path from a *Data in* to a flip-flop. The proposed scheme requires an additional signal to disable the operation of the scan cells. The added routing, however, is common to all the scan cells inside the same segment so that the congestion level for the routing is not unacceptably high.

To scan new test data into the *i*-th scan segment or scan-out the test response from the *i*-th scan segment, *ScanEn* should be activated and Segment Disable; should be disabled. To capture the response to the scan cells in the i-th scan segment, both ScanEn and Segment Disable, should be disabled. If the signal Segment Disable<sub>i</sub> is activated, the scan cells in the i-th scan segment reserve their current contents or are frozen. In order to configure the SCR, additional 2N clock cycles are required (N for the shift disable setting and N for capture disable setting), where N is the number of scan segments. However, since generally more than one scan segment is disabled during the test data shift period, the time overhead is concealed by the time reduction due to segment disabling so that the final test application time does not increase. The test application time usually decreases because time reduction during segment disabling is much larger than the time required for segment configuration.

Figure 2(a) illustrates the implementation of an example of a TSM in which there are three segments. It consists of some combinational logics (an offset decoder and a decode-+r) and a mod-N bit counter. In this example, a 2-bit counter is required since there are three segments. During the shift period, the *i*-th segment is skipped and is not activated if the i-th SCR is set to 1. The counter increases its current content by the amount of the offset, which is pre-calculated by the offset decoder. Figure 2(b) shows the simulation waveform for the example. During the test mode, three signals of Segment Disable<sub>i</sub> (i = 1, 2, 3) for segment disable are generated for both the shift and capture operations. First, the scan chain is in segment configuration mode, and the segment configuration data is shifted into the SCR. Then, the scan chain falls into scan shift mode, and the scan segments are activated in the sequence of segments 1, 2, and 3 until the current test patterns are completely loaded into the scan chain. For the current test pattern, the second segments do not need to be loaded with test stimulus data (that is, the content of the second shift disable register in the SCR is set); therefore, it is skipped and is not activated. Instead, the third segment is controlled to be loaded with test data from the tester. For the second segment, it re-uses its current content as the consecutive test stimulus. After the scan chain is fully loaded, the signal of ScanEn is deactivated and it enters capture mode. In this example, the first segment does not need to capture the test response data (that is, the content of the first capture disable register in the SCR is set) so that it is disabled while the other segments are being controlled



Fig. 2. Control of the proposed scheme.

to capture the test response.

The proposed method is different from the previous scan chain segmentation schemes, in that some portions of the scan segments reuse their test response as the consecutive test stimulus data without being shifted with new test data. Moreover, when the test response data in some segments is of no use for observation, the segments are controlled not to capture test response. Therefore, the maximum number of flipflops which can toggle simultaneously is limited so that both the average and the peak power dissipation during scan testing can be considerably reduced.

The proposed scan scheme makes use of the information about the spectrum of *don't cares* in the given test cube set when partitioning the scan chain into multiple scan segments, reordering the test vectors, and controlling (disabling or activating) each scan segment. It fills the don't-care bit positions in the consecutive test stimulus data with the test response data so that the original test cube set is preserved in the final test stimulus data. Therefore, the proposed scheme does not degrade the original fault coverage.

#### III. Scan Partition Heuristic

The method used to distribute scan cells to scan segments determines the efficiency of the proposed scheme. To identify the scan cells that belong to the same segment, the associated set of test cubes is analyzed by a graph-based heuristic. The nodes in the graph represent scan cells, and the edge weights denote the number of test cubes, where the scan cells are

```
construct a graph G; for (i=1; N+1; i++) {

select the pair with the maximum edge weight sum from G; add the pair to group G_i;

remove the pair from group G,

for (j=1; j< M+1; j++)// generating G_i
{

select a node from G so that the weight sum of the edges between the node and G_i is maximum;

add the node from G_i;

optimize_1 G_i;
}

optimize_2 G_i;
}
```

Fig. 3. Scan partition algorithm.

simultaneously specified. The edge weight thus determines the connection strength between the two nodes.

It is desirable to include highly connected nodes in the same scan segment to keep the average weight sum inside the segment high. Additionally, since the highly connected nodes are included in the same segments, the average weight sum among nodes across scan segments should be kept low. However, since it is NP-hard to divide a graph into N subgraphs with a maximum weight sum inside the scan segment and with a minimum weight sum crossing scan segments [17], a new heuristic for scan chain partition is used. Figure 3 describes the heuristic, where N is the number of scan segments, and M is the balanced number of scan cells in each segment. First, a graph G is constructed from the given test

cube set. A pair of nodes is selected from G such that its edge weight is maximal, and it is included in a sub-graph  $G_i$ . The selected pair is removed from G. Next, the node whose edge weight sum with G is maximal is selected and included in  $G_b$ . After that,  $G_i$  is optimized. In the proposed heuristic, two optimization steps are applied to minimize the weight sum between sub-graphs and to maximize the weight sum inside each sub-graph. During the generation of  $G_i$ , the optimization process (optimize 1) is applied to reduce the inter sub-graph weight sum. Therefore, a selected node is replaced with an unselected node if the average edge weight sum between the current sub-graph and remaining sub-graphs is lower than that between a new sub-graph including the unselected node and the other sub-graphs. After generating the sub-graph, the second optimization process (optimize 2) is applied to maximize the edge weight sum inside the sub-graph. Therefore, an unselected node is replaced with an already selected node if the edge weight sum of the unselected node is higher than the weight sum of the selected node. This optimization is repeated until no more improvement is possible.

Figure 4 shows an example of the scan chain partition process. A graph G based on the given test cube set is constructed. In this example, the number of segments is assumed to be 2. Nodes 0 and 3 yield the highest weight sum of 3 in G so that they are added to sub graph  $G_1$ . Since node 1 has the strongest connection or the maximum weight sum with sub-graph  $G_1$ , it is included into  $G_1$ . Finally, highly correlated scan cells are placed in the same segment. Nodes 0, 1, and 3 belong to one segment, and the rest belong to another segment.

Since some scan cells may not be included in the same segment because of layout constraints, the physical layout



Fig. 4. An example of scan segmentation.

constraints should be considered in the scan partitioning process. Usually, the layout is constructed so that the routing overhead is minimized. Therefore, the nearby scan cells or scan cells in the same design blocks have to be placed in the same scan chain segment to avoid layout violations in the implementation of an efficient scan chain partition. Moreover, segmentation is performed according to the test cubes to group scan cells into a segment. Scan ordering is not yet considered at this stage. At the stage of physical design (such as place or route), the scan cells in the same scan segment are stitched together so that the proposed method will not significantly alter the traditional design flow.

# IV. Increasing the Number of Don't Cares

The proposed scheme applies scan segmentation to reduce the average power consumption during scan testing. In addition, the spectrum of don't-care bits in the test cube and test response are exploited to prevent capture operations of segments which contain test response data useless for fault detection. Therefore, if a partially specified test cube set is given, the proposed scheme can be applied with full performance. Sometimes, however, most bits in the test pattern set may be specified, or only a fully specified test set may be available. In such cases, only the benefit of scan segmentation can be achieved. Therefore, some bit positions in the test patterns are relaxed into don't cares by a heuristic based on the concept of support sets [18], [19].

A test vector is *fully specified* if all inputs are specified to 0 or 1. That is, no input assumes a value of X. A test vector that contains don't cares is said to be *partially specified*. A support set (SS) for a primary output Z is any set of signals including primary inputs (PIs) that satisfy all of the following conditions.

- a) All signals in the set assume a logic value of 0 or 1.
- b) The primary output Z is a member of the set.
- c) The logic value on any signal except PIs in the SS is uniquely determined by values of other signals in the SS.

The *support signals* for a gate are the smallest subset of signals that are required to uniquely determine the current logic value of the gate. When multiple signals determine the logic value of the gate, the following criteria are used to calculate the smallest subset. If one of the possible support signals of the gate has already been included in the support set of the circuit, then this signal is selected as the support signal of the gate. Otherwise, a support signal at the lowest level is chosen.

For example, consider an AND gate, a, that has two inputs, b and c. When the logic values of gate a are a=0, b=1, and c=0, then the support signal is b because signal b uniquely determines the value of a. Assume that the logic values of the

```
Procedure compute support set(L);

{
    Support set S=L;
    while (unsupported gates in L exists)
    {
        g=unsupported gate in L with maximal level;
        ss=minimal support signals for g;
        add support signals ss to S;
        for (all the unsupported gates i in ss)
        {
            add i to L;
        }
        mark g supported;
    }

    return support set S;
}
```

Fig. 5. Procedure to compute support set.

gate are a=0, b=0, and c=0, and the level of b is higher than the level of c. If b is already in the support set of the circuit, then b is the support signal for a; otherwise, c is the support signal of a, because the level of c is lower than b.

In case of multiple primary outputs, condition 2 is modified to require that each of the POs is included in the support set. A support set is minimal if no signal in the set can be deleted without violating the conditions.

Figure 5 describes the algorithm to increase the number of don't care bits in the given test set by taking a list of gates L with known logic values and by computing a minimum support set for the given test set. The circuit is levelized and the input vector is simulated to determine the logic value of each circuit line. When the logic value of a gate line is specified, it is included the gate list L, which will be considered for test relaxation. Since the fault coverage of the original test set should not be deteriorated by relaxing specified inputs, fault simulation is performed to generate a list of newly detected faults and then for every fault in the newly defected fault list. All the gates that are located on the line whose values are required to excite and propagate the corresponding fault are excluded from gate list L for test relaxation. With this technique to increase the number of don'tcare bits in the test data set, even when only fully specified test vectors are available, the dependency of the proposed low-power scan architecture on the initial test data set can be reduced, or the efficiency of the proposed scheme can be increased without loss of fault coverage.

### V. Experimental Results

Experiments were conducted on the large ISCAS89 benchmark circuits [21]. Only the flip-flops in the benchmark circuits were considered as candidates for scan partitioning, while the primary inputs and primary outputs were assumed to be controllable and observable by the external tester and were not considered during the partitioning of the scan chain.

In CMOS technology the major power dissipation results from the switching of a CMOS gate from one stable state to another; therefore, the switching activity of charging and discharging the load capacitance in each component is usually the dominant factor in dynamic power dissipation. Thus, the power consumption is usually evaluated as

$$P_d = 0.5 \times V_{DD}^2 \times f_p \times \sum E[T_i] \times C_i \tag{1}$$

where  $V_{DD}$  is the power supply voltage,  $f_p$  is the clock frequency,  $E[T_i]$  is the expected number of transitions per cycle in gate i, and  $C_i$  is the load capacitance of gate i. The total power consumption can be simply evaluated by using the weighted switching activity (WSA) [3], [6], [20]. The WSA in each gate is given by the number of signal transitions plus the number of fanouts at the gate, as

WSA = number of transitions 
$$\times$$
 (1+number of fanouts). (2)

Table 1 shows the experimental results of the proposed scheme on large ISCAS 89 benchmark circuits in terms of the average WSA and peak WSA. The average WSA and peak WSA can explain the average power consumption and peak power consumption, respectively. The benchmark circuits considered are full scan versions of ISCAS 89 benchmark circuits. The set of test cubes is generated to achieve 100% fault coverage for all the detectable faults in each circuit by an ATPG based on the SOCRATES algorithm [22], [23]. According to the spectrum of don't-care bit positions in the given test cube set, the scan chain is split into multiple segments (2, 3, or 4 segments in our work) and the actual test stimulus data is calculated by the proposed scan partition heuristic. The method to increase the number of don't cares in the test cube set can improve the efficiency of the scan chain partition heuristic during the scan partition process. For the test stimulus data, the final fault coverage, the average WSA, and peak WSA are calculated. The final fault coverage with the test stimulus data for all the benchmark circuits was the same as the original fault coverage with the test cube sets.

The second and the third columns of Table 1 show the average WSA and the peak WSA, for each circuit under the full scan environment. The results for 2, 3, and 4 partitions are shown in the remaining columns. In the case of 2 partitions, the average WSA was reduced by about 12.62% (683/5,411), and the peak WSA was reduced by 66.89% (5,146/7,704) compared to the case of the full scan environment. In the case of 3 partitions, the average WSA was reduced by 9.33% (505/5,411), and the peak WSA was reduced by 61.45% (4,734/7,704). In the case of 4 partitions, the average WSA was reduced by 7.63% (413/5,411), and the peak WSA was reduced by 7.63% (413/5,411), and the peak WSA was

Table 1. Experimental results of the proposed scheme on average and peak power consumption.

| Name    | Full scan |        | Proposed scheme |        |              |        |              |        |  |  |
|---------|-----------|--------|-----------------|--------|--------------|--------|--------------|--------|--|--|
|         |           |        | 2 partitions    |        | 3 partitions |        | 4 partitions |        |  |  |
|         | Average   | Peak   | Average         | Peak   | Average      | Peak   | Average      | Peak   |  |  |
|         | WSA       | WSA    | WSA             | WSA    | WSA          | WSA    | WSA          | WSA    |  |  |
| s5378   | 1,053     | 1,728  | 193             | 928    | 107          | 744    | 76           | 660    |  |  |
| s9234   | 2,020     | 3,294  | 543             | 2,117  | 392          | 2,065  | 183          | 1,464  |  |  |
| s13207  | 2,136     | 4,350  | 249             | 1,253  | 154          | 916    | 99           | 798    |  |  |
| s15850  | 3,962     | 5,301  | 252             | 2,590  | 105          | 2,410  | 105          | 2,415  |  |  |
| s35932  | 8,195     | 13,425 | 1,422           | 13,313 | 1,041        | 13,110 | 897          | 13,110 |  |  |
| s38417  | 11,378    | 12,369 | 1,045           | 4,629  | 887          | 3,248  | 678          | 2,784  |  |  |
| s38584  | 8,934     | 13,464 | 1,067           | 11,195 | 852          | 10,642 | 850          | 10,475 |  |  |
| Average | 5,411     | 7,704  | 683             | 5,146  | 505          | 4,734  | 413          | 4,529  |  |  |

Table 2. Effect of different partitioning algorithms.

|         | Reduction rate |        |         |        |              |      |  |  |  |
|---------|----------------|--------|---------|--------|--------------|------|--|--|--|
| Name    | 2 part         | itions | 3 part  | itions | 4 partitions |      |  |  |  |
|         | average        | peak   | average | peak   | average      | peak |  |  |  |
| s5378   | 0.78           | 1.01   | 0.75    | 1.03   | 0.69         | 1.04 |  |  |  |
| s9234   | 0.78           | 0.88   | 0.70    | 1.15   | 0.61         | 0.98 |  |  |  |
| s13207  | 0.86           | 0.94   | 0.78    | 0.97   | 0.76         | 1.02 |  |  |  |
| s15850  | 0.65           | 0.88   | 0.61    | 0.93   | 0.60         | 0.97 |  |  |  |
| s35932  | 0.91           | 1.06   | 0.90    | 0.97   | 0.91         | 0.97 |  |  |  |
| s38417  | 0.87           | 0.89   | 0.80    | 0.91   | 0.77         | 0.99 |  |  |  |
| s38584  | 0.85           | 0.97   | 0.80    | 0.95   | 0.83         | 0.96 |  |  |  |
| Average | 0.81           | 0.95   | 0.76    | 0.99   | 0.74         | 0.99 |  |  |  |

reduced by 58.79% (4,529/7,704). The peak power consumption is usually determined by the capture operation when more than 1 segment can be activated. Since only one partition is activated in the shift operation of the proposed scheme, the reduction of average power consumption is much higher than the reduction of peak power consumption as shown in Table 1. As the number of partitions increases, the area overhead and power reduction of the proposed scheme increase. However, the reduction of the power consumption slows down when the number of partitions is larger than 5, so the results of up to 4 segments are presented in Table 1.

Table 2 shows the power dissipation reduction rate of the proposed scan partition scheme compared to another scan partition algorithm. We chose the other partition algorithm by altering some parameters of the proposed algorithm. The purpose of this experiment is to show that the reduction of power consumption is sensitive to the partitioning algorithm.

Table 3. Power reduction of the proposed scheme compared to that of the partition-first scheme.

|         | Reduction rate |        |         |        |              |      |  |  |  |
|---------|----------------|--------|---------|--------|--------------|------|--|--|--|
| Name    | 2 part         | itions | 3 part  | itions | 4 partitions |      |  |  |  |
|         | average        | peak   | average | peak   | average      | peak |  |  |  |
| s5378   | 0.28           | 0.92   | 0.22    | 0.93   | 0.19         | 0.95 |  |  |  |
| s9234   | 0.30           | 0.72   | 0.21    | 0.79   | 0.19         | 0.95 |  |  |  |
| s13207  | 0.24           | 0.71   | 0.16    | 0.64   | 0.15         | 0.67 |  |  |  |
| s15850  | 0.11           | 0.86   | 0.08    | 0.96   | 0.09         | 1.01 |  |  |  |
| s35932  | 0.99           | 1.06   | 0.73    | 0.97   | 0.70         | 0.97 |  |  |  |
| s38417  | 0.14           | 0.69   | 0.12    | 0.65   | 0.12         | 0.73 |  |  |  |
| s38584  | 0.27           | 0.94   | 0.21    | 0.95   | 0.25         | 0.95 |  |  |  |
| Average | 0.33           | 0.84   | 0.25    | 0.84   | 0.24         | 0.89 |  |  |  |

The reduction rate is calculated by dividing the average (or peak) WSA of the proposed scheme by the average (or peak) WSA of the other partition algorithm. In most cases, with a few exceptions, the proposed partition algorithm outperforms the other algorithm. Since the proposed algorithm focuses on power reduction during the scan shift operation, the reduction of average power consumption is much greater than the reduction of peak power consumption. The proposed algorithm consumes an average power of 77% and a peak power of 98% compared to the other partition algorithm as shown in Table 2.

To reduce the time-to-market, it is sometimes preferable to partition the scan chain first and use the partitioned information as constraints in the ATPG Table 3 compares the proposed scheme to the partition-first scheme in terms of power reduction rate. In the case of the partition-first scheme, there was a significant loss of power reduction without loss of fault

Table 4. Comparison to previous scan partition schemes.

| Name   | Tests |      |      | Average WSA |      |      | Peak WSA |      |
|--------|-------|------|------|-------------|------|------|----------|------|
|        | Ours  | [13] | [14] | Ours        | [13] | [14] | Ours     | [14] |
| s5378  | 221   | 252  | 333  | 0.10        | 0.34 | 0.29 | 0.43     | 1.06 |
| s9234  | 361   | 367  | 480  | 0.19        | 0.24 | 0.27 | 0.63     | 0.85 |
| s13207 | 488   | 459  | 586  | 0.07        | 0.19 | 0.29 | 0.21     | 0.68 |
| s15850 | 459   | 442  | 500  | 0.03        | 0.27 | 0.17 | 0.45     | 0.73 |
| s38417 | 973   | 882  | 1243 | 0.08        | 0.25 | 0.14 | 0.29     | 0.68 |
| s38584 | 664   | 653  | 854  | 0.10        | 0.23 | 0.28 | 0.79     | 0.86 |
| Ave.   | 528   | 509  | 666  | 0.10        | 0.25 | 0.24 | 0.47     | 0.81 |

Table 5. Comparison to other low-power test scheme.

| Name   | Те   | sts | Averag | e WSA | Peak WSA |      |
|--------|------|-----|--------|-------|----------|------|
|        | Ours | [8] | Ours   | [8]   | Ours     | [8]  |
| s5378  | 118  | 100 | 0.18   | 0.74  | 0.47     | 0.65 |
| s9234  | 125  | 111 | 0.27   | 0.74  | 0.69     | 0.83 |
| s13207 | 238  | 235 | 0.19   | 0.73  | 0.34     | 0.58 |
| s15850 | 101  | 97  | 0.15   | 0.64  | 0.47     | 0.68 |
| s38417 | 100  | 87  | 0.21   | 0.71  | 0.42     | 0.76 |
| s38584 | 121  | 114 | 0.23   | 0.71  | 0.81     | 0.51 |
| Ave.   | 134  | 124 | 0.23   | 0.71  | 0.53     | 0.67 |

coverage. The proposed scheme outperforms the partition-first scheme in both average and peak power dissipation during scan testing. The proposed algorithm consumes an average power of 27.33% and a peak power of 85.67% compared to the partition-first scheme.

Table 4 compares the proposed scheme with previous scan partition schemes [13], [14] in terms of average and peak power consumption. The WSA of each scheme was normalized to the WSA of the traditional full scan environment. The number of scan partitions is 3 for all the schemes. The sizes of test cube set for each scheme are provided in the second column of Table 4. The third column contains the average WSA results of each scheme. For most benchmark circuits, the proposed scheme consumes the least average power. The average power consumption of the proposed scheme is about 15% lower than that of the previous scan chain partition schemes. In addition, as shown in Table 4, the variance in power consumption for benchmark circuits is much smaller than that of the other schemes. The peak WSA data for each scheme is provided in the fourth column. In [13], the data for peak power consumption is not available since the method does not consider peak power reduction. For all the benchmark circuits, the proposed scheme consumes the least power. On

average, the proposed scheme shows better peak power consumption than [14] by approximately 34%.

Table 5 compares the proposed scheme to software based low-power scan test schemes. In [8], tests are relaxed to contain don't-care bits. Half of the PPI's and PI's with don't-care bits are filled by the preferred filling method, and the remaining half of PPI's and PI's with don't-care bits are filled by the adjacent filling method. That is, the scheme in [8] does not use test cubes, so for the experiments, we generate a fully specified test set without don't-care bits and relax the given test set by the method explained in section IV. In the first column, the sizes of relaxed test sets are provided for the proposed scheme and [8]. The remaining columns show the average power consumption and peak power consumption results of the proposed scheme and [8]. As shown in Table 5, the proposed scheme shows much lower average power consumption and peak power consumption during scan testing by 48% and 14%, respectively. Although the low-power scheme in [8] does not require hardware modification and overhead compared to the proposed scheme, the proposed scheme results in a much greater reduction in both the average and peak power consumption. Therefore, the proposed scheme can be more efficiently applied to highly power-critical designs than [8].

## VI. Conclusion

A new scan chain architecture using scan chain partitioning and disabling was proposed to reduce both the average and peak power consumption. The scan chain is split into several lengthbalanced segments and only one segment is enabled in each test clock during both the shift and capture cycles. In addition, the segments that do not need to be activated are skipped or disabled during the shift and capture cycles. Therefore, with the proposed scheme, the average power and peak power consumption during test application can be significantly reduced. Since the efficiency of the proposed scheme greatly depends on how scan cells are distributed into scan segments, a new graph-based heuristic for scan partitioning has been developed to increase the average weight sum inside each sub-graph and to reduce the average weight sum among sub-graphs. Also, since the proposed scheme is sensitive to the initial test data set, a method based on the support set has been proposed to increase the number of don'tcare bits in a test data set. The experimental results show that both peak and average power consumption are reduced with the proposed scan architecture without degrading the performance of the designs and with minimal impact on area. Hence, this method represents a potential solution to power-related issues associated with scan-based testing. This enables shifting of test data at high frequencies without the risk of overheating the chip under test and eliminates the risk of noise-induced test failures.

# References

- [1] M. Abramovic, A.D. Friedman, and M.A. Breuer, Digital System Testing and Testable Design, John Wiley & Sons, 1993.
- [2] N.K. Jha and S. Gupta, Testing of Digital Systems, Cambridge University Press. 2003.
- [3] N.Z. Basturkmen, S.M. Reddy, and I. Pomeranz, "A Low Power Pseudo-Random BIST Technique," Proc. IEEE Int'l Conf. Computer Design, 2002,
- [4] A. Chandra and K. Chakrabarty, "A Unified Approach to Reduce SOC Test Data Volume, Scan Power, and Testing Time," IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems, vol. 20, 2001, pp. 355-368.
- [5] P.M. Rosinger, P.T. Gonciari, B.M. Al-Hashimi, and N. Nicolici, "Simultaneous Reduction in Volume of Test Data and Power Dissipation for Systems-on-a-Chip," Electronic Letters, vol. 37, no. 24, 2001, pp. 1434-1436.
- [6] S. Wang and S.K. Gupta, "DS-LFSR: A New BIST TPG for Low Heat Dissipation," Proc. Int'l Test Conf., 1997, pp. 848-857.
- [7] S. Remersaro, X. Lin, Z. Zhang, S.M. Reddy, I. Pomeranz, and J. Rajski, "Preferred Fill: A Scalable Method to Reduce Capture Power for Scan Based Designs," Proc. Int'l Test Conf., 2006, pp. 1-10.
- [8] S. Remersaro, X. Lin, S.M. Reddy, I. Pomeranz, and J. Rajski, "Low Shift and Capture Power Scan Tests," Int'l Conf. VLSI Design, 2007, pp. 793-798.
- [9] S. Gerstendorfer and H.J. Wunderlich, "Minimized Power Consumption for Scan-Based BIST," Proc. Int'l Test Conf., 1999, pp. 77-84.
- [10] R. Sankaralingam and N.A. Touba, "Inserting Test Points to Control Peak Power During Scan Testing," Proc. of IEEE Int'l Symp. Defect and Fault Tolerance in VLSI Systems, 2002, pp. 138-146.
- [11] L. Whetsel, "Adapting Scan Architecture for Low Power Operation," Proc. Int'l Test Conf., 2000, pp. 863-872.
- [12] P. Rosinger, B.M. Al-Hashimi, and N. Nicolici, "Scan Architecture With Mutually Exclusive Scan Segment Activation for Shift and Capture Power Reduction," IEEE Trans. on Computer-Aided Design of Intergrated Circuits and Systems, vol. 23, no. 7, 2004, pp. 1142-1153.
- [13] O. Sinanoglu and A. Orailoglu, "A Novel Scan Architecture for Power-Efficient, Rapid Test," Proc. IEEE/ACM Int'l Conf. Computer Aided Design, 2002, pp. 299-303.
- [14] R. Sankaralingam, Bahram Pouya, and N. A. Touba, "Reducing Power Dissipation During Test Using Scan Chain Disable," Proc. VLSI Test Symp., 2001, pp. 319-324.
- [15] I. Hamzaoglu and J.H. Patel, "Reducing Test Application Time for Full Scan Embedded Cores," Digest Papers, 29th Int'l Symp. Fault Tolerant Computing, 1999, pp. 260-267.
- [16] A.R. Pandey and J.H. Patel, "An Incremental Algorithm for Test Generation in Illinois Scan Architecture Based Designs," Proc. the Design, Automation and Test in Europe Conf., 2002, pp. 369-375.
- [17] M.R. Garey and D.S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness, Freeman, New York, 1979
- [18] A. Raghunathan and S.T. Chakradhar, "Acceleration Techniques for Logic BIST," Proc. Int'l Test Conf., 1995, pp. 338-346.
- [19] X. Chen and M.S. Hsiao, "Characteristic Faults and Spectral Information for

- Logic BIST," Proc. IEEE/ACM Int'l Conf. Computer Design, 2002, pp. 294-298.
- [20] S. Wang and S.K. Gupta, "DS-LFSR: A BIST TPG for Low Switching Activity," IEEE Trans. Computer Aided Design of Integrated Circuits and Systems, vol. 21, no 7, 2002, pp. 842-851.
- [21] F. Brglez, D. Bryan, and K. Kozminski, "Combinational Profile of Sequential Benchmark Circuits," Int'l Symp. Circuits and Systems, 1989, pp. 1929-1934
- [22] M. Schulz, A. Trischler, and T. Sarfert, "SOCRATES: A Highly Efficient Automatic Test Pattern Generation System," Proc. Int'l Test Conf., 1987, pp. 1016-1026.
- [23] M. Schulz and E. Auth, "Improved Deterministic Test Pattern Generation with Applications to Redundancy Identification," IEEE Trans. Computer Aided Design of Integrated Circuits and Systems, vol. 8, no. 7, 1989, pp. 811-816.



Hong-Sik Kim received the BS, MS, and PhD degrees in electrical and electronic engineering from Yonsei University, in 1997, 1999, 2004, respectively. He was a post-doctorial fellow at Virginia Tech, VA, in 2005, and a senior engineer at System LSI Group in Samsung Electronics Co. in 2006. He is currently a

Cheong-Ghil Kim received MS and PhD from

Department of Computer Science, Yonsei

University in 2003 and 2006, respectively. He

research professor at Yonsei University, Seoul, Korea. His current research interest includes design for testability, built in self test, and test compression algorithm.



was a post-doctoral researcher at the same department from September 2006 to February 2008. Currently, he is a professor at Department. of Computer Science, Namseoul University. His research areas are computer architecture and multimedia



embedded systems.

Sungho Kang received the BS degree from Seoul National University, Seoul, Korea, and MS and PhD degree in electrical and computer engineering from University of Texas at Austin. He was a post-doctorial fellow at University of Texas at Austin, a research scientist at the Schlumberger Laboratory for Computer

Science, Schlumberger Inc., and a senior staff engineer at Semiconductor Systems Design Technology, Motorola Inc. Since 1994, he has been an associate professor at Department of Electrical and Electronic Engineering, Yonsei University, Seoul, Korea. His current research interests include VLSI design, VLSI CAD, and VLSI testing and design for testability