# **On Diagnosable and Tunable 3D Clock Network Design for Lifetime Reliability** Enhancement

Li Jiang<sup>†</sup>, Pu Pang<sup>†</sup>, Naifeng Jing<sup>†</sup>, Sung Kyu Lim<sup>§</sup>, Xiaoyao Liang<sup>†</sup> and Qiang Xu<sup>‡</sup> <sup>†</sup>Department of CS&E, Shanghai Jiao Tong University, Shanghai, China <sup>§</sup>School of Electrical and Computer Engineering, Georgia Institute of Technology, GA <sup>‡</sup>Department of CS&E, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong

### Abstract

In three-dimensional (3D) integrated circuits (ICs), many clock-TSVs are deployed to deliver clock signals to different tiers with minimum skews. However, these clock-TSVs are prone to aging effects, such as thermal-mechanical stress and electromigration, rendering hard-to-predict clock skews at runtime. These skews have a wide range of influence on the flipflops, and may violate the safety margins of critical paths in the circuit. Besides the circuit aging effect, the clock-TSV induced skews pose another threat to the circuit lifetime reliability. To tackle this problem, we propose to put tunable buffer for each clock-TSV in the clock network, and introduce an efficient algorithm to place aging sensors in the circuit at design stage. Then, at runtime, we conduct online diagnosis and apply effective clock tuning algorithms based on the triggered alarms in the aging sensors. Experimental results on a post layout 3D circuit show that the proposed solution is able to significantly improve the lifetime reliability of 3D ICs.

#### Introduction 1.

3D-stacked integrated circuits (3D-SICs) have emerged as one of the most promising technologies to address the "More than Moore" trend [1]. They stack multiple silicon dies in vertical direction and use through-silicon vias (TSVs) as inter-die interconnect. The utilization of TSVs in 3D-stacked ICs can bring significantly reduction on global interconnects (e.g., memory buses [2]), thereby dramatically improving the performance of the circuit. At the same time, however, clock signals need to be delivered to the flip-flops across layers using TSVs [3], and how to design a reliable 3D clock tree is a challenging task.

TSV fabrication involves disruptive manufacturing processes, leading to various manufacturing defects, such as open hole and short defects. By assuming the clock-TSVs are testable, Lung et.al [4] proposed a fault tolerant structure that can disable the faulty TSV and reroute the clock signal through the redundant 2-D clock network designated for pre-bond test [5]. Unfortunately, TSV may also contain latent defects, such as TSV interfacial-crack [6, 7] induced by thermal-mechanical stress (CTE) and the micro-voids inside of the TSV structure left by the unsuccessful filling of the TSV [6,8,9] (see Figure 1(a)). These defects are difficult to screen during burn-in test, while their impact can be significant during field operation. That is, the cracks/voids inside of TSVs can expand due to continuing stress force [10] and electromigration (EM) effects [7]. Previous studies [7, 8] show that the interface between the



end of line and the TSV is quite prone to EM degradation.

What's worse, the stress aggravates the migration process, leading to significant increase of the resistance of TSVs [9]. Therefore, at run time, they can affect timing and harm the reliability of the 3D circuit far before they manifest themselves as detectable faults.

The latent defects inside signal-TSVs have been investigated in [11], in which, in order to mitigate the aging of signal-TSVs, the authors propose to periodically test the paths (especially critical paths) with signal-TSVs and check for any timing violation. If the path violates the timing requirement, the signal-TSV associated with the path can be repaired in the field. The technique presented in [11], however, cannot be directly applied to solve the aging of clock-TSVs. This is because, when unexpected skews appear in the clock network, they spread across the clock network, rendering a large number of possible timing violations. Figure 1(b) shows an example 3D clock network spread in a two-laver 3D IC with two clock-TSVs ( $TSV_1$  and  $TSV_2$ ) and six subtrees of sinks. If the skew forms in TSV1, it may propagate to its subordinative clock network (i.e., subtree S3 and S4). We can imagine that the higher hierarchy the clock-TSV is, the larger impact the aging effects of clock-TSVs can cause upon the circuit. At the sink level, the clock signal arrives at flip-flops with skews, which in turn change the safety margin of various circuit paths (e.g., P1 and P2) and cause timing variations.

In addition, transistors in the circuit suffer from various reliability issues such as NBTI [12], which also degrades the path delay. Moreover, the TSV induced stress may squeeze or stretch adjacent transistors and interconnects, which leads to mobility changes [13] of the nearby transistors [14]. All the above factors affect the reliability of 3D ICs at runtime and it

INTERNATIONAL TEST CONFERENCE Paper 17.1 978-1-4799-4722-5/14/\$31.00 ©2015 IEEE

is essential to develop a holistic solution to tackle them. Motivated by the above, we propose a diagnosable and tunable clock network design in this paper, and the main contributions include:

- To the best of our knowledge, we present the first work for the aging effects of the clock-TSVs and the corresponding countermeasure.
- We propose an online diagnosis technique that can differentiate the source of timing degradation, leveraging an existing circuit-aging diagnosis solution [15]. We propose a novel online clock tuning strategy that can enhance circuit lifetime reliability caused by clock-TSV aging and circuit aging.

The remainder of this paper is organized as follows: Section II introduces the related work and motivates this paper. The overview of the proposed techniques are described in section III. Section IV and Section V detail the proposed solution. In Section VI, we present our experimental results. Finally, we conclude this paper in Section VIII.

# 2. Related Work and Motivation

### 2.1. TSV reliability threats

The electrical, thermal and mechanical coupling effect involving TSVs can cause the mobility variation on transistors around TSVs. Yang et.al [16] considered the mobility variation of the clock buffers due to the TSV induced stress and proposed a stress-aware clock buffer insertion technique to minimize the unexpected skew. Another strategy is proposed in 3D clock network to eliminate the clock skew [17], wherein dummy TSVs are inserted wisely to reduce temperature and stress gradient. These solutions target on the reliability of transistors in the clock network instead of clock-TSVs themselves. However, we limit the scope of this paper to the clock-TSV aging only.

Jiang et al. [11] studies the TSV aging effect to the circuit and provides the corresponding solution. Nevertheless, we cannot directly use this method mainly because of the following reasons: (i) the degraded delay of a signal-TSV can be easily detected by online path delay testing, while testing the extra skew caused by clock-TSV aging is difficult; (ii) the clock-TSV has higher switching activity and current density than signal-TSVs, resulting in higher EM effect that aggregates the aging effect [18]; (iii) the skews induced by clock-TSVs aging can reach a larger scale in the circuit. How to detect the clock-TSV aging becomes the most challenge work.

#### 2.2. Post-silicon clock tuning techniques

Post-silicon clock tuning techniques are studied extensively to mitigate the clock skew variation caused by process variation [19–22]. Moreover, die-to-die variation in 3D ICs is taken account [23,24] when dealing with the clock skew variation. In these works, pairs of flip-flops are chosen and their clock



Figure 2 The structure of the example 3D clock network.

arrival times are compared to detect the clock skew. Then, they tune the clock buffers predesigned in the clock network to guarantee a skew balanced clock network. However, it may not be the best choice for the sake of life-time reliability because some skews can ease the circuit aging. We use the following example to explain the reason.

Figure 2 shows the structure of the example clock network shown in Figure 1. Consider the flip-flops (e.g., FF1) driven by an aging clock-TSV (e.g.,  $TSV_1$ ) with increasing clock skew, two scenarios may arise: (i) Circuit-wide paths, such as P2 with FF1 as the driving end while their receiving ends' clock signals do not go through TSV1, will have a reduced timing slack and may fail to meet setup time constraint; (ii) The paths, such as P3, with FF1 as the receiving end while their driving ends' clock signals do not go through  $TSV_1$ , will have a larger timing slack. It may offset the circuit aging effect. Needless to say, it is more desired to tune the clock skew in the favor of lifetime reliability, rather than achieving a skew-balanced clock network. To be specific, we can allow the clock skews induced by clock-TSVs to exist, in case that they can alleviate the circuit delay degradation for paths like P3 in Figure 2.

One may argue that the existing post-silicon clock tuning techniques for combating the circuit aging can also be used to solve the clock-TSV aging. An aging sensor [25] is shown in Figure 3. The basic idea is to check any transition in a guard-band interval before the clock signal. Any transition occurred in this internal indicates a timing error in the near future. The aging sensor is modified from a standard flip-flop by adding the following three components: 1) a delay element that defines the guard-band interval; 2) a logic circuitry to check the transition during the guard-band interval; and 3) an additional latch or flip-flop that works with the transition checker and holds its output.

The insertion of aging sensor presented in [26] demands the static timing analyzers (STA) for providing the estimation of



Figure 3 Aging sensor design [25].

the delay of the paths in the circuit. They protect those long paths whose delay (x) satisfying  $x \times 120\% >$  clock period, by equipping their receiving end flip-flops with aging sensors. Because they believe the NBTI induced aging normally can only cause less than 20% extra delay. Meanwhile, they insert tuning elements on the clock wires prior to the flip-flop (e.g.,  $FF_x$ ). Whenever the aging sensor in  $FF_x$  is triggered, the tuning element generates the extra delay so that the clock signal arrives  $FF_x$  later to provide the path with extra safety margin. The path actually borrows this extra safety margin from the successive path, for which the  $FF_x$  is the driven end. This method, however, may not be fit for combating clock-TSV aging. Because a single clock-TSV can spread the clock skews to a large scale of flip-flops through the clock network, which can jeopardize a great number of paths. In addition, when the clock-TSV aging becomes dominant (the TSV aging induced delay has no limit compared to the circuit aging), some noncritical paths may also become critical. This will result in a large cost of both aging sensors and tuning elements. Therefore, a cost-effective method is essential for online clock skew detection and tuning.

### 2.3. Motivation

Motivated by above, in this work, we reuse the existing design and mechanisms as much as possible to serve our purpose.

First, we propose to detect the clock-TSV aging by detecting the resulting skews in the paths. As shown in Figure 4, circuit aging combat technique [26] equips flip-flops with aging sensor and CVD to monitor the degraded path delay and tunes the clock arrival time. We can reuse and add specific aging sensors to flip-flops not only for critical paths, but also for clock-TSVs. These aging sensors are placed at the receiving ends of the paths to predict whether they may fail in the near future due to the clock-TSV aging. As the clock skew may spread across more paths, it is essential to develop an efficient aging sensor placement algorithm. Moreover, we insert tunable buffer behind the clock-TSV in the clock network instead of using CVD, as it is more effective to migrate the clock-TSV aging.

Second, as transistors and wires in the circuit also age with time, it is essential to distinguish the reason for the degraded path



Figure 4 The framework.

delay, whenever a failure is predicted with the aging sensors. For example, it could be because of either the clock-TSV induced skew or the circuit aging induced delay, or both. As a result, the derived clock network must be diagnosable. On the one hand, we can use online testing as being used in [11] for diagnosis. On the other hand, we propose a speculation based technique for diagnosis.

Third, when it is confirmed that the predicted failures are caused by a clock-TSV, we can tune the skew of the defective clock-TSV accordingly back to normal. Otherwise, if the skew is caused by circuit aging, we can still mitigate the delay degradation leveraging the tunable buffer resided in the clock-TSV. Note that, if the clock-TSVs is "over-aged" due to severe degradation, we have to use a spare one for replacement [11].

To build such a diagnosable and tunable clock network, we propose an efficient algorithm to insert aging sensors for cost reduction. Furthermore, three effective algorithms are proposed for online diagnosis and guiding the clock tuning procedure.

### 3. Overview of the Proposed Techniques

As shown in Figure 5, the proposed clock-TSV reliability enhancement technique is conducted in two stages: In the presilicon stage, we insert aging sensors and tunable buffers during the clock tree synthesis. Meanwhile, we generate corresponding path-delay test patterns for diagnosis. Section IV gives the details of the placement for the aging sensors.

We place tunable clock buffers following each clock-TSV, and they can be tuned either forwardly or backwardly [26, 27]. That is because, as shown in Figure 6, the aging of clock-TSV (e.g., TSV2) can harm path (e.g., P3)'s service life, while on the contrary, it can be benefit for another path (e.g., P1). Because of this, we have two ways to eliminate the timing degradation: (i) tune the buffer backwardly, through which the driving-end flip-flop's clock passes; (ii) Tune the buffer forwardly, through which the receiving end flip-flop's clock passes.

In the post-silicon stage, we query the sensors and get a list of the triggered alarms. For a triggered alarm, we diagnose the

 Paper 17.1
 INTERNATIONAL TEST CONFERENCE

 978-1-4799-4722-5/14/\$31.00
 ©2015 IEEE



Figure 5 Overview of the proposed technique.

cause of the sensor triggering and decide which clock-TSVs to tune. We rely on two mechanisms, i.e., speculation and online testing. The former approach guesses the possible root cause purely relying on the dynamic sensor information, and the static structure information, and thereby its accuracy is not guaranteed. That is why we need a deterministic approach, i.e., the later one, to confirm the root cause by online testing. The above two diagnosis mechanisms lead to two different clock tuning processes. The basic idea of speculation based diagnosis is trial-and-error, and we only configure the tunable buffer by one-step for each diagnosis. While the latter one can get us a confirmed diagnosis result, if possible, so that we can tune the buffer continuously until the skew is limited. We can also verify that by online testing.

Compared to the speculation, however, the online testing need to halt the system before feeding the test patterns. We cannot afford to do that in those systems where the availability is important. Therefore, we advocate a compound solution to strike a balance between the reliability and the availability.

It should be noted that, the speculation, the online test and the online tuning are all conducted online by the processor and the corresponding internal data (e.g., programs, test data and alarm list etc.) is stored in the nonvolatile memory, as shown in Figure 4, which is similar to [11].



Figure 6 An example critical path with both of its ending flipflops' clock signals pass through two different clock-TSVs.



Figure 5 Problem formulation for aging sensor placement: (a) The circuit structure containing all the critical paths; (b) Transform to a bipartite graph.

#### 4. Aging Sensor Placement

The aging sensors are responsible for failure predication at runtime. For a cost-effective placement, minimum sensors should be inserted to cover the most clock-TSVs. At the same time, they should have a quick response upon any failure.

To achieve the first goal, we formulate the problem and solve it as follow: we first extract the key pieces related to our problem from the circuit. That is, the clock-TSVs, the flip-flops and the circuit paths. Figure 7(a) shows such an example. The driving end flip-flops of circuit paths (denoted as DrvFFs) are on the left, while the receiving end flip-flops (denoted as RcvFFs) are on the right. The clock-TSVs provide the clock signals for DrvFFs/RcvFFs are on the left/right (denoted as DrvTSVs/RcvTSVs). The edge between flip-flop and TSV indicates that the flip-flop's clock signal goes through the TSV, while the edges between flip-flops are the circuit paths. Note that many flip-flops serve as both DrvFFs and RcvFFs at the same time. Therefore, we duplicate them if necessary.

Next, we remove those circuit paths immune to the clock-TSV induced skews, i.e., the one with both its DrvFF's and RcvFF's clock signal going through the same clock-TSV (e.g., path  $FF_1 - FF_7$ ) or no DrvTSVs (e.g., path  $FF_5 - FF_{10}$ ).

At last, we collapse the graph by removing the DrvFFs and the RcvTSVs but keep the connections from DrvTSVs to their reachable RcvFFs. The resulting graph is bipartite as shown in Figure 7(b), which clearly shows which flip-flops whose clock signal passing a clock-TSV (affected flip-flops). Then, the problem of aging sensor placement can be formulated as a minimum set-cover problem. That is, given a set of clock-TSVs, a set of flip-flops, and each flip-flop contains a subset of clock-TSVs, the objective is to select the minimum number of flip-flops to cover all the clock-TSVs. A simple greedy approximation can solve this problem efficiently.

For the second goal, that is, to predict the timing error as early as possible, we can simply resort to a minimum weighted setcover problem. Generally, the critical path with longer delay fails earlier and therefore should have a higher priority for sensor placement. Meanwhile, the more clock-TSVs a skew can travel through, the larger the skew is aggregated when arriving the path's DrvFF (similar for RcvFF). To achieve this, we add a weight of each edge in the bipartite graph in Figure 7(b) as follow:

$$w = \frac{1}{D_{path} + m \cdot D_{DrvTSV} - n \cdot D_{RcvTSV}}$$
(1)

where  $D_{path}$  is the STA delay of the critical path, and DTSV is an user-defined value indicating the TSV aging induced clock skew.

Moreover, there is a special path in the circuit, which only has the RcvTSV but no DrvTSV. According to Figure 6, the contribution of aging from TSVs would be beneficial to this special paths service life. If we desire to extend this special paths service life by means of clock tuning with tunable buffer, we have to insert the aging sensor on its RcvFF and tune the RcvTSV forwardly.

# 5. Online Diagnosis and Tuning

In the run time, the aging sensors monitor the paths and trigger the alarm when the transition approaches the timing guard band. Thus, we still have time window for diagnosis and tuning before the circuit actually fails. Our diagnosis and tuning begin immediately after any sensors trigger the alarms. We first speculate the root cause based on the sensor information and try tuning the buffer. If the speculation is correct and the tuning have the effect, the alarms disappear. Otherwise, the alarms keep ringing and more sensors will be triggered. In this time window, we allow wrong speculations. But, if the speculation doesn't work well in some complicated circuit structures, we have to seek help from online testing. In this section, we describe three approaches.

#### 5.1. Clock Tuning based on Speculation

Clock tuning based on speculation is a trial-and-error solution. Thus, the speculation and tuning are tightly integrated. To speculate the root-cause of a predicted timing violation, we must use all the possible information, including the triggered sensors and the structural relationship in the circuit. Given this information, we need to choose a "suspicious" clock-TSVs for tuning. We use two weights for clock-TSVs as their probability to be chosen. We describe their indication followed by their formal definition, and describe how to use them in online diagnosis and tuning.

- Suspicious Weight, indicating the probability of the clock-TSV to be the sources of skew given the alarm list.
- Beneficial Weight, indicating the benefits of the clock-TSV to be tuned, in terms of the safety margin for the affected paths.

When the aging sensors trigger alarms (e.g., sensors in  $FF_7$  and  $FF_8$  in Figure 8(a)), we query the list of alarms. For each sensor (i.e., RcvFF), we maintain a static table for its related DrvTSVs, by inspecting the circuit structure (e.g.,  $FF_7$  is related to  $TSV_1$ ,  $TSV_2$  and  $TSV_3$  in Figure 8). According to the query, the



Figure 9 An example of beneficial weight.

suspicious weight of a clock-TSV is defined as the number of its related RcvFFs whose sensor is triggered (e.g., the suspicious weights of  $TSV_1$ ,  $TSV_2$  and  $TSV_3$  are 2,1 and 1 in Figure 8(b)).

Paper 17.1 INTERNATIONAL TEST CONFERENCE 978-1-4799-4722-5/14/\$31.00 ©2015 IEEE Before presenting the definition of the beneficial weight, let's investigate the effect of tuning TSVs in the circuit using an example shown in Figure 9 (a). Tuning  $TSV_2$  backwardly affects multiple paths. Some paths' safety margins are extended (e.g., path  $FF_3$ - $FF_6$ ,  $FF_4$ - $FF_7$ ), while others' are shortened (e.g., path  $FF_1$  –  $FF_8$ ). A clock-TSV may be a DrvTSV for some paths and/or a RcvTSV for others. Therefore, we define two beneficial weights for clock-TSVs regarding their roles. Suppose a clock-TSV is a DrvTSV for *D* paths and it is also a RcvTSV for *R* paths, we define its weights as follow:

$$w_{DrvTSV} = D - R \tag{2}$$

$$w_{RcvTSV} = R - D \tag{3}$$

For example,  $TSV_2$  in Figure 9 (b) is a DrvTSV for path  $FF_3$ - $FF_6$  and path  $FF_4$ - $FF_7$  so that its static weight equals 1. But from the point of view of path  $FF_1$ - $FF_8$ , it is a RcvTSV and its beneficial weight equals -1.

Given these two types of weight, we now show how to decide which clock-TSV to tune based on speculation. We start from tuning the DrvTSVs given the alarm list. If there are multiple suspicious DrvTSVs, we compute their suspicious weight and choose the one with maximum weight, indicating it is the most suspicious one. A more complicated scenario happens when multiple clock-TSVs preserve the same maximum suspicious weight. In this case, we choose the one with maximum beneficial weight from these clock-TSVs with the same maximum suspicious weight.

If no DrvTSV found for a triggered sensor, we know the predicted timing error is induced by circuit aging. A circuit aging countermeasure can be involved. Fortunately, we can tune the RcvTSV of this aging circuit path, on the contrary, with the maximum beneficial weight. Because it has a high likelihood to give the aging path more safety margin without hurting its successive paths.

The speculation provides no confirmed root-cause, therefore, we tune the buffer (forwardly or backwardly) with only one step. If the speculation is successful, no alarms occur in future within a period of time. Otherwise, above procedure will be repeated.

#### 5.2. Clock Tuning based on Online Test

The speculation based clock tuning may have false diagnosis, which can aggravate the circuit aging. Therefore, in this section, we introduce a more accurate method relying on the online test. The basic flow of this method is shown in Figure 5: (i) online testing and diagnosis; (ii) extensive search based tuning. We will discuss the solution if the root-cause is circuit aging in the end.

To conduct online testing and diagnosis, we pick up the victim paths and apply path-delay test patterns through scan-chains and observe the results. The diagnosis works as follow:



Figure 10 Overview of the extensive search and an example of the tuning process.

Whenever an aging sensor is triggered (e.g.,  $FF_6$  in Figure 9(b)), we find all the suspicious paths (e.g., path  $FF_2$ - $FF_6$  and path  $FF_3$ - $FF_6$ ), and obtain the suspicious clock-TSVs (e.g.,  $TSV_1$ and  $TSV_2$ ). Consequently, we conduct the path-delay testing on these suspicious paths and see which one triggers the alarm again (e.g.,  $FF_2$ - $FF_6$ ). Therefore, we can narrow down the suspicious list (e.g.  $TSV_2$  is removed from the suspicious list).

To exclude the circuit aging when diagnosing, we conduct extra delay testing on some extra paths whose DrvTSV is also the suspicious clock-TSV (e.g., path  $FF_1 - FF_7$  and  $FF_1 - FF_8$ in Figure 9). At this moment, however, most of these extra paths can pass the delay testing. Otherwise, we should have already observed more triggered sensors. To confirm the suspicious clock-TSV is the root-cause, we intentionally tune the buffer of the suspicious clock-TSV forwardly, rendering less safety margin for these paths under test. If these extra paths fail the tests, we have high confidence to nail down the clock-TSV as the root cause. Otherwise, we can confirm the cause of triggered alarm is caused by the circuit aging.

Next, we describe the tuning mechanism in detail. We tune the buffer iteratively until the test confirms no alarms will be trigged. In each iteration, we conduct the delay testing for validation: (i) the triggered alarms will not be triggered again; (ii) no other critical paths trigger any alarms or fail the testing. The former condition is used to make sure that the iteration of backward clock tuning continues until the victim path has enough timing slack. This is the major difference compared to "One-step Tuning". While the latter condition guarantees

Paper 17.1 INTERNATIONAL TEST CONFERENCE 978-1-4799-4722-5/14/\$31.00 ©2015 IEEE



Figure 11 Two level safety margin.

enough timing slacks for those paths whose RcvTSV is the root cause TSV. Because, they lend the extra timing slacks to the victim paths. As long as the two conditions are met, we extensively search for clock-TSV to be tuned.

Figure 10(a) illustrates the extensive search based procedure using the example circuit structure shown in Figure 10(b). Suppose the online testing and diagnosis confirm that the victim path  $FF_1 - FF_8$  triggers the alarm, and the root cause clock-TSV is  $TSV_1$ , which is our first choice TSV. We tune the buffer of this clock-TSV backwardly in iterative manner until path  $FF_1 - FF_8$  no longer triggers the alarm (step 1). In the step 2, if the tuning of first choice TSV successes, we use online test to verify if new victim paths emerge. To be specific, we verify those paths whose RcvTSV is the one we just tuned (path  $FF_5$ - $FF_7$ ). If no such paths exist, we report success. Otherwise, we deal with such paths as a new victim path and conduct the extensive search based tuning recursively (step 3). However, the tuning may fail when the buffer's range is exceeded before the tuning successes. We restore the previous tuning and try to tune the buffer of another target clock-TSV (e.g. TSV<sub>2</sub>) with the same process (step 4). Note that, the failed tuning for those new emerged paths can also be restored, backtracking bottom up through the extensive search tree. If there is no other target clock-TSV, the algorithm reports failure and exits. In order to avoid potential deadlock and excessively deep recursive process, we restrict the adjustment direction in each tunable buffer. That is, if a clock-TSV's buffer is tuned backwardly, it can only be tuned backwardly in the remaining tuning process.

If the root cause is circuit aging, we can tune the tunable buffer of RcvTSV with maximum beneficial weight (as mentioned in section V-A).

#### 5.3. Compound Solution

In this section, we proposed a compound solution to balance the accuracy and test cost by integrating the above two methods.

The basic idea is to divide the safety margin of the monitored paths by half, as shown in Figure 11. In the front half (secondary safety margin), we allow speculation based clock tuning to try multiple times without jeopardizing the circuit reliability. If the clock-TSV and the circuit keep aging and/or



Figure 12 Two level aging sensor design.

the speculation keep trying and failing, some paths have their delay breaking into the primary safety margin. We then employ the online testing to pursue an accurate and successful diagnosis and tuning.

To support the compound solution, the key is to design an aging sensor that can trigger two level alarms. As depicted in Figure 12, we add another delay element with the delay  $D_1$  into the aging sensor, which is linked with the original delay element  $(D_2)$  in series. Thus, the secondary sensor output has  $D_1 + D_2$ delay, i.e., a guard band  $D_1 + D_2$  (see Figure 11).

### 6. Experimental Results

# 6.1. Experimental Setup

To evaluate the effectiveness and efficiency of the proposed solution, we perform simulation studies and report results on MTTF and test times.

We use the circuit without any aging sensors and tunable buffers as the baseline solution. While the circuit with the proposed three methods are denoted as "Speculation", "Online testing", and "Compound" respectively. We also compare our aging sensor placement and compound solution to the one in [26] wherein only critical paths are monitored and only the RcvFFs of these critical paths are combined with tunable CVD.

We use the fast-Fourier transform (FFT) circuit from the IWLS 2005 OpenCore benchmarks in our experiments. The FFT circuit has 229,000 gates and 20,000 flops. We employ a performance-driven placement engine to partition the circuit into two layers with the Nangate open cell library. Then, we apply the statistic timing analysis on the resulting 3D FFT circuit and extract the timing of critical paths. The 3D clock network of FFT circuit is then generated using a low power clock synthesis technique [3]. Given the critical paths information and the 3D clock network, we insert aging sensors using the proposed method. To mimic the circuit aging effect, we apply a NBTI-induced performance degradation model [28] to all the gates, assuming their duty cycles to be 50%. We treat signal-TSVs aging as circuit aging since they both degrade the



Figure 13 The histogram of delay increment percentage of critical paths in the 10th year comparing the one without and with clock-TSV aging.

path delay. If the signal-TSVs are over-aged, we can resort to previous solutions [11].

The aging rate (A) for each clock-TSV can be obtained as [11]. To be specific, we first extract the rate of resistance increases of TSV with EM reliability model for TSVs [9], and use the TSV resistance to derive the latency using the model in [29]. Compared to the aging effect of signal-TSVs, the switching activity of clock-TSV (i.e., clock signal) is more regular, and we obtain A using uniform distribution considering different factors such as temperature, thermal mechanical stress for each clock-TSVs. Meanwhile, the clock-TSV has higher current density than signal-TSVs as their currents are always switching, and thereby we vary the average A of clock-TSV from 0.8 to 2.8, which covers a larger range than that for signal-TSVs [11]. Given the average A, we generate 200 sample circuits, in which the A of each clock-TSV is randomly assigned. We simulate the circuit lifetime for 10 years and report the average MTTF value for all these samples.

#### 6.2. Results and Analysis

First, we investigate the impact of clock-TSV aging in terms of path delay degradation. Figure 13 shows the delay increment percentage of 64 critical paths. Compared to the simulation solely considering the circuit aging, the clock-TSV aging dramatically increase the delay distribution of these paths. We also observe that the clock-TSV aging can either exacerbate or alleviate the circuit aging. The result coincide to our expectation and motivation.

Figure 14(a) presents the MTTF with the three proposed techniques, wherein the MTTF results are normalized to the one of the baseline solution. The MTTF decreases for all three methods as A increases. The possible explanation is that the paths suffer from more severe delay degradation as clock-TSV aging is aggravated (i.e., A is larger). The resulting safety margins are reduced and exceed the tunable range more quickly, which cause the circuit failure earlier. The "Online test" outperforms the "Speculation". The result of "Compound" is in between of the above two methods. The MTTF curve of the



Figure 14 Experimental results on MTTF and test cost varying A. The tunable range is 5% of the clock period. In (a), the MTTF results of the three methods are normalized to "No Tuning". (b) shows the test cost of "Compound Solution" normalized to "Online Test".



Figure 15 Simulation results comparing the aging sensor placement and the "Compound" to [26]. The MTTF results are normalized to the on in [26]. The tunable range is 5% of the clock period.

compound solution is close to "Speculation" when A is small, while it approaches to "Online test" as A grows. The underlying reason is the proportion of "online test" in the compound solution increase when A increase. Because, the victim paths enter the primary safety margin more quickly when A is larger. Figure 14(b) presents the test cost of

Paper 17.1 INTERNATIONAL TEST CONFERENCE 978-1-4799-4722-5/14/\$31.00 ©2015 IEEE

"Compound" which is normalized to "online test". Compared to "Online Test", the "Compound" can dramatically saves the test cost.

Figure 15 compares the MTTF results of our compound solution and aging sensor placement to the one in [26]. The 3D FFT circuit contains 12 clock-TSVs and 64 critical paths to be monitored by [26], as shown in Figure 15(a). Both of them run very fast in microseconds. The proposed aging sensor placement inserts less aging sensors when compared to the previous aging sensor placement technique [26]. The tuning elements we used is similar to [26] but we add one tunable buffer for each and every TSV, instead of the FFs in critical paths. Generally speaking, the number of clock-TSVs couldn't be large. Hence, the overhead of our proposed techniques couldn't be larger than [26]. Interestingly enough, as can be seen in Figure 15(b), the compound solution still outperforms the circuit aging combat solution [26] with less aging sensors, especially when A increases. To understand the main underlying reason, we analyze the circuit structure and find out that most of the critical paths are partitioned into two layers due to the performance-driven 3D placer. It results in scenarios in which multiple critical paths share a single DrvTSV. As the primary target of the proposed algorithm is the full coverage of the clock-TSVs instead of monitoring all the critical paths, we only pick one of them to monitor, leading to less aging sensors. However, when the aging sensor triggers the alarm due to either the circuit aging in its monitoring path or the aging of its DrvTSV, the buffer following the DrvTSV is tuned backwardly. This not only migrates the degraded delay in the path monitored by the aging sensor, but also alleviates the degraded delay for other paths without aging sensor.

In Figure 15(b), we also observe that the compound solution leads to higher MTTF when the aging of clock-TSVs dominates the path delay degradation (i.e., larger A). There are two possible reasons: (i) some non-critical paths are under monitors that may cause timing error due to the clock-TSV aging; (ii) the proposed tuning methods can tune the clock arrival time in either the DrvFF or the RcvFF of a degraded path, which leads to a larger solution space for tuning; (iii) some victim paths only have RcvTSVs but no DrvTSVs, wherein the aging of the RcvTSV is beneficial to the circuit aging.

# 7. Limitation and Discussion

It must be admitted that the Aging Sensors, Tunable Buffers and the associated control logics will also suffer from the aging effect, rendering the degraded capability of performing effective diagnosis and tuning. An aging sensor is stressed only when its associated critical path is stressed. Similarly, tunable buffer works only when the aging sensors trigger the alarm. The switching activity of these additional logics are much less than the logics and TSVs in the clock network. Therefore, the aging effect of these additional logic could be nearly ignored. The proposed technique can be easily applied to circuit level partitioned 3D ICs. For block-level partitioned 3D ICs, our solution can still work and benefit the reliability. In this 3D design, TSVs are placed in white-space outside of the blocks [14], and it is easy to place tunable buffers to clock-TSVs. But, it may need some design effort for diagnosis. For example, we need to reuse or design the DfT structures in blocks for online testing, and insert aging sensors into the blocks.

# 8. Conclusion

In this work, we investigate the impact of clock-TSV aging in 3D ICs and the countermeasures to mitigate it.

To be specific, we propose to insert aging sensors in the circuit and put a tunable buffer following each clock-TSV. The runtime information of the triggered alarms in these aging sensors are used for efficient and effective online diagnosis, with which we further propose three online diagnosis and clock tuning methods to diagnose the root cause of the failure, associated with an efficient tuning algorithm for each method. Experimental results show that dramatic reliability enhancement is achieved with the proposed solutions.

# 9. Acknowledgement

This work was supported in part by Shanghai Science and Technology Committee under Grant No.15YF1406000, and in part by the Hong Kong S.A.R. General Research Fund (GRF) under Grant No. 418112 and Grant No. N CUHK444/12, in part by National Natural Science Foundation of China under Grant No. 61432017.

### **10. References**

- Semiconductor Industry Association (SIA). The International Technology Roadmap for Semiconductors (ITRS): 2011 Edition. http://www.itrs.net/, 2011.
- [2] Y. Xie, et al. Design space exploration for 3d architectures. ACM Journal on Emerging Technologies in Computing Systems (JETC), 2(2):65–103, 2006.
- [3] X. Zhao, J. Minz, and S. K. Lim. Low-power and reliable clock network design for through-silicon via (tsv) based 3d ics. Transactions on Components, Packaging and Manufacturing Technology, 1(2):247– 259, 2011.
- [4] C.-L. Lung, et al. Through-silicon via fault-tolerant clock networks for 3-d ics. Transactions on Computer-aided Ddesign of Integrated Circuits and Systems, 32(7), 2013.
- [5] X. Zhao, et al. Low-Power Clock Tree Design for Pre-Bond Testing of 3-D Stacked ICs. Transactions on Computer-Aided Design of Integrated Circuits and Systems, 30(5):732–745, 2011.
- [6] S.-K. Ryu, et al. Impact of near-surface thermal stresses on interfacial reliability of through-silicon vias for 3-d interconnects. Transactions on Device and Materials Reliability, 11(1):35–43, 2011.
- [7] K.H. Lu, et al. Thermal stress induced delamination of through silicon vias in 3-d interconnects. In Electronic Components and Technology Conference (ECTC), pages 40–45, 2010.
- [8] Y.C. Tan, et al. Electromigration performance of through silicon via (tsv)-a modeling approach. Microelectronics Reliability, 50(9):1336– 1340, 2010.
- [9] T. Frank, et al. R El Farhane, and L Anghel. Resistance increase due to electromigration induced depletion under tsv. In International Reliability Physics Symposium (IRPS), pages 3F–4, 2011.

Paper 17.1 INTERNATIONAL TEST CONFERENCE 978-1-4799-4722-5/14/\$31.00 ©2015 IEEE

- [10] K. Athikulwongse, et al. Stress-driven 3d-ic placement with tsv keepout zone and regularity study. In International Conference on Computer-Aided Design, pages 669–674. IEEE Press, 2010.
- [11] L. Jiang, et al. On effective and efficient in-field tsv repair for stacked 3d ics. In Design Automation Conference, page 74, 2013.
- [12] W. Wang, et al. An efficient method to identify critical gates under circuit aging. In IEEE/ACM International Conference on Computer-Aided Design., pages 735–740, Nov 2007.
- [13] G. et al. Van der Plas. Design issues and considerations for lowcost 3d tsv ic technology. IEEE Journal of Solid-State Circuits, 46(1):293 – 307, 2011.
- [14] X. Zhao and S. K. Lim. Tsv array utilization in low-power 3d clock network design. In International Symposium on Low Power Electronics and Design, pages 21–26, 2012.
- [15] Z. Lak and N. Nicolici. In-system and on-the-fly clock tuning mechanism to combat lifetime performance degradation. In International Conference on Computer-Aided Design, pages 434–441, 2010.
- [16] J.-S. Yang, et al. Robust clock tree synthesis with timing yield optimization for 3d-ics. In Asia and South Pacific Design Automation Conference, pages 621–626, 2011.
- [17] MPD Sai, et al. Reliable 3-d clock-tree synthesis considering nonlinear capacitive tsv model with electrical-thermal- mechanical coupling. Transactions on Computer-Aided Design of Integrated Circuits and Systems, 32(11):1734–1747, 2013.
- [18] Y. C. Tan, et al. On using on-chip clock tuning elements to address delay degradation due to circuit aging. Microelectronics Reliability, 50(9-11):1336C1340, 2010.
- [19] P. Mahoney, et al. Clock distribution on a dual-core, multithreaded itanium R-family processor. In International SolidState Circuits Conference., pages 292–599, 2005.
- [20] J.-L. Tsai, L. Zhang, and C.C. Chen. Statistical timing analysis driven post-silicon-tunable clock-tree synthesis. In International Conference on Computer-Aided Design., pages 575–581, 2005.

- [21] K. Nagaraj and S. Kundu. An automatic post silicon clock tuning system for improving system performance based on tester measurements. In International Test Conference., pages 1–8, 2008.
- [22] Y.-C. Kao, H.-M. Chou, K.-T. Tsai, and S.-C. Chang. Synthesis of an efficient controlling structure for post-silicon clock skew minimization. In International Conference on Computer-Aided Design., pages 746–749, 2010.
- [23] T.-Y. Kim and T. Kim. Post silicon management of on-package variation induced 3d clock skew. Journal of Semiconductor Technology and Science, 12(2):139–149, 2012.
- [24] K. Chae, et al. Tier adaptive body biasing: A post-silicon tuning method to minimize clock skew variations in 3-d ics. Transactions on Components, Packaging and Manufacturing Technology, PP(99):1–1, 2013.
- [25] M. Agarwal, et al. Circuit failure prediction and its application to transistor aging. In VLSI Test Symposium, pages 277–286. IEEE, 2007.
- [26] Z. Lak and N. Nicolici. On using on-chip clock tuning elements to address delay degradation due to circuit aging. Transactions on Computer-Aided Design of Integrated Circuits and Systems, 31(12):1845–1856, 2012.
- [27] Y. Elboim, A. Kolodny, and R. Ginosar. A clock-tuning circuit for system-on-chip. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 11(4):616–626, 2003.
- [28] Y. Wang, H. Luo, K. He, R. Luo, H. Yang, and Y. Xie. Temperature-aware nbti modeling and the impact of standby leakage reduction techniques on circuit performance degradation. Transactions on Dependable and Secure Computing, 8(5):756–769, 2011.
- [29] F. Ye and K. Chakrabarty. Tsv open defects in 3d integrated circuits: Characterization, test, and optimal spare allocation. In Design Automation Conference, pages 1024–1030. ACM, 2012.