# Variation-Tolerant and Low-Power Clock Network Design for 3D ICs

Xin Zhao, Saibal Mukhopadhyay, and Sung Kyu Lim Georgia Institute of Technology 777 Atlantic Dr. NW, Atlanta, GA 30332, U.S.A {xinzhao, saibal, limsk}@ece.gatech.edu

# Abstract

This paper studies the random characteristics of throughsilicon-via (TSV)-based 3D clock networks, taking into account both die-to-die and within-die process variations in clock buffers, interconnects, and TSVs. We investigate many design parameters which may cause clock skew variation, including the TSV RC parasitics, the TSV count, the stack die number, and the range of variations. Key insights are as follows: 1) under the circumstances of random uncorrelated TSV variation with no TSV defect, our experimental results show that the TSV variation is a new source affecting skew variability, but performs as a secondary contributor on clock skew degradation, compared with other types of random effects; 2) though several concerns exist that a 3D clock network using many TSVs may suffer from high skew variation by introducing the uncertainties of TSV electrical parasitics, our study demonstrates that a 3D clock network with multiple TSVs can decrease the random effects by using fewer buffers and shorter interconnects. The multi-TSV strategy can achieve both less power dissipation and small skew variation in 3D clock networks.

#### I. Introduction

Three-dimensional integrated circuits (3D ICs) have gradually shown promising potentials of low cost, further miniaturization, small area, low power, high bandwidth, and heterogeneous stacking enable. Many studies have focused on the through-silicon via (TSV) manufacture development, electrical or thermal mechanical characteristic modeling, and TSV-aware design automation [2], [7]. On the other hand, however, process variation has always been a critical aspect of semiconductor fabrication. As technology keeps scaling down, the variability in devices and interconnects becomes inevitable and serious [6]. Especially, a 3D design has to suffer from the global die-to-die (D2D) variation in addition to the local within-die (WID) randomness, which would exaggerate the already serious variability issues. Therefore, a good understandings on the variability in 3D ICs plays a significant role in achieving high yield and low cost.

A 3D clock network is a global component consisting of TSVs, clock buffers, and wires. The primary goal of a 3D clock design is to minimize the clock skew and constrain the maximum slew to guarantee timing integrity. Besides, a 3D clock network is power hungry that necessitates a close attention on low power dissipation technique. Moreover, this network travels over the entire 3D stacks and is highly sensitive to the variations: it is susceptible to both WID and D2D variations; in addition, the TSVs may bring in additional randomness, which has not been well characterized yet. As a



Fig. 1. Two samples of two-die stack clock networks using single TSV for die-to-die communication (a) or using 10 TSVs (b).

result, the clock network with zero skew in pre-silicon phase may degrade seriously due to the variations that are difficult to predict.

The history of 3D clock synthesis is short. Previous studies mainly fall into the following categories: clock skew minimization in the presence of thermal gradients [8], enabling pre-bond testability [5], [15], and design for a low-power clock network [14]. These 3D clock networks have an outstanding property that a complete 2D clock tree locates in one die (usually the die where the clock source is placed), and the other dies have many separate subtrees<sup>1</sup>. Those subtrees are connected to the complete 2D tree using TSVs (see Figure 1). This topology takes advantages of a significant wirelength reduction and less power dissipation when using multiple TSVs. Correspondingly, several optimization techniques have been developed for multiple TSV utilization to obtain a low-power clock network while achieving good controls on clock skew and slew.

However, the variation impact on 3D clock networks has not been fully addressed yet. Using various numbers of TSVs leads to different wirelength, buffer counts, and power consumption. An optimal number of TSVs for low-power network inherently decreases the uncertainties from interconnects and buffers. However, using more TSVs introduces increasing randomness in TSVs, which may in turn bring in higher skew uncertainty than using fewer TSVs. Note that the process variation in TSVs is unavoidable, e.g., the oxide thickness variations, the substrate doping concentration fluctuations, and misalignment from bonding. The TSV variation inherently alters its electrical parasitics, which are then translated into a delay mismatch and clock skew degradation. Thus, the TSV variation mechanism and the design space exploration for small skew variation need to be well studied.

<sup>&</sup>lt;sup>1</sup>Note that the clock source and the complete tree are not constrained to the top die. The bottom die is allowed to have a complete tree with a source node, instead.

This paper is to investigate the TSV random effect and to analyze the impact of WID and D2D process variation on clock performance. We aim at figuring out the guidance for TSV planning in 3D clock networks to obtain both less power consumption and small skew variation. The contributions are as follows:

- We study the TSV variation from manufacture process. Considering the process variations in TSVs, buffers, and interconnects, we discuss their influences on clock skew randomness. We also analyze the impact of the TSV count, the TSV parasitic capacitance, the stacked die number, and the variation range on skew distribution.
- The multiple TSV policy shows a promising nature of decreasing clock skew variation caused by the random process variations. We show several trends of design metrics, including wirelength, clock power consumption, and clock skew distribution with respect to the TSV counts and the TSV parasitics. These curves indicate a range of TSV counts that can achieve both low power consumption and skew variation tolerance in 3D clock network designs.
- Under the circumstances of random uncorrelated TSV variation and no TSV defect, we find out that the TSV variation is a new source affecting skew variability, but performs a secondary effect on clock skew degradation, compared with other types of process variations. Though many people worry that a 3D clock network using more TSVs would suffer from higher skew variation due to the increasing TSV uncertainties, our analysis shows that a 3D clock network using multiple TSVs is able to decrease the random effects by using fewer buffers and shorter interconnects.

# **II. Related Work**

Several works focused on analyzing the variation impact on 2D clock networks. Sauter et al. presented an analysis on the parameter variations impact on clock networks [11]. They compared four clock topologies in the presence of WID and D2D process variations: a H-tree, a clock network with interleaved rings, a trunk tree, and a clock grid. Narasimhan et al. analyzed the process variation impact on a five-stage 2D H-tree [10]. They focused on several technology nodes and considered the random variation and systematic variation. They observed that with technology reduced from 180 nm to 45 nm, the mean clock skew has been reducing, while its standard deviation has been increasing steadily.

As for 3D clock network design and optimization, Minz et al. proposed the first 3D clock synthesis method and studied the clock skew minimization taking into account the thermal gradient impact [8]. The clock topology consists of a complete clock tree in one die, and many subtrees in other dies. Their results showed a significant wirelength reduction when using many TSVs. Kim and Kim [4] developed an embedding algorithm to reduce wirelength.

Zhao et al. explored several effective design parameters on the 3D clock performance including the buffer insertion, the TSV count, and the clock source die location [16]. From SPICE simulation results, they discussed the impact of these



Fig. 2. Illustration of TSVs in top and side view, bonding structure, and electrical modeling. We show the Cu-filled viamiddle TSVs by Cu-to-Cu bonding as an example, and list sample materials for the TSVs.

factors on clock power and clock slew, and found out that using many TSVs in a 3D clock network helps to achieve a robust design in terms of clock skew, slew, and power. They also developed a TSV planning algorithm to find out the optimal multi-TSV policy for low-power clock designs [14]. They discussed the trends of "TSV count versus clock power consumption" in various TSV parasitic values. And their algorithm is able to find a close-to-optimal design point compared with a straightforward exhaustive search method on the TSV count. In addition, Xu et al. [13] proposed a statistical clock skew model for regular 3D H-tree taking into account the WID and D2D variations in buffers. But this work did not consider the variations in interconnects and TSVs, and did not focus on TSV planning and clock tree optimization.

Though many works have shown that a 3D clock network using multiple TSVs performs appealing merits of low power and reliable timing integrity, the TSV randomness brings in uncertainties in 3D clock skew distribution that has not been well addressed yet. In the following sections, we will study the TSV variation and systematically analyze the impact of many design parameters on variation-aware clock performance.

# **III. TSV Random Characteristics**

### A. TSV RC Electrical Modeling

Figure 2 shows a sample structure of Cu-filled via-middle TSVs, where Cu is surrounded by dielectric liner (e.g.,  $SiO_2$ ) and barrier metal (e.g., TiN). After wafer thinning, TSV nails are exposed; and the die or wafer is aligned, and then bonded Cu-to-Cu to the back metal of the bottom layer.

Many works have focused on developing TSV RC electrical modeling [3], which simulation results perform good match to the measurement data [2]. The TSV resistance value ( $R_{\rm TSV}$ ) consists of the copper resistance ( $R_{\rm Cu}$ ) and the contact resistance ( $R_{\rm cont}$ ).

$$R_{\rm TSV} = R_{\rm Cu} + R_{\rm cont} \tag{1}$$

The dc value of  $R_{Cu}$  follows the traditional function as

$$R_{\rm Cu} = \frac{\rho l_{\rm TSV}}{\pi r_{\rm TSV}^2},\tag{2}$$

where  $\rho$ ,  $r_{\rm TSV}$ ,  $l_{\rm TSV}$  are the resistivity of copper, the radius of TSVs, and the thickness of TSVs, respectively. The  $R_{\rm cont}$ presents the conduction between the exposed TSV nail and the back metal, which is closely related to the quality of alignment and bonding. Thus, the  $R_{\rm TSV}$  depends on the die thickness, TSV diameter, and the contact quality.

The TSV C-V characteristic follows similar to the planar MOS capacitor that the accumulation capacitance is the oxide capacitance ( $C_{\text{ox}}$ ), which follows the equation as

$$C_{\rm ox} = \frac{2\pi\epsilon_{\rm ox} l_{\rm TSV}}{\ln\left(\frac{r_{\rm TSV} + t_{\rm ox}}{r_{\rm TSV}}\right)} \tag{3}$$

where  $\epsilon_{ox}$  and  $t_{ox}$  are the permittivity and thickness of the linear, respectively. Because of the MOS effect, the isolation may be surrounded by a depletion region, depending on the biasing voltage, the interface charge density, and the substrate property [12]. The depletion capacitance ( $C_{dep}$ ) is given as following:

$$C_{\rm dep} = \frac{2\pi\epsilon_{\rm si}l_{\rm TSV}}{\ln(\frac{r_{\rm TSV} + t_{\rm ox} + w_{\rm dep}}{r_{\rm TSV} + t_{\rm ox}})} \tag{4}$$

where  $w_{dep}$  is the thickness of the depletion region. The depletion capacitance performs in series with the oxide capacitance, and the TSV capacitance ( $C_{TSV}$ ) is expressed as

$$C_{\rm TSV} = (\frac{1}{C_{\rm ox}} + \frac{1}{C_{\rm dep}})^{-1}.$$
 (5)

The  $C_{\text{TSV}}$  is nonlinear and depends on the TSV thickness, the TSV diameter, the oxide thickness, the biasing of the TSV with respect to the substrate, and the substrate doping concentration. The measured  $C_{\text{TSV}}$  values [2] may vary from tens to a hundred of femto-farads.

# B. Variation in TSVs

TSV manufacture consists of three main process modules: 1) TSV formation, including the TSV patterning, isolation, barrier deposition, and metallization; 2) wafer thinning and backside processing; and 3) the die or wafer alignment and bonding. The TSV liner is to electrically isolate the connection between TSVs and the substrate. The barrier layer is to avoid migration of TSV metal into the silicon and to improve the adhesion between TSV metal and liner. Each of those manufacture steps may introduce variations in TSVs and change the corresponding electrical characteristics.

The following items, but not limited to, may contribute to the TSV variation. Note that the contact resistance is related to the quality of alignment and bonding processes. Misalignment



Fig. 3. TSV parasitic RC impact on the TSV delay and the src-to-sink path delay.

can cause global variations in contact resistance die-to-die. The bonding quality depends on the temperature ramping rate and bonding down force. The surface roughness is important to maintain intimate contact and good bonds. For Cu-to-Cu bonding, the contact resistance is the contact between TSV and back metal; in the case of micro bump, besides the contact resistance between TSV and microbump, the bump resistance also contributes to the  $R_{\rm TSV}$ . An example  $R_{\rm cont}$  between TSV and microbump is in the order of hundreds of milli-ohm (e.g., 0.7  $\Omega$ ) [9]. In addition, the  $R_{\rm TSV}$  variation in different locations of the die is also observed [2].

The thickness variation of a thinned wafer consists of the thickness variation of the carrier wafer, the temporary glue layer thickness variation, and the accuracy of the grinding tool [1]. Note that  $R_{\rm TSV}$  and  $C_{\rm TSV}$  both linearly depend on the  $l_{\rm TSV}$ . A fluctuation in the TSV thickness could be global and local effects, which directly changes the TSV parasitics. TSV etching determines the TSV diameter. The corresponding uncertainty may come from the sidewall tapering angle and within wafer center-to-edge depth. TSV parasitics could also be influenced by the oxide thickness fluctuations and substrate doping concentrations.

The TSV variability analysis and modeling require further exploration that is closely related to the practical manufacture processing. In this work, we assume random uncorrelated TSV variation and no TSV defect. The TSV RC are modeled as random variables in Gaussian distribution.

#### C. TSV Variation Impact on Timing

A motivated example shows the TSV parasitic RC impact on timing in Figure 3, which consists of a buffer, two 500  $\mu$ m clock wires before and after a TSV. The parasitics of interconnect can be found in Table I. We sweep the  $R_{\rm TSV}$  from 0.1  $\Omega$ to 10  $\Omega^2$  and  $C_{\rm TSV}$  from 20 fF to 120 fF. We plot the delay through the TSV (see Figure 3.a) and the delay from source to sink (see Figure 3.b) with respect to the  $R_{\rm TSV}$  and  $C_{\rm TSV}$ .

 $<sup>^2 {\</sup>rm The}$  large  $R_{\rm TSV}$  value (5  $\Omega$  or 10  $\Omega)$  is to represent different bonding techniques.



Fig. 4. Analysis flow of variation-aware 3D clock performance.

TABLE I. Nominal values and 1- $\sigma$  variation for each random variable.

| Parameters | Values                                                                                   |
|------------|------------------------------------------------------------------------------------------|
| Wire       | $r = 0.1 \ \Omega/\mu m; c = 0.2 \ fF/\mu m$                                             |
| TSV        | $R_{\rm TSV} = 50 \text{ m}\Omega; C_{\rm TSV} = \{15, 50, 100\} f \text{F}/\mu\text{m}$ |
| Buffer     | $NMOS(V_T) = 0.469 V; PMOS(V_T) = -0.418 V$                                              |
| WID var.   | $\sigma = \{5\%, 10\%, 15\%\}$                                                           |
| D2D var.   | $\sigma = \{5 \%, 10 \%, 15 \%\}$                                                        |

Our observations are as followings: First, the  $R_{\text{TSV}}$  has negligible impact on both the TSV delay and the src-to-sink path delay. When keeping the same  $C_{\text{TSV}}$  and increasing  $R_{\rm TSV}$  from 0.05  $\Omega$  to 10  $\Omega$ , the TSV delay and src-tosink delay both increase by only 1.5 ps. Second, the  $C_{\text{TSV}}$ influences the src-to-sink delay as a buffer loading, and shows negligible contribution on the TSV delay variation. When keeping the same  $R_{\text{TSV}}$  and varying the  $C_{\text{TSV}}$  from 20 fF to 120 fF, the TSV delay varies less than 0.2 ps, but the src-tosink delay goes up by 12.5 ps. Note that the TSV capacitance is the buffer loading that increases the buffer delay, thus the overall path delay. On the other hand, considering a 10 % 1- $\sigma$ swing of  $C_{\text{TSV}}$  in 50 fF nominal value, a 5 fF to 15 fF  $C_{\rm TSV}$  variation would result to 0.63 ps to 1.88 ps src-tosink delay changes. Note that an overall clock path delay is usually in the order of hundreds of pico-seconds. This lessthan-2 ps delay variation caused by TSV RC fluctuation is a secondary contributor compared with the interconnects and devices random effects. This observation is confirmed by the following variation analysis on 3D clock networks.

# **IV. Analysis Flow and Modeling**

The analysis flow is composed of the following steps (see Figure 4). We first construct a buffered 3D clock tree using the 3D clock tree synthesis algorithm [14]. The generated clock tree has exact zero skew in the Elmore Delay model and minimized clock skew in SPICE simulation. The clock routing result is determined by the parasitics of TSVs, buffers and wires; the allowed maximum TSV number; and the maximum loading capacitance given in buffer insertion. We then perform 1000 Monte Carlo runs in SPICE simulation on the 3D clock tree. The random variables have both WID and D2D profiles. We obtain the simulation results of clock power and clock skew variation.

TABLE II. TSV count, buffer count, wirelength ( $\mu$ m), power (mW) and nominal skew (ps) of the clock trees using 1 TSV or 40 TSVs.

| Design      | #TSVs | #BUFs | WL     | Power | Skew |
|-------------|-------|-------|--------|-------|------|
| CLK1        | 1     | 221   | 170740 | 68.83 | 8.67 |
| CLK2        | 40    | 187   | 134342 | 58.85 | 9.41 |
| Reduction % | -     | 15.4  | 21.3   | 14.5  | -    |

Our analysis include the variations in threshold voltage, wire width, and TSVs RC parasitics caused by the TSV diameter and thickness uncertainties. Each random variable consists of the nominal value, D2D variation, and WID variation. Let random variable  $x_i$  denote a parameter in die-*i*. Both variations follow the normal distribution  $(N(0, \sigma_{D2D}^2))$  and  $N(0, \sigma_{WID}^2))$ . And  $x_i$  can be expressed as

$$G_i \sim N(0, \sigma_{\rm D2D}^2),\tag{6}$$

$$x_i \sim G_i + N(0, \sigma_{\text{WID}}^2) + x_i^0, \tag{7}$$

where,  $\sigma_{\text{D2D}}$ ,  $\sigma_{\text{WID}}$ , and  $x_i^0$  are D2D swing, WID swing, and nominal value of  $x_i$ . Random variables in the same die have the same D2D variation but separate local variations. Whereas, located in different dies, they will have different global and local values. The 1- $\sigma$  WID and D2D variations are chosen from 5 %, 30 %, or 45 % of the nominal value. The TSVs have 5  $\mu$ m diameter, 30  $\mu$ m thickness, and 120 nm oxide thickness. Unless specified, we use 50 fF  $C_{\text{TSV}}$ , 50 m $\Omega$  $R_{\text{TSV}}$ , and consider variations in TSVs, buffers, and wires, with 10 % WID and D2D swing. Table I summarized the settings.

We focus on two-die stack clock network on a benchmark circuit from ISPD'09, which is generated from an industry design with 121 clock sinks. For skew distribution, we report the mean  $(\mu)$ , standard deviation  $(\sigma)$ , and 95 % quantile  $(Q_{0.95})^3$  to describe the skew degradation.

#### V. Analysis on Clock Skew Distribution

Our analysis focuses on discussing the impact of several design metrics on the clock performance in terms of clock power and clock skew distribution caused by random parametric uncertainties in 3D clock networks. We construct many clock trees for two-die and four-die stacks under various amounts of TSVs, TSV capacitance values, and variation range. We analyze the impact of different variations on clock timing, and will explore guidance for low-power and variation-robust clock network design.

#### A. Sample Clock Networks

The topologies of two-die stack clock networks using 1 TSV and 40 TSVs are shown in Figure 5.a and Figure 5.b, respectively. Clock routing results are shown in Table II. In the single-TSV clock tree, both dies have a complete tree. Whereas, the clock tree using 40 TSVs has 40 small subtrees in die-2 that are connected to the complete tree in die-1 through TSVs. Using 40 TSVs results to more than 20 % wirelength

<sup>&</sup>lt;sup>3</sup>The 95 % quantile indicates that 95 % of the observations are less or equal to  $Q_{0.95}$ . We report  $Q_{0.95}$  rather than  $\mu$ +2 $\sigma$  due to the non-Gaussian profile of the clock skew distribution.



Fig. 5. Samples for two-die stack clock networks, using 1 TSV (a) and using 40 TSVs (b). TSVs and the clock source are shown in the enlarged black dots and triangles, respectively.

reduction and 14.5~% power reduction. In nominal condition, both clock trees have small skew less than  $10~{\rm ps}.$ 

### B. Impact of Different Random Effects

We compare the impact of variations in wires, devices, and TSVs, separately, as well as their combined randomness for the two clock networks constructed in Section V-A. Figure 6 shows two groups of skew distribution, each corresponds to a clock network. Each clock design includes four distributions: consider TSV variation only, buffer or wire variation only, and these four variation together. We use 0.25 ps and 0.5 ps bin size for the histograms of the TSV variation only in Figure 6.a and Figure 6.b, respectively. Other histograms use 5 ps bin size.

First, both figures demonstrate that transistor variation is the dominant contributor to clock skew distribution. While, the TSV fluctuation presents a secondary effect on clock skew degradation. The TSV variation causes 10.3 ps and 11.4 ps  $Q_{0.95}$ , which means around 2 ps more skew than nominal values for both cases. However parametric uncertainty leads to around 80 ps to 180 ps  $Q_{0.95}$ . The major reason is due to the global connectivity in 3D clock networks, where both WID and D2D variations influence the single 3D clock network, simultaneously.

Second, another important message from Figure 6 is that the clock network using multiple TSVs (Figure 6.b) demonstrates an impressive nature of dramatically decreasing the skew variation. The clock network using 40 TSVs has 40 ps lower skew in sampled mean and more than 90 ps less  $Q_{0.95}$  (i.e., 51.3 % reduction), compared with the results of using single TSV. The major reason comes from the large reduction in wirelength and the buffer count in the 40-TSV network,



Fig. 6. Histogram of clock skew distribution for the clock networks using 1 TSV (a) and 40 TSVs (b). We compare the impact of variation in TSVs, wires, and buffers, separately, as well as their combined random effects.

which in turn decreases the randomness. Meanwhile, though more TSVs are employed in the multi-TSV clock design, we observe negligible impact of using more TSVs on the skew fluctuations.

### C. Impact of D2D Variation

Figure 7 shows the comparisons of skew distribution taking into account WID variation only and considering both WID and D2D variations. We show two groups of comparisons for clock networks using single TSV in Figure 7.a and 40 TSVs in Figure 7.b, respectively. In the single TSV case, the D2D variation causes a large skew degradation that additional 100 ps  $Q_{0.95}$  degradation is observed. Meanwhile, the 40-TSV clock design is not affected by the D2D variation very much, with only 10 ps more  $Q_{0.95}$ , and 4 ps more skew on average. This is due to the shorter wirelength and fewer buffers in the bottom die in the 40-TSV design.

# D. Impact of Variation Range

Focusing on the clock network with 40 TSVs, we change the 1- $\sigma$  swing of TSV variation from 5 %, 10 % to 15 %. At the same time, we enlarge the swing of both buffers and wires from the three candidate swings. For each combination of the swings, we obtain a skew distribution. Figure 8 shows nine



Fig. 8. Impact of variation range in TSVs, buffers, and wires, with  $1-\sigma$  swing varying from 5 %, 10 % to 15 %. Each histogram varies from 0 to 300 ps in x-axis, and 0 to 300 counts in y-axis. Buffers and wires are assigned the same deviation.

skew histograms, which are in the same X and Y scale and in the same bin size of 4 ps.

Figure 8 demonstrates that skew variation is dramatically aggravated when buffers and wires present larger deviations (following the X direction). Meanwhile, enlarging the TSV swing from 5 % to 15 % lead to minor impact on skew degradation. Therefore, TSV uncertainties show minor impact on skew degradation.

#### E. Impact of TSV Count on Power and Skew Variation

Figure 9 shows the impact of the TSV count on the clock performance, such as clock power, buffer count, wirelength, and skew distribution. The box-and-whisker diagram shows the skew distribution in terms of the lower ( $Q_{0.25}$ ), median ( $Q_{0.50}$ ), and upper quartile ( $Q_{0.75}$ ), 5 % ( $Q_{0.05}$ ) and 95 % quantile ( $Q_{0.95}$ ), and the minimum observations.

Using more TSVs, narrower skew distribution and smaller  $\mu$  and  $Q_{0.95}$  are observed compared with using fewer TSVs. This is mainly because of the dramatic reduction in wirelength and buffer count, which means a significant decrease of uncertainties, thus a small skew variation.

# F. Impact of TSV Parasitic Capacitance

Existing work [14] has studied the impact of  $C_{\text{TSV}}$  on clock power, wirelength, and buffer count. Under different

TSV oxide thickness, the TSV parasitic capacitance would vary from tens to a hundred of femto-farads, while keeping at the same resistance value. We add one more dimension into the trend, that is the skew variations  $(Q_{0.95})$  trend.

Figure 10 shows the trends of power and  $Q_{0.95}$  skew with respect to the TSV count in the presence of three TSV parasitic capacitance values (15 *f*F, 50 *f*F, and 100 *f*F). The power trends are sensitive to the  $C_{\text{TSV}}$ . In the case of 50 *f*F and 100 *f*F TSVs, clock power can not be further reduced after using more than 40 TSVs, instead, clock power rises back.

Different from the power trends, the trends of  $Q_{0.95}$  clock skew are not affected by the  $C_{\text{TSV}}$  very much. Using more TSVs shows smaller skew variation. The  $Q_{0.95}$  can be reduced by more than 50 % compared with the one-TSV clock network (baseline).

#### G. Impact of Stack Number

Figure 11 illustrates the comparisons of clock performance between using single TSV and multiple TSVs for four-die stack clock networks. The single-TSV network uses totally three TSVs for three die-to-die connections, whereas the multiple TSV design uses 91 TSVs. The clock power, buffer count, and wirelenth are normalized to the results of the single-TSV design. Similar to the two-die stack, using 91 TSVs



Fig. 7. Comparison of skew distributions under WID variation only, and both WID and D2D variation, when a clock network uses 1 TSV (a) or uses 40 TSVs (b).

achieves 20 %, 35 %, and 19 % reduction in power dissipation, wirelength, and buffer count, respectively. Moreover, the skew variation is significantly decreased by 75.3 ps in  $Q_{0.95}$ and more than 40 ps on average. This shows an impressive advantage of using multiple TSVs to achieve both low power and small skew variation in highly stacked 3D clock networks.

# H. Analysis on Large-Size Circuits

Without loss of generality, we extend the analysis on other four large benchmark circuits for two-die stack clock network designs. These circuits come from the GSRC IBM suit with hundreds to thousands of clock sinks. For each circuit, we construct two clock trees: one uses single TSV, the other uses multiple TSVs for low power. The multi-TSV clock results are normalized to those of the single-TSV design (see Figure 12). The multi-TSV clock trees achieve 10 % to 15 % power reduction, more than 30 % wirelength reduction, and 48 % to 60 %  $Q_{0.95}$  reduction compared with the single-TSV designs.

#### **VI.** Conclusions

We studied various WID and D2D randomness, including the variations in threshold voltage, TSVs, and wire width, separately, and their combined random effects. We applied SPICE Monte Carlo simulation on many clock network designs under various TSV parasitics, TSV counts, and variation rage. We found out that using multiple TSVs in a clock network helps to reduce the skew variation and achieve low



Fig. 9. Impact of the TSV count on clock power, buffer count, wirelength, and skew distribution. Power, buffer count and wirelength are normalized to the design with single TSV. The box-and-whisker diagram for skew distributions depicts the lower ( $Q_{0.25}$ ), median ( $Q_{0.50}$ ), and upper quartile ( $Q_{0.75}$ ), 5% and 95% quantile ( $Q_{0.05}$ ,  $Q_{0.95}$ ), and the minimum observations.



Fig. 10. Impact of TSV parasitic capacitance on power and  $Q_{0.95}$  skew.  $C_{\text{TSV}}$  varies from 15 fF to 100 fF.



Fig. 11. Comparisons between single-TSV (with 3 TSVs) and multi-TSV (with 91 TSVs) designs in terms of clock power, buffer counts, wirelength, and skew distribution for the four-die stack.



Fig. 12. Power, wirelength, buffer count, and  $Q_{0.95}$  for twodie stack r1, r2, r3, and r5 using multiple TSVs. The results are normalized to the single-TSV designs, respectively.

power dissipation. Compared with buffer and wire variations, the TSV fluctuations does not show significant impact on skew degradation. Instead, using more TSVs has shown a promising advantage of reducing the randomness due to the shorter wirelength and fewer buffers.

# Acknowledgment

This material is based upon work supported by the National Science Foundation under Grant No. CCF-0546382 and CCF-0917000, the SRC Interconnect Focus Center (IFC), and Intel Corporation.

# References

- The International Technology Roadmap For Semiconductors 2009.
- [2] Geert Van der Plas et al. Design Issues and Considerations for Low-Cost 3-D TSV IC Technology. *IEEE Journal of Solid-State Circuits*, 46(1):293 –307, jan. 2011.

- [3] G. Katti, M. Stucchi, K. De Meyer, and W. Dehaene. Electrical Modeling and Characterization of Through Silicon Via for Three-Dimensional ICs. *IEEE Trans on Electron Devices*, 57(1):256–262, jan. 2010.
- [4] Tak-Yung Kim and Taewhan Kim. Clock Tree Embedding for 3D ICs. In Proc. Asia and South Pacific Design Automation Conf., pages 486–491, 2010.
- [5] Tak-Yung Kim and Taewhan Kim. Clock Tree Synthesis with Pre-Bond Testability for 3D Stacked IC Designs. In *Proc. ACM Design Automation Conf.*, pages 723–728, 2010.
- [6] Kelin Kuhn et al. Managing Process Variation in Intels 45nm CMOS Technology. Intel Technology Journal, 12(2):92–110, 2008.
- [7] Sung Kyu Lim. TSV-Aware 3D Physical Design Tool Needs for Faster Mainstream Acceptance of 3D ICs. 2010.
- [8] Jacob Minz, Xin Zhao, and Sung Kyu Lim. Buffered Clock Tree Synthesis for 3D ICs Under Thermal Variations. In *Proc. Asia and South Pacific Design Automation Conf.*, pages 504–509, 2008.
- [9] Nobuaki Miyakawa et al. Multilayer Stacking Technology Using Wafer-to-Wafer Stacked Method. ACM Journal on Emerging Technologies in Computing Systems, 4(4):20:1–20:15, oct. 2008.
- [10] Ashok Narasimhan and Ramalingam Sridhar. Impact of Variability on Clock Skew in H-tree Clock Networks. In International Symposium on Quality Electronic Design, pages 458–466, 2007.
- [11] S. Sauter, D. Schmitt-Landsiedel, R. Thewes, and W. Weber. Effect of Parameter Variations at Chip and Wafer Level on Clock Skews. *IEEE Trans on Semiconductor Manufacturing*, 13(4):395 –400, nov 2000.
- [12] Chuan Xu, Hong Li, R. Suaya, and K. Banerjee. Compact AC Modeling and Performance Analysis of Through-Silicon Vias in 3-D ICs. *IEEE Trans on Electron Devices*, 57(12):3405–3417, dec. 2010.
- [13] Hu Xu, Vasilis F. Pavlidis, and Giovanni De Micheli. Process-induced Skew Variation for Scaled 2-D and 3-D ICs. In Proc of International workshop on System level Interconnect Prediction, pages 17–24, 2010.
- [14] X. Zhao, J. Minz, and S. K. Lim. Low-Power and Reliable Clock Network Design for Through-Silicon via (TSV) Based 3D ICs. *IEEE Trans on Components, Packaging and Manufacturing Technology*, PP(99):0, 2010.
- [15] Xin Zhao, D. L. Lewis, H. H. S. Lee, and Sung Kyu Lim. Pre-bond Testable Low-Power Clock Tree Design for 3D Stacked ICs. In *Proc. IEEE Int. Conf. on Computer-Aided Design*, pages 184–190, 2009.
- [16] Xin Zhao and Sung Kyu Lim. Power and Slew-aware Clock Network Design for Through-Silicon-Via (TSV) Based 3D ICs. In Proc. Asia and South Pacific Design Automation Conf., pages 175–180, 2010.