# Power and Slew-aware Clock Network Design for Through-Silicon-Via (TSV) based 3D ICs

Xin Zhao and Sung Kyu Lim School of Electrical and Computer Engineering Georgia Institute of Technology, Atlanta, GA 30332, U.S.A. {xinzhao, limsk}@ece.gatech.edu

Abstract—In this paper, three effective design techniques are presented to effectively reduce the clock power consumption and slew of the 3D clock distribution network: (1) controlling the bound of through-silicon-vias (TSVs) used in between adjacent dies, (2) controlling the maximum load capacitance of the clock buffer, (3) adjusting the clock source location in the 3D stack. We discuss how these design factors affect the overall wirelength, clock power, slew, skew, and routing congestion in the practical 3D clock network design. SPICE simulation indicates that: (1) a 3D clock tree with multiple TSVs achieves up to 31% power saving, 52% wirelength saving and better slew control as compared with the single-TSV case; (2) by placing the clock source on the middle die in the 3D stack, an additional 7.7% power savings, 9.2% wirelength savings, and 33% TSV savings are obtained compared with the clock source on the topmost die. This work aims at helping designers construct reliable low-power and low-slew 3D clock network by making the right decisions on TSV usage, clock buffer insertion, and clock source placement.

## I. INTRODUCTION

In three-dimensional integrated circuits (3D ICs), the clock distribution network spreads over the entire stack to distribute the clock signal to all the sequential elements. Clock skew, defined as the maximum difference in the clock signal arrival time from the clock source to all the sinks, is required to be less than 3%-4% of the clock period in an aggressive clock network design according to ITRS projection. Thus, clock skew control, which was well studied in 2D ICs [1], is still a primary objective in the 3D clock network design. Meanwhile, the clock signal is distributed not only along the X and Y directions, but also along the Z direction using through-silicon-vias (TSVs). The clock distribution network drives large capacitive loads and switches at a high switching frequency. This leads to an increasingly large proportion of the total power of a system dissipated in the clock distribution network. In some applications, the clock network itself is responsible for 25% [2] and even up to 50% [3] of the chip total power consumption. Thus, low power and low slew rate still remain the important design goals in 3D clock network.

3D integration with TSVs has been intensively studied in both chip-to-chip and chip-to-wafer communications [4]. The fabrication and characterization of TSVs are being explored in many companies and institutes [5]. TSV reliability issues are also studied [6]. TSV provides the vertical interconnection to deliver the clock signal to all dies in the 3D stack. The TSV

This material is based upon the work supported by the National Science Foundation under CAREER Grant No. CCF-0546382, the Center for Circuit and System Solutions (C2S2), and the Interconnect Focus Center (IFC).

usage is an important factor that characterizes the electrical property of the clock network. In general, the total wirelength of the 3D clock network decreases significantly if more TSVs are used [7], [8]. However, too many TSVs often cause routing congestion and yield reduction. Therefore, it is important that the power vs congestion tradeoff is considered carefully during the 3D clock network design.

Minz et al. [7] studied the thermal issue of the 3D clock network and developed a 3D clock routing algorithm. However, this work does not consider power consumption or slew rate and does not provide any SPICE simulation results. Zhao et al. [8] developed a clock design method for pre-bond testing on 3D ICs. They also discussed the impact of TSVs on the prebond testable clock tree. In [9], Pavlidis et al. presented measurement data on a fabricated 3D clock distribution network. In [10], Arunachalam and Burleson used a separate layer for the clock distribution network to reduce power. Their simulations show 15%-20% power reduction over the same 2D chip clock network. However, these works use simple H-tree and do not perform any design-level optimization.

In this paper, we analyze the impact of TSV usage upper bound, clock source location, and maximum loading capacitance of clock buffers on the wirelength, clock power, slew, skew, and TSV count of the 3D clock network. The contributions of this paper are as follows:

- We analyze the impact of TSV usage on the wirelength and clock power metrics. Compared with the single-TSV clock network, the multi-TSV clock network achieves up to 31% power savings and 52% wirelength savings.
- We analyze the impact of clock source location on the wirelength and clock power metrics. By placing the clock source on the middle die, we obtain additional 7.7% power savings, 9.2% wirelength savings, and 33% TSV count savings compared with the clock source on the topmost die.
- We discuss the impact of TSV usage and maximum load capacitance of clock buffers on clock slew control. The multi-TSV usage helps to narrow the slew distribution and reduce the maximum and average slew as compared with the single-TSV case. In addition, upper bound on the clock buffer load capacitance remains an efficient way to control the maximum slew for 3D clock network design.

We validate our claims with SPICE simulation results on a set of large benchmark designs. Our power consumption, skew,



Fig. 1. Sample clock tree and its electrical model. (a) sample 3-die clock network, where the clock source is on die-3. Sink a on die-1 uses a 2-stack TSV, and sink b on die-2 uses a 1-stack TSV to connect to clock source. (b) electrical models for clock wire segment, TSV, and buffer/drivers.

and slew metrics are based on SPICE simulation.

#### II. MODELING AND SYNTHESIS OF 3D CLOCK TREE

#### A. Electrical Model of 3D Clock Network

In this paper, the 3D clock network is modeled as a distributed RC network. Figure 1 shows an illustration of 3-die clock interconnect, where the clock source is located on die-3, sink *a* on die-1 connects to the source using a 2-stack TSV, and sink *b* on die-2 connects to the source by a 1-stack TSV. The sink nodes *a* and *b* are modeled as capacitive loads. Wire segments and TSVs are modeled by  $\pi$  model. Each buffer or driver is constructed with two inverters.

When a vertical connection between non-adjacent dies is required, e.g., die-1 and die-3, we use "stacked TSV", where the TSV that connects die-1 and die-2, and die-2 and die-3 are vertically aligned as shown in Figure 1(a). This tall TSV is called "2-stack TSV". A 6-die 3D stack utilizes, 1-stack, 2-stack, ..., up to 5-stack TSV.

## B. 3D Clock Tree Synthesis

Given a set of clock sinks distributed on multiple dies and the TSV upper bound, 3D clock tree synthesis is to construct a single 3D tree that connects all the sinks on different dies under the TSV budget and to minimize the clock skew and power consumption. Clock slew, defined as the transition time from 10% to 90% of clock signal at each sink, is an additional design constraint. In this paper, we set the clock slew to a given value, usually is some percentage of the clock period. The TSV upper bound is defined as the maximum number of TSVs allowed between each pair of adjacent dies in the 3D stack. TSV bound is usually decided before clock synthesis and is based upon the process technology. Different from TSV bound, #TSVs is the total number of TSVs generated by 3D clock tree synthesis. For an *n*-die 3D stack, #TSVs is usually less than or equal to  $(n - 1) \times$  TSV bound.

The basis of our 3D clock tree synthesis algorithm is [7], where we extend [7] so that (1) we can handle more than 2 dies, (2) we can decide on which die of the 3D stack to place the clock source, (3) we perform buffer insertion to minimize clock slew. Our algorithm consists of two main steps: (1) 3D abstract tree topology generation, and (2) embedding and slewaware buffering. First, we generate a 3D abstract tree based on 3D Method of Means and Medians (3D-MMM) algorithm. The



Fig. 2. 3D abstract trees with 3D-MMM algorithm under various TSV upper bound. (a) 2D version, where thick lines denote TSV connection, (b) 3D version, (c) binary abstract trees, where the dots denote TSVs.

3D-MMM algorithm basically determines which pair of nodes to connect together and use TSVs if necessary while building a binary tree in a top-down fashion. For an *n*-die clock tree, the 3D abstract tree is a *n*-colored binary tree to identify the die index of clock source, sinks and internal nodes. The goal of TSV usage is to evenly distribute the TSVs across the die area and satisfy the given TSV bound. A TSV is used if we decide to connect a pair of nodes in different dies. Figure 2 shows an illustration of 2-die stack with various TSV upper bound. A larger TSV bound tends to move TSVs closer to the sink nodes and cause more vertical clock connections than horizontal connections.

Once a 3D abstract tree is obtained, we use the deferredmerge embedding (DME) algorithm [11] to geometrically embed (= route) the abstract tree under the "zero-skew" goal based on Elmore delay model. Different from the existing 2D design [12], [13], [14], which focused on slew-aware buffer insertion after clock routing, our slew-aware buffering is performed during the bottom-up embedding procedure. The goal of slew-aware buffering is to locate buffers while merging sub-sets, so that the load capacitances of buffers are within the given bound (CMAX). The impact of CMAX on 3D clock slew is discussed in section III-C.

Note that this 3D-MMM algorithm works in such a way that there is always one die that contains a *single tree* that connects all sinks on the die, whereas the sinks on other dies are connected with *multiple trees* (= forest). In this case, the clock source is located on the die that contains the single tree. Figure 3 shows sample clock trees on die-1 and die-3 of a 6-die 3D stack. The triangle denotes the clock source on die-3. Each die contains up to 20 TSVs. Note that die-3 has a single global tree that connects all the sinks, and die-1 contains multiple local trees that are connected to the clock source using TSVs.



Fig. 3. Sample clock trees on die-1 and die-3 of a 6-die 3D clock network, where the clock source is located on die-3. Black dots denote TSVs. TSV bound is set to 20. Die-1 contains many local trees, whereas die-3 contains a single global tree.

### C. Theoretical Upper Bound on TSV Usage

In this section, we discuss theoretical upper bound on TSV usage in terms of the location of clock source. Given M sinks evenly distributed on N dies, each die contains M/N sinks. Assume that the clock source is located on die-s. In general, a group of sinks on die-i are connected to the clock source die (= die-s) by sharing one (i - 1)-stack TSV. In the worst case, however, each sink uses its own (i - 1)-stack TSV to connect to the clock source on die-s. The maximum TSV usage in this case can be expressed as:

$$\frac{M}{N} \times \left(\sum_{i=1}^{s-1} (s-i) + \sum_{i=s+1}^{N} (i-s)\right)$$

Note that we count (i - 1)-stack TSV as i - 1 TSVs, i.e., we count individual TSVs in the stacked TSVs separately. For the 6-die clock network, a clock source located on the topmost die (= die-6) leads to the worst-case TSV count as:  $(M/6) \times \sum_{i=2}^{6} (i - 1) = 2.5M$ ; when the clock source is located on the middle die (= die-3), the worst-case TSV count is:  $(M/6) \times (\sum_{i=1}^{2} (3-i) + \sum_{i=4}^{6} (i-3)) = 1.5M$ , which leads to 40% TSV savings. In Section III-D, further discussions on the practical 6-die 3D clock network is provided.

### **III. SIMULATION AND DISCUSSIONS**

# A. Simulation Setting

In our simulation, we first construct a 3D clock network by using the 3D clock network synthesis method shown in Section II-B under the given TSV bound and CMAX. We then extract the netlist of the entire 3D clock network for SPICE simulation. Clock power mainly comes from switching capacitance of interconnect, sink nodes, TSVs and clock buffers. Using SPICE simulation, we obtain the power consumption of the entire clock network and the timing information such as propagation delay, clock skew, and slew.

The technical parameters are based on 45nm PTM [15]: unit-length wire resistance is  $0.1\Omega/um$ , and unit-length wire capacitance is 0.2fF/um. We use  $10um \times 10um$  via-last TSVs with thinned die height of 20um. The parasitics are:  $R_{TSV}$  is  $0.035\Omega$  and  $C_{TSV}$  is 15.48fF. Clock frequency is set to 1GHz with the supply voltage of 1.2V. Clock skew is



Fig. 4. Impact of TSV bound on the total wirelength and power of the 6-die clock network with 3101 sinks. The TSV usage is represented as the percentage of the total number of sinks. The baseline is when the TSV bound is 1. The infinity means TSV bound is relaxed.



Fig. 5. Spatial distribution of propagation delay (in ns) and clock skew (in ps) of the clock source die. The TSV usage is 90% of the sink count.

constrained to 4% of the clock period. Clock slew constraint is set to 10% of the clock period. The maximum load capacitance of each clock buffer, denoted CMAX, is set to 300 fF for slew control unless otherwise specified.

Our discussions focus on a 6-die 3D stack, and the clock source is located on the topmost die unless specified otherwise. The IBM benchmark  $r_5$  [16] is used, which is the biggest one available in the suite.  $r_5$  has 3101 sink nodes with input capacitance varying from 30 fF to 80 fF. Since  $r_5$  is originally designed for 2D ICs, we randomly distribute the sinks into 6 dies. We scale the footprint area by  $\sqrt{6}$  to reflect the area reduction in 3D design.

## B. Impact of TSV Bound on Wirelength and Power

Figure 4 shows the wirelength and power consumption trend of the 6-die clock network based on the TSV bound. The TSV usage is represented as the percentage of the total number of sinks. Larger TSV upper bound could lead to more TSV usage. Note that the actual number of TSVs used in the clock network may not be the same as the bound because our clock tree algorithm decides the optimal TSV count for maximum

TABLE I

COMPARISONS OF WIRELENGTH(UM), POWER(MW), TSV COUNT, BUFFER COUNT AND SKEW(PS) AMONG SINGLE-TSV, BOUNDED MULTI-TSV AND RELAXED MULTI-TSV CASES

|       |       |         | Single TSV |       |         |       |      | Bounded multi-TSV |       |         |       |      | Relaxed multi-TSV |       |         |       |      |
|-------|-------|---------|------------|-------|---------|-------|------|-------------------|-------|---------|-------|------|-------------------|-------|---------|-------|------|
|       | ckt   | #Sinks  | #TSVs      | #Bufs | WL      | Power | Skew | #TSVs             | #Bufs | WL      | Power | Skew | #TSVs             | #Bufs | WL      | Power | Skew |
|       | $r_1$ | 267     | 3          | 317   | 272306  | 142   | 10.3 | 139               | 292   | 187905  | 114   | 11.8 | 265               | 251   | 158466  | 102   | 13.6 |
|       | $r_2$ | 598     | 3          | 699   | 585090  | 306   | 26.9 | 309               | 587   | 376360  | 232   | 17.4 | 579               | 526   | 313234  | 208   | 13.8 |
|       | $r_3$ | 862     | 3          | 945   | 735299  | 398   | 15.4 | 432               | 777   | 491815  | 310   | 14.3 | 819               | 705   | 412752  | 282   | 18.4 |
| 4-die | $r_4$ | 1903    | 3          | 1954  | 1531670 | 831   | 16.6 | 1003              | 1678  | 1006030 | 655   | 19.9 | 1893              | 1471  | 854225  | 594   | 17.8 |
|       | $r_5$ | 3101    | 3          | 2937  | 2312420 | 1272  | 19.8 | 1631              | 2605  | 1497400 | 1001  | 18.2 | 3097              | 2283  | 1259140 | 909   | 23.1 |
|       | Avg   | g Ratio |            | 1.0   | 1.0     | 1.0   |      |                   | 0.87  | 0.66    | 0.78  |      |                   | 0.76  | 0.56    | 0.71  |      |
|       | $r_1$ | 267     | 5          | 320   | 260905  | 138   | 15.0 | 222               | 280   | 163894  | 106   | 15.0 | 399               | 240   | 133357  | 93    | 15.9 |
|       | $r_2$ | 598     | 5          | 703   | 566520  | 299   | 15.2 | 483               | 596   | 338817  | 222   | 14.6 | 908               | 513   | 271020  | 195   | 16.7 |
|       | $r_3$ | 862     | 5          | 889   | 718387  | 387   | 17.2 | 701               | 783   | 440406  | 297   | 19.3 | 1301              | 681   | 351552  | 262   | 19.4 |
| 6-die | $r_4$ | 1903    | 5          | 1873  | 1501320 | 814   | 24.0 | 1594              | 1666  | 881044  | 621   | 18.6 | 2980              | 1471  | 721593  | 558   | 24.4 |
|       | $r_5$ | 3101    | 5          | 2933  | 2267170 | 1250  | 20.8 | 2588              | 2638  | 1341100 | 968   | 21.0 | 4782              | 2326  | 1082050 | 867   | 21.3 |
|       | Av    | g Ratio |            | 1.0   | 1.0     | 1.0   |      |                   | 0.88  | 0.60    | 0.76  |      |                   | 0.76  | 0.49    | 0.68  |      |





Fig. 6. Slew distribution of 6-die 3D clock tree among all sinks. Slew constraint is set to 10% of the clock period, CMAX is 300 fF. (a) single-TSV clock tree, (b) multiple-TSV clock tree.

Fig. 7. Slew variations and power comparisons between single-TSV and multi-TSV clock trees. CMAX varies from 175fF to 300fF.

wirelength reduction. For example, when the TSV bound is set to infinity, our clock tree uses 4782 TSVs, which is around 150% of the number of sinks (= 3101). Our baseline 3D clock tree contains only one TSV in between adjacent dies, which is equivalent to TSV bound of 1. This baseline is the most straightforward way to build a 3D clock tree that uses the minimum possible TSVs.

We observe that the total wirelength and power consumption decrease when using more TSVs. The power saving mostly comes from wirelength reduction, because the clock wire capacitance significantly affects the overall power consumed by the clock tree. In general, 3D interconnects based on TSVs have shorter wirelength compared with 2D counterparts. In case of the single-TSV 3D clock tree, this opportunity is severely limited. In overall, our multiple-TSV cases outperform the single-TSV case by up to 52% in wirelength and 31% in power consumption. This trend allows us to choose the right TSV bound for a given power budget. If the power saving of 20% is required, the TSV bound, which is equal to or larger than the bound of 70% TSV usage, can be chosen based on the point A in Figure 4. One interesting observation is that, as the TSV bound increases, the number of local trees in the nonsource die increases while their size decreases. This means that multiple TSV case encourages more local clock distribution in 3D designs while reducing the overall wirelength and power.

Table I presents more detailed results on the TSV bound impact. The IBM bench  $r_1$ - $r_5$  are used in 4-die and 6-die 3D stacks. For each 3D design, we compare three TSV-usage cases: 1) Single TSV, where TSV bound is 1; 2) Bounded multi-TSV, where TSV bound is set to 20% of #sinks; 3) Relaxed multi-TSV, where TSV bound is set to be infinity. The average ratio shows the relationships among the three TSV usages in terms of buffer count, wirelength and power.

First, total wirelength and clock power reduce as TSV bound increases. Bounded multi-TSV cases produce 34% and 40% wirelength savings in the 4-die and 6-die stack, respectively. In addition, power saving is 22% and 24% for 4-die and 6-die. Increasing the TSV bound further reduces wirelength



Fig. 8. (a) Clock source is located on the topmost die, where one 2-stack TSV and one 1-stack TSV are used, (b) clock source is located on the middle die, where two 1-stack TSV are used.



Fig. 9. Impact of clock source location (topmost vs middle) and TSV bound on wirelength and power. Baseline is the clock tree with one 5-stack TSV, where the clock source is on the topmost die. The used TSV count in this case is ranging from 80% to 120% of the total sink count (= 3101).

and power. As a result, infinity TSV bound case leads to the maximum wirelength savings and power savings, which are 44% and 51% wirelength savings, and 29% and 32% power reductions as compared with the single-TSV usage in 4-die and 6-die stack. Note that these savings come at higher TSV usage, which may affect the overall yield. Thus, TSV bound must be decided carefully based on the given TSV process technology.

Second, multi-TSV cases reduce the number of long clock paths, which in turn requires fewer buffers for slew control. Compared with the single-TSV buffer usage, the bounded multi-TSV uses 13% and 12% fewer buffers in 4-die and 6-die, respectively; the relaxed multi-TSV case uses 24% fewer buffers in both 4-die and 6-die. The reduced #Bufs contributes to the clock power savings. Meanwhile, reduced buffers do not harm the slew control. Using multiple TSVs, we achieve better slew distribution than the single-TSV case, which is shown in section III-C.

Third, clock skew for each 3D clock network is constrained within 30ps. In most cases, clock skew is less than 20ps. In case of the 6-die 3D stack of  $r_5$ , Figure 5 shows the spatial distribution of propagation delay and clock skew of the die containing the clock source. TSV usage is 90% of #sinks. We



Fig. 10. Distribution of TSV heights used for the clock source on top-die vs middle-die. We report the total number of each TSV height used in the clock tree.

observe that the clock skew among the 6 dies varies within [17.8ps, 21ps]. The skew of the entire clock network is 21ps (around 2% of clock period). Note that our 3D clock tree synthesis algorithm builds a zero-skew tree under the Elmore delay model, which in practice shows discrepancy between SPICE simulation results.

## C. Impact of TSV Bound and CMAX on Slew Distribution

TSV upper bound also affects the clock slew distribution. Figure 6 shows the slew distribution of the 6-die 3D clock tree for  $r_5$  among all sinks. Clock slew constraint is set to 100ps, which is 10% of the clock period. Figure 6(a) shows the slew distribution of the single-TSV clock tree, whereas the Figure 6(b) shows the slew distribution of the multi-TSV clock tree. In the single-TSV clock tree, slew varies within [11.4ps, 86.2ps]with average slew 53.9ps. The slew distribution of the multi-TSV case is in the range of [10.9ps, 79.6ps] with average slew value 42.6ps. Compared with the single-TSV case, the multi-TSV usage reduces the maximum slew and average slew by 6.6ps and 11.3ps, and shows narrower slew distribution. The main reason for the improved slew distribution of the multiple-TSV 3D tree is the shorter wirelength, which in turn reduces the average source-to-sink delay. Lower delay along the clock path means less slew degradation seen at the sink nodes. Thus, we observe that multiple TSVs are effective in improving the slew distribution.

Figure 7 shows the impact of CMAX, the maximum clock buffer load capacitance, on slew variations (min, average, max) and power consumption in the single-TSV and multi-TSV clock trees. First, CMAX remains efficient to control the maximum slew in 3D clock network design. Both the single-TSV and multi-TSV have the similar trends as CMAX varying from 300 fF to 175 fF. Smaller CMAX reduces the maximum slew, but increases the clock power. That is because each buffer stage is allowed to drive less capacitance using smaller CMAX, which in turn requires more buffers and thus consumes more power. Second, given a certain CMAX, the multi-TSV clock tree always has smaller maximum slew, smaller average slew, and narrower slew range, as compared with the single-TSV cases. Third, we note that the multi-TSV cases always consume lower power than single-TSV cases. Therefore, we conclude that the multi-TSV case has

an advantage in achieving both low power and better slew.

# D. Impact of Clock Source Location on Power and Wirelength

As discussed in the previous section, more TSVs means lower wirelength and clock power. However, more TSV also means more routing congestion. It is thus important to exploit the tradeoffs among power, TSV count and congestion while designing 3D clock trees. In particular, we observe that the decision on which die to contain the clock source affects those metrics in a significant way. Figure 8 shows an illustration. When the clock source is located on the topmost die as in Figure 8(a), the clock tree utilizes one 2-stack TSV and one 1-stack TSV. However, if the clock source is located on the middle die as in Figure 8(b), the clock tree utilizes two 1stack TSVs. In addition, the overall wirelength is shorter if the source is located on the middle die. The routing congestion is also less severe with shorter wirelength and shorter TSV heights. It is shown [17] that stacked TSV fabrication requires additional steps to align the TSVs, and that the complexity increases as more and more TSVs are to be aligned and stacked vertically. Therefore, the tree with the clock source located on the middle die (= Figure 8(b)) potentially has higher yield. Our clock tree synthesis algorithm takes the source location as an input, so generating the corresponding 3D tree is straightforward.

Figure 9 validates our observations, where we show the wirelength and power reduction trend with the source located on the topmost vs middle die. Note that discussions on topmost die and middle die also apply to the bottom-most die and middle die. We observe the following:

- The overall wirelength and power reduction trend is similar in both the topmost-die and middle-die clock source cases. However, we observe that the middle-die case shows more wirelength and power savings. When TSV usage is 90% of #sinks, the middle-die case (point D and point C in Figure 9) shows additional 7.7% power savings and 9.2% wirelength savings as compared with the corresponding topmost-die case. The overall power and wirelength saving compared with the baseline (= the single-TSV case) is 33% and 53%, respectively.
- Given a power budget, we observe that the middle-die case requires fewer TSVs to obtain nearly the same amount of power savings. If 20% power reduction is desired, the point A and B in Figure 9 have the same power consumption, but point A uses 33% fewer TSVs than point B. The middle-die case reduces the TSV usage and achieves the same power saving as compared with the topmost-die case.

Figure 10 shows the distribution of stacked TSV heights used for the clock source on topmost die vs middle die. We report the total number of each TSV height used in the clock trees. The overall trend is that the average height of TSV is lower in the middle-die case. For example, the middle-die case uses more of shorter TSVs (1-stack, 2-stack) than the topmost-die case. Note that the middle-die case does not require 4-stack and 5-stack TSVs. In overall, the total TSV height is 25% shorter in the middle-die case. This has positive impact on

TSV yield because taller TSVs (4-stack and 5-stack) require higher precision alignment and fabrication process.

# IV. CONCLUSIONS

In this paper, we explored the design optimization techniques for reliable low-power, low-slew 3D clock network design. SPICE simulations on the 3D clock networks provided the clock power and timing information. We studied the impact of the TSV usage, the clock source location and maximum clock buffer load capacitance on various metrics of 3D clock distribution network, including wirelength, clock power, slew, skew, and total TSV count. We observed that using more TSVs helps reduce the wirelength and power, and shows better control on the clock slew variations. We also discussed that using smaller maximum loading capacitance on clock buffers efficiently lowers clock slew. In addition, we showed that the clock source location also affects the 3D clock network in a significant way: placing the clock source on the middle die, compared with the topmost die, reduces clock slew and TSV usage under the same power budgets.

#### REFERENCES

- P. J. Restle and et al., "A clock distribution network for microprocessors," in *Solid-State Circuits, IEEE Journal of*, vol. 36, no. 5, 2001, pp. 792–799.
- [2] E. Friedman, "Clock distribution networks in synchronous digital integrated circuits," vol. 89, no. 5, May 2001, pp. 665–692.
- [3] Q. K. Zhu, "High-speed Clock Network Design," 2003.
- [4] J.U.Knickerbocker, P.S.Andry, and et al., "Three-dimensional Silicon Integration," in *IBM Journal of Research and Development*, vol. 52, no. 6, 2008, pp. 553–569.
- [5] J. Vardaman, "3-D Through-Silicon Vias Become a Reality," 2007, http://www.semiconductor.net/article/CA6445435.html.
- [6] S. L. Wright, P. S. Andry, E. Sprogis, B. Dang, and R. J. Polastre, "Reliability testing of through-silicon vias for high-current 3D applications," in *Electronic Components and Technology Conference*, 2008. *ECTC 2008. 58th*, 2008, pp. 879–883.
- [7] J. Minz, X. Zhao, and S. K. Lim, "Buffered clock tree synthesis for 3D ICs under thermal variations," in *Proc. Asia and South Pacific Design Automation Conf.*, 2008, pp. 504–509.
- [8] X. Zhao, D. L. Lewis, H.-H. S. Lee, and S. K. Lim, "Pre-bond Testable Low-Power Clock Tree Design for 3D Stacked ICs," in *Proc. IEEE Int. Conf. on Computer-Aided Design*, 2009, pp. 184–190.
- [9] V. F. Pavlidis, I. Savidis, and E. G. Friedman, "Clock distribution networks for 3-D ictegrated Circuits," in *Custom Integrated Circuits Conference*, 2008. CICC 2008. IEEE, 2008, pp. 651–654.
- [10] V. Arunachalam and W. Burleson, "Low-power clock distribution in a multilayer core 3d microprocessor," in *Proceedings of the 18th ACM Great Lakes symposium on VLSI*, 2008, pp. 429–434.
- [11] K. Boese and A. Kahng, "Zero-skew clock routing trees with minimum wirelength," in ASIC Conference and Exhibit, 1992., Proceedings of Fifth Annual IEEE International, 1992, pp. 17–21.
- [12] G. E. Tellez and M. Sarrafzadeh, "Minimal Buffer Insertion in Clock Trees with Skew and Slew Rate Constraints," *IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems*, vol. 16, no. 4, pp. 333–342, April 1997.
- [13] C. Albrecht, A. B. Kahng, B. Liu, I. I. Mandoiu, and A. Z. Zelikovsky, "On the Skew-Bounded Minimum-Buffer Routing Tree Problem," *IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems*, vol. 22, no. 7, pp. 937–945, July 2003.
- [14] S. Hu, C. Alpert, J. Hu, S. Karandikar, Z. Li, W. Shi, and C. Sze, "Fast Algorithms for Slew-Constrained Minimum Cost Buffering," *IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems*, vol. 26, no. 11, pp. 2009–2022, Nov. 2007.
- [15] Predictive Technology Model, http://www.eas.asu.edu/ ptm/.
- [16] G. Benchmark, http:// vlsicad.ucsd.edu/GSRC/bookshelf/Slots/BST.
- [17] P. Ramm, M. J. Wolf, A. Klumpp, R. Wieland, B. Wunderle, B. Michel, and H. Reichl, "Through silicon via technology-processes and reliability for wafer-level 3D system integration," in *Electronic Components and Technology Conference*, 2008. ECTC 2008. 58th, 2008, pp. 841–846.