# Three-Tier 3D ICs for More Power Reduction: Strategies in CAD, Design, and Bonding Selection

Taigon Song<sup>1</sup>, Shreepad Panth<sup>2</sup>, Yoo-Jin Chae<sup>3</sup>, and Sung Kyu Lim<sup>1</sup> <sup>1</sup>School of ECE, Georgia Institute of Technology, Atlanta, GA <sup>2</sup>Altera Corp., San Jose, CA <sup>2</sup>School of EE, KAIST, Daejeon, South Korea taigon.song@gatech.edu, limsk@ece.gatech.edu

*Abstract*—Low-power is one of the key driving forces in modern VLSI systems. Several recent studies show that 3D ICs offer significant power savings over 2D ICs, primarily due to wirelength and buffer saving. However, these existing studies are mainly limited to 2-tier designs. In this paper, our target is extended to 3-tier 3D ICs. Our study first shows that the one additional tier available in 3-tier 3D ICs does offer more power saving compared with their 2-tier 3D IC counterparts, but more careful floorplanning, through-silicon via (TSV) management, and block folding considerations are required. Second, we find that the three tiers can be bonded in different ways: (1) face-to-back only and (2) faceto-face and face-to-back combined. Our study shows that these choices pose additional challenges in design optimizations for more power saving. Lastly, we develop effective CAD solutions that are seamlessly integrated into commercial 2D IC tools to handle 3-tier 3D IC power optimization under various bonding style options. With our low-power design methods combined, our 3-tier 3D ICs provide -14.8% more power reduction over 2-tier 3D ICs and -36.0% over 2D ICs under the same performance.

*Index Terms*—3D IC, TSV, F2F, low power

# I. INTRODUCTION

As we reach the mobile era, power reduction is the keyword that integrated circuit (IC) industry considers as top priority. Not only for mobile devices that require long battery life and energy efficiency, but also for data centers that wish to increase their GHz/Watt performance requires to tackle this power reduction issue and have it set as their top priority goal. Power reduction directly links to packaging and cooling cost, and the power consumption of ICs has significant impact on manufacturing yield and reliability. In terms of device perspectives, the development of ultrathin body silicon-on-insulator (UTB SOI or fully-depleted SOI) and FinFET devices also correlates with this power reduction trend [1].

Due to the increasing challenges in design, power, and cost issues that industries were facing beyond 32-22nm nodes, many have started searching for alternative solutions. In this effort, three-dimensional integrated circuits (3D ICs) using through-silicon vias (TSVs) have gained a great deal of attention as a viable solution for low-power IC designs. In [2], the authors showed that -15% power reduction and +15% performance gain can be achieved by an optimized 3D floorplan in a two-tier microprocessor. In [3], authors achieved - 21.2% power reduction when 3D floorplan and design techniques were applied. In [4], authors reported that -21.5% power reduction can be achieved by reducing the bus power in GPUs. In [5], authors demonstrated 50% leakage and 25% dynamic power reduction in 3D DRAM.

In this paper, we try to answer the following question: "If logic ICs are designed in many-tiers, how much more power reduction can

This work has been done while the author Shreepad Panth was at Georgia Tech

3D ICs achieve?" Knowing that previous 3D IC studies focused on reporting the power reduction in two-tiers [2], [3], [6], [4], we try to answer our question by designing three-tier 3D ICs and study the impact. In detail, by using an OpenSPARC T2 core (a commercial multi-threaded microprocessor that has been released to public) [7] in a PDK [8] that are both available to the academic community, we visualize the unique design challenges and benefits of three-tier 3D ICs, which two-tier 3D ICs did not have. We develop CAD tools for three-tier 3D IC design styles, build GDSII-level 3D IC layouts, and perform optimization and analysis using sign-off CAD tools. Our contributions include the following:

- 1) To the best of authors' knowledge, we are the first that reported the largest power reduction that 3D ICs have. Our three-tier results show -36% power reduction to the 2D counterpart [3], which is the biggest power reduction achieved among all other previous studies.
- 2) Three-tier 3D IC design in mixed bonding styles (e.g., faceto-face and face-to-back combined) help reduce more power. To reveal these benefits, we develop CAD tools and implement mixed bonding styles in three-tier.
- 3) Block-folding technique helps to reduce significant power in three-tier design. However, careful design management must be followed, and different bonding styles in mixed bonding impact the design quality in three-tier block-folding.

## II. SIMULATION SETTINGS

## *A. Benchmark*

For our three-tier (3-tier) study, we use OpenSPARC T2 Core (T2 Core) [7] as our benchmark. T2 Core consists of 12 functional unit blocks including two integer execution units (EXU), a floating point and graphics unit (FGU), an instruction fetch unit (IFU), a load/store unit (LSU), and a trap logic unit (TLU). We synthesize and design our benchmark using Synopsys 28nm PDK [8]. The PDK allows to use up to nine metal layers, and we use dual- $V_{th}$  (RVT: regular  $V_{th}$ and HVT: high  $V_{th}$ ) standard cells during our design. We include power distribution network (PDN) in our designs. Table I describes the details of our PDN. We place fixed PDN at the initial design stage before placement and routing and is targeted to have a density of 25% (M9) to 10% (M3). We do not place a fixed PDN for M1 and M2. This is because for M1, standard cells already contain VDD/VSS lines, and a fixed PDN for M2 acts as placement blockages.

## *B. 3D Bonding Technology*

When stacking 3D ICs in 2-tier, two bonding styles are possible: face-to-back (F2B) and face-to-face (F2F) [see Figure 1]. In F2B bonding, TSVs are used for vertical interconnects. However, since TSVs penetrate through the silicon substrate and occupy area, using excessive TSVs lead to area overhead, which designers should avoid.

This work is supported by Intel Corporation through Semiconductor Research Corporation (ICSS Task 2293).

TABLE I PDN SPECIFICATIONS USED IN OUR 2D AND 3D DESIGNS. # TRACKS SHOW THE MAXIMUM NUMBER OF SIGNAL WIRES THAT CAN FIT IN BETWEEN TWO ADJACENT P/G WIRES.

|                    | Local          | Intermediate | Global    |       |       |
|--------------------|----------------|--------------|-----------|-------|-------|
|                    | M <sub>3</sub> | M4 - M6      | M7        | M8    | M9    |
| Metal width/pitch  | 56/152nm       | 112/228nm    | 224/456nm |       |       |
| PDN density $(\%)$ | 10.5           | 14.9         | 18.0      | 21.4  | 24.9  |
| PDN width (nm)     | 208            | 340          | 2048      |       |       |
| PDN pitch (um)     | 1.976          | 2.28         | 11.4      | 9.576 | 8.208 |
| # tracks           | 11             |              | 20        | 16    | 13    |



Fig. 1. Basic 2-tier die bonding styles: (a) Face-to-back (F2B), and (b) Face-to-face (F2F).

On the other hand, F2F is a bonding style where it uses F2F vias for vertical interconnects. Unlike TSVs, F2F vias do not occupy any silicon area due to its advantageous bonding style. Table II summarizes the bonding-style-related settings used in our paper. We assume that TSVs are much bigger than F2F vias since manufacturing reliable sub-micron TSVs are challenging. Resistances and capacitances of the TSVs are calculated based on [9].

In this paper, we study the impact of two different types of bonding styles in 3-tier 3D ICs: face-to-back only (F2B-only) and face-to-face and face-to-back combined (F2F+F2B). As in Figure 2, each shows F2B-only and F2F+F2B, respectively. In all bonding styles, Die 0 is the bottom die where it connects to the package/PCB, and Die 2 is the top die where heat sink attaches. For (a), F2B-only is a bonding style that only uses TSVs for 3D interconnects. For (b), F2F+F2B uses F2F vias for 3D interconnects between Die 0 and Die 1, and one TSV layer (in Die 1) for Die 1 and Die 2. The advantage of F2F+F2B is that Die 0 and Die 1 suffer less from 3D interconnect penalty (smaller R and C from F2F vias than TSVs). In addition, since F2F vias do not occupy any silicon area and are smaller than TSVs, more dense and optimal 3D connection can be made.

#### III. CAD TOOL FOR 3-TIER 3D ICS

This section first discusses existing CAD approaches for F2B and F2F 3D ICs. It also discusses why these approaches are not directly applicable to mixed bonding. Next, it describes how a 3-tier F2B+F2F mixed bonding 3D IC circuit can be constructed.

#### *A. Need for New Tools*

The authors of [10] have provided a framework for handling TSVs arbitrarily in a many-tier F2B-only 3D IC. However, the authors primarily compared wirelength, and when it comes to power studies, only two-tier 3D ICs have been considered in many previous papers[2], [3], [6], [4].

In the placement framework proposed in [10], the gates are first partitioned into as many tiers as required. Next, TSVs are inserted



Fig. 2. 3-tier die bonding styles: (a) Face-to-back only (F2B-only) and (b) Face-to-face and face-to-back combined (F2F+F2B).

into the netlist as large cells. The placement is an iterative forcedirected process, with two main forces. The net force **Fnet** tries to bring all the cells of a given net together, and the move force **Fmove** tries to remove overlap between cells and TSVs of a given tier. The authors have also demonstrated that it is more beneficial to treat the 3D net as one subnet per tier (including the TSV), instead of as a single 3D net, as it leads to more accurate wirelength estimation. This is shown in Figure 3 (a).

When it comes to F2F integration, the placement engine remains more or less the same, with a few differences [6]. First, the F2F vias are not inserted into the netlist, and second, the nets are not split into subnets per tier. This is because the F2F vias are so small that they will be inserted by tricking a 2D router. Once the placement is complete, the entire 3D stack is fed into a commercial router to extract 3D via locations. However, this is limited to two tiers, with at most 7 metal layers per tier, as commercial 2D tools cannot handle more than a total of 15 metal layers.

Clearly, these approaches cannot directly be applied for a circuit with mixed bonding. TSV-based engines require TSVs to be inserted during placement, while F2F engines do not. In addition, the TSVbased engine employs net splitting, while the F2F engine does not. Finally, the F2F planner can handle at most two tiers due to commercial tool limitations. We now present techniques to handle F2F+F2B mixed bonded 3D ICs in the following subsection.

#### *B. CAD Tool for F2B+F2F Bonding*

The modifications made to the placement engine to handle this style of mixed bonding are shown in Figure 3 (b). We perform two major modifications. First, TSVs are inserted into the netlist only in those tiers that are F2B. Next, we perform net splitting, but do not split the nets at the F2F interface. Therefore, a 3D net spanning three tiers will have only two subnets, instead of three as in the all F2B case. We then perform placement to give us the (x,y) locations of all the gates in the netlist, as well as the TSV locations for the F2B tier.

Now, we need to insert F2F vias using a commercial router in the F2F interface. However, as mentioned previously, commercial tools can only handle two tiers. So, we extract the netlist of those two tiers that are part of the F2F interface as shown in Figure 3 (c). In addition to extracting the connectivity and location of gates, we also need to create additional I/O pins in the same location as where the TSV would have existed. This ensures that the router will construct



Fig. 3. Net handling and routing in 3-tier mixed bonding. (a) A 6-pin net with 2 TSVs is split into one subnet per tier in F2B-only case, (b) F2F bonding does not cause net splitting, (c) Subnet 5 from (b), where the TSV is defined as an I/O pin, (d) A sample routing topology for (c).

an accurate topology including the TSV, as shown in Figure 3 (d). Once the F2F locations are extracted, we create separate verilog/DEF files for each tier, then place, route, and optimize them separately.

## *C. 3-Tier 3D IC Design Flow*

To design an optimized 3-tier 3D IC, we first synthesize the netlist with initial design constraints. Then, we perform 3-tier floorplanning using the developed mixed-bonding tools mentioned from the previous sections. We design and layout each die separately based on the floorplanning results. Once the 3D CAD tools generate the TSV/F2F locations, cells and memory macros are placed using Cadence SoC Encounter. We then extract the parasitics of each die and perform static timing analysis using Synopsys PrimeTime to obtain new timing constraints for each die. With the new timing constraints, we perform timing and power optimizations using Cadence SoC Encounter. We perform several iterations of these optimization steps (from obtaining timing constraints by Synopsys PrimeTime to design optimization in each die using Cadence SoC Encounter). By these steps, we obtain a timing-closed and power optimized design for 3 tier 3D ICs.

#### IV. BENEFITS OF 3-TIER 3D IC

This section studies the challenges and benefits of 3-tier 3D ICs. Due to the broad scope, this section limits the study to F2B-only bonding style in block-level (non-folded) designs.

# *A. New Design Challenges*

When floorplanning a 3D IC, many design constraints must be considered such as the connection between blocks and top-level pins to external connections. In addition to these constraints, area balance limits many partitioning options in a 3-tier 3D IC. For T2 Core, Table III shows the area ratio between the blocks inside. We see that the two biggest modules (LSU and IFU) occupy 32.1% and 22.3% of the total T2 Core area. This means that, e.g., when a designer decides to have LSU and IFU at the same die, this die will be significantly larger than the other two since these two blocks consume more than half (54.4%) of the total area. Considering area balance, LSU should not be partitioned to be at the same die with any large blocks (such as IFU, FGU, TLU, EXU, or MMU), and the die including IFU should

TABLE III AREA PERCENTAGE OF THE FUNCTIONAL UNIT BLOCKS IN T2 CORE.

| block      | Area $(\% )$ | block          | Area $(\% )$ |
|------------|--------------|----------------|--------------|
| <b>LSU</b> | 32.1         | <b>MMU</b>     | 5.3          |
| IFU        | 22.3         | <b>IFU IBU</b> | 3.2          |
| FGU        | 11.5         | PKU            | 1.4          |
| TLU        | 8.4          | <b>GKT</b>     | 1.3          |
| EXU0       | 6.3          | <b>PMU</b>     | 1.3          |
| EXU1       | 6.3          | <b>DEC</b>     | 0.6          |
|            |              |                |              |



Fig. 4. TSV layers aligned to provide through path for Die 0–Die 2 connecting nets (Through-3D-Paths) in F2B-only (blue dots: regular TSVs, yellow dots: Through-3D-Path TSVs).

also be carefully be partitioned. Having this area balance issue, 3 tier partitioning becomes very challenging, and partitioning becomes even more challenging in many-tier designs.

In T2 Core, several blocks such as an LSU connect to other blocks on all three dies. If a die partition places a block (e.g., LSU) in Die 0 and the other connecting block in Die 2, Die 1 must support the paths that connect blocks in Die 0 and Die 2. We will call these "Through-3D-Paths." Knowing that every block interact with other blocks in T2 Core, these Through-3D-Paths become as many as half of the total TSV count. Many Through-3D-Paths enter Die 1 through a TSV from Die 0 and leave Die 1 by a TSV. In this regard, Die 1 handles double the number of 3D connections than the other two tiers. Therefore, providing sufficient white space and an actual "throughpath" for Through-3D-Paths is very important in 3-tier design. As in Figure 4, we align white space of the top and bottom 3D connections so that these Through-3D-Paths do not need to detour. Note that the white space design in both Die 0 and Die 1 is necessary since M9 landing pads in Die 1 is on the exact location of Die 0 TSVs. If white space for Through-3D-Paths are not well designed, additional routing congestion occurs in addition to the Die 1 routing-related congestion.

## *B. 2D vs. 2-tier 3D vs. 3-tier 3D*

We now compare our 2D and 3D block level designs in TSV only bonding style. First, all our designs run in a target clock period of 1.5ns (=677MHz). Note that the run speed of our designs are much slower than UltraSPARC T2, a commercial product of OpenSPARC T2, that runs at 1.4GHz [11]. This is because some custom memory blocks in T2 Core such as content-addressable memory are synthesized with cells, because a general memory compiler cannot handle these kind of memories. Unfortunately, these synthesized memories run slower than the memory macros generated by a memory compiler. Second, our baseline 2D and 2-tier 3D follow the floorplan and designs done in [3]. However, since the designs in [3] did not have PDN, we include PDN in our 2D and 2-tier 3D designs and made minor modifications to meet the timing.

TABLE IV 2D VS. 2-TIER 3D VS. 3-TIER 3D (NON-FOLDING, F2B-ONLY). ALL PERCENTAGE VALUES ARE WITH RESPECT TO 2D RESULTS.

|                     | 2D                | 2-tier 3D          | 3-tier 3D            |
|---------------------|-------------------|--------------------|----------------------|
|                     | $\lceil 3 \rceil$ | [3]                | (non-folding)        |
| Clock period        | 1.5ns             | 1.5ns              | 1.5ns                |
| Footprint $(mm2)$   | 3.08              | 1.44 $(-53.2\%)$   | $1.00 (-67.5\%)$     |
| Si. Area $(mm2)$    | 3.08              | $2.88 (+6.5\%)$    | 3.00 $(-2.6\%)$      |
| Wirelength (m)      | 22.4              | 18.0 $(-19.6\%)$   | 14.3 $(.36.2\%)$     |
| # Cells $(x10^3)$   | 523.4             | 420.8 $(-19.6%)$   | 403.9 $(-22.8\%)$    |
| # Buffers $(x10^3)$ | 221.7             | 130.8 $(-41.0\%)$  | $130.7(-41.0\%)$     |
| HVT cells $(x10^3)$ | 370.6             | 408.3              | 377.4                |
| $#$ TSV             |                   | 6.562              | 4.118                |
| Total power (mW)    | 348.3             | $271.7(-22.0\%)$   | 248.1 $(-28.8\%)$    |
| Cell power (mW)     | 71.6              | $62.9$ $(-12.2\%)$ | $62.6$ ( $-12.6\%$ ) |
| Net power (mW)      | 175.7             | 137.9 (-21.5%)     | 117.3 (-33.2%)       |
| Leak. power (mW)    | 101.1             | $70.9$ (-29.9%)    | $68.2$ ( $-32.5\%$ ) |

Table IV compares various metrics between 2D, 2-tier 3D, and 3-tier 3D designs, and Figure 7 (a) and (b) shows GDSII layouts of our 2D and 3-tier non-folded 3D design in F2B-only bonding, respectively. 2-tier 3D applies all design techniques proposed in [3]. First, by having 3-tier 3D design, we reduce the total wirelength by -36.2% and cell count by -22.8%. Compared to 2-tier 3D, we reduce -16.6% more wirelength and -3.2% more cell count. The significant wirelength reduction comes from the smaller footprint and better toplevel floorplanning.

Second, and most importantly, 3-tier 3D (non-folding) reduces the total power by -28.8%, where 2-tier 3D (block-folding) reduces - 22.0% (Note that our 2-tier 3D design reduces -0.8% more power than reported in [3]). In spite of not applying block-folding in our 3-tier 3D yet, better 3-tier floorplan gives more net power reduction than in 2-tier 3D (-20.6mW more). 3-tier 3D achieves power reduction by cell count reduction, and wirelength saving. However, significant wirelength saving largely contributes to this power reduction than reduction in cell count which is not as significant (small cell and leakage power reduction). Lastly, we reduce the footprint by -67.5%. This is -14.3% more reduction than the 2-tier 3D design. In terms of silicon area, 3-tier 3D still uses -2.6% less area than 2D. 3-tier 3D uses more silicon area than 2-tier 3D since it requires to manage more TSVs on the top-level. However, the footprint/silicon area reduction stems from the significant wirelength and cell count reduction.

#### V. BONDING STYLE IMPACT STUDY

Previous sections showed 3-tier design in F2B-only (TSV) bonding. Thus, this section studies how 3-tier bonding styles described in Section II-B enhance design quality and reduce power combined with block folding.

## *A. Bonding Impact On Floorplan*

*1) F2B-only vs. F2F+F2B Bonding:* As described in Section II-B, F2F bonding provides many advantages over the F2B bonding. Even in 2-tier 3D ICs, F2F reduces more power than F2B-only bonding style. Thus, it is advantageous to use F2F bonding in 3-tier designs too. However, if one layer is bonded in F2F style, the other 3D layer must be designed in F2B as bonding style as in Figure 5. Therefore, having non-folded F2B-only T2 Core as our baseline, we compare how the top-level design quality changes when we apply F2F+F2B bonding in 3-tier.

Figure 6 compares how the top-level design changes in Die 0 in F2F+F2B bonding. Note that the floorplan is exactly the same in both designs. First, We see that F2F placement quality is much better than



Fig. 5. Choosing bonding layer in 3-tier 3D ICs. (a) One bonding layer in 2-tier 3D IC using F2F, (b) two bonding layers in 3-tier 3D IC: At least one layer must use F2B if the other is F2F bonding style.



(a) Die 0 in F2B-only bonding (TSVs)



(b) Die 0 in F2F+F2B bonding (F2F vias)

Fig. 6. F2F vias for better design in F2F+F2B bonding under the same floorplan: (a) F2B-only (TSVs for 3D connection), (b) F2F+F2B (F2F vias for 3D connection).

that of the TSV placement. Many top-level 3D connections form between Die 0 and Die 1 (2176 TSV/F2F vias), and placing 2176 TSVs consume a large space due to the relatively large TSV size. In addition, TSV landing pads in Die 1 must not overlap with the top-metal PDN. In this regard, placing 2176 TSVs on the top-level requires more space than before. This forces the TSVs to be placed in sub-optimal locations. As in Figure 6 (a), we see that TSVs are crowded and their locations become sub-optimal. However, since F2F vias occupy smaller footprint than TSVs, F2F vias can be placed on its optimal location and become less affected by the PDN. Second, because of the better F2F via locations and small RC parasitics, toplevel design quality in F2F bonding improves significantly. In Die 0, wirelength reduces by -31.9% and buffer count reduces by -39.3%. This translates to -54.5% top-level power reduction than F2B-only in Die 0.

## *B. Bonding Impact On Block-Folding*

*1) F2F+F2B Bonding on Folded Blocks:* Block-folding in mixed bonding leaves the designer to choose the right 3D bonding for the right purpose. In a 2-tier design when the bonding style is decided to



Fig. 7. GDSII layouts of various 3-tier 3D IC designs: (a) 2D based on [3], (b) 3-tier non-folding in F2B-only, (c) 3-tier block-folding in F2B-only, and (d) 3-tier block-folding in F2F+F2B.

be F2F (or F2B), this means that both folded blocks and the top-level design utilize F2F layer. However, in 3-tier designs, we must decide how to utilize its F2F layer since it can have only one due to the bonding technology. The more the designer chooses to use F2F layer for block-folding, the less it can be used for top-level design, and vice versa. To study which is more beneficial in T2 Core, we studied two floorplans: (1) Using F2F layer for top-level design (F2F+F2B V1), and (2) use F2F layer for block-folding (F2F+F2B V2) [see Figure 8].

Our results show that F2F+F2B V1 reduces more power than F2F+F2B V2. F2F+F2B V1 showed -36.0% power reduction, but F2F+F2B V2 showed -34.7% power reduction than 2D. We explain this through the following reasons. First, extra power reduction from F2F bonding in folded blocks is not significant. Block-folding based 3-tier designs must consider (1) power reduction of the block itself from block-folding, and (2) options for better connectivity in the top level. For power reduction of single blocks by block-folding in standalone designs, Table VI shows a standalone power analysis when the four block-folding candidates are using F2B bonding or F2F bonding. Each standalone designs were done based on its optimal partition/pin locations. The total power reduction from F2F bonding is only -5.3mW. This is -1.5% of the total T2 Core power. Note that we are not seeing significant power reduction from folded blocks in F2F bonding. This is because 3-tier floorplanning limits many partitioning options for block-folding in F2F.

Second, top-level design quality in F2F+F2B V1 is better than F2F+F2B V2. F2F+F2B V1 and V2 uses 52% more top-level 3D connections (TSV count: 2,573) than F2B-only–block-folding design for top-level connection (TSV count: 1,693). However, since the optimal white spaces for TSV location are limited, this leads to worse TSV locations and design quality in the top level. In fact, the top-level design quality in V2 is worse than F2B-only–blockfolding design. However, note that F2F+F2B V1 uses F2F layer for top-level design. Despite the increased top-level F2F via count than F2B-only–block-folding design, F2F+F2B V1 provides better toplevel design quality, and provides more power reduction than F2Bonly–block-folding design (top-level design quality: F2F+F2B V1 *>* F2B-only–block-folding design *>* F2F+F2B V2). Comparing the top-level design quality, V1 achieves -17.3% cell count and -20.4% wirelength reduction and -29.4% total top-level power reduction than F2B-only–block-folding design. Better top-level design quality leads to more power reduction in blocks, because it requires the blocks to use less resources to optimize the boundaries. Therefore the design quality impact by better top-level design cannot be ignored. However, if other designs where F2F bonding is used for block-folding provides good top level routing too, it would lead to more power reduction than V1.

## *C. Overall Comparison*

Table V compares all designs that we have done in this paper based on whether block-folding technique is applied and the bonding style. GDSII layouts of our designs are illustrated in Figure 7, and designs that are not shown in the figure (such as non-folding–F2F+F2B) are based on a similar design as what is shown in Figure 7. First, we

TABLE V COMPARISON AMONG 3-TIER 3D IC DESIGNS BUILT WITH VARIOUS OPTIONS IN FOLDING AND BONDING STYLES. ALL FOLDED DESIGNS TARGET 4 BLOCKS (LSU, IFU, TLU, AND FGU) TO BE FOLDED.

|                   | 2D                | Non-Folding       |                   | Block-Folding      |                      |
|-------------------|-------------------|-------------------|-------------------|--------------------|----------------------|
|                   | $\lceil 3 \rceil$ | F2B-only          | $F2F + F2B$       | F2B-only           | $F2F + F2B$          |
| Clock period      | 1.5 <sub>ns</sub> | 1.5 <sub>ns</sub> | 1.5 <sub>ns</sub> | $1.5$ ns           | 1.5 <sub>ns</sub>    |
| Footprint $(mm2)$ | 3.08              | 1.44 $(-53.2\%)$  | $1.44(-53.2\%)$   | 1.44 $(-53.2\%)$   | $1.44(-53.2\%)$      |
| Si. Area $(mm2)$  | 3.08              | 3.00 $(-2.6\%)$   | $3.00(-2.6\%)$    | $3.00(-2.6\%)$     | $3.00(-2.6\%)$       |
| Wirelength (m)    | 22.4              | 14.3 $(-36.2\%)$  | 13.8 $(-38.4\%)$  | 13.4 $(-40.2\%)$   | 13.0 $(-42.0\%)$     |
| $#$ Cells         | 523.4K            | 403.9K (-22.8%)   | 394.3K (-24.7%)   | 370.9K $(-29.1\%)$ | 368.8K (-29.5%)      |
| # Buffers         | 221.7K            | 130.7K (-41.0%)   | 124.9K (-43.7%)   | 117.8K $(-46.9\%)$ | 114.2K $(-48.5%)$    |
| HVT cells         | 370.6K            | 377.4K            | 372.0K            | 348.6K             | 346.0K               |
|                   | [70.7%]           | $[93.4\%]$        | $[94.3\%]$        | $[93.9\%]$         | $[93.8\%]$           |
| $#$ TSV           |                   | 4.118             | 4.118             | 8.688              | 9.231                |
| Total power (mW)  | 348.3             | 248.1 $(-28.8\%)$ | $242.6 (-30.3\%)$ | $229.7 (-34.0\%)$  | $223.1 (-36.0\%)$    |
| Cell power (mW)   | 71.6              | $62.6(-12.6\%)$   | $62.1 (-13.3\%)$  | 54.1 $(-24.4\%)$   | 54.0 $(-24.6\%)$     |
| Net power (mW)    | 175.7             | $117.3 (-33.2\%)$ | $113.3 (-35.5\%)$ | $107.7 (-38.7%)$   | $102.0$ (-41.9%)     |
| Leak. power (mW)  | 101.1             | 68.2 $(-32.5\%)$  | $67.1 (-33.6\%)$  | $67.9$ $(-32.8\%)$ | $67.2$ ( $-33.5\%$ ) |



(a) V1: F2F bonding for top-level connection



(b) V2: F2F bonding for block-folding

Fig. 8. F2F bonding choice for more power reduction in F2F+F2B. (a) F2F bonding for top-level, (b) F2F bonding for block-folding (folded blocks in orange font).

TABLE VI F2F VS. F2B POWER COMPARISON IN FOLDED BLOCKS.

|            | F2B                      | F2F                       | Power reduction |
|------------|--------------------------|---------------------------|-----------------|
| FGU        | $28.1 \text{mW}$ (-5.7%) | $26.7 \text{mW}$ (-10.4%) | $-1.4mW$        |
| <b>TLU</b> | $23.2mW$ (-2.9%)         | 22.8mW $(-4.6\%)$         | $-0.4mW$        |
| LSU        | $76.3 \text{mW}$ (-7.3%) | 74.1mW (-10.0%)           | $-2.2mW$        |
| IFU        | 57.2mW $(-1.4\%)$        | 55.9mW $(-3.6\%)$         | $-1.3mW$        |
| Total      |                          |                           | $-5.3$ mW       |

emphasize that we achieve a maximum of -36% power reduction in block-folded–F2F+F2B design. This is 14.8% more reduction than what was reported in [3], and the most power reduction reported in any previous studies. Second, block-folding provides more power reduction than non-folding. In terms of bonding style, F2F+F2B reduces more power than F2B-only style. However, to visualize more power reduction from these design techniques, we note that more careful floorplanning and design must be done.

# VI. CONCLUSIONS

In this paper, we demonstrated power reduction benefits that 3 tier 3D IC design provides in an OpenSPARC T2 Core. First, we showed that one additional tier in 3-tier 3D ICs offers more power

savings than 2-tier 3D ICs. Second, 3-tiers can be bonded in mixed styles, and these mixed styles provide additional power reduction. However, more careful floorplanning, TSV management, and blockfolding considerations are required. Lastly, to demonstrate the maximum power reduction of 3-tier 3D ICs, we developed CAD tools that seamlessly integrate into commercial 2D tools for design and optimization. With aforementioned methods and design techniques combined, we achieved -14.8% more reduction than 2-tier 3D IC, and -36.0% total power saving against the 2D counterpart. Our future work will reveal 3-tier power saving in full-chip microprocessors, thermal issues in various 3-tier design styles, and circuit techniques to reduce more power.

#### **REFERENCES**

- [1] K. Ahmed and K. Schuegraf, "Transistor wars," *Spectrum, IEEE*, vol. 48, no. 11, pp. 50–66, November 2011.
- [2] B. Black *et al.*, "Die Stacking (3D) Microarchitecture," in *Microarchitecture, 2006. MICRO-39. 39th Annual IEEE/ACM International Symposium on*, Dec 2006, pp. 469–479.
- [3] M. Jung *et al.*, "How to reduce power in 3d ic designs: A case study with opensparc t2 core," in *Custom Integrated Circuits Conference (CICC), 2013 IEEE*, Sept 2013, pp. 1–4.
- [4] Y.-J. Lee and S. K. Lim, "On GPU bus power reduction with 3D IC technologies," in *Design, Automation and Test in Europe Conference and Exhibition (DATE), 2014*, March 2014, pp. 1–6.
- [5] U. Kang *et al.*, "8Gb 3D DDR3 DRAM using through-silicon-via technology," in *Solid-State Circuits Conference - Digest of Technical Papers, 2009. ISSCC 2009. IEEE International*, Feb 2009, pp. 130– 131,131a.
- [6] M. Jung *et al.*, "On enhancing power benefits in 3D ICs: Block folding and bonding styles perspective," in *Design Automation Conference (DAC), 2014 51st ACM/EDAC/IEEE*, June 2014, pp. 1–6.
- [7] Oracle, "OpenSPARC T2." [Online]. Available: http://www.oracle.com/.
- [8] Synopsys, 32/28nm Generic Library. [Online]. Available: http://www.synopsys.com/.
- [9] G. Katti *et al.*, "Electrical Modeling and Characterization of Through Silicon via for Three-Dimensional ICs," *Electron Devices, IEEE Transactions on*, vol. 57, no. 1, pp. 256–262, Jan 2010.
- [10] D. Kim, K. Athikulwongse, and S. Lim, "A study of Through-Silicon-Via Impact on the 3D Stacked IC Layout," in *Proc. IEEE Int. Conf. on Computer-Aided Design*, Nov 2009, pp. 674–680.
- [11] U. Nawathe *et al.*, "An 8-core 64-thread 64b power-efficient sparc soc," in *Solid-State Circuits Conference, 2007. ISSCC 2007. Digest of Technical Papers. IEEE International*, Feb 2007, pp. 108–590.