# A Design Methodology for Back-side Power and Clock Routing Co-Optimization

Pruek Vanna-iampikul, Hang Yang, Jungyoun Kwak, Joyce X Hu,

Amaan Rahman, Nesara Eranna Bethur, Cong Hao, Shimeng Yu, Sung Kyu Lim

School of Electrical and Computer Engineering, Georgia Institute of Technology, USA, Email: v.pruek@gatech.edu

## Abstract

This paper presents a backside (BS) design methodology for optimizing both power delivery network (PDN) and clock routing in 3nm. A unit converter (UC) has been integrated on the backside with BS-PDN to minimize dynamic IR-drop. Additionally, our new buffer cell with backside contacts enables backside clock routing. Experimental results show that BS-PDN mitigates 57.7% IR-drop compared with FS-PDN, and UC further reduces IR-drop by 10.3% and package IR-drop by 83.9%. Our backside clock routing improves clock power by 32% and full-chip power-delay product by 13.6%.

## Introduction

Backside metallization techniques, such as buried power rail (BPR) and backside metal (BSM) layers, have been rapidly adopted in advanced technologies to mitigate IR-drop. These wide metals on the backside feature lower parasitic resistance, which is ideal to provide power to the cells with minimum IR-drop. In addition, researchers began to ask if backside metals will benefit clock delivery if done on the backside as well [1]. The lower resistance is expected to improve clock latency and skew, which will eventually reduce full-chip critical path delay. This paper reports quantified benefits of both, applied to RISC-V OpenPiton architecture designed and simulated using a 3nm technology [2].

## **Back-side Power Delivery Network**

Our BS-PDN structure is illustrated in Fig. 1, where the PDN utilizes almost 100% of BSM resources, decoupling power routing resources from signal on the front side.

**A. Back-side DC-DC Converter:** The on-chip DC-DC unit converter (UC) provides high-efficiency conversion and block-level voltage regulation [3]. Packaging parasitics lead to unwanted IR-drop/bounce, impacting both frontside- (FS) and BS-PDN. Instead, on-chip UCs can mitigate voltage drop from package and bonding; however, their large size makes them impractical for FS integration. In contrast, the backside offers sufficient space, enabling dense UC integration without causing routing congestion.

**B.** Integration of BS-UC: Our 4:1 backside UC (BS-UC) converts 3.3V down to an on-chip supply voltage of 0.7V. To separate the two voltage domains, two additional backside metal layers, MB3 and MB4, are added (see Table I). MB3 is dedicated to BS-UC routing; MB4 is utilized to supply 3.3V VDD and 0V VSS inputs to the BS-UC. Fig. 2 shows our BS-UC stack up. Our voltage domain decoupling ensures no connectivity between MB4 and MB2 layers, conserving the BS-PDN configuration. For BS-UC placement, we apply an interleaving strategy for compactness. BS-UC PDN metal layer breakdown and BS-UC placement are shown in Fig. 3.

**C. Benefits of BS-UC:** The BS-UC reduces both the worstinstance dynamic IR-drop and layer-wise minimal voltage drop (see Fig. 4). Finally, the decoupling strategy enables higher C4/micro bump density without incurring significant power pad area overhead.

# **Back-side Power and Clock Delivery Network**

After the backside PDN and UC have been integrated into the

backside metal layers to achieve an acceptable IR-drop, we use the remaining area in MB1-MB2 for back-side clock routing. Unlike power TSVs, clock TSVs connect the frontside layer to the backside layer. Their specifications are listed in Table I.

**A. Backside Buffers:** With the BPR layer in between the backside layer and the frontside layer, TSVs are required to transition from front to back for backside routing, as well as from back to front to connect to clock pins in the flip-flops (FFs) located on the front side. Our strategy is to integrate TSVs directly into the clock buffer cell, called backside buffers (BS\_BUF), instead of striving to find empty spaces in the thin silicon substrate. We created two types of backside buffers: BS\_OUT and BS\_IN, where the direction indicates the net outgoing or incoming from the frontside layer (see Fig. 5).

**B. Backside Clock Routing:** Given an initial clock tree, the trunk nets of a selected group of FFs are serviced on the backside to optimize clock metrics such as delay and skew. We manually select a subset of violating FFs (negative slack) to route with back-side wires to improve clock arrival time. In addition, we use BS\_OUT and BS\_IN buffers to transition between the two sides of the substrate. Fig. 6 shows a cross-section view of our clock tree partitioning and metal layer usage strategy.

**C. Benefits of Back-side Clock Routing:** Table III shows that our back-side clock routing (BS-CDN) achieves higher performance than the front-side counterpart (FS-CDN) for clock and full-chip metrics. Both FS-CDN and BS-CDN have their PDN routed on the backside. But, we added BS-UC to our BS-CDN design only, while FS-CDN did not use the unit converters. Experiments show that BS-CDN uses fewer cells and comparable wirelength. The power saving comes from the reduction in the clock buffer, thanks to the lower parasitics of backside metal layers. The performance improvement in BS-CDN is due to the better clock latency at critical FF endpoints. As a result, the BS-CDN achieves 13.6% better in Power Delay Product (PDP) compared with FS-CDN.

## **Conclusion and Acknowledgements**

We showed that backside metals and backside unit converters improved IR-drop in an advanced node. Moreover, our backside clock routing improved clock-related and fullchip performance and power metrics. Our ongoing work investigates the benefits and challenges of backside metals on global signal routing.

This research is funded by CHIMES JUMP 2.0 Center of Semiconductor Research Corporation (SRC) and DARPA, and Interuniversity Microelectronics Centre (IMEC).

## References

[1] A. Veloso, et al. "Insights into Scaled Logic Devices Connected from Both Wafer Sides," IEDM, 2022.

[2] S. Shaji, et al, "A Comparative Study on Front-Side, Buried and Back-Side Power Rail Topologies in 3nm Technology Node," ISLPED, 2023.

[3] J. Kwak, et al., A Reconfigurable Monolithic 3D Switched-Capacitor DC-DC Converter with Back-End-of-Line Oxide Channel Transistor, MWSCAS, 2023.



Fig. 1 Our front and back-side metal structures.







Fig. 3 Our metal layer usage. (a) MB4 with UCs, VDD, and VSS wires interleaving. (b) MB2-MB1 used for VDD and VSS wires. (c) MBPR layer used for buried power rails and nano-TSVs. (d) M1-M6 front-side metals for clock & signal routing.

#### TABLE I

Technology specifications of the wires and vias used in this work. Note that there are two types of TSVs: Power-TSV to connect MB1 to MBPR, and Clock-TSV to connect MB1 to M1. The clock TSV specification is based on [1].

| Details | Metal /      | Width | Pitch | Resistance |
|---------|--------------|-------|-------|------------|
|         | Via          | (µm)  | (µm)  |            |
| UC      | MB3          | 34    | 100   | 0.19 Ω/m   |
| FS-PDN  | M6-M5        | 0.5   | 1.344 | 0.96 Ω/m   |
|         | MBPR         | 0.24  | 0.025 | 6.77 Ω/m   |
| BS-PDN  | Power-TSV    | 0.06  | 0.5   | 5 Ω        |
|         | (MB1-MBPR)   |       |       |            |
|         | Clock-TSV    | 0.09  | 0.18  | 10 Ω       |
|         | (MB1-M1)     |       |       |            |
|         | MB1-MB2      | 0.17  | 1     | 0.17 Ω/m   |
|         | MB4 (for UC) | 0.17  | 1     | 0.17 Ω/m   |



Fig. 4 Dynamic IR-drop comparison among FS-PDN, BS-PDN, and BS-PDN with UC.



Fig. 5 Back-side clock routing and back-side buffers.

| TABLE III                                                |                    |        |        |  |  |  |
|----------------------------------------------------------|--------------------|--------|--------|--|--|--|
| PPA and Clock metrics between FS-CDN and BS-CDN designs. |                    |        |        |  |  |  |
| BEOL: 2+6 (back + front)                                 | OpenPiton (1.1GHz) |        |        |  |  |  |
|                                                          | FS-CDN             | BS-CDN | Δ%     |  |  |  |
| # Unit Converters                                        | -                  | 16     | -      |  |  |  |
| UC Total Power (mW)                                      | -                  | 2.107  | -      |  |  |  |
| Eff. Freq (GHz)                                          | 1.07               | 1.10   | 2.89%  |  |  |  |
| # Cell                                                   | 332K               | 320K   | 3.61%  |  |  |  |
| Wirelength (m)                                           | 1.26               | 1.27   | -      |  |  |  |
| Total power (mW)                                         | 85.3               | 75.91  | 11.01% |  |  |  |
| Worst Negative Slack (ps)                                | 27.1               | 0      | 1X     |  |  |  |
| Power Delay Product                                      | 79.86              | 69.01  | 13.59% |  |  |  |
| Clock wirelength (mm)                                    | 46.4               | 46.9   | -1.08% |  |  |  |
| # Clock Buffer                                           | 1,201              | 1,111  | 7.49%  |  |  |  |
| Clock Power (mW)                                         | 12.50              | 8.48   | 32.16% |  |  |  |

| unium unum                        |                                                |
|-----------------------------------|------------------------------------------------|
|                                   |                                                |
|                                   |                                                |
| BPR+nTSV                          |                                                |
| clock PD                          | N Clock signal                                 |
| (a) backside metal layers (MB1-MB | <ol> <li>(b) frontside metal layers</li> </ol> |

11.40

6.70

41.23%

Sequential Power (mW)

Fig. 6 Metal layer utilization of back-side clock (BS-CDN) design.