## A Novel Backside Signal Inter/Intra-Cell Routing Method Beyond Backside Power for Angstrom nodes

J. Lee<sup>1,3</sup>, S. Lee<sup>1</sup>, S. Lee<sup>1</sup>, Y. Ahn<sup>1</sup>, M. Kim<sup>1</sup>, G. Cho<sup>1</sup>, S. C. Song<sup>2</sup>, U. Roh<sup>2</sup>, M. Cai<sup>2</sup>, D. Greenlaw<sup>2</sup>, S. Molloy<sup>2</sup>, S.-K. Lim<sup>3</sup>, and R.-H. Baek<sup>1</sup>

<sup>1</sup>Department of Electrical Engineering, Pohang University of Science and Technology (POSTECH), Pohang, Korea 
<sup>2</sup>Google LLC, Mountain View, CA, USA

<sup>3</sup>School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA, USA Phone: +82-54-279-2220, E-mail: rh.baek@postech.ac.kr

**Abstract:** For the first time, we investigate a power-performancearea (PPA) benefit of novel backside (BS) signal (BSS) routing using BS gate/source-drain contact targeting the Angstrom node. INVx1 with BS-pin has a smaller miller capacitance (Cmiller) and shows 3.0 ~ 3.3 % higher ring oscillator (RO) frequency at isopower by BSS inter-cell routing. Even without BS-pin, standard cells can improve frontside (FS) routing congestion and an energydelay product (EDP) by using BSS intra-cell routing. BSS intracell routing based chip has a larger IR drop, but it is mitigated when the µBump pitch is small. The BSS intra-cell routing based chip shows an 8.61 ~ 9.46 % lower power delay product (PDP). Introduction: Over the past few years, the BS power (BSP) delivery network has been extensively studied due to its advantages in greatly improving IR drop and FS routing congestion [1]-[2]. Among various BSP methods, connecting power lines using BS source-drain contacts is studied as the most effective method [3]. Utilizing the BS space has great potential; placing only power lines in BS can be a waste of space. Thus, in a recent study, the BS clock tree using nanoTSV has been studied, improving chip power and performance (Fig. 1a) [4]. Furthermore, a BS gate contact (BSGC) that enables BSS without utilizing nanoTSV has also been implemented (Fig. 1b) [5]-[6]. Utilizing a BSGC enables a more sophisticated design compared to nanoTSV, including BSS intra and inter-cell routing using BS-pin. However, BSS using BSGC has only been implemented in a single device, and its detailed utilization methods and benefits have not been studied. In this paper, we analyzed the BSS intra/inter-cell routing method using BSGC and its PPA benefit in terms of cell and chip. Device and cell design assumptions: For BSP (Fig. 2a) and BSS cells (Fig. 2b), forksheet-FET (FSFET) based 80 nm cell height and 4-track standard cells are assumed to target the angstrom node. In the BSS cell, two BSS routing tracks are used, and BSS has a relatively smaller power line and BSC width. We used a 4-channel FSFET with a 20 nm channel width (Fig. 3a). Wrap-around contact (WAC) improves contact resistance and drive current [7]. Thus, before forming S/D BSC and BSGC, the WAC process should be applied in advance (Fig. 3b). There is no performance difference between BSP and BSS devices because WAC is used for both devices; thus, we use the same BSIM parameters. Interconnect materials and resistivity are assumed to be the same as in the previous paper [8] (Table I). We used Synopsys tools and modified the previous PDK when creating our Angstrom node PDK. Bi-directional metal lines are applied for cell design.

Standard cell design options: In the BSS cell, FS and BS-pin can be used selectively, and the INVx1 can be designed in four different layouts (Fig. 4a). If the input and output pins are formed on different sides, the pin capacitance  $(C_{\text{pin}})$  is reduced due to the reduction of C<sub>miller</sub> (Fig. 4b). Reduction of C<sub>pin</sub> impacts the interconnect load capacitance (Cload), which can improve overall chip power and performance. Placing both pins on the BS shows the highest C<sub>pin</sub> because the power line is on the BS, which occurs the C<sub>para</sub> between the BS pin and the power line (**Fig. 4c**). 15stage RO (fan-out = 3) simulation shows that BS routing only RO has a slightly lower frequency than FS only RO (Fig. 5). On the contrary, RO utilizing both side routing shows 3.0 ~ 3.3 % improved freq due to improved C<sub>miller</sub>. The BS-pin very clearly improves the cell, but advancements in EDA tools for chip design that can consider BSS inter-cell routing are not ready yet. To avoid the chip design issue of BS-pin, BSS intra-cell routing can be an alternative.

Backside signal intra-cell routing: In the BSP cell, pin metal and non-pin metal are located on the FS (Fig. 6a). On the contrary, in

the BSS cell, C<sub>miller</sub> can be reduced while maintaining FS-pin by relocating only non-pin metal (Fig. 6b); and we call it BSS intracell routing cell. Also, it increases the FS routing space, simplifying place and route. We designed 42 standard cells for BSP and BSS intra-cell routing (Fig. 7a) and assumed only one thin BS metal layer for intra-cell signal routing (Fig. 7b). However, some BSS cells, including INVx1, do not have non-pin metal and do not use BSS signal routing tracks (Fig. 7c). In this case, the only difference between BSP and BSS cells is MB1 power line width. Thus, INVx1 has almost no change for BSP and BSS (**Table II**). BUFx1 uses BSS routing and shows a smaller C<sub>pin</sub>. However, BUFx1 showed little change in EDP because its internal parasitic did not change significantly due to the simple layout. DFFHQNx1 has a more complex layout and many non-pin metal lines. Since relocating all non-pin metals to the BS is impossible, we placed them selectively on the BS. BSS utilized DFFHQNx1 significantly improves parasitic, showing a 4.4 % smaller C<sub>pin</sub> and  $4.5 \sim 6.0$  % lower EDP for both fast and slow cases.

**IR drop analysis:** Three BS metal layers and μBump were applied for chip simulation (**Fig. 8a**). The power line of the BSS cell has a smaller width (**Fig. 8b**), smaller thickness, and larger resistivity (**Table I**), which affects the IR drop of the power mesh. We also considered the ideal case BSP with a large MB1 width. BSS chip has a  $17 \sim 19$  % higher IR drop than the BSP and  $51 \sim 63$  % higher than the BSP\_ideal (**Fig. 9a**). When the μBump pitch is a typical 40 μm [9], BSS has an IR drop of more than 100 mV, which is about 40 mV larger than the BSP\_ideal. However, if a less than 20 μm μBump pitch is achieved through package innovation, the IR drop is very low even in the BSS chip. Even in the large BS power mesh pitch cases with a lot of spare space on the BS, BSS has a very low IR drop at the small μBump pitch (**Fig. 9b**). Thus, a large power mesh pitch can be used, and it shows the potential of applying BSS inter-cell routing using BS spare space.

Chip PPA analysis: In all benchmarks with ideal power mesh, the

BSS intra-cell routing based chip shows a smaller internal (P<sub>internal</sub>) and switching power (P<sub>switch</sub>) than the BSP based chip due to the improved FS routing space and reduced cell's Cpin and Pint, showing  $4.1 \sim 4.7$  % smaller total power consumption (P<sub>total</sub>) (Table III). Also, the BSS chip improves the worst (WNS) and total negative slack (TNS) and shows 4.7 ~ 5.3 % higher effective frequency (Freq<sub>effective</sub>). Overall, the BSS chip shows 8.61 ~ 9.46 % PDP improvement for all benchmarks. Clearly, BSS intra-cell routing is a novel scheme that can improve the chip beyond BSP without concerns about power mesh IR drop and BS-pin design. Conclusion: We unveiled PPA benefit of BSS inter and intra-cell routing scheme compared to BSP. BSS inter-cell routing improves RO, but BS-pin chip design is a concern. BSS intra-cell routing using FS-pin achieves lower EDP, especially for complex cells. At ideal bump pitch, the BSS chip shows low IR drop, and the BSS intra-cell routing based chip shows 8.61 ~ 9.46 % PDP reduction. Obviously, BSS intra-cell routing is a promising scheme beyond BSP that enables chip improvement without BS-pin design issue. Acknowledgment: This work was supported by the Ministry of Trade, Industry & Energy (MOTIE, Korea) (20019450, RS-2023-00234828) and National Research Foundation of Korea (NRF-2022R1C1C1004925)

References: [1] D. Prasad *et al.*, *IEDM*, 2019, pp. 446–449. [2] J. Lee *et al.*, *VLSI*, 2023, pp. 1-2. [3] S. Yang *et al.*, *VLSI*, 2023, pp. 1–2. [4] P. Vannaiampikul *et al.*, *VLSI*, 2024, pp. 1–2. [5] J. Park *et al.*, *VLSI*, 2024, pp. 1–2. [6] M. Kobrinsky *et al.*, *IEDM*, 2023, pp. 1-4. [7] W. -L. Sung *et al.*, *IEEE TED*, vol. 68, no. 6, pp. 3124-3128, 2021. [8] V. Vashishtha *et al.*, *Microelectronics Journal*, 2022, pp. 1–2. [9] S. B. Samavedam *et al.*, *IEDM*, 2020, pp. 1–10.



Fig. 1. Two different methods using BS region. (a) BS clock tree using nanoTSV, (b) BSS using BSGC

[Case1]

1 sp 10 n

**NFET** 

Freq

Case1

77.4

65.79

56.55

1.176

Case2

122.34

86.78

56.75

1.410

[Case2]



Fig. 3. (a) Two different BSC cases. BSC without WAC shows poor performance. (b) WAC and BSC process flow and key parameters of TCAD simulation. We assumed WAC for both BSP and BSS



Fig. 4. (a) Four possible INVx1 designs considering FS and BS-pin. (b) Using both side (FS/BS) pins reduces Cpin. (c) Two INVx1 layouts have different pin configurations.



Fig. 2. Metal and layout configuration of (a) FSFET & BSP and (b) FSFET & BSS. For both cases, 4 metal tracks and FSFET with 80 nm cell height are assumed.

TABLE I. Assumption of MOL & BEOL metal layers and dielectric.

 $\times$  BSP: MB1 = 40 nm / BSP\_ideal: MB1 = 70 nm

 $\rho$  ( $\Omega$ · $\mu$ m)

0.025

0.025

 $\times$  Aspect ratio of M1 $\sim$ M9 = 2, M0 = 3, MB1 $\sim$ MB3 = 1 (MB1, BSS = 2)

|                       |          | Pern       | nittivity |     |                              |            | L  | iner.           | М  | aterial          |  |
|-----------------------|----------|------------|-----------|-----|------------------------------|------------|----|-----------------|----|------------------|--|
| LowK                  |          | 2.7        |           |     | M0                           |            |    | 2 nm            |    | Tungsten         |  |
| Dielectric<br>Barrier |          | 4.8        |           | M-  | M1~M3, MB1<br>M4~M9, MB2~MB3 |            |    | 1 nm<br>2 nm    |    | Cobalt<br>Copper |  |
|                       |          |            | BSP       | BSS |                              |            |    | BS              | P  | BSS              |  |
|                       | Width    | (nm)       | 10        |     | M8                           | Width (nm  | ۱) | 30              |    |                  |  |
| M0                    | ρ (Ω·μm) |            | 0.356     |     | <br>M9                       | ρ (Ω·μm)   |    | 0.043           |    |                  |  |
| M1                    | Width    | (nm)       | 10        |     |                              | Width (nm  | 1) | 40/             | 70 | 10/30            |  |
| ~<br>М3               | ρ (Ω     | ·µm)       | 0.189     |     | MB1                          | MB1        |    | 0.103/<br>0.084 |    | 0.189/<br>0.127  |  |
| M4                    | Width    | (nm)       | 15        |     |                              | Width (nm  |    | 90/140          |    | 70               |  |
| ~<br>M5               | ρ (Ω     | ·µm)       | 0.073     |     | MB2                          | ρ (Ω·μm)   |    | 0.029/<br>0.025 |    | 0.032            |  |
| M6                    | Width    | Width (nm) |           | )   |                              | Width (nm) |    | 140             |    | 140              |  |
| ~                     |          |            |           |     | ⊣ мвз                        | MB3        |    |                 |    |                  |  |

0.057

ρ (Ω·μm)

М7



Fig. 5. 15stage RO (fan-out = 3) simulation results for different routing schemes. RO\_ver3 effectively reduces Cmiller and shows 3.0 ~ 3.3 % frequency improvement at iso-power.



Fig. 6. BUFx1 layout with (a) BSP and (b) BSS intra-cell routing. BSS intra-cell routing can reduce the C<sub>miller</sub> between the pin and nonpin metal. Also, the BSS cell secures additional M1 routing space

TABLE II. Comparison of BSP and BSS intra-cell routing based standard cells. BSS utilized cell shows smaller Cpin. Complex standard cell utilizing BSS shows a significantly improved EDP.

※ EDP Unit: 10-39 J⋅s

Case3

|                                                  |         |         |         |         | ™ LDI OI    | 111. 10 7 3      |  |  |  |
|--------------------------------------------------|---------|---------|---------|---------|-------------|------------------|--|--|--|
|                                                  | IN'     | /x1 BUF |         | Fx1     | DFFH        | QNx1             |  |  |  |
|                                                  | BSP     | BSS     | BSP     | BSS     | BSP         | BSS              |  |  |  |
| Area                                             | 0.007   | 0.007   | 0.013   | 0.013   | 0.054       | 0.054            |  |  |  |
| C <sub>pin</sub> (fF)                            | 0.189   | 0.189   | 0.187   | 0.183   | 0.195 (clk) | 0.186 (clk)      |  |  |  |
| Fast case: Input slew: 10 ps Output cap: 1.44 fF |         |         |         |         |             |                  |  |  |  |
| Delay (ps)                                       | 7.96    | 7.96    | 8.70    | 8.70    | 15.92       | 15.70            |  |  |  |
| t <sub>tran</sub> (ps)                           | 10.76   | 10.76   | 10.13   | 10.11   | 11.15       | 11.13            |  |  |  |
| P <sub>int</sub> (fJ)                            | 0.01258 | 0.01255 | 0.04636 | 0.04628 | 0.2247      | 0.2172           |  |  |  |
| EDP                                              | 797     | 795     | 3508    | 3502    | 5694        | <b>&gt;</b> 5353 |  |  |  |
| Slow case: Input slew: 40 ps Output cap: 5.76 fF |         |         |         |         |             |                  |  |  |  |
| Delay (ps)                                       | 30.71   | 30.71   | 25.48   | 25.49   | 34.86       | 34.58            |  |  |  |
| t <sub>tran</sub> (ps)                           | 42.11   | 42.11   | 38.79   | 38.76   | 39.17       | 39.15            |  |  |  |
| P <sub>int</sub> (fJ)                            | 0.01749 | 0.01746 | 0.07154 | 0.07141 | 0.2504      | 0.2429           |  |  |  |
| EDP                                              | 16494   | 16466   | 46446   | 46398   | 304290      | > 290454         |  |  |  |



Fig. 9. (a) IR drop comparison of power meshes for different μBUMP pitches. (b) BSS has a higher IR drop, but even in the large power mesh pitch case, BSS shows a low IR drop at a small µBUMP pitch.



Fig. 7. (a) Distinction of BSS intra-cell routing utilized and unutilized standard cells. (b) BSS cell has a small and thin MB1 layer, and only MB1 has a BSS line. (c) Example of BSS-unutilized cell layout.



Fig. 8. (a) Cross-section that illustrates our chip configuration. MB1~3, μBump are considered. (b) Top view of MB1, MB2 power mesh for BSP, BSP\_ideal, and BSS chips.

TABLE III. Power and performance metrics between BSP and BSS intra-cell routing based chip for various benchmarks. BSS chip improves P<sub>Total</sub> & Freq<sub>effective</sub>.

※ Assuming ideal power mesh (low IR drop)

| Chin                           | AES     |       | JP      | EG     | FFT     |              |  |
|--------------------------------|---------|-------|---------|--------|---------|--------------|--|
| Chip                           | BSP     | BSS   | BSP     | BSS    | BSP     | BSS          |  |
| Freq <sub>target</sub> (Ghz)   | 6.67    |       | 2.78    |        | 4       |              |  |
| P <sub>internal</sub> (pW)     | 3.50    | 3.34  | 64.0    | 61.4   | 385     | 370          |  |
| P <sub>switching</sub> (mW)    | 2.69    | 2.56  | 17.6    | 16.9   | 99.2    | 94.6         |  |
| P <sub>leakage</sub> (mW)      | 0.023   | 0.023 | 1.06    | 1.02   | 2.17    | 2.13         |  |
| P <sub>Total</sub> (mW)        | 6.21 =  | 5.92  | 82.7    | 79.3   | 486 =   | <b>→</b> 466 |  |
| ΔP <sub>Total</sub> (%)        | -4.7 %  |       | -4.1 %  |        | -4.1 %  |              |  |
| WNS (ps)                       | -16.33  | -8.44 | -65.74  | -47.18 | -45.73  | -31.19       |  |
| WNS (% period)                 | -10.88  | -5.62 | -18.26  | -13.11 | -18.29  | -12.48       |  |
| TNS (ps)                       | -1080   | -340  | -9226   | -8389  | -2914   | -1627        |  |
| Freq <sub>effctive</sub> (Ghz) | 6.01 =  | 6.31  | 2.35    | 2.46   | 3.38    | <b>3.56</b>  |  |
| ΔFreq <sub>effctive</sub> (%)  | +4.99 % |       | +4.7 %  |        | +5.3 %  |              |  |
| ΔPDP (%)                       | -9.46 % |       | -8.61 % |        | -9.18 % |              |  |