# A Comparative Study on Front-Side, Buried and Back-Side Power Rail Topologies in 3nm Technology Node

Sandra Maria Shaji, Lingjun Zhu, Junsik Yoon, Sung Kyu Lim School of ECE, Georgia Institute of Technology, Atlanta, GA sshaji3@gatech.edu, limsk@ece.gatech.edu

Abstract—The standard cells are becoming increasingly smaller due to aggressive device down-scaling, and power rails take up a sizable portion of the available space. Buried Power Rail (BPR) and Back-Side Power (BSP) have been gaining more attention owing to their capacity to reduce the standard cell height from 6-Track in the traditional Front Side Power Rail (FS-PR) to 5-Track and 4-Track, respectively. In this paper, we provide a comprehensive comparison of power rail topologies at the device, standard cell, and full chip design level in terms of Power, Performance and Area (PPA). Our experiments show that nanosheet width scaling for BPR and BSP reduces device gate capacitance by 26% and 40%, respectively, resulting in an improvement of internal power of over 33% and 40%, respectively, at the standard cell level, and total power drop of over 24% and 30%, respectively, at the full chip level. Additionally, the floorplan can be shrunk down by 7% with BPR compared to FSPR, and even further by an additional 17% with BSP. This study also demonstrates the Back-side Power delivery network (BS-PDN) benefits in IR drop for BPR and **BSP** topologies.

## I. INTRODUCTION

Aggressive scaling over the last decade, allowed the threedimensional fin-structured FETs (FinFET) to replace the planar MOSFET structures in and beyond 14 nm technology node [1] up until now at sub-3nm where vertically stacked Nanosheet FETs (NSFET) are gaining attention due to its capacity to mitigate Short Channel effects (SCEs) [2]. However, due to lithography and process limitations, the scaling of device dimension has slowed down and new device architectures like Forksheet devices and Complementary-FET (CFET) are being explored. At the standard cell level, scaling techniques like Contact Over Active Gate (COAG) and Single Diffusion Break (SDB) have been successful in minimizing cell footprints. However, when device size decreases with advanced technology nodes, the standard cell footprint also reduces accordingly with power rails occupying a significant portion of it. Therefore, standard cell layouts that reposition the power rail to create more room will be necessary as cells continue to scale down.

A conventional standard cell with Front-Side Power Rail (FS-PR) in 3 nm technology node has a cell height of 6-Track, with the power rails taking up two of the tracks [3]. With the Buried Power Rail (BPR) technology, only one track

979-8-3503-1175-4/23/\$31.00 ©2023 IEEE

need to be reserved to draw connections from the power rail and the cell height can be reduced to 5-Track [3]. The novel technology of Back-Side Contact (BSC) introduced in [4] removes even this single-track requirement for power rails, reducing the cell height to 4-Tracks. Power rails in this technology are moved to backside metal layer underneath the active region and connected through BSC.

There are numerous works studying the scalability and performance of BPR technologies with respect to traditional FS-PR at the standard cell level [5] [3]. However, there isn't a complete study on how the down-scaled BPR cells compare against the FS-PR cells in terms of power, performance, and area in full chip designs. While the study presented in [4] introduces the potential of the BSC technology to further downscale the cells, there is no work presented on its implementation and impact on standard cells and full chip design performance and power.

In this paper, we present a comprehensive study comparing different power rail topologies highlighting their impact on PPA at three levels- device, standard cell, and full chip design. In addition to implementing FS-PR and BPR topologies, we for the first time, present a thorough study on the use of BSC to develop a third power rail topology, Back-Side Powerrail (BSP). As part of this study, we develop Process Design Kits (PDK) and cell libraries for each of the three power rail architectures. We then use them in tandem with appropriate power delivery network design in full chip design simulation to evaluate the PPA and supply voltage drop impact between the different power rail technologies.

### II. STANDARD CELL DESIGN

In this section, we design the layout of a single drive strength inverter cell, INVx1, for the three power rail configurations and discuss the design constraints involved. Fig 1 presents the FS-PR, BPR and BSP layouts for INVx1. The layers used in the layouts are derived from ASAP7 [6] cell library and are scaled to 3 nm technology node. The section view of the three power rail topologies are presented in Fig 2.

In the FS-PR configuration, the power rails are in the M1 metal layer and are  $3 \times$  Critical Dimension (CD) wide. The cell height for this configuration is 6-Track, i.e., 144 nm. The nanosheet width for the device is fixed at 32 nm.



Fig. 1. INVx1 cell layout comparison (a) FS-PR, (b) BPR and (c) BSP power rail configurations.



Fig. 2. Standard cell section view comparison FS-PR, BPR and BSP power rail configurations.

In the BPR configuration, the power rails are moved to the buried metal layer, MBPR, and are connected to the Source/Drain (S/D) epi through  $12 \text{ nm} \times 18 \text{ nm}$  VBPR vias. We assume the following design rules: 25 nm wide MBPR metal, 9.5 nm MBPR-Nanosheet space, and Nanosheet tipto-tip gap of 35 nm [4]. NS width that complies with these design requirements and fits within the 5-Track cell height is calculated to be 21 nm.

In BSP topology, the power rails are placed in the Backside Metal layer, MB1, and connect to the S/D epi directly from the bottom through BSC. Scaling nanosheet width with cell height could lead to insufficient drive current causing performance to degrade at the chip level, so we scale the nanosheet such that the design rule 35 nm nanosheet tip-tip spacing is met. As only four tracks can be used for interconnects inside a cell, the gate cut for four-track cells must be scaled down to one CD to prevent open circuits on the outermost tracks.

## III. PDK DEVELOPMENT FOR THREE PR TOPOLOGIES

To compare the three power rail configurations, we develop PDKs with NSFET devices of nanosheet widths 32 nm, 21 nm and 13 nm for FS-PR, BPR and BSP configurations respectively. We follow the three staged flow in Fig 3 for this purpose.



Fig. 3. DTCO flow implemented in this work to develop PDK and cell library for Place and Route in Synopsys EDA tools [5]

#### A. Device Structure Design

To develop the 3 nm PDK, we start with the NSFET device structure design. We simulate three NSFET devices of nanosheet widths - 32 nm, 21 nm and 13 nm as derived in Section II. The device contact poly pitch is scaled to 42nm from 48nm in the TSMC 5nm finFET device [7]. The gate length and spacer length are scaled to 12 nm and 5 nm, respectively as shown in Table I. The nanosheets in the device design are 10 nm apart and 5 nm thick, respectively. Due to an increase in the access resistance in lower nanosheet layers with number of nanosheets [8], we limit the device design to 3 nanosheets. The S/D epi is made rectangular to avoid epi merging in standard cells. We adopt punch-through stopper (PTS) doping to avoid current leakage through the substrate.

TABLE I Comparison of device physical parameters of FinFET [7] and NSFET device.

| Physical parameters (nm) | TSMC 5nm | NSFET 3nm  |
|--------------------------|----------|------------|
| CPP                      | 48       | 42         |
| Fin/NS pitch             | 25       | NA         |
| Gate length              | 15       | 12         |
| spacer length            | 6        | 5          |
| # Fin/NS                 | 2        | 3          |
| Fin/NS thickness         | 6        | 5          |
| Fin/NS width             | NA       | 32, 21, 13 |
| Fin/NS stack height      | 55       | 55         |

## B. TCAD Simulation and Device Modelling

With Synopsys sdevice platform we perform the steps in the process simulation flow as shown in Fig 4. The nanosheet stack fin is formed by alternating Si and SiGe epitaxy and patterned like a wide fin. This is followed by Shallow Trench Isolation (STI), polygate deposition and patterning, and inner spacer deposition. For S/D epitaxy, we choose Si for n-type FET and Si<sub>0.5</sub>Ge<sub>0.5</sub> for p type FET. The S/D epi and PTS doping concentrations for both n-type and p-type NSFETs are  $4 \times 10^{20}$  and  $4 \times 10^{18}$ . The anneal temperature is 1000 °C and anneal time for n-type and p-type MOSFETS are 0.4 s and 0.6 s respectively. The SiGe sacrificial layers between nanosheets are removed after the S/D epitaxy. The polysilicon dummy gate is replaced with High K Metal Gate (HKMG)



Fig. 4. Steps involved in TCAD process simulation of NSFET device.

stack. Gate, source, and drain contacts are established and the structure is replicated to form the other half.

We perform device simulation to obtain the output and transfer characteristics of the device. The device simulations follow the equations and models used in [5]. For reliable estimation of device performance, the NSFET device is fully calibrated to the IMEC N3 prediction model [9] as shown in Fig 5. Performance deviations with structure and doping profile changes is verified with the sensitivity table in [10]. We simulate the NSFET device to obtain the output characteristics at multiple gate biases as well as the transfer characteristics for the device at drain biases of 0.05V and the operating voltage, 0.7 V. We also perform AC Simulation in TCAD to generate a gate capacitance  $(C_{qq})$  versus gate voltage curve. The compact model card for the device is generated from curve fitting of the device simulation outputs using Berkeley Short channel IGFET model (BSIM) Common Metal Gate (CMG) model with HSPICE.



Fig. 5. TCAD calibration to IMEC N3 prediction model [9].

#### C. Technology and Interconnect Files

The BEOL stack is adopted from ASAP7 [6] and scaled down to 3 nm node as shown in the Table II. The width and resistance of the buried metal layer, MBPR, and the via connecting it to M0 metal, VBPR is derived from [5]. The MBPR power rails in BPR cell are connected to the backside metal layers through wide micro-TSVs of low resistance of 5  $\Omega$ . The BSC in BSP cells is assumed to be of Ru with WAC TiN and is of resistance of 74.9  $\Omega$  [4] [11]. The backside metals are made wide to provide low resistance path for power delivery. The BEOL metal dimensions and rules are included in the technology file (.tf). The interconnect file with the parasitics of the metal and dielectric layers was compiled in Synopsys STAR RC to obtain the NXTGRD and TLUPlus files for parasitic extraction of standard cells and full chip designs, respectively.

 TABLE II

 METAL AND VIA DIMENSIONS AND RESISTANCE USED IN THIS WORK.

|                | Metal      | W (nm)           | Resistance ( $\Omega$        |  |
|----------------|------------|------------------|------------------------------|--|
| Back-side      | MB2-MB1    | 61               | 34                           |  |
| BPR cell layer | MBPR       | 25               | 65                           |  |
|                | M0         | 20               | 523                          |  |
|                | M1-M3      | 12               | 347                          |  |
| Front-Side     | M4,M5      | 18               | 101                          |  |
|                | M6,M7      | 24               | 44                           |  |
|                | VIA        | $W 	imes L nm^2$ | <b>Resistance</b> $(\Omega)$ |  |
| Back-Side      | VB1        | $40 \times 40$   | 4.0                          |  |
| DDD call lavor | VBPR       | $20 \times 12$   | 74.6                         |  |
| BFK Cell layer | $\mu$ -TSV | $60 \times 60$   | 5                            |  |
| BSP cell layer | BSC        | $20 \times 20$   | 74.9                         |  |
| Front-Side     | V0-V3      | $12 \times 12$   | 63.5                         |  |
|                | V4,V5      | $18 \times 18$   | 19.8                         |  |
|                | V6,V7      | $24 \times 24$   | 10.8                         |  |

#### D. Standard cell characterization

The "cell library" section of the flow in Fig 3 defines the steps followed to derive the LIB file that captures the electrical behavior and LEF file capturing the pin shape and location. The layouts for 57 standard cells listed in Table III are drawn manually for each of the three power rail configurations. The electrical connectivity is verified in Layout Vs Schematic (LVS) Verification and the LEF is exported from the layout abstract. Parasitic extraction of the layouts with Synopsys STARRC generates the RC Netlist for each of the 57 standard cells and cell characterization on this RC netlist generates the LIB file for the three cell libraries.

#### IV. STANDARD CELL PPA

The three cell libraries generated have different cell heights and employ devices of different NS widths. The libraries, therefore, vary in cell power and performance. Table IV compares the device metrics in each cell library. Compared to NSFET in FS-PR, on current ( $I_{on}$ ) reduces by 35% and 50% as effective width ( $W_{eff}$ ) decreases from 222 nm in the FS-PR device to 156 nm and 108 nm in BPR and

TABLE III Standard cells in the cell library

| Std Cells     | (input)x(#parallel transistors)                    |
|---------------|----------------------------------------------------|
| INV           | x1, x2, x3, x4, x5, x6, x7, x8, x10, x12, x14, x16 |
| BUF           | x1, x2, x3, x4, x5, x6, x7, x8, x10                |
| NAND, NOR     | 2x1, 3x1, 4x1, 2x2, 3x2                            |
| AND, OR       | 2x1, 3x1, 4x1                                      |
| XOR, XNOR     | 2x1, 3x1, 2x2                                      |
| AOI, OAI      | 21x1 , 22x2, 221x1, 222x1, 11x1, 31x1              |
| MUX           | 2x1                                                |
| DFF (flipflop | Hx1, HQNx1                                         |
| DHL (latch)   | x1                                                 |

BSP devices, respectively. The low mobility of holes along the dominant transport at 100 surface orientation reduces  $I_{on}/W_{eff}$  in p-type NSFET compared to n-type NSFET. The gate capacitance reduces with nanosheet width owing to the smaller inversion charge and the lower outer fringing bringing down the inversion capacitance and the parasitic capacitance  $(C_{par})$ , respectively.

 TABLE IV

 Device merit comparison of NSFETs in the three cell libraries

 - 6 track FS-PR, 5 track BPR and 4 track BSP.

| PR             | 6T FS-PR |      | <b>5</b> T | BPR  | 4T BSP |      |  |
|----------------|----------|------|------------|------|--------|------|--|
| Metric         | PFET     | NFET | PFET       | NFET | PFET   | NFET |  |
| NS width       | 32 nm    |      | 21 nm      |      | 13 nm  |      |  |
| $W_{eff}$      | 222 nm   |      | 156        | nm   | 108 nm |      |  |
| $I_{on}$ (mA)  | 0.14     | 0.2  | 0.09       | 0.13 | 0.07   | 0.10 |  |
| $C_{par}$ (fF) | 0.11     | 0.13 | 0.08       | 0.09 | 0.07   | 0.08 |  |
| $C_{gg}$ (fF)  | 0.15     | 0.16 | 0.11       | 0.11 | 0.09   | 0.09 |  |

In the standard cell level, a larger device  $I_{on}$  indicates a larger cell drive current and hence faster cells. However, wider nanosheet's high gate capacitance leads to high pin capacitance, which can slightly slow down the cell. Table V shows the trend in cell delay degrading and pin capacitance reducing from FS-PR to BPR and BSP. Fig 6 depicts the cell rise delay longer than the cell fall delay due to smaller  $I_{on}$  in p-type NSFET than the n-type NSFET causing asymmetry in cell pull-up and pull-down circuits.

 TABLE V

 Standard cell metric comparison of 6-Track, 5-Track and 4-Track INVx1.

|                                                | 6T INVx1 | 5T INVx1 | 4T INVx1 |  |  |  |  |
|------------------------------------------------|----------|----------|----------|--|--|--|--|
| Cell height (nm)                               | 144      | 120      | 96       |  |  |  |  |
| Cell width (nm)                                | 84       | 84       | 84       |  |  |  |  |
| Cpin (fF)                                      | 0.325    | 0.249    | 0.217    |  |  |  |  |
| Lkg Power (pW)                                 | 8649     | 275      | 269      |  |  |  |  |
| Slow case: input slew= 10ps output load=1.44fF |          |          |          |  |  |  |  |
| cell delay (ps)                                | 5.103    | 8.166    | 9.434    |  |  |  |  |
| transition delay (ps)                          | 7.16     | 10.92    | 13.33    |  |  |  |  |
| Int. Power (fJ)                                | 0.067    | 0.045    | 0.04     |  |  |  |  |
| Fast case: input slew= 40ps output load=5.76fF |          |          |          |  |  |  |  |
| Cell delay (ps)                                | 19.5721  | 31.03    | 35.9     |  |  |  |  |
| transition delay (ps)                          | 28.0645  | 42.2684  | 51.94    |  |  |  |  |
| Int Power (fJ)                                 | 0.105    | 0.050    | 0.044    |  |  |  |  |

The cell leakage power improves as we move to BPR and BSP, since the NS width is smaller. While the high pin capacitance increases dynamic power, the decreased cell delay shortens the time when both pull-up and pull-down circuits are active,



Fig. 6. cell rise and fall delay comparison of 6T FS-PR, 5T BPR and 4T BSP INVx1 cells.

lowering short circuit current and power. The pin capacitance, however, dominates and we see internal power reducing from FS-PR to BPR to BSP cells.

#### V. FULL CHIP DESIGN

## A. Full chip design and simulation setup

For the design-level PPA analysis, we choose two standard cell-only benchmark designs - Elliptical Curve Group (ECG) core and JPEG Encoder core and a CPU design, Arm<sup>®</sup> Cortex<sup>®</sup>-A7 64-bit dual-core processor. The physical design implementation steps followed in this work are illustrated in Fig 7. We perform logic synthesis on the three design RTL netlists using each of the three cell libraries in Synopsys Design Compiler. The floorplan and I/O pin for the standard cell-only designs are automatically derived by Synopsys IC Compiler II based on cell density settings optimal for design performance. The CPU design floorplan is derived from the design manual as shown in Fig 9(a). In the power planning step, the P/G mesh is designed and routed for the design. ICC2 performs placement, clock tree synthesis and route optimization on the design.



Fig. 7. Design logic synthesis and P&R flow followed in this work.

# B. PDN design

The conventional Front-Side Power Delivery Network (FS-PDN) is used in tandem with FS-PR library. To enable a small power source pad pitch, we assume power is delivered to the chip from the off-chip circuit through a Redistribution Layer (RDL) [12] as shown in Fig 8.

For experiments using BPR and BSP PDKs, we employ the Back-Side Power Delivery Network (BS-PDN) scheme that leverages the backside metals under the silicon substrate to route the P/G mesh. In experiments with BPR technology, the P/G pins on the buried metal layer are connected to the backside metals with micro-scale TSVs. The silicon substrate is thinned down to achieve reasonable TSV pitch dimensions. With BSP cells we use BS-PDN architecture since the P/G pins in these cells are on the backside metal layers. The P/G grid for both FS-PDN and BS-PDN are generated automatically with ICC2 using a pattern-based power grid generation flow.



Fig. 8. Power Delivery through RDL in (a) FSPDN and (b) BSPDN

## C. Memory macro modeling and implementation

The CPU design, Arm® Cortex®-A7, contains memory macros. Since our work is based on simulations and there are no published 3 nm memory compilers available for use, we build an estimate of the memory macro model by scaling the delay and power tables in memory LIB files and the footprint and area in memory LEF files from 16 nm TSMC technology node to 3 nm node using scaling factors assumed from studying the PPA trend across technology nodes. The memory library so generated has P/G pins on the front side M4 metal layer and is used for FS-PR configuration. For BPR and BSP configurations, the delay-power models are scaled to reflect the trend observed with standard cells in each power rail technology. The memory LEF files for these cells are also altered to reposition the power rail to buried metal and backside metal layers respectively [3]. The memory footprint is scaled down to reflect the cell height reduction in BPR and BSP cells respectively.

# D. PPA Analysis

The PPA summary for the three flows - FS-PR 6-Track cell library with FSPDN, BPR 5-Track cell library with BSPDN and BSP 4-Track cell with BSPDN for each of the three chip designs - ECG, JPEG and Cortex A7 is tabulated in Table VI.

The die footprint of the designs reduces with cells. While the BPR cells result in 9% instance cell area saving, with the 4-Track BSP cells, it is over 22% when compared to the traditional FS-PR cells. With smaller footprints enabled by smaller cell heights of cells along with the use of backside metals for power delivery, the wirelength is observed to improve. The benefit in wire capacitance so achieved together with smaller pin capacitance in the shorter cells outlined in Section IV, reduces the design switching power. The improved cell internal power with BPR and BSP translates to an immense reduction of design internal power in all three designs. The total power for all three designs drops by 25% with BPR and over 30% with BSP cells. Hence, scaling enabled by BSC technology presents opportunities for low-power design applications.

The performance metrics in Table VI degrade upto 20% with BPR and further drops with BSP cells. This is mainly due to the longer cell delays and smaller driving currents in these cells as discussed in Section IV. The instance count also increases with shorter cells due to the addition of repeaters to compensate for the long cell delay and output slew of BPR and BSP cells.

The energy efficiency of the three flows is presented in terms of Power Delay Product (PDP). The PDP improves by 9.7% in ECG, 6.2% in JPEG and by 21% in Cortex A7 CPU, with 5-Track BPR technology compared to the flow using FS-PR cells. The PDP of design implemented with BSP library does not mprove further due to the degradation of RC with narrow nanosheets [13].

## E. IR analysis

In the advanced 3nm technology node, the front-side metal layer widths are aggressively narrowed resulting in high resistance paths for FSPDN and compromising power integrity. Moving power delivery mesh to the low resistance backside metals with wider wire width reduces the voltage drop in the power grid. Hence, it is argued BSPDN improves the power integrity of the chip [3].

We perform Power integrity analysis on the fully routed design obtained from ICC2 with Ansys RedHawk. Table VI presents the worst static voltage drop for the three designs-ECG, JPEG and Arm<sup>®</sup> Cortex<sup>®</sup>-A7 in each of the three experiment flows - FS-PR with FSPDN, BPR with BSPDN and BSP with BSPDN. Due to high cell density in the standard cell-only designs, ECG and JPEG, we see the Voltage drop is well over 10% of the supply voltage 700 mV. When BSPDN is adopted the IR drop reduces to less than 70 mV threshold.

For the FS-PDN in Cortex-A7, we find the static IR-drop hotspot is located in the right side of the floorplan, as shown in Fig. 9 (b). This region has high standard cell density and it is challenging for front-side power delivery due to the high resistivity of the FS metals. For the BS-PDN designs with BPR and BSP, we find the hotspot region shrinks and the worst IR drop reduces by 89% and 87%, respectively, thanks to the less resistive path from the back side. This demonstrates the benefits of BS-PDN in power integrity.

#### VI. CONCLUSION

In this work, we simulate and implement NSFETs with three power rail topologies - the convention FS-PR, BPR, and BS-PR enabled by the novel BSC. This paper is the first demonstration of the BSC technology implementation from device to standard cells to full chip design, thoroughly comparing it with other rail topologies at each stage. We find that BPR and BSC enable further scaling of standard cells

TABLE VI PPA comparison between the three flows - FS-PR cell library with FSPDN, BPR cell library with BSPDN and BSP cell with BSPDN for the three designs - ECG, JPEG and Cortex A7.

|                     | ECG    |        |        | JPEG   |        |        | CORTEX A7 |         |         |
|---------------------|--------|--------|--------|--------|--------|--------|-----------|---------|---------|
|                     | FS-PR  | BPR    | BSP    | FS-PR  | BPR    | BSP    | FS-PR     | BPR     | BSP     |
| PDN                 | FSPDN  | BSPDN  | BSPDN  | FSPDN  | BSPDN  | BSPDN  | FSPDN     | BSPDN   | BSPDN   |
| Target Freq. (GHz)  |        | 10     |        |        | 8      |        |           | 1.6     |         |
| Cell area           | 3638   | 3456   | 2832   | 13924  | 12275  | 10133  | 181481    | 163922  | 138639  |
| #Instance           | 92 K   | 113 K  | 113K   | 335 K  | 349 K  | 366 K  | 486 K     | 489 K   | 502 K   |
| Total WL (µm)       | 264424 | 249034 | 232981 | 831999 | 846903 | 786862 | 3035500   | 2604374 | 2595484 |
| WNS (ps)            | 4      | 33.5   | 51     | 46.6   | 87.1   | 119.6  | 46.9      | 63.44   | 97      |
| Eff. Freq. (GHz)    | 9.6    | 7.49   | 6.62   | 5.8    | 4.71   | 4.08   | 1.48      | 1.45    | 1.38    |
| Total wire cap (pF) | 37.5   | 33.9   | 32.8   | 115.7  | 109.6  | 106.4  | 438       | 352.5   | 336.7   |
| Total pin cap (pF)  | 82.7   | 71.9   | 65.8   | 327.8  | 260.9  | 239.3  | 458.6     | 358.2   | 323.7   |
| Total cap (pF)      | 120.2  | 105.8  | 98.6   | 443.5  | 370.5  | 345.7  | 896.6     | 710.7   | 700.4   |
| Total Power (W)     | 304    | 214    | 195    | 888    | 672    | 583    | 262       | 194     | 184     |
| Sw Power (mW)       | 90.3   | 82.5   | 74.3   | 306    | 294    | 251    | 76        | 61.2    | 58.9    |
| Int Power (mW)      | 212    | 131    | 121    | 574    | 378    | 331    | 175       | 132     | 123     |
| Lkg Power (mW)      | 2.1    | 0.1    | 0.1    | 8.45   | 0.3    | 0.3    | 11.6      | 1.1     | 0.9     |
| PDP (pJ)            | 31.6   | 28.6   | 29.4   | 151.8  | 142.4  | 142.8  | 176       | 138.    | 132.8   |
| IR (mW)             | 124.2  | 33.5   | 40.6   | 137.2  | 32.4   | 53.9   | 23.8      | 2.6 3.4 |         |





Fig. 9. (a) manually derived floorplan of Cortex A7, Static IR drop map of CortexA7 in flows (b) FS-PR with FSPDN (b) BPR with BSPDN and (c) BSP with BSPDN.

by repositioning the power rails resulting in an average of 7% and 24% floorplan area reduction at the full ship level. In addition to this, the smaller nanosheet width devices used in these technologies offer power-efficient standard cells. The area and power savings along with BSPDN reducing IR drop, we find BSP cells best suited for low-power applications.

#### ACKNOWLEDGEMENT

This research is partially funded by Samsung Semiconductor, Inc.

#### REFERENCES

 A. Razavieh, P. Zeitzoff, and E. J. Nowak, "Challenges and limitations of cmos scaling for finfet and beyond architectures," *IEEE Transactions* on Nanotechnology, vol. 18, pp. 999–1004, 2019.

- [2] G. Bae, D.-I. Bae, M. Kang, S. Hwang *et al.*, "3nm gaa technology featuring multi-bridge-channel fet for low power and high performance applications," in 2018 IEEE International Electron Devices Meeting (IEDM), 2018, pp. 28.7.1–28.7.4.
- [3] D. Prasad, S. S. Teja Nibhanupudi, S. Das, O. Zografos *et al.*, "Buried power rails and back-side power grids: Arm® cpu power delivery network design beyond 5nm," in 2019 IEEE International Electron Devices Meeting (IEDM), 2019, pp. 19.1.1–19.1.4.
- [4] S. Song, G. Nallapati, I. Khan, N. Nikfar *et al.*, "System design technology co-optimization for 3d integration at ¡5nm nodes," in 2021 *IEEE International Electron Devices Meeting (IEDM)*, 2021, pp. 22.3.1– 22.3.4.
- [5] J.-S. Yoon, J. Jeong, S. Lee *et al.*, "Performance, Power, and Area of Standard Cells in Sub 3 nm Node Using Buried Power Rail," *IEEE Transactions on Electron Devices*, vol. 69, no. 3, pp. 894–899, 2022.
- [6] L. Clark, V. Vashishtha, L. Shifren *et al.*, "ASAP7: A 7-nm finFET predictive process design kit," *Microelectronics Journal*, vol. 53, pp. 105–115, 07 2016.
- [7] G. Yeap, S. S. Lin, Y. M. Chen *et al.*, "5nm cmos production technology platform featuring full-fledged euv, and high mobility channel finfets with densest 0.021µm2 sram cells for mobile soc and high performance computing applications," in 2019 IEEE International Electron Devices Meeting (IEDM), 2019, pp. 36.7.1–36.7.4.
- [8] F. M. Bufler, D. Jang, G. Hellings *et al.*, "Monte Carlo Comparison of n-Type and p-Type Nanosheets With FinFETs: Effect of the Number of Sheets," *IEEE Transactions on Electron Devices*, vol. 67, no. 11, pp. 4701–4704, 2020.
- [9] G. Rzepa, M. Karner, O. Baumgartner *et al.*, "Reliability and Variability-Aware DTCO Flow: Demonstration of Projections to N3 FinFET and Nanosheet Technologies," in 2021 IEEE International Reliability Physics Symposium (IRPS), 2021, pp. 1–6.
- [10] H.-H. Park, W. Choi, M. A. Pourghaderi *et al.*, "NEGF simulations of stacked silicon nanosheet FETs for performance optimization," in 2019 International Conference on Simulation of Semiconductor Processes and Devices (SISPAD), 2019, pp. 1–3.
- [11] A. Gupta, O. V. Pedreira, G. Arutchelvan, H. Zahedmanesh *et al.*, "Buried power rail integration with finfets for ultimate cmos scaling," *IEEE Transactions on Electron Devices*, vol. 67, no. 12, pp. 5349–5354, 2020.
- [12] H. Lu, R. Furuya *et al.*, "Design, modeling, fabrication and characterization of 2–5- μm redistribution layer traces by advanced semiadditive processes on low-cost panel-based glass interposers," *IEEE Transactions on Components, Packaging and Manufacturing Technology*, vol. 6, no. 6, pp. 959–967, 2016.
- [13] J.-S. Yoon, J. Jeong *et al.*, "Optimization of nanosheet number and width of multi-stacked nanosheet fets for sub-7-nm node system on chip applications," *Japanese Journal of Applied Physics*, vol. 58, p. SBBA12, 04 2019.