# Automated I/O Library Generation for Interposer-Based System-in-Package Integration of Multiple Heterogeneous Dies

Minah Lee<sup>®</sup>, Student Member, IEEE, Arvind Singh<sup>®</sup>, Student Member, IEEE,

Hakki Mert Torun<sup>®</sup>, *Student Member, IEEE*, Jinwoo Kim, *Student Member, IEEE*, Sung Kyu Lim, *Senior Member, IEEE*, Madhavan Swaminathan, *Fellow, IEEE*, and Saibal Mukhopadhyay, *Fellow, IEEE* 

Abstract-System-in-package (SiP) integration of multiple dies in a single package can achieve much higher performance than onboard integration of integrated circuits (ICs) while reducing the design cost/effort compared to a large system on chips (SoCs). However, a major challenge in the design of SiPs with many dies is automated design and insertion of input/output (I/O) cells to minimize energy and delay of the wire traces. This article presents an automated cell library generation flow for all-digital I/O circuits for SiP integration. Given parameterized models of SiP wire traces, our method automatically designs, optimizes, and generates layouts of I/O cells for delay/energy minimization. The proposed flow is demonstrated on interposer-based SiP integration considering 28-nm CMOS technology and 65-nm BEOL technology. Given a multidie SiP design and associated interposer wire traces, this article demonstrates that automated I/O library cell generation can reduce the maximum die-to-die communication delay or energy. We demonstrate the proposed flow for various interposer parameters and SiP designs to show the feasibility of chip-interposer codesign.

*Index Terms*—2.5-D integration, automated flow, input/output (I/O library), interface circuits, system-in-package (SiP).

#### I. INTRODUCTION

**S** YSTEM-ON-CHIP (SoC) integration of diverse functional units has been the driver of electronic and computing systems. However, the complexity and cost of designing a complex SoC in advanced CMOS nodes have increased significantly over the last decade [1]. Consequently, alternative packaging technologies, such as interposers (2.5-D), 3-D integrated circuits (ICs), and multichip modules (MCMs), have received major attention to integrate diverse functions [2]–[4]. The system in package (SiP) allows integration of digital logic, memory, analog, mixed-signal, and RF functions that are

Manuscript received July 15, 2019; revised October 15, 2019; accepted October 21, 2019. Date of publication November 14, 2019; date of current version January 15, 2020. This work was supported by the Defense Advanced Research Projects Agency (DARPA) Common Heterogeneous Integration and IP Reuse Strategies (CHIPS) Program under Grant No. N00014-17-1-2950. Recommended for publication by Associate Editor M. G. Telescu upon evaluation of reviewers' comments. (*Corresponding author: Minah Lee.*)

The authors are with the Department of Electrical and Computing Engineering, Georgia Institute of Technology, Atlanta, GA 30332 USA (e-mail: minah.lee@gatech.edu).

Color versions of one or more of the figures in this article are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TCPMT.2019.2953659



Fig. 1. (a) On-chip wires without any need for I/O circuits for SoC integration. (b) I/O circuits are required for SiP integration to drive long interposer wires.

potentially designed in heterogeneous technologies, in a single module [5]–[9]. Recent breakthroughs in silicon interposerbased 2.5-D integration technologies [7]-[9] demonstrate scalable systems with comparable performance to SoC solutions and ease of integration, such as conventional packaging. The ability to reuse intellectual property (IP) as individual dies in an SiP promises amortization of design effort/cost over a longer lifecycle of IPs [10]. Overall, SiP promises SoC-like performances but can reduce design cost and complexity and increase yields [5]-[9]. However, lack of design tools remains a critical challenge for large-scale commercial adoption of 2.5-D-based SiP integrations [10]. This article develops an automation approach to address the input/output (I/O) design tool challenge associated with die-to-die (D2D) oninterposer signaling for a given multidie SiP design and the associated interposer wire traces.

In an SoC, different IPs communicate through on-chip wires [see Fig. 1(a)]. The on-chip wires in advanced CMOS processes, which are highly diffusive in nature, can be modeled as distributed *RC* network [11]. CMOS inverter-/buffer-based transmitter/receiver can drive on-chip wires. Design automation tools exist to characterize on-chip wires, optimize their drivers/receivers, perform buffer insertion to recover signal slew, and minimize wire delay/energy. However, when the same SoC is partitioned into multiple dies and integrated as an SiP, the on-chip wires between IPs are replaced by D2D interconnects in the interposer [see Fig. 1(b)]. To minimize the performance (or energy) loss, signaling through on-interposer wires must be optimized for minimum communication delay or energy, similar to the case for on-chip wires.

2156-3950 © 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications\_standards/publications/rights/index.html for more information.

Unlike on-chip wires where delay/energy minimization is performed by optimal insertion/placement of inverters/buffers, in the case of on-interposer wires, the minimization must be performed by optimally designing the I/O cells. In addition, wires in silicon interposers have larger linewidth and show inductive properties. Hence, the transceiver circuits that minimize delay/energy of D2D signal while ensuring good signal quality must be designed, taking transmission line behavior of on-interposer wires into considerations [12].

Moreover, traditional I/O cells for off-chip signaling are usually designed to match target impedance. As there are many on-interposer wires with varying impedance characteristics in SiP, it is critical to develop an automated approach for the optimal design of I/O cells for on-interposer signaling. Such optimization needs to go beyond matching target impedance and explicitly consider delay and/or energy as a cost function. In addition, the traditional I/O cells are complex mixed-signal circuits, consume appreciable power, and require custom design. The total number of I/O cells connecting on-interposer wires in an SiP will be much larger than the number of offchip I/Os in the original SoC (see Fig. 1). Hence, directly adopting complex I/O cells for off-chip signaling in SoC to D2D signaling in SiP will reduce power efficiency and increase design effort. I/O cells (drivers/receivers) for SiP should be simple and provide optimal delay/energy.

The design of drivers/receivers for SiP needs to be automatically generated to provide an optimal design for oninterposer wires with low design complexity and cost. On one hand, driver/receiver circuits for on-interposer wires should function similar to I/O circuits for off-die communication in traditional SoCs to maintain high signal quality through inductive wires. For example, similar to I/O cells for traditional packaging, the I/O cells for dies in SiP should be designed to cope with coupled, frequency-dependent RLGC properties of on-interposer wires, instead of only RC properties in onchip wires. On the other hand, driver/receiver circuits for oninterposer wires should be small and simple enough similar to I/O circuits for on-chip communications to automatically generate for large SiP design and reduce the design cost. All-digital I/O cells with full-swing signaling, similar to on-chip wires, are desirable to achieve this goal.

In this article, we present an automated library generation flow of all-digital I/O cells for given 2.5-D (interposer) technology and varying trace lengths. Fig. 2 shows the overview of our proposed cell library that considers package specification and design goals and generates I/O cell with the layout and its timing/power library. Our tool can be applied to both SiP and system-in-interposer as long as one is using an all-digital, fullswing (single-ended or differential), and moderate frequency (1-5 GHz) signaling. Such signaling is feasible mostly in low-to-moderate (1-10 mm) distance interconnects in SiP and system-on-interposer integrations. However, to demonstrate the tool flow, we mostly focus on system-on-interposer systems for wire modeling. We first present a chip-interposer cosimulation environment that couples SPICE models for I/O cells (drivers and receivers) with parametric models of interposer wire traces. The cosimulation characterizes delay and energy of the physical link (driver, wire trace, and receiver),



Fig. 2. Overview of the proposed I/O cell library generation.

which is designed with full-swing digital signaling and digital CMOS inverters, similar to the on-chip communication. Using cosimulation, we develop a design flow that automatically generates the all-digital I/O cell library. The tool generates driver/receiver that gives minimum delay/energy (design goal, i.e., electrical cost function) for a given interposer technology and wire length (design specification) with 90% voltage swing constraints at the input of receiver (optimization constraints). Our flow allows a designer to define a cost function that includes delay, energy, target impedance, or area of I/O cells, and so on. To demonstrate our flow, we define a cost function as delay and energy minimization through this article.

Our tool generates cell library both as a soft macro (register transfer logic, RTL level) and hard macro (layout) in a target CMOS technology. The soft macro can be integrated with the RTL description of the IP facilitating early-stage design space exploration of SiP, while the hard macro can be integrated with the layout of an IP facilitating the physical design of the multidie SiP. We demonstrate autogeneration of I/O cells for various design goals (minimum delay/energy), different interposer parameters, different wire lengths, and ESD protection. With a case study on an SiP-based multicore mesh NOC structure, we show that wire distribution-dependent optimization of I/O library cell can help enhance delay/energy characteristics of D2D communication in SiP compared to the design of fixed I/O cells for target output impedance. For various case studies, such as SiP design with nonneighboring connection or heterogeneous signaling, we present I/O design methodology using our generation flow to meet the design goal.

The rest of this article is organized as follows. Section II reviews related work. Section III presents the cosimulation flow of chip and interposer, and Section IV presents automated I/O cell library generation flow. Section V shows some experimental results of interposer model/wire length-dependent I/O library generation and applications on various SiP designs. Section VI concludes this article.

## II. RELATED WORK

### A. I/O Circuits for On-Chip Wire

In the past few decades, researchers have explored different signaling schemes to drive long on-chip wires at high data rates and low energy [12]. However, most of these work have only looked at *RC* characteristics of on-chip wires, employing current-mode [13] or low-voltage differential signaling [14] or utilizing complex capacitor-based preemphasis and equalization circuits [15], [16] as well expensive calibration techniques to improve timing/voltage margins [17]. In addition, these high-data-rate signaling techniques implement sourcesynchronous links to remove any mismatch between clock and data lines resulting from variations in operating conditions or crosstalk [18]. Most of these schemes are not designed to consider the inductive characteristic of interposer wires. Moreover, the designs lead to custom cells and are difficult to integrate into an RTL-level tool.

## B. Various Off-Chip Signaling

Recently, some researchers have explored silicon-/glassbased 2.5-D package technologies [19], [20] and signaling schemes for the D2D interconnects for high data rates at low energy [21], [22]. Sawyer et al. [19] demonstrate redistribution layers (RDLs) on the surface of glass for very high speed (28 Gb/s) signaling, while Sundaram et al. [20] demonstrate feasibility of low-cost and low-loss 3-D silicon interposer without TSVs for high bandwidth logic-to-memory interconnects. Lee et al. [21] present an energy-efficient currentmode signaling scheme for glass-based interposer wire for up to 3 Gb/s of data rate. It utilizes an open-drain transmitter with one-tap preemphasis and a current sense amplifier as a receiver. Even though this scheme achieves very good energy efficiency, the driver and receiver circuits are not friendly to digital synthesis, place, and route flows, and glass interposer technologies are not easily integrable with silicon-based CMOS processes. Liao et al. [22] present a heterogeneous system consisting of an RF receiver, baseband processor, and DRAM, all in different technologies integrated into 3-D on CoWoS. However, the focus of their work is on electrical characterization with a very fast built-in-selftest (BIST) algorithm targeted for the heterogeneous integration. Similarly, Lin et al. [23] present an eDRAM PHY operating at very low-voltage swing (0.3 V) on 2.5-D CoWoS. More recently, Dinakarrao [24] propose Q-learning-based selfadaptive output-voltage swing adjustment and further present a 2.5-D integrated multicore network-on-chip, which consists of microprocessor die, memory die, and accelerator die with 2.5-D silicon interposer I/Os. Jeon et al. [25] propose an on-silicon-interposer passive equalizer for the next-generation high bandwidth memory (HBM). However, most of these schemes adopt I/Os in analog mixed-signal circuits that consume a large amount of energy. Also, they require custom design that leads to high design cost, especially for the large heterogeneous SiP system.

## C. Contribution of This Article

In this article, we focus on very large-scale integration of IPs in 2.5-D systems based on silicon interposer with D2D interconnects running at full swing (similar to on-chip alldigital signaling) at high data rates (2 Gb/s). In addition, unlike prior work, we demonstrate an automated flow to generate



Fig. 3. All-digital I/O and full-swing digital signaling.

I/O cells to drive various lengths of wires designed for a given interposer technology and optimized for a specified goal (energy/delay). Our automated flow generates RTL and layout for I/O cell that can be treated as soft/hard macro (with timing and layout library) in the synthesis, place, and route flow.

We have previously introduced the concept of automated generation of I/O library for SiP integration in [26]. This article significantly extends the prior work. First, we add the differential receiver that was presented in [27] in our flow as an option to reduce delay, energy, and area of I/O cells for SiP design with nonneighboring connections. We consider one fixed size of the differential receiver, so our tool can still automatically generate cell library as soft/hard macro. We also present an area analysis of I/O cells for a given wire length distribution. Area analysis was redundant when our I/Os were all-digital, but it becomes critical after adding a differential receiver. More importantly, in this article, our tool is improved to generate I/O cells for heterogeneous integration between dies in different technologies or supply voltages. I/O design for heterogeneous SiP integration should also consider technologies and supply voltages to achieve minimum delay or energy, so our automated I/O generation tool shows more benefits. Given transceiver technology and interposer parameters, we present delay-/energy-minimized I/O cells generated by our flow for heterogeneous signaling between 28- and 180-nm dies.

## **III. CHIP-INTERPOSER COSIMULATION**

We develop a chip-interposer cosimulation flow to accurately characterize delay and energy in the physical link (driver, wire, and receiver) of an interposer wire. The transceiver circuits and signaling mimic the driving on-chip wires. Our design uses full-swing digital signaling and all-digital I/Os based on CMOS inverters as transceivers, as shown in Fig. 3. All-digital I/O requires full-swing signaling at the receiver interface, eliminating receiver side termination, which helps in minimizing the total power. However, compared to on-chip wires, interposer wires in SiP have significant inductance, specifically for longer wires, and show transmission line behavior even at moderate frequencies ( $\sim$ 1–2 GHz). Therefore, accurate interconnect model that includes all the full-wave EM effect of interconnects is necessary for cosimulation.

As mentioned earlier, interconnects in the interposer show a strong inductive behavior that cannot be ignored in the SPICE model. In order to capture the impedance and coupling profiles of these interconnects accurately, a full-wave EM solver needs to be utilized. However, such solvers tend to be CPU extensive especially for multiscale structures seen in chip-to-chip traces on the interposer.



Fig. 4. Package model generation [28].

To overcome this CPU extensive process and efficiently automate the SPICE model generation process without losing accuracy, we leverage machine learning (ML) techniques. First, a moderate amount of training data from a full-wave EM solver, Ansys HFSS, is collected using single-frequency simulations by storing the full RLGC matrices of the interconnects. Note that the interconnect thicknesses on the interposer have the same order of magnitude with the skin depth at the desired frequency of operation. Hence, the transition of self and mutual R and L from dc to higher frequencies constitutes the majority of the frequency-dependent behavior. Since the proposed technique utilizes full-wave EM simulations to extract the RLGC parameters that account for the complete skin, proximity, and edge effects, this behavior is accurately captured in the final model.

As training data are collected only by using single-frequency simulations as opposed to high-bandwidth frequency sweep ranging from dc to high GHz regime, the training data collection time is significantly reduced. Then, we train an additive Gaussian process (ADD-GP) [29] that takes geometric parameters of the interconnects and a range of frequency as input and outputs the frequencydependent RLGC matrices. This is then converted into S-parameters, which is then used by broadband SPICE generator of Keysight ADS to generate the final SPICE model. The same steps are repeated for modeling C4 bumps, but the ADD-GP model is trained to directly predict S-parameters for this case. The framework is summarized in Fig. 4, and a detailed description can be found in [30]. The ADD-GP model shows  $\sim 97\%$  accuracy and requires only 2 s to generate the broadband spice model as opposed to 2 h required by full-wave EM solver. The total training

time required to derive the model is only 5 h since there are no high-bandwidth frequency sweeps involved in this step.

Final HSPICE compatible models are coupled with circuitlevel models of the driver/receiver in HSPICE. Hence, we can simulate the whole physical link in HSPICE and obtain propagation delay and energy. As our I/O generation tool considers full-wave EM effect of interconnects, generated I/O design considers not only the loss/crosstalk but also the skin/proximity effect and nonuniform current distribution along the width of the interconnects along with all higher modes of propagation that occur in discontinuities, such as bump-to-via transition.

## IV. CELL LIBRARY GENERATION FLOW

For a given interposer model and wire length, the transceiver sizes can be optimized for different goals under some constraints. For systems with high-performance requirements, the driver and receiver can be sized as to have a minimum endto-end delay. Similarly, for systems with constraints on energy, the driver/receiver sizes can be optimized for minimum total energy consumption.

Fig. 5 shows our proposed I/O cell library generation flow. The transceiver circuits are considered as the inverter chain. We define the sizes of the first and last inverters in the driver stage as 1 and D, respectively. Likewise, we define the sizes of the first and last inverter stage in the receiver as R and 1. Now, the design of the I/O cell can be defined as design of the entire driver and receiver chain, i.e., selecting final driver (D) and receiver (R) sizes, as well as number of inverters in the driver ( $N_{\text{driver}}$ ) and receiver ( $N_{\text{receiver}}$ ) chains. Our tool flow consists of two main steps: I/O design specification and I/O library generation.

1) I/O Design Specification: For each driver (D) and receiver (R) pair, we find an optimal ratio (f) between each stage of driver and receiver inverter chain, as shown in Fig. 6. Consider energy minimization as an example. For very large ratios (f), the number of stages required to drive a fixed final stage is small, which reduces the switching power but increases the short-circuit power because slow slew rate dominates the total power. Similarly, for a large number of stages (smaller f), the total power is dominated by switching power. Therefore, we get an optimal number of stages for energy optimization with respect to ratio f, and we obtain f = 8 as the optimum ratio. On the other hand, for propagation delay minimization, the driver and receiver chain is sized based on effective fan-out  $(C_{drv}/C_{invx1})$  and is obtained to be 4.

The next step is to select the optimal driver/receiver for energy and delay minimization. Consider the example of delay minimization for a target wire length and interposer technology. The overall flow starts with a set of available driver and receiver sizes (i.e., a set of R and D). For each pair in the set, we first perform elaboration of the entire driver/receiver chain based on f = 4. Next, for all the driver/receiver options, we perform cosimulation where the wire model incorporates interposer technology and length properties. We select the subset of the driver/receiver pairs in which interposer output swing is greater than 90% of the full swing, and finally, from this subset, we select the optimum I/O cell for minimum delay.



Fig. 5. Proposed I/O cell library generation flow. The table and layout show an example of delay minimized I/O for 1-mm interposer wire generated with the flow.



Fig. 6. Methodology of the I/O design specification.

The same process can be performed for minimum energy as well by using f = 8 for elaboration.

2) *I/O Library Generation:* Once the driver/receiver chains are finalized, our flow generates the RTL for these drivers/receivers. We automatically insert the modified RTL into a baseline template consisting of rest of the functional logic for the I/O cell. Using standard cell library, the RTL is synthesized and placed and routed to generate the layout for the I/O cell. The final layout and extracted netlist can be passed to a cell library characterization tool, such as SiliconSmart, to generate the final timing and power library of the I/O cell.

## V. EXPERIMENTAL RESULTS

In this section, we show applications of the proposed design flow for the generation of I/O cells under various conditions. Sections V-A and V-B show generated I/Os for various interposer models or wire lengths. Section V-C presents a design methodology of I/Os for an SiP with many dies and comparison between traditional I/Os and generated I/Os from the proposed flow. Section V-D compares single-ended and differential receivers and suggests considering both receivers for an SiP with nonneighboring connections. Section V-E shows design methodology of I/Os for heterogeneous signaling, and Section V-F presents how generated I/Os are changed for ESD protection. For all sections, we present driver/receiver sizes as I/O designs for both delay and energy minimization scenarios. Drivers/receivers are considered as inverter chains, and those sizes are defined as final/first inverter sizes. Inverter size of nis *n* times wider than inverter size of 1, which is the minimum size of inverter that our CMOS technology allows. The results are based on 28-nm CMOS technology for transceiver and 65-nm BEOL technology for silicon interposer.

#### A. Cell Library for Different Interposers

The interposer wire parasitics are dependent on wire dimensions as well as spacing/shielding between wires. A higher wiring density is required for large bandwidth SiP systems. However, it leads to finer wire pitch and, therefore, higher resistive wires and more coupling capacitance. This limits the achievable data rates, which, in turn, reduces the system bandwidth. To understand the role of transceiver optimization and to demonstrate the feasibility of our flow for varying package wire dimensions, we have chosen three cases for package wire dimensions, as described in Fig. 7 and Table I. We assume 65-nm BEOL technology to determine the sample space for the interconnect geometry. Case 1 has minimum achievable line dimensions that provide the highest interconnect density and represents a high-bandwidth SiP system. Case 2 has lower wiring density and represents an SiP system, which can



Fig. 7. (a) Transmission line [28]. (b) Microbump.

TABLE I

PHYSICAL DIMENSIONS OF VARIOUS PACKAGE MODELS

|                                              | Case1 | Case2 | Case3 |
|----------------------------------------------|-------|-------|-------|
| Line Width $(l_w)$                           | 0.4   | 1.6   | 1.6   |
| Spacing(S)                                   | 0.4   | 1.6   | 1.6   |
| Thickness $(t_c)$ [µm]                       | 1     | 2.0   | 2.0   |
| Bump Diameter $(d_{bump})$                   | 25    | 25    | 15    |
| $Pitch(d_{pitch}) \ [\mu m]$                 | 50    | 50    | 30    |
| Chip to interposer Via Diameter( $d_{via}$ ) | 0.4   | 1.6   | 1.6   |
| $\operatorname{Pad}(P_{via})$                | 0.7   | 2.4   | 2.4   |
| Height( $h_{via}$ ) [ $\mu$ m]               | 5.0   | 2.0   | 2.0   |

achieve higher data rates. Case 3 has reduced bump size/pitch with respect to the other two cases to reduce wire lengths.

The generated delay- and energy-optimized I/O cells for these interposers in 1-mm wire are shown in Table II. Case 1 has a smaller wire dimension than case 2 and, hence, requires stronger I/O driver (i.e., larger I/O cell) to drive more resistive wires. Likewise, case 3 has smaller bump dimensions than case 2, which contributes significant parasitics to the interposer channel and requires larger I/O. As cases 1 and 3 are more resistive than case 2, they require bigger driver/receiver sizes than case 2 for delay minimization. On the other hand, driver/receiver sizes for minimum energy are nearly the same because x3 driver is the smallest size that achieves 90% voltage swing constraints for all interposer cases. Delay from the energy-minimized I/O is much larger for cases 1 and 3 compared to the one for case 2.

## B. Cell Library for Different Wire Lengths

For large-scale integration of dies in an SiP, the D2D communication will cover a wide range of wire lengths. It is essential to design the I/O circuit optimized for different ranges of wire lengths to achieve high data rates as well as to minimize energy consumption. We demonstrate the application of our flow for generating I/O cell for delay or energy minimization for different wire lengths.

Table III shows driver/receiver sizes, delay, and energy for various lengths for delay or energy minimization considering the interposer technology from case 2 in Fig. 7. In general, driver size increases with increasing wire lengths for both energy and delay minimizations. Moreover, as expected, driver/receiver sizes are bigger for delay minimization and smaller for energy minimization.

TABLE II I/O Cells for Various Interposer Models (1-mm Wire)

|                        | Delay Minimization |       | Energy Minimization |       |       |       |
|------------------------|--------------------|-------|---------------------|-------|-------|-------|
|                        | Case1              | Case2 | Case3               | Case1 | Case2 | Case3 |
| TX sizes               | x72                | x59   | x80                 | x3    | x3    | x3    |
| RX sizes               | x5                 | x4    | x5                  | x3    | x1    | x1    |
| Propagation delay[ps]  | 45                 | 43    | 43                  | 193   | 164   | 193   |
| Energy per bit[pJ/bit] | 0.144              | 0.117 | 0.145               | 0.093 | 0.084 | 0.088 |

TABLE III I/O Cells for Various Wire Lengths (Package Case 2)

|                         | Delay Minimization |       | Energy Minimization |         |       |       |
|-------------------------|--------------------|-------|---------------------|---------|-------|-------|
|                         | 1mm                | 5mm   | 10mm                | 1mm     | 5mm   | 10mm  |
| TX sizes                | x59                | x79   | x151                | x3      | x12   | x28   |
| RX sizes                | x4                 | x5    | x5                  | x1      | x1    | x3    |
| Propagation delay [ps]  | 43                 | 69    | 104                 | 164     | 192   | 162   |
| Energy per bit [pJ/bit] | 0.117              | 0.451 | 0.814               | 0.0.084 | 0.337 | 0.639 |

## C. Case Study on an Illustrative SiP

To demonstrate the feasibility of our proposed flow for a large-scale system, we have applied the proposed flow to an illustrative SiP design, as shown in Fig. 8(a). It consists of CPU, GPU, baseband, and several other modules in the mesh structure. The layout of interposer routing for the SiP system is separately generated, and different colors present different metal layers [see Fig. 8(b)]. In this design, two metal layers are used on top of the interposer for the routing. The wire length distribution shows a histogram of the interconnections in an interposer layer [see Fig. 8(c)], and it has a small range of wire lengths as it does not contain nonneighboring connections.

Traditionally, off-chip I/O cells are usually designed to match a target impedance ( $\sim 50 \Omega$ ) to minimize reflection in off-chip wires. Therefore, for comparison, we first designed I/O cells to match target impedance. Table IV (A) shows the worst delay and average energy of these I/O cells, referred to as the conventional I/O cells. Table IV (B and C) summarizes all the I/O cells that are created with the optimization methods discussed previously. The I/O cells are optimized (delay or energy) individually for different wire lengths (referred to as "individually optimized I/O"). The worst-case delay and average energy (= total energy of all wires divided by the number of wires) are reported for analysis. Individually optimized I/Os for minimum delay show 13% less worst-case delay and 33% less average energy consumption compared to the conventional I/O. Likewise, individually optimized I/Os for minimum energy show 198% higher worst-case delay but 52% less energy consumption compared to the conventional I/O. Table IV (C) shows the result when only one I/O cell is generated using proposed flow considering delay or energy minimization for the maximum wire length and placed for all length of wires. We refer to this design as "optimized I/O for longest wire." Using the optimized I/O for longest wire for minimum delay results in 7% less worst-case delay and 36% less average energy consumption compared to the conventional I/O cell. Likewise, when optimized I/O to minimize energy



Fig. 8. (a) Floor plan, (b) interposer routing layout, and (c) wire length distribution of a mesh NOC structure.

TABLE IV I/O Cells for an Illustrative SIP (Interposer Case 2)

|                         | Conventional       | Individual opt | imized I/O (B) | Optimized I/O | for longest wire (C) |
|-------------------------|--------------------|----------------|----------------|---------------|----------------------|
|                         | $I/O(4/\Omega)(A)$ | Delay Min.     | Energy Min.    | Delay Min.    | Energy Min.          |
| TX, RX sizes            | x128, x4           | x55-x82, x5    | x2-x6, x1      | x82, x5       | x6, x1               |
| Worst delay [ps]        | 55                 | 51             | 167            | 51            | 167                  |
| Average energy [pJ/bit] | 0.187              | 0.089          | 0.060          | 0.099         | 0.063                |

dissipation for the longest wire is used, we observe 174% higher worst-case delay but 66% less average energy. In summary, we observe that I/O cells generated by the proposed flow can lower worst-case delay as well as reduce average energy dissipation compared to the conventional I/O.

## D. Structure of Receivers

We have only considered single-ended receiver design for I/Os and adjust driver/receiver sizes for minimum delay and energy. A single-ended receiver has a small area and energy but vulnerable to noise and PVT variations. On the other hand, the differential receiver is robust to noise and PVT variations but has a larger area and energy consumption. We add a differential receiver in the I/O generation flow, and our tool can select single-ended or differential receivers (see Fig. 9). The single-ended receiver is a chain of inverters, thus requires full-swing signal as input. In contrast, the differential receiver can have a low-swing signal as input. We set 90% voltage swing constraint at receiver input for the single-ended receiver and 40% for the differential receiver. These constraints cause different tendencies of two receivers in propagation delay, energy, and area. In this section, we use our flow to analyze the propagation delay, energy, area, and reach (i.e., maximum wire length supported) of I/O circuits with single-ended drivers but single-ended or differential receivers. Given a wire length distribution, our flow suggests a methodology to choose the optimal receiver design for each I/Os depending on wire lengths in an SiP design.

1) I/Os With Fixed Driver Sizes: We first consider a design where the size of the driver is fixed for all I/O cells in an SiP. It will save design cost and effort for a large design. However, a fixed size driver can only drive single-ended signal through a maximum wire length, as the voltage swing

![](_page_6_Figure_9.jpeg)

Fig. 9. (a) Single-ended and (b) differential receiver circuits.

![](_page_6_Figure_11.jpeg)

Fig. 10. Maximum wire length that single-ended and differential receiver can drive on (a) case2 and (b) case1 interposer.

of the signal at the input of the receiver reduces as wires get longer. The I/O circuits with differential receivers can correctly detect input signals with much lower voltage swing than the I/Os with single-ended receivers. Hence, for a given size of the driver, the I/O circuits with differential receivers can drive much longer wires than the I/Os with the singleended receiver (see Fig. 10). Maximum wire length of both single-ended and differential receiver changes by package property. Case 1 package is more resistive than case 2, so for

TABLE V Delay/Energy/Area of I/O with Single-Ended/Differential Receiver for Given Driver Sizes

|     | De               | lay [ps]     | Energ            | gy [pJ/bit]  | Are              | a [ $\mu m^2$ ] |
|-----|------------------|--------------|------------------|--------------|------------------|-----------------|
|     | Single-<br>ended | Differential | Single-<br>ended | Differential | Single-<br>ended | Differential    |
| x5  | 123              | 129          | 0.089            | 0.093        | 2.268            | 4.002           |
| x10 | 84               | 84           | 0.088            | 0.093        | 2.898            | 4.632           |
| x15 | 71               | 71           | 0.096            | 0.101        | 4.914            | 6.648           |
| x20 | 64               | 61           | 0.095            | 0.103        | 5.418            | 7.152           |

a given driver size, the maximum drivable wire length for case 1 interposer is smaller than the same for the interposer case 2. Delay and energy of I/O with single-ended/differential receiver for given driver size/wire length are nearly the same (0%–4.8% and 4%–7.5% differences, respectively) (see Table V). Therefore, we can use I/Os with single-ended receiver for shorter wires (up to a maximum wire length) and differential receiver for longer wires.

Fig. 11(a) shows a chipletized design of a generic SoC, including CPU and GPU. The layout of interposer routing for the SiP system is separately generated and different colors present different metal layers [see Fig. 11(b)]. In this design, three metal layers are used on top of the interposer for the routing. The wire length distribution shows the histogram of the interconnections in an interposer layer [see Fig. 11(c)]. As this design contains nonneighboring connections, it has a large range of wire lengths ( $\sim 6$  mm) compared to Fig. 8(a), so single-ended receiver solely results in strong driver that has large energy and area. Maximum wire length of the differential receiver with x5 driver (7 mm) is longer than the longest length of the distribution (6 mm), so x5 driver can be used for all wire lengths. Maximum wire length of the single-ended receiver with x5 driver is 1 mm, so the single-ended receiver is used for 1-mm wire and differential receiver is used for 2-6 mm. This set of I/Os has 306-ps worst delay, 0.115-pJ/bit average energy consumption, and  $22.3 - \mu m^2$  area.

2) Energy-Minimized I/Os: I/O for a given wire is proportional to the size of the driver, so the minimum size of driver that satisfies the voltage swing constraint at receiver input may achieve both energy and area minimization. Fig. 12(a) and (b) shows the area of single-ended or differential receivers with minimum drivers for each length of wires. When the wire is short, I/O with the single-ended receiver is smaller than I/O with the differential receiver. This is because the area of a differential receiver (dark red) is bigger than the area of a singleended receiver (dark blue). However, as the wire becomes longer, the size of the driver for the single-ended receiver (light blue) grows faster than for differential receiver (light red) because of larger voltage swing constraint. Therefore, I/O with single-ended receiver occupies a larger area than I/O with differential receiver for the long wire. On the other hand, for all wire lengths, I/O with the differential receiver has a longer delay (25%–105%) and less energy consumption (6%–70%) compared to I/O with single-ended receiver [see Fig. 12(c) and (d)]. This is because I/O with differential receivers always

![](_page_7_Figure_6.jpeg)

Fig. 11. (a) Floor plan, (b) interposer routing layout, and (c) wire length distribution of a chipletized generic SoC.

have smaller driver size resulting in longer delay and smaller energy consumption.

The critical wire length after which I/O with single-ended receiver becomes larger than I/O with differential receiver varies by the interposer design [see Fig. 12(a) and (b)]. Due to the higher wire resistance, the critical wire length for the interposer in case 1 is shorter than the same in case 2.

Consider the wire length distribution in Fig. 11 again. Given a wire length distribution, we now have three approaches to design energy-minimized I/O circuits (see Table VI).

- 1) All I/Os with single-ended receivers and corresponding energy-minimized driver. This set of I/Os decreases worst delay because single-ended receiver always has a smaller delay than differential.
- All I/Os with differential receivers and corresponding energy-minimized driver. In this case, average energy is reduced because the differential receiver always has smaller energy consumption.
- 3) A mix of I/Os with single-ended and I/Os with differential receivers, each with corresponding

![](_page_8_Figure_1.jpeg)

Fig. 12. (a) and (b) Area of driver and receiver for case 2 and case 1 interposers, respectively. (c) Propagation delay and (d) energy of IO with single-ended and differential receivers for several wire lengths.

TABLE VI I/O Cells With All Single-Ended, All Differential, and Mix of Single-Ended and Differential Receiver for a Wire Length Distribution

|                         | All          | All          | Single-ended   |
|-------------------------|--------------|--------------|----------------|
|                         | Single-ended | Differential | & Differential |
| Driver sizes            | x3 - x16     | x2 - x5      | x3 - x5        |
| Worst delay [ps]        | <b>189</b>   | 315          | 315            |
| Average energy [pJ/bit] | 0.170        | <b>0.107</b> | 0.152          |
| Area $[\mu m^2]$        | 18.1         | 21.9         | <b>18.0</b>    |

energy-minimized drivers. I/Os can have a singleended receiver for a range of short wires and have a differential receiver for longer wires, which leads to the area reduction.

In summary, worst delay, average energy, or area can be decreased by choosing single-ended or differential receiver for each length of wires.

## E. Heterogeneous Signaling

The ability of heterogeneous signaling between different supply voltages or different technologies is one of the most important advantages in 2.5-D SiP integration. I/O design for heterogeneous integration should also take into account supply voltages and technologies of two dies to achieve minimum delay or energy in the interconnect. Therefore, our automated I/O generation flow shows more benefit on heterogeneous integration. In this section, we present I/Os for signaling between two dies in 28- and 180-nm technologies with 0.9- and 1.8-V supply voltages, respectively, as an example. Fig. 13 shows two scenarios for heterogeneous signaling. Fig. 13(a) uses low-voltage (0.9 V) signaling from driver to interconnect and

![](_page_8_Figure_9.jpeg)

Fig. 13. Two scenarios of heterogeneous signaling. (a) uses low-voltage signaling, and (b) uses high-voltage signaling at interconnect.

TABLE VII I/O Cells for Heterogeneous Signaling Between 28- and 180-nm Dies

|                       | Low V      | Signaling   | High V Signaling |               |  |
|-----------------------|------------|-------------|------------------|---------------|--|
|                       | Delay min. | Energy min. | Delay min.       | Energy min.   |  |
| TX sizes (28nm)       | x85        | x11         | x2 (HV)          | x2 (HV)       |  |
| TX sizes (180nm)      | x38        | x9          | x38              | x3            |  |
| Worst delay [ps]      | 1023       | 1199        | 977              | 977           |  |
| Worst energy [pJ/bit] | 0.209      | 0.157       | 0.629            | 0.629         |  |
|                       |            |             | HV: high v       | oltage device |  |

shift to high voltage (1.8 V) at I/O 2 (180 nm). Notice that the differential receiver shown in Fig. 9 can also behave as a level shifter, so additional level shifter is not required at the slave. On the other hand, Fig. 13(b) uses high-voltage (1.8 V)signaling from driver to interconnect and shift to low voltage (0.9 V) using differential receiver at I/O 1 (28 nm). We do not consider using other voltages than 0.9 or 1.8 V for signaling since it requires level shifters at the input of driver in both I/Os and results in larger delay and energy consumption.

Table VII presents the worst delay and energy of delay-/ energy-minimized I/Os for heterogeneous integration between 28- and 180-nm dies. Low-voltage signaling in interconnect [see Fig. 13(a)] results in smaller worst energy consumption but larger worst delay because driver 2 (180 nm) operates in low voltage (0.9 V) when the signal goes from 180 to 28 nm. On the other hand, high-voltage signaling [see Fig. 13(b)] arises larger worst energy but smaller worst delay because driver 1 (28 nm) uses high-voltage devices. Therefore, energyminimized I/O should use low-voltage signaling, and delay minimized I/O should use high-voltage signaling in heterogeneous integration.

## F. Cell Library With ESD Protection

Transistor-based ESD protection avoids a sudden electricity flow and protects ICs. The delay-/energy-minimized I/O cells with and without ESD protection are shown in Table VIII.

TABLE VIII I/O CELLS WITHOUT AND WITH ESD PROTECTION (INTERPOSER CASE 2, 1 mm)

|                                                                   | Delay Min              | imization              | Energy Minimization    |                        |  |
|-------------------------------------------------------------------|------------------------|------------------------|------------------------|------------------------|--|
|                                                                   | w/o ESD                | w/ ESD                 | w/o ESD                | w/ ESD                 |  |
| TX, RX sizes<br>Propagation delay [ps]<br>Energy per bit [pJ/bit] | x59, x4<br>43<br>0.117 | x68, x5<br>44<br>0.125 | x3, x1<br>164<br>0.084 | x3, x1<br>164<br>0.087 |  |

As the ESD protection increases the load capacitance, I/O with ESD protection requires bigger driver/receiver sizes for delay minimization. On the other hand, driver/receiver sizes for minimum energy are the same, but I/O with ESD protection consumes more energy.

## VI. CONCLUSION

This article presents an automated flow for generating alldigital I/O library cells for large-scale 2.5-D SiP integration. Given a 2.5-D packaging (interposer) technology, our flow automatically generates I/O layout and timing/power library with the objective of minimizing delay or energy. It takes 7.9 min to generate one delay-/energy-minimized I/O library for a given interposer technology/wire length. Our flow includes chip-interposer cosimulation to consider the inductive property of on-interposer wire and, at the same time, minimizes communication delay/energy, similar to buffer design/insertion for on-chip signaling. We demonstrate our flow for various wire lengths, package dimensions, and ESD protection. We also show the case studies of our flow on various SiP designs to show its feasibility. We first apply our flow to generate I/O cells for an illustrative SiP design in the mesh structure. Generated I/O cells show better delay/energy characteristics compared to the traditional impedance-matched I/O, and the delay/energy minimizing design methodology of I/Os in large SiP design is suggested. Our flow provides both single-ended and differential receivers' options, and we propose a design methodology of I/Os in large SiP design with nonneighboring connections by using both receivers to meet the design goal. We also show our flow generates delay-/ energy-minimized I/Os for heterogeneous signaling between 28 and 180 nm.

The interposer-based SiP integration is gaining traction in many industrial designs. There has been a significant recent effort in developing standards for on-interposer signaling, for example, Intel's AIB [9]. Our proposed flow can integrate with such emerging standard to enable automated I/O design for on-interposer wires. In addition, I/O cells generated from our electronic design automation (EDA) flow can be easily integrated with the EDA flow for the full-chip design. For example, Kim *et al.*, [31] have adopted hard macro I/O cell generated from our flow and merged to the EDA flow for the full 2.5-D IC design.

In this article, we demonstrate the experimental results based on delay or energy minimization as cost functions, motivated by on-chip signaling. Further considerations on cost functions beyond energy and/or delay minimization, such as impedance matching or area of I/O cells, might be valuable in the future work. Moreover, a codesign of I/O cells and interposer dimensions may provide a more holistic design solution in SiP.

#### REFERENCES

- G. Gielen *et al.*, "Emerging yield and reliability challenges in nanometer CMOS technologies," in *Proc. Design, Automat. Test Eur.*, Mar. 2008, pp. 1322–1327.
- [2] G. Van der Plas *et al.*, "Design issues and considerations for low-cost 3-D TSV IC technology," *IEEE J. Solid-State Circuits*, vol. 46, no. 1, pp. 293–307, Jan. 2011.
- [3] K. Parat and C. Dennison, "A floating gate based 3D NAND technology with CMOS under array," in *IEDM Tech. Dig.*, Dec. 2015, pp. 3.3.1–3.3.4.
- [4] T. Fukushima *et al.*, "New heterogeneous multi-chip module integration technology using self-assembly method," in *IEDM Tech. Dig.*, Dec. 2008, pp. 1–4.
- [5] I. Bolsens, "Pushing the boundaries of Moore's Law to transition from FPGA to all programmable platform," in *Proc. ACM Int. Symp. Phys. Design*, Mar. 2017, p. 23.
- [6] J. Wang, S. Ma, P. D. S. Manoj, M. Yu, R. Weerasekera, and H. Yu, "High-speed and low-power 2.5D I/O circuits for memory-logicintegration by through-silicon interposer," in *Proc. IEEE Int. 3D Syst. Integr. Conf. (3DIC)*, Oct. 2013, pp. 1–4.
- [7] K. Saban, "Xilinx stacked silicon interconnect technology delivers breakthrough FPGA capacity, bandwidth, and power efficiency," Xilinx, San Jose, CA, USA, Tech. Rep. WP380, 2011.
- [8] E. McGill, GLOBALFOUNDRIES Demonstrates 2.5D High-Bandwidth Memory Solution for Data Center, Networking, and Cloud Applications. Santa Clara, CA, USA: GlobalFoundries, 2017.
- [9] M. Deo, "Enabling next-generation platforms using intels 3D systemin-package technology," White Paper, 2017.
- [10] D. Kehlet, "Accelerating innovation through a standard chiplet interface: The advanced interface bus (AIB)," Intel Technol., Santa Clara, CA, USA, White Paper WP-01285-1.1, 2017. Accessed: Nov. 27, 2019.
- [11] H. Zhang, V. George, and J. M. Rabaey, "Low-swing on-chip signaling techniques: Effectiveness and robustness," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 8, no. 3, pp. 264–272, Jun. 2000.
- [12] K. Chandrasekar, D. Oh, and A. Rahman, "Timing analysis for wide IO memory interface applications with silicon interposer," in *Proc. IEEE Int. Symp. Electromagn. Compat. (EMC)*, Aug. 2014, pp. 46–51.
- [13] S.-K. Lee, S.-H. Lee, D. Sylvester, D. Blaauw, and J.-Y. Sim, "A 95 fJ/b current-mode transceiver for 10 mm on-chip interconnect," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2013, pp. 262–263.
- [14] V. Raghunathan, M. B. Srivastava, and R. K. Gupta, "A survey of techniques for energy efficient on-chip communication," in *Proc. Design Automat. Conf.*, Jun. 2003, pp. 900–905.
- [15] J. Bae, J.-Y. Kim, and H.-J. Yoo, "A 0.6pJ/b 3Gb/s/ch transceiver in 0.18 μm CMOS for 10 mm on-chip interconnects," in *Proc. IEEE Int. Symp. Circuits Syst.*, May 2008, pp. 2861–2864.
- [16] R. Ho et al., "High-speed and low-energy capacitively-driven on-chip wires," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2007, pp. 412–612.
- [17] J.-S. Seo, R. Ho, J. Lexau, M. Dayringer, D. Sylvester, and D. Blaauw, "High-bandwidth and low-energy on-chip signaling with adaptive preemphasis in 90nm CMOS," in *IEEE Int. Solid-State Circuits Conf.* (*ISSCC*) Dig. Tech. Papers, Feb. 2010, pp. 182–183.
- [18] D. Walter et al., "A source-synchronous 90 Gb/s capacitively driven serial on-chip link over 6 mm in 65 nm CMOS," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2012, pp. 180–182.
- [19] B. Sawyer, B. C. Chou, S. Gandhi, J. Mateosky, V. Sundaram, and T. Tummala, "Modeling, design, and demonstration of 2.5D glass interposers for 16-channel 28Gbps signaling applications," in *Proc. IEEE 65th Electron. Compon. Technol. Conf. (ECTC)*, May 2015, pp. 2188–2192.
- [20] V. Sundaram, Q. Chen, Y. Suzuki, G. Kumar, F. Liu, and R. Tummala, "Low-cost and low-loss 3D silicon interposer for high bandwidth logicto-memory interconnections without TSV in the logic IC," in *Proc. IEEE* 62nd Electron. Compon. Technol. Conf., May 2012, pp. 292–297.

- [21] S.-H. Lee, S.-K. Lee, B. Kim, H.-J. Park, and J.-Y. Sim, "Current-mode transceiver for silicon interposer channel," *IEEE J. Solid-State Circuits*, vol. 49, no. 9, pp. 2044–2053, Sep. 2014.
- [22] W. S. Liao *et al.*, "3D IC heterogeneous integration of GPS RF receiver, baseband, and DRAM on CoWoS with system BIST solution," in *Proc. Symp. VLSI Circuits*, Jun. 2013, pp. C18–C19.
- [23] M.-S. Lin et al., "An extra low-power 1Tbit/s bandwidth PLL/DLL-less eDRAM PHY using 0.3V low-swing IO for 2.5D CoWoS application," in Proc. Symp. VLSI Technol., Jun. 2013, pp. C16–C17.
- [24] S. Manoj P. D., H. Yu, H. Huang, and D. Xu, "A Q-learning based self-adaptive I/O communication for 2.5D integrated many-core microprocessor and memory," *IEEE Trans. Comput.*, vol. 65, no. 4, pp. 1185–1196, Apr. 2016.
- [25] Y. Jeon, H. Kim, J. Kim, and M. Je, "Design of an on-silicon-interposer passive equalizer for next generation high bandwidth memory with data rate up to 8 Gb/s," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 65, no. 7, pp. 2293–2303, Jul. 2018.
- [26] M. Lee *et al.*, "Automated generation of all-digital I/O library cells for system-in-package integration of multiple dies," in *Proc. IEEE* 27th Conf. Elect. Perform. Electron. Packag. Syst. (EPEPS), Oct. 2018, pp. 65–67.
- [27] M. Lee *et al.*, "On the design of energy-efficient I/O circuits for interposer-based 2.5D system-in-package," in *Proc. IEEE SOI-3D-Subthreshold Microelectron. Technol. Unified Conf. (S3S)*, Oct. 2018, pp. 1–3.
- [28] H. M. Torun, M. Larbi, and M. Swaminathan, "A Bayesian framework for optimizing interconnects in high-speed channels," in *IEEE MTT-S Int. Microw. Symp. Dig.*, Aug. 2018, pp. 1–4.
- [29] D. K. Duvenaud, H. Nickisch, and C. E. Rasmussen, "Additive Gaussian processes," in *Proc. Adv. Neural Inf. Process. Syst. (NIPS)*, 2011, pp. 226–234.
- [30] H. M. Torun and M. Swaminathan, "High-dimensional global optimization method for high-frequency electronic design," *IEEE Trans. Microw. Theory Techn.*, vol. 67, no. 6, pp. 2128–2142, Jun. 2019.
- [31] J. Kim et al., "Architecture, chip, and package co-design flow for 2.5D IC design enabling heterogeneous IP reuse," in Proc. 56th Annu. Design Automat. Conf., 2019, pp. 178:1–178:6.

![](_page_10_Picture_12.jpeg)

Hakki Mert Torun (S'15) received the B.Sc. degree in electrical and electronics engineering from Bilkent University, Ankara, Turkey, in 2016, and the M.S. degree in electrical engineering from Georgia Institute of Technology (Georgia Tech), Atlanta, GA, USA, in 2019, where he is currently pursuing the Ph.D. degree with the School of Electrical and Computer Engineering.

His research interests include developing machine learning models and algorithms for system-level design optimization and modeling with the appli-

cations in signal and power integrity in high-speed channels, microwave electronics, and VLSI systems.

Mr. Torun was a recipient of the Best Student Paper Award of the IEEE 27th Conference on Electrical Performance of Electronic Packaging and Systems (EPEPS) in 2018.

![](_page_10_Picture_17.jpeg)

**Jinwoo Kim** (S'19) received the B.S. degree in electrical and computer engineering and the M.S. degree in electrical engineering and computer science from Seoul National University, Seoul, South Korea, in 2011 and 2013, respectively. He is currently pursuing the Ph.D. degree with the School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA, USA.

His current research interests include interposerbased 2.5-D IC design and coanalysis, and 3-D IC design methodology.

![](_page_10_Picture_20.jpeg)

**Minah Lee** (S'18) received the joint B.S. degree in creative IT engineering and electrical engineering from the Pohang University of Science and Technology (POSTECH), Pohang, South Korea, in 2017. She is currently pursuing the Ph.D. degree in electrical and computer engineering with the Georgia Institute of Technology, Atlanta, GA, USA.

Her current research interests include energyefficient and robust edge device with embedded deep learning.

Ms. Lee's works have been nominated for the Best Student Paper Award of the IEEE SOI-3D-Subthreshold Microelectronics

Technology Unified Conference (S3S) and the IEEE Electrical Performance of Electronic Packaging and Systems (EPEPS) in 2018.

![](_page_10_Picture_25.jpeg)

**Arvind Singh** (S'15) received the B.S. and M.S. degrees in electrical engineering from IIT Kanpur, Kanpur, India, in 2010. He is currently pursuing the Ph.D. degree in electrical and computer engineering with the Georgia Institute of Technology, Atlanta, GA, USA, under the supervision of Prof. S. Mukhopdhyay.

From 2010 to 2014, he was with the ASIC Design Team, NVIDIA, Bengaluru, India, where he was involved in tapeouts of multiple generations of Tegra and graphics processors. He was an Intern with the

Circuits Research Labs, Intel, Hillsboro, OR, USA, and the ASIC/VLSI Research Group, Qualcomm, San Diego, CA, USA, in the summer of 2016 and 2017, respectively. His current research interests include hardware security, side-channel attacks, lightweight cryptography, and secure and energy-efficient architectures.

![](_page_10_Picture_29.jpeg)

**Sung Kyu Lim** (SM'05) received the B.S., M.S., and Ph.D. degrees from the Computer Science Department, University of California at Los Angeles (UCLA), Los Angeles, CA, USA, in 1994, 1997, and 2000, respectively.

In 2001, he joined the School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA, USA, where he is currently a Professor. His research is featured as Research Highlight in the *Communications of the ACM* in January 2014. He is the author of *Practical Prob*-

*lems in VLSI Physical Design Automation* (Springer, 2008) and *Design for High Performance, Low Power, and Reliable 3D Integrated Circuits* (Springer, 2013). He has published more than 300 articles on 2.5-D and 3-D ICs. He has been leading two projects (CHIPS and 3DSOC) under the DARPA Electronics Resurgence Initiative (ERI) since 2017. His research focus is on the architecture, design, test, and electronic design automation (EDA) solutions for 2.5-D and 3-D ICs.

Dr. Lim received the National Science Foundation Faculty Early Career Development (CAREER) Award in 2006, the ACM SIGDA Distinguished Service Award in 2008, and the Best Paper Award from the 2012 IEEE Asian Test Symposium (ATS), the 2014 IEEE International Interconnect Technology Conference (IITC), and the 2017 IEEE Electrical Design of Advanced Packaging and Systems Symposium (EDAPS). His works have been nominated for the best paper award at several top venues in EDA and circuit/packaging design. He was an Associate Editor of the IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS from 2007 to 2019 and the IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS from 2013 to 2018.

![](_page_11_Picture_1.jpeg)

**Madhavan Swaminathan** (F'06) received the M.S. and Ph.D. degrees in electrical engineering from Syracuse University, Syracuse, NY, USA, in 1989 and 1991, respectively.

He was with IBM working on packaging for supercomputers. He was the Founding Director of the Center for Co-Design of Chip, Package, System (C3PS), the Joseph M. Pettit Professor of Electronics in ECE and the Deputy Director of the Packaging Research Center (NSF ERC), Georgia Institute of Technology (GT), Atlanta, GA, USA. He is currently

the John Pippin Chair in Microsystems Packaging & Electromagnetics with the School of Electrical and Computer Engineering (ECE), a Professor of ECE with a joint appointment in the School of Materials Science and Engineering (MSE), and the Director of the 3D Systems Packaging Research Center (PRC), GT. He also serves as the Site Director for the NSF Center for Advanced Electronics Through Machine Learning (CAEML) and the Theme Leader for Heterogeneous Integration, SRC JUMP ASCENT Center. He is also the founder and a co-founder of two start-up companies. He is the author of more than 500 refereed technical publications and the primary author and a co-editor of three books. He holds 30 patents.

Dr. Swaminathan's research has been recognized with 22 best paper and best student paper awards. His most recent awards include the Distinguished Alumnus Award from the National Institute of Technology Tiruchirappalli (NITT), Tiruchirappalli, India, in 2014, the Outstanding Sustained Technical Contribution Award from the IEEE Components, Packaging, and Manufacturing Technology Society in 2014, the Georgia Tech Outstanding Achievement in Research Program Development Award in 2017, and the D. Scott Wills ECE Distinguished Mentor Award in 2018. He has served as the Distinguished Lecturer for the IEEE EMC Society. He is also the Founder of the IEEE Conference Electrical Design of Advanced Packaging and Systems (EDAPS), a premier conference sponsored by the EP Society.

![](_page_11_Picture_6.jpeg)

Saibal Mukhopadhyay (S'99–M'07–SM'11–F'18) received the B.E. degree in electronics and telecommunication engineering from Jadavpur University, Kolkata, India, in 2000, and the Ph.D. degree in electrical and computer engineering from Purdue University, West Lafayette, IN, USA, in 2006.

He is currently the Joseph M. Pettit Professor with the School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA, USA. He has authored or coauthored over 200 articles in referred journals and conferences. He holds five U.S.

patents. His research interests include the design of energy-efficient, intelligent, and secure systems. His research explores a cross-cutting approach to design spanning algorithm, architecture, circuits, and emerging technologies.

Dr. Mukhopadhyay was a recipient of the Office of Naval Research Young Investigator Award in 2012, the National Science Foundation CAREER Award in 2011, the IBM Faculty Partnership Awards in 2009 and 2010, the SRC Inventor Recognition Award in 2008, the SRC Technical Excellence Award in 2005, and the IBM PhD Fellowship Award for the years 2004–2005. He received the IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS Best Paper Award in 2014, the IEEE TRANSACTIONS ON COMPONENTS, PACKAGING, AND MANUFACTURING TECHNOLOGY Best Paper Award in 2014, and multiple best paper awards in the International Symposium on Low Power Electronics and Design from 2014 to 2016.