# Voltage Boosted Fail Detecting Circuit for Selective Write Assist and Cell Current Boosting for High-Density Low-Power SRAM

Jaehyun Park<sup>10</sup>, Sangheon Lee<sup>10</sup>, and Hanwool Jeong<sup>10</sup>, Member, IEEE

Abstract—In low supply voltage ( $V_{DD}$ ), the wordline (WL) underdrive read assist and the negative bitline (NBL) write assist circuits are widely used for stable operation of SRAM. However, NBL consumes enormous energy and WL underdrive read assist degrades performance. A voltage-boosted fail-detecting circuit for selective write assist (VBFD-SWA) and selective cell current boosting (VBFD-SCCB) is proposed for high-density low-power FinFET static RAM (SRAM). VBFD-SWA detects the bitline status and then selectively triggers NBL only when a write failure is detected, reducing the write energy consumption. VBFD-SCCB detects slow bitcell and selectively triggers cell current boosting to improve read performance. The simulation results show that the write energy consumption is improved by 19% and the read performance is improved by 36% by applying the VBFD-SWA and VBFD-SCCB, respectively, with an area overhead of 10%.

Index Terms—Low-power static RAM (SRAM), SRAM writeassist circuit, SRAM read-assist circuit.

## I. INTRODUCTION

S THE demand for wearable devices and the Internet of things (IoT) grows, the importance of low-power and high-performance system-on-chip (SoC) is increas-ing. Static random-access memory (SRAM) is one of the main components of a SoC. One simple way to achieve lowpower SRAM is lowering the supply voltage ( $V_{DD}$ ); however, lowering  $V_{DD}$  degrades stability and performance. Recently, it has become common to use assist circuits for read and write operations to maintain the stability and performance of SRAM in low  $V_{DD}$ . The assist circuits dynamically adjust the bitcell control signals to make the bitcell robust to operation failures.

The read-assist circuits are divided into column-based assist circuits and row-based assist circuits. The row-based circuit is the wordline (WL) underdrive circuit. The column-based assist circuits, such as boosting cell  $V_{DD}$  (CVDD) or cell  $V_{SS}$  (CVSS) [1], cause a large dynamic power overhead.

Manuscript received 8 June 2022; revised 29 September 2022 and 27 October 2022; accepted 30 November 2022. Date of publication 8 December 2022; date of current version 30 January 2023. This work was supported in part by the Research Grant of Kwangwoon University, in 2021; and in part by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education under Grant 2018R1A6A1A03025242. This article was recommended by Associate Editor C. Wang. (*Corresponding author: Hanwool Jeong.*)

The authors are with the Department of Electronic Engineering, Kwangwoon University, Seoul 01897, South Korea (e-mail: hwjeong@kw.ac.kr).

Color versions of one or more figures in this article are available at https://doi.org/10.1109/TCSI.2022.3226464.

Digital Object Identifier 10.1109/TCSI.2022.3226464

On the other hand, the row-based WL underdrive read assist circuit (WLUD-RA) causes relatively less power overhead as well as relieves row half-selected cells (RHSCs) issues. For those reasons, WLUD-RA is most frequently used [2], [3], [4]. However, WLUD-RA degrades the write ability and read performance because reduced WL weakens pass-gate (PG) transistor strength, especially in low V<sub>DD</sub>.

WL overdrive (WLOD) [5], [6], transient voltage collapse (TVC) [7], [8], [9], raising CVSS [10], [11], and negative bit-line (NBL) [12], [13], [14], [15] are used for write assist techniques. Although TVC and NBL effectively increase the write ability without harming the half-selected bitcells, they inevitably incur large power consumption. This is because TVC results in massive pull-down current to change CVDD node with a large capacitance, and NBL normally uses a huge coupling capacitor to change bitline (BL) with a large capacitance.

To reduce the write assist power, selective negative bitline write assist (SNBL-WA) [12] technique is proposed. However, write failure detecting takes too much time because a static CMOS inverter is used for the write failure detector. In order to avoid this problem, a complex BL precharge conditioning circuit is used. However, the additional BL conditioning makes the circuit more complex and susceptible to noise. In [16], the conditional biasing WA (CBWA) is proposed, which utilizes floating CVSS for the write assist to decrease power overhead. However, it takes a significant amount of time to raise CVSS, because large CVSS capacitance should be charged with weak bitcell transistors. Thus, it is highly challenging to achieve the target write yield without speed degradation.

Write- and performance-assist cell (W- and P-AC) circuit [17] is proposed, which restores BL and WL during write and read operation, respectively. These circuits increase the write ability by reducing the effects of interconnect parasitic resistance. However, compared to the write assist circuits that directly boost BL or collapse supply voltage, the write ability enhancement effect is highly limited. In [18], [19], and [20], the charge recycling or redistribution techniques are used to reduce the power overhead of the write assist. However, in [18] and [19], the CVSS boosting during write operation inevitably degrades the half-selected bitcell stability, while in [20], the requirement for an additional phase to charge sharing significantly increases the cycle time.

1549-8328 © 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.



Fig. 1. Structure of the conventional 6T SRAM bitcell: (a) schematic and (b) layout by 2  $\times$  1 bitcell.

In this study, we propose read and write assist circuits with BL status sensing to address the read performance degradation and large write assist energy overhead. First, unlike the conventional NBL, our proposed selective write assist circuits do not generate the NBL for every write operation. Instead, the write assist is generated only when the write failure is detected by the BL status sensing. Thus, the write assist power can be saved without performance degradation. Second, the BL status sensing circuit is also used to detect slow bitcell and trigger cell current ( $I_{CELL}$ ) boosting to improve the read performance. Again, the  $I_{CELL}$  boosting is enabled only for the slow bitcells, so the power overhead is minimized. It should be noted that the same BL status sensing circuit is used to save the write energy, improve read performance, and reduce the circuit overhead.

In the next section of this paper, the background is explained. Then, Section III describes the structure and operation of the proposed selective write assist circuits through BL status sensing. In Section IV, selective  $I_{CELL}$  boosting is presented using the same sensing circuit. Section V describes the simulation results and compares the proposed structure with that of the conventional SRAM. Finally, Section VI concludes the study.

## II. BACKGROUND

## A. SRAM Bitcell Structure

Fig. 1(a) and (b) show the schematic and layout of the conventional 6T SRAM bitcell, respectively. The layout is shown in Fig. 1 (b), based on 7nm ASAP technology [21]. Bit-line pairs (BLL and BLR) and CVDD are routed vertically with M2, while CVSS is routed using M2 and M3 for power mesh, and WLs are routed horizontally with M3.

## B. SRAM Read Operation and Assist Circuits

The read operation of SRAM is performed by sensing the voltage difference between BL pairs, which are pre-charged to  $V_{DD}$ . When the WL is asserted, one of the BLs is discharged to a low voltage level according to the stored data in the bitcell. After BL discharges enough to meet the read sensing yield, sense amplifier enable signal (SAE) is engaged. Then, from the voltage difference of BL pairs, SA reads the data whether "low" or "high."

Fig. 2 shows the state of the bitcell and the waveform for the read operation. It is assumed that the initial condition of the



Fig. 2. Conventional 6T SRAM bitcell read operation and waveforms.



Fig. 3. Conventional 6T SRAM bitcell write operation and waveforms corresponding to write pass and fail.

left bitcell storage node (Q) is "high" and that the right node (QB) is "low". When the BLR is discharged, the voltage of QB increases due to  $I_{CELL}$  through the PG<sub>R</sub>. If QB is over the voltage trip point of the inverter, the stored data in the bitcell is flipped, which results in read failure. To alleviate this read failure, the driving strength of the PG must be weaker than that of the pull-down (PD). As previously stated, the WLUD-RA technique is frequently used to improve read stability by suppressing the WL voltage level. However, underdrive WL decreases  $I_{CELL}$ , which results in serious read performance degradation, especially in low  $V_{DD}$ .

## C. SRAM Write Operation and Assist Circuits

Fig. 3 shows the SRAM bitcell under the write operation and the waveform for the write pass and fail. Assuming that D = 0is written when the selected bitcell initially stores Q = 1 and QB = 0, BLL is discharged to GND. There is a conflict between PU<sub>L</sub> and PG<sub>L</sub> after WL is enabled at the Q node. For the successful write operation, the drive strength of PG<sub>L</sub> should be sufficiently larger than that of PU<sub>L</sub>.

When using WLUD-RA, which is common in low  $V_{DD}$ , WL should be underdriven for read operations as well to protect the row half selected bitcells from data flip. Accordingly, WLUD-RA weakens PG, resulting in decreased write stability. As a result, the write assist circuit is required to meet the target write yield. The negative BL write assist circuit (NBL-WA) is one of the most widely used techniques, which biases low BL to a negative voltage instead of GND. Fig. 4 shows the conventional NBL-WA structure and waveforms. After the negative enable signal (/NegEn) is enabled, NVSS which supplies the write driver falls to a negative voltage through



Fig. 4. Conventional NBL-WA structure and waveforms.

capacitive coupling. In order to discharge BL to a negative voltage, the coupling capacitor should be large because of the high BL capacitance. Thus, NBL-WA which is triggered for all the selected columns for every write operation incurs a significant amount of dynamic energy overhead.

# III. SELECTIVE WRITE ASSIST CIRCUIT

As explained, the use of the conventional NBL-WA results in a huge dynamic power overhead. In particular, assisting every bitcell for every write operation is an inefficient way in terms of energy, because write can be successfully performed without assist circuits for most bitcells. To resolve this issue selective write assist, which is enabled only when the write assist is required, is proposed in this section.

The structure of the proposed voltage boosted fail detecting selective write assist (VBFD-SWA) circuit is shown in Fig. 5. It should be noted that a voltage boosted failure detecting (VBFD) circuit is implemented to selectively assert /NegEn by sensing BL states. In VBFD, MPDL and MPDR are used to sense the voltages of BLL and BLR, respectively. The supply voltage of M<sub>PDL</sub> and M<sub>PDR</sub>, VDD<sub>H</sub>, is designed to be raised through coupling capacitor  $C_{C0}$ , while  $C_S$  is used for the storing capacitor. For reasons to be explained later, M<sub>P2</sub> is added to form the positive feedback between /NegEn falling and OUT rising. The overall operation of VBFD-SWA is composed of three steps: 1) The BLs are first driven by write drivers without NBL-WA, 2) whether write failures or successes for the selected bitcell are detected, and 3) NBL is enabled only when write failures are detected, but not enabled when write successes are detected.

Fig. 5(b) shows the write operations of VBFD-SWA. It is assumed that D = 0 is written, while the selected bitcell initially stores Q = 1. At the beginning of the write operation,  $M_{N0}$  is turned on by high /SEL. Thus, OUT = 0 and /NegEn = 1, which drives NVSS to GND. DBWB becomes high after WREN is asserted by being pulled up, while DWB stays low (: D = 0). Thus, the selected BLL where write MUX is on is driven to GND by  $M_{N1}$ , while BLR is floated. At the same time, in VBFD, overdrive enable signal (ODEN) is raised to turn off  $M_H$  and boost

 $VDD_H$  to  $V_{DD}$  +  $|V_{thp,MPD}|$ . Then after SEL is raised,  $M_{PCHL}$ ,  $M_{PCHR}$ , and  $M_{N0}$  are turned off, while  $M_{PL}$  and  $M_{PR}$  are turned on. It should be noted that in the proposed VBFD, the conventional read MUX control signal is replaced with a combination of column select signal (ColSel) and DW or DBW. In order to make one of the read MUX is turned on during the write operation according to the write data value. Thus,  $M_{RMXR}$  is turned on to connect BLR to  $X_R$ .

The operational waveforms are shown in Fig. 5(b) for two cases: 1) write success without NBL-WA and 2) write initially fail without NBL-WA, so the NBL-WA is required to trigger for write success. If write succeeds without NBL-WA trigger, BLR stays near V<sub>DD</sub> so |V<sub>GS</sub>| of M<sub>PDR</sub> is almost the same as  $|V_{thp}|$ . Thus, current barely flows through M<sub>PDR</sub>, so OUT is kept low until the end of the write operation. On the other hand, if write fails without NBL-WA storage data Q and QB are not flipped, Q is high, and QB is still low. Owing to low QB, floated "high" BLR is slowly discharged like the read operation. As a result, after the SEL is enabled the lower BLR causes  $|V_{GS}|$  to exceed  $|V_{thp}|$  in  $M_{PDR}$ , resulting in current flow from VDD<sub>H</sub> to OUT. This makes OUT be pulled up, then /NegEn is pulled down which triggers NBL-WA. To fully charge OUT to V<sub>DD</sub> even with a small current through M<sub>PDR</sub>, the positive feedback circuit of MP2, which is triggered by lowered OUTb, is added.

Fig. 6 shows the flow chart and its matched waveforms for control signals of VBFD-SWA to understand the operation more clearly.

VDD<sub>H</sub> is designed to be overdriven to the V<sub>DD</sub>+|V<sub>thp,MPD</sub>|. This is because if VDD<sub>H</sub> exceeds V<sub>DD</sub> + |V<sub>thp,MPD</sub>|, M<sub>PD</sub>s (M<sub>PDR</sub> and M<sub>PDL</sub>) are turned on. Then VDD<sub>H</sub> is discharged through M<sub>PD</sub>s until VDD<sub>H</sub> becomes V<sub>DD</sub> + |V<sub>thp,MPD</sub>|, and M<sub>PD</sub>s are turned off. It should be noted that this process does not affect node OUT, because SEL is not enabled yet, and OUT stays low by turning on M<sub>N0</sub>. Accordingly, boosted level of VDD<sub>H</sub> can track the V<sub>th</sub> variation of M<sub>PDs</sub>, where |V<sub>thp,MPD</sub>| is the smaller one between |V<sub>thp,MPDL</sub>| and |V<sub>thp,MPDR</sub>|. Thus, the current through M<sub>PD</sub> is robust on the V<sub>th</sub> process variation, leading to preventing unexpected leakage current which flows through M<sub>PDs</sub> and M<sub>H</sub>. To realize this, V<sub>th</sub> tracking M<sub>PDs</sub> are designed with LVT, while M<sub>H</sub> is designed with RVT.

Capacitor  $C_{C1}$  in negative voltage generator (NVG) is set to satisfy the target write yield (=  $6\sigma$ ), and C<sub>C0</sub> is used to be large enough to boost VDD<sub>H</sub> to  $V_{DD} + |V_{thp,MPD}|$ . To achieve the target write yield when VBFD-SWA is used, /NegEn should be properly lowered whenever the write failure occurs. Accordingly,  $C_S$  should be set large enough for VDD<sub>H</sub> node stores enough charge to fully pull OUT up when MPD is turned on by write failure. In addition, the time duration of the SEL (T<sub>SEL</sub>) should be set large enough to provide sufficient time for OUT to be reached to V<sub>DD</sub> by M<sub>PD</sub> current. On other hand, too large C<sub>S</sub> or long T<sub>SEL</sub> raises OUT even for write success case. This means that an unnecessary write assist is performed, wasting energy consumption. In this context, the write assist rate should be minimized to reduce energy overhead, provided that the target write yield is satisfied. Thus, C<sub>C0</sub>, C<sub>S</sub>, and T<sub>SEL</sub> should be carefully optimized.



Fig. 5. Proposed VBFD-SWA structure (a) and operational waveforms (b) when write assist not triggered and assist triggered.



Fig. 6. Control signal flow chart of VBFD-SWA and its matched waveforms.

Fig. 7 shows the write yield and assist rate according to three different parameters of  $C_S$  and  $T_{SEL}$  at the fixed  $C_{C0}$  of 1.5 fF. The results are derived from an HSPICE Monte Carlo (MC) simulation with the most probable failure point (MPFP) [22],  $V_{DD}$  is 0.6 V and WL underdriven voltage (V<sub>WL</sub>) is 0.52 V. The global process corner is TT and the temperature is set to -40 °C (the worst temperature for the write ability). To achieve the target write yield of  $6\sigma$  as mentioned before, Cs should be large. Large Cs (1.8 fF) not only results in a



Fig. 7. Write yield and write assist rate according to T<sub>SEL</sub> and C<sub>S</sub>.

high write yield but also generates an excessive write assist ratio. In the same context,  $T_{SEL}$  becomes larger, assist rate and write yield are increased rapidly. To minimize the assist rate, while meeting the target write yield,  $T_{SEL} = 1.2$  ns, and  $C_S = 1.5$  fF are chosen, and that of write assist ratio is 14% which is obtained from MC simulation.

For the two capacitors ( $C_{C0}$ ,  $C_S$ ) optimization, Fig. 8 shows the shmoo plot of the write operation for different values of  $C_S$ and  $C_{C0}$  for the target write yield of  $6\sigma$  at the fixed  $T_{SEL}$  of 1.2 ns. Among the options satisfying target write yield,  $C_{C0} =$  $C_S = 1.5$  fF are chosen to minimize area overhead and maximize power saving. In addition, by choosing these balanced capacitor sizes for  $C_{C0}$  and  $C_S$ , the capacitance variation can be minimized by avoiding using small capacitance [23].

In addition, even if the slow bitcell is accessed, OUT should be raised as quickly as possible to make the whole



Fig. 8. Shmoo plot of write operation that satisfies target write yield of  $6\sigma$  for different values of Cs and C<sub>C0</sub>.

VBFD-SWA operation complete within the WL duration predetermined by the read operation. To achieve this goal, the capacitance of OUT is reduced by minimizing the size of  $M_{PDL}$ ,  $M_{PDR}$ ,  $M_{N0}$ ,  $M_{P2}$ , and CMOS inverter driven by OUT. Then,  $C_{C0}$  and  $C_S$  are adjusted properly. Thus, VBFD-SWA is carried out within WL duration with the minimized transistors used for the components that are connected to OUT and with adjusted  $C_{C0}$  and  $C_S$ , without the cycle time overhead.

# IV. SELECTIVE CELL CURRENT BOOSTING CIRCUIT

As previously mentioned, the use of WLUD-RA in low V<sub>DD</sub> significantly degrades the read performance. To resolve this problem, the I<sub>CELL</sub> can be boosted by driving CVSS to a negative value at the expense of energy increase. To minimize the energy overhead, the I<sub>CELL</sub> boosting can be applied only for the slow bitcells, which critically determine the overall read speed. In this section, we propose a VBFD-based selective cell current boosting circuit (VBFD-SCCB), which reuses VBFD proposed in Section III. By sharing VBFD with the NBL-WA, the proposed scheme can be implemented in an area-efficient way. Besides the speed, energy can be saved through the proposed VBFD-SCCB. In the read, and write operation most BLs are unnecessarily discharged into both the selected and unselected columns, occupying a large portion of total energy consumption. The reduced WL duration (RWL) by the proposed VBFD-SCCB can decrease the BL discharge amount, leading to energy savings.

Fig. 9(a) and (b) show the schematic and operational waveforms of the proposed VBFD-SCCB, respectively. It should be noted that VBFD is shared with VBFD-SWA, which is used for sensing BL status. We shared the same capacitance of  $C_{C1}$ , which is used as a negative voltage generator in both the VBFD-SWA and VBFD-SCCB for area efficiency. Utilizing VBFD, the slow bitcell can be detected to selectively trigger  $I_{CELL}$  boosting. To incorporate the  $I_{CELL}$  boosting operation, two transmission gates – TG<sub>1</sub> and TG<sub>2</sub> – and a NAND2 gate are added. TG<sub>1</sub> and TG<sub>2</sub> are both designed to turn on when SRAM read and write operations are not enabled, while during the operation, one of the TGs is turned off depending on the READ and WREN signals. For read operation, by READ = 1,  $TG_1$  is turned off, and  $TG_2$  is still turned on. NAND2 is triggered by the read select signal (RSEL), which is enabled only when performing a read operation.

To keep switching CVSS connection nodes, which are GND or negative voltage, CVSS footers –  $M_{FT1}$  and  $M_{FT2}$  – are added. Through the  $M_{FT1}$  and  $M_{FT2}$ , depending on BLRS, CVSS is connected to the NVSS or GND. In addition, CVSS is separately routed column-by-column, as shown in Fig. 10. It should be noted that compared with the conventional structure shown in Fig. 1(b), the effective resistor from GND to CVSS is increased, and the capacitance of CVSS is decreased.

Fig. 9(b) shows the read operation for the two cases which are nominal and slow bitcell, where the SCCB is triggered only for the slow bitcell. For the read operation, it is assumed that the bitcell storage data Q = 1, after WL enable BLR is discharged and OUT is low before /SEL enables. Then, ODEN is raised to turn off M<sub>H</sub> and boost VDD<sub>H</sub> to V<sub>DD</sub>+|V<sub>thp,MPD</sub>|. Until now, the operation of VBFD has been the same as in Section III. Then from DW and DBW which are both high ( $\therefore$  WREN = 0), both M<sub>RMXR</sub> and M<sub>RMXL</sub> are also turned on.

In a slow bitcell, OUT is still low because BLR is not enough to fall to turn on  $M_{PDR}$ , which means OUTb is also still high. Consequently, after RSEL is enabled, NAND2 is triggered and /NegEn is generated to generate negative NVSS, resulting in a CVSS pull down to a negative voltage and boosting the I<sub>CELL</sub>. When RSEL is enabled in a nominal bitcell, OUT is generated to high and OUTb is pulled down to low. Thus /NegEn stays high and CVSS is kept connected to GND, which means I<sub>CELL</sub> boosting is not asserted. To understand the operation more clearly, the flow chart and its matched waveforms for control signals of VBFD-SCCB are shown in Fig. 11.

In the VBFD circuit, BL failure detection is achieved both in read and write operations, thus NAND2 gate and RSEL are applied to trigger NVG to distinguish read and write detection. It should be noted that the timing of RSEL is carefully determined. First, the rising time of RSEL should be waited for to provide enough time for OUTb to pull down for the nominal bitcell case. In this manner, unnecessary ICELL boosting is prevented to minimize the energy overhead. Second, the time duration of RSEL (T<sub>RSEL</sub>) should be considered. T<sub>RSEL</sub> is equal to the time duration of overdriving CVSS. If T<sub>RSEL</sub> is not large enough, the slow BL is not discharged to the target level enough for SA to sense data stably. On the contrary, if T<sub>RSEL</sub> is too large, unwanted read assist frequently occurs, increasing the energy overhead. Fig. 9(b) in the slow bitcell shows an example of the role of RSEL, although OUTb is triggered to "high" at the end of T<sub>SEL</sub>, RSEL is already turned off, then assist does not occur. Thus, T<sub>RSEL</sub> should be large enough to sufficiently improve the sensing yield, while minimizing the unwanted read assist for energy saving.

Fig. 12 shows the time duration of WL ( $T_{WL}$ ) improvement and read energy according to  $T_{RSEL}$ . The simulation result of  $T_{WL}$  is derived from an importance sampling [24], [25];  $V_{DD}$  is 0.6 V and  $V_{WL}$  is 0.52 V, whereas  $V_{WL}$  is underdriven and determined to obtain target read stability yield. When  $T_{RSEL}$  becomes large,  $T_{WL}$  becomes short, and the energy is de-creased thanks to reduced WL (RWL). However,



Fig. 9. Proposed VBFD-SCCB structure (a) and operational waveform (b) when nominal bitcell not triggered cell current boosting and slow bitcell triggered cell current boosting.



Fig. 10. Layout of separated CVSS column routing and the associated SRAM array.

if  $T_{RSEL}$  is over 0.7 ns,  $T_{WL}$  is not decreased anymore. Thus, we determine  $T_{RSEL}$  is 0.7 ns, that of  $T_{WL}$  improvement is 36% (2.6 ns  $\rightarrow$  1.67 ns), and that of read energy is 27.7 fJ.

# V. COMPARISON

All simulation results are based on 7-nm ASAP FinFET post-layout simulations. We compare the proposed VBFD-SWA and VBFD-SCCB with the conventional SRAM design in terms of power, performance, and area (PPA). The SRAM bitcell array size of 256 rows, 128 columns, and 4-to-1 BL MUXs is considered and a conventional voltage latched sense amplifier (VLSA) is used. V<sub>DD</sub> is 0.6 V and 5% worst voltage corner is considered, and V<sub>WL</sub> is 0.52 V, which is determined to require a target read stability yield of  $6\sigma$ .

To evaluate each SRAM design, MC and HSPICE simulations were performed with  $V_{th}$  variation. To model the transistor's variation,  $V_{th}$  is assumed to follow Gaussian distribution



Fig. 11. Control signal flow chart of VBFD-SCCB and its matched waveforms.

with a standard deviation of  $\sigma_{Vth}$ , as indicated in the following equation (1) [26]:

$$\sigma_{Vth} = \frac{A_{\Delta Vt}/\sqrt{2}}{\sqrt{L_g W_g}} = \frac{A_{\Delta Vt}/\sqrt{2}}{\sqrt{L_g N_{fin}(2H_{fin} + T_{fin})}},$$
 (1)



Fig. 12. T<sub>WL</sub> improvement and read energy according to T<sub>RSEL</sub>.

where  $A_{\Delta Vt}$  is the Pelgrom coefficient, whose value is determined based on the silicon measurement results of 7nm FinFET in [27], and  $L_g$ ,  $W_g$ ,  $N_{fin}$ ,  $H_{fin}$ , and  $T_{fin}$  are gate length, gate width, number of fins, fin height, and fin thickness in a FinFET, respectively.

Read stability yield and write ability yield are obtained based on the MPFP method [22]. That is because it is impossible to get a target read and write yield of  $6\sigma$  with brute force MC simulation (~2000 years are needed). In addition,  $T_{WL,6\sigma}$  is determined to meet the read sensing yield of  $6\sigma$ , which is formulated by the following equation (2):

$$P[\Delta V_{BL}(T_{WL,6\sigma}) - V_{off} > 0] = CDF_{STD}(6), \qquad (2)$$

where  $\Delta V_{BL}$  is the voltage difference between the BL pairs, when at the time of  $T_{WL,6\sigma}$ , distribution of  $\Delta V_{BL}$  is obtained by simulation and  $V_{off}$  is the sense amplifier offset voltage, it assumed that  $V_{off}$  follows the Gaussian distribution whose standard deviation is 30 mV [28], and  $CDF_{STD}(Z)$  is the cumulative distribution function of the standard normal distribution. Thus, according to (2), to require target  $6\sigma$  read sensing yield, it is said that  $T_{WL,6\sigma}$  is quite related to the BL discharge speed. If one of BLs discharge slowly,  $T_{WL,6\sigma}$  should be large. It should be noted that in the VBFD-SCCB circuit,  $T_{WL,6\sigma}$  is got using simulation with importance sampling [24], [25], because BL distribution does not follow the gaussian distribution due to overdriving CVSS to a negative voltage.

Fig. 13 shows the Q-Q plot of  $T_{WL}$  distribution with conventional and VBFD-SCCB. In the low sigma region, which is near the nominal bitcell, BL discharging with VBFD-SCCB is slow compared to conventional circuits. This is because the effective resistance of CVSS is increased and capacitance is reduced owing to the use of the separated CVSS with the footer. Thus, the transient rise of CVSS during read operation is increased in VBFD-SCCB. However, as observed in the high sigma region which represents the slow bitcells that actually determine  $T_{WL}$ , the trigger of the proposed VBFD-SCCB effectively accelerates the BL discharging



Fig. 13. Q-Q plot of  $T_{\rm WL}$  distribution with conventional read and VBFD-SCCB.



Fig. 14. The comparison of  $T_{WL}$  and its components with conventional and VBFD-SCCB read operation for the slow bitcell.

speed. By selectively boosting  $I_{CELL}$ ,  $T_{WL,6\sigma}$  can be improved by 36% compared with no assist circuit.

Fig. 14 shows the portion of  $T_{WL}$  for the conventional SRAM and the SRAM with the proposed VBFD-SCCB. In the conventional read operation,  $T_{WL}$  is determined by the BL development time (2.6 ns). In VBFD-SCCB, the BL development time, the slow bitcell detecting, and the I<sub>CELL</sub> boosting are occupied by 39%, 15%, and 36% of whole  $T_{WL}$  (1.67 ns), respectively. Thus,  $T_{WL}$  of VBFD-SCCB is reduced due to I<sub>CELL</sub> boosting even considering the additional circuit delay.

According to the simulation results, to the required target write yield of  $6\sigma$ , the C<sub>C1,6 $\sigma$ </sub> value should be 12 fF for every four columns in the conventional structure. However, with the VBFD-SWA, the capacitance of BL is larger than the conventional structure because of added circuits like VBFD. Thus, the required C<sub>C1,6 $\sigma$ </sub> is 14 fF for four columns.

Table I summarizes the parameters and simulation results of comparison between conventional SRAM and proposed VBFD read and write assist SRAM. The capacitances of  $C_{C1,6\sigma}$ ,  $C_{C0}$ , and  $C_S$  are implemented using the metal-oxide-metal (MOM) capacitor for high density and simplicity.

The area of one bitcell is  $0.023 \text{ um}^2$  (width: 0.108 um, height: 0.216 um) in 7nm ASAP. The area of four columns bitcell array is 27.648 um<sup>2</sup> and the bitcell array with column

TABLE I Comparison Between the Conventional and Proposed Structures

| Method                        | Conventional           | Proposed             |
|-------------------------------|------------------------|----------------------|
| V <sub>DD</sub>               | 0.6 V                  |                      |
| $V_{WL,r6\sigma}$             | 0.52 V                 |                      |
| $T_{WL,r6\sigma}$             | 2.6 ns                 | 1.67 ns              |
| C <sub>C1,6</sub>             | 12 fF                  | 14 fF                |
| Write-assist rate             | 100%                   | 14%                  |
| Read energy for four columns  | 29.4 fJ                | 27.7 fJ              |
|                               |                        | 32.4 fJ              |
|                               |                        | (w/o RWL)            |
| Write energy for four columns | 39.5 fJ                | 31.8 fJ              |
|                               |                        | 34.5fJ               |
|                               |                        | (w/o RWL)            |
| Average energy                | 34.5 fJ                | 29.7 fJ              |
| Area of bitcell (256×4)       | 27.648 μm <sup>2</sup> |                      |
| Area of                       | $7.652 \ \mu m^2$      | $11.85 \ \mu m^2$    |
| Column peri, I/O circuit      |                        |                      |
| Total Area (4columns)         | 35.3 µm <sup>2</sup>   | 39.5 μm <sup>2</sup> |



Fig. 15. Write energy consumption comparison when negative BL is used and when VBFD-SWA is applied without and with RWL.

peripheral circuit area is 39.5 um<sup>2</sup>, where the peripheral circuit includes sense amplifier, column mux, write driver, I/O D-Latch, VBFD, and NVG with capacitors. Especially, capacitors occupy large portion of peripheral circuit area (C<sub>C1</sub>: 7 um<sup>2</sup>, C<sub>C0</sub>: 0.75 um<sup>2</sup>, C<sub>S</sub>: 0.75 um<sup>2</sup>, total: 8.5 um<sup>2</sup>). The total area overhead for a 256 × 4 SRAM array is 10% (35.3  $\mu$ m<sup>2</sup>  $\rightarrow$  39.5  $\mu$ m<sup>2</sup>). In terms of PPA, by applying VBFD read and write assist circuits, the average power is improved by 14% (34.5 fJ  $\rightarrow$  29.7 fJ), and performance increased by 36% (2.6 ns  $\rightarrow$  1.67 ns), and the area overhead is 10%.

Fig. 15 shows the overall write energy comparison for four columns for the conventional NBL with a 12 fF coupling capacitor and the VBFD-SWA with a 14 fF coupling capacitor. Also, VBFD-SWA is categorized as with no RWL and with RWL. In VBFD-SWA, in the assist case, write energy is larger than conventional, whereas in the no assist case, the average energy saving of VBFD-SWA is about 13.1% (39.5 fJ  $\rightarrow$  34.3 fJ) and the selective write assist ratio is 14%. Applying the RWL in the VBFD-SWA, write energy is saved in every case. Thus, the energy saving is 19.4% (39.5 fJ  $\rightarrow$  31.8 fJ) with RWL.



Fig. 16. Read energy consumption comparison when conventional read operation is used and when VBFD-SCCB is applied with and without RWL.



Fig. 17. Read performance improvement and write energy saving under the global corner variation.

Fig. 16 shows the average read energy comparison for four columns for the conventional SRAM read operation and the VBFD-SCCB. Because VBFD is added, the average energy of proposed circuits without reduced WL, increased compared with conventional 9.2% (29.4 fJ  $\rightarrow$  32.4 fJ), however applying RWL, the average read energy actually decreased by 5.8% (29.4 fJ  $\rightarrow$  27.7 fJ) owing to the elevated BL discharge level, even the circuit is more complicated.

If there is MOM capacitor size mismatch, the capacitor should be enlarged to satisfy the write yield. Assuming the worst capacitance variation of 5% [23],  $C_{C0}$  and  $C_S$  should be sized-up to 1.575 fF to have a 5% margin. In this condition, the area overhead is increased by 10.7% to 11.3%, and the average energy-saving effect is decreased by 13.9% to 13.1%.

Fig. 17 shows the read performance improvement rate and write energy saving rate when considering the global process corner. It shows that in fast nFET, the read performance improvement rate is high and the write energy saving rate is low. Conversely in slow nFET, the read performance rate is low and the write energy saving rate is high.

Due to the use of NBL,  $V_{MIN}$  can be improved by 190 mV, which is almost the same as the reported  $V_{MIN}$  results of 7nm FinFET SRAM having a similar configuration [29].

## VI. CONCLUSION

VBFD-SWA and VBFD-SSCB are proposed for write assist and read assist, respectively. By detecting BL status, VBFD triggered the NVG for selective write assist and read assist, only when failure was detected. Selective write assist is performed using VBFD-SWA to reduce write energy consumption. Also, by applying VBFD-SCCB, the read performance is significantly improved, and thanks to the reduced WL duration, the read energy is also decreased, whereas the circuits are complicated. The simulation results derived at the same  $V_{DD}$  condition indicated that the performance and energy consumption improved by 36% and 14%, respectively, by applying the proposed VBFD using circuits. However, the inevitable area overhead occurs at 10%.

#### REFERENCES

- [1] Y. Yang, H. Jeong, F. Yang, J. Wang, G. Yeap, and S.-O. Jung, "Readpreferred SRAM cell with write-assist circuit using back-gate ETSOI transistors in 22-nm technology," *IEEE Trans. Electron Devices*, vol. 59, no. 10, pp. 2575–2581, Oct. 2012.
- [2] E. Karl et al., "A 4.6 GHz 162 Mb SRAM design in 22 nm trigate CMOS technology with integrated active VMIN-enhancing assist circuitry," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2012, pp. 230–232.
- [3] J. Chang et al., "A 20 nm 112 Mb SRAM in highmetal-gate with assist circuitry for low-leakage and low-VMIN applications," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2013, pp. 3–316.
- [4] T. Song et al., "A 10 nm FinFET 128 Mb SRAM with assist adjustment system for power, performance, and area optimization," *IEEE J. Solid-State Circuits*, vol. 52, no. 1, pp. 240–249, Jan. 2017.
- [5] A. Raychowdhury et al., "PVT-and-aging adaptive wordline boosting for 8T SRAM power reduction," in *Proc. IEEE ISSCC Dig. Tech. Papers*, Feb. 2010, pp. 352–353.
- [6] K. Takeda et al., "Multi-step word-line control technology in hierarchical cell architecture for scaled down high-density SRAMs," *IEEE J. Solid-State Circuits*, vol. 46, no. 4, pp. 806–814, Apr. 2011.
- [7] E. Karl et. al., "A 4.6 GHz 162 Mb SRAM design in 22 nm tri-gate CMOS technology with integrated read and write assist circuitry," *IEEE J. Solid-State Circuits*, vol. 48, no. 1, pp. 150–158, Jan. 2013.
- [8] O. Hirabayashi et al., "A process-variation-tolerant dual-power-supply SRAM with 0.179 cell in 40 nm CMOS using level-programmable wordline driver," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2009, pp. 458–459.
- [9] E. Karl et al., "A 0.6 V, 1.5 GHz 84 Mb SRAM in 14 nm FinFET CMOS technology with capacitive charge-sharing write assist circuitry," *IEEE J. Solid-State Circuits*, vol. 51, no. 1, pp. 222–229, Jan. 2016.
- [10] K. Sohn et al., "A 100 nm double-stacked 500 MHz 72 Mb separate-I/O synchronous SRAM with automatic cell-bias scheme and adaptive block redundancy," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2008, pp. 386–622.
- [11] S. Mukhopadhyay, R. M. Rao, J.-J. Kim, and C.-T. Chuang, "SRAM write-ability improvement with transient negative bit-line voltage," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 19, no. 1, pp. 24–32, Jan. 2011.
- [12] H. Jeong et al., "Offset-compensated cross coupled PFET bit-line conditioning and selective negative bit-line write assist for high-density low-power SRAM," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 62, no. 4, pp. 1062–1070, Apr. 2015.
- [13] J. Chang et al., "A 7 nm 256 Mb SRAM in high-κ metal-gate FinFET technology with write-assist circuitry for low-VMIN applications," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2017, pp. 206–207.
- [14] C.-Y. Lu et al., "A 0.325 V, 600-kHz, 40-nm 72-kb 9T subthreshold SRAM with aligned boosted write wordline and negative write bitline write-assist," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 23, no. 5, pp. 958–962, May 2015.
- [15] Y.-H. Chen et al., "A 16 nm 128 Mb SRAM in high-κ metal-gate FinFET technology with write-assist circuitry for low-VMIN applications," *IEEE J. Solid-State Circuits*, vol. 50, no. 1, pp. 170–177, Jan. 2015.
  [16] C.-R. Huang and L.-Y. Chiou, "An energy-efficient conditional
- [16] C.-R. Huang and L.-Y. Chiou, "An energy-efficient conditional biasing write assist with built-in time-based write-margin-tracking for low-voltage SRAM," *IEEE Trans. Very Large Scale Integr.* (VLSI) Syst., vol. 29, no. 8, pp. 1586–1590, Aug. 2021, doi: 10.1109/TVLSI.2021.3084041.
- [17] K. Cho et al., "SRAM write- and performance-assist cells for reducing interconnect resistance effects increased with technology scaling," *IEEE J. Solid-State Circuits*, vol. 57, no. 4, pp. 1039–1048, Apr. 2022, doi: 10.1109/JSSC.2021.3138785.
- [18] H. Jeong et al., "Bitline charge-recycling SRAM write assist circuitry for VMIN improvement and energy saving," *IEEE J. Solid-State Circuits*, vol. 54, no. 3, pp. 896–906, Mar. 2019, doi: 10.1109/JSSC.2018.2883725.

- [19] K. Kim, H. Jeong, J. Park, and S.-O. Jung, "Transient cell supply voltage collapse write assist using charge redistribution," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 63, no. 10, pp. 964–968, Oct. 2016, doi: 10.1109/TCSII.2016.2536258.
- [20] W. Choi and J. Park, "A charge-recycling assist technique for reliable and low power SRAM design," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 63, no. 8, pp. 1164–1175, Aug. 2016, doi: 10.1109/TCSI.2016.2589118.
- [21] L. T. Clark et al., "ASAP7: A 7-nm FinFET predictive process design kit," *Microelectron. J.*, vol. 53, pp. 105–115, Jul. 2016.
- [22] D. Khalil, M. Khellah, N.-S. Kim, Y. Ismail, T. Karnik, and V. K. De, "Accurate estimation of SRAM dynamic stability," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 16, no. 12, pp. 1639–1647, Dec. 2008, doi: 10.1109/TVLSI.2008.2001941.
- [23] Y. Yong et al., "MoM capacitor variation models for FinFET era," in Proc. 14th IEEE Int. Conf. Solid-State Integr. Circuit Technol. (ICSICT), Oct. 2018, pp. 1–3.
- [24] J. Wang, S. Yaldiz, X. Li, and L. T. Pileggi, "SRAM parametric failure analysis," in *Proc. 46th Annu. Design Autom. Conf.*, 2009, pp. 496–501.
- [25] K. Katayama, S. Hagiwara, H. Tsutsui, H. Ochi, and T. Sato, "Sequential importance sampling for low-probability and high-dimensional SRAM yield analysis," in *Proc. IEEE/ACM Int. Conf. Comput.-Aided Design* (*ICCAD*), Nov. 2010, pp. 703–708.
- [26] K. J. Kuhn et al., "Process technology variation," *IEEE Trans. Electron Devices*, vol. 58, no. 8, pp. 2197–2208, Aug. 2011.
- [27] D. Ha et al., "Highly manufacturable 7 nm FinFET technology featuring EUV lithography for low power and high performance applications," in *Proc. Symp. VLSI Technol.*, Jun. 2017, pp. T68–T69, doi: 10.23919/VLSIT.2017.7998202.
- [28] S.-H. Woo, H. Kang, K. Park, and S.-O. Jung, "Offset voltage estimation model for latch-type sense amplifiers," *IET Circuits, Devices Syst.*, vol. 4, no. 6, pp. 503–513, Nov. 2010.
- [29] T. Song et al., "A 7 nm FinFET SRAM using EUV lithography with dual write-driver-assist circuitry for low-voltage applications," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2018, pp. 198–200.



**Jaehyun Park** was born in Seoul, South Korea, in 1996. He received the B.S. degree in electronics engineering from Kwangwoon University, Seoul, in 2021, where he is currently pursuing the M.S. degree in electronic engineering.

His research interests include FinFET-based highdensity and low-power static random-access memory (SRAM).



**Sangheon Lee** was born in Seoul, South Korea, in 1997. He received the B.S. degree in electronic engineering from Kwangwoon University, Seoul, in 2022, where he is currently pursuing the M.S. degree in electronics engineering.

His research interests include FinFET-based highspeed and low-power static random-access memory (SRAM).



Hanwool Jeong (Member, IEEE) was born in Seoul, South Korea, in 1987. He received the B.S. and Ph.D. degrees in electrical and electronics engineering from Yonsei University, Seoul, in 2012 and 2017, respectively.

From 2017 to 2019, he was with the Foundry Division, Samsung Electronics Company Ltd., Hwaseong, South Korea, where he was involved with the circuit design and verification of 4 nm/5 nm memory compiler. Since 2019, he has been a Professor at Kwangwoon University, Seoul. His current

research interests include memory circuit design, low-voltage/low-power digital logic, and neuromorphic/machine learning circuit design.