An Ultra Low Power and High Speed Domino for Wide Fan-In Gates

K. Karthi,
PG scholar, ME VLSI DESIGN,
Aksaya college of engineering and technology,
k.karthifire@gmail.com

Abstract— In this paper, a new domino circuit is proposed, which has a lower leakage and higher noise immunity without dramatic speed-upgradation for wide fan-in gates. The technique which is utilized in this paper is based on comparison of mirrored current of the pull-up network with its worst case leakage current. The proposed circuit technique decreases the parasitic capacitance on the dynamic node, yielding a smaller keeper for wide fan-in gates to implement fast and robust circuits. Thus, the contention current and consequently power consumption and delay are reduced. The leakage current is also decreased by exploiting the footer transistor in diode configuration, which results in increased noise immunity. Simulation results of wide fan-in gates designed using a 16-nm high-performance predictive technology model demonstrate 51% power reduction and at least 2.41× noise-immunity improvement at the same delay compared to the standard domino circuits for 64-bit OR gates.

Index Terms— Domino logic, leakage-tolerant, noise immunity, wide fan-in.

I. INTRODUCTION

The main source of noise in deep-submicron circuit is mainly due to the higher leakage current, crosstalk, supply noise and charge sharing, while noise at the input of the evaluation transistor may increases due to increased crosstalk. In domino logic scaling the supply voltage and capacitance of dynamic (pre-charge) node reduces the amount of charge stored at the dynamic node. Due to all these concurrent factors, the noise immunity of domino gate substantially decreases with technology scaling. The leakage immunity is more problematic in high fan-in domino circuits because of larger leakage due to more parallel evaluation paths. Since the leakage

Current is proportional to the fan-in domino OR gate, the noise immunity also decreases with fan-in increases (Peiravi et al. 2009). Leakage and noise immunity are major issues for the wide fan-in domino OR logic, because the evaluation transistor are all in parallel, leaking the charge from precharge node (Moradi et al. 2004). Keeper transistor upsizing is a conventional method to improve the robustness of domino circuit. A full keeper is added in precharge node to improve the robustness of the dynamic node. As the keeper transistor is up-sized the contention between the keeper transistor and the evaluation network increases in the evaluation phase this cause an increase in the evaluation delay, power consumption of the circuit and degrading the performance (David and Bhat 2008). To improve noise immunity and controlling the leakage, keeper upsizing is used as a compromise between delay and power. Therefore keeper upsizing may not be a viable solution for high leakage immunity problem in scaled domino circuit (Peiravi and Asyaei 2012). In the proposed wide fan-in domino OR logic, makes the domino circuit more robust, leakage tolerant, scalable without considerable performance degradation or power consumption increases.

II. LITREATURE REVIEW

The main goal of these circuit design technique is to improved noise immunity and circuit performance, especially for wide fan-in circuit. The vulnerability to noise increases, especially in ultra deep submicron technologies. Conventionally, the keeper transistor is added to circuit for keeping the state of the dynamic node resistant to noise and leakage. However, adding the PMOS keeper transistor degrades the performance and increases power dissipation in the circuit. Upsizing the keeper transistor is one of the ways to improve robustness. In other words, upsizing the keeper increases current contention between the keeper transistor and the evaluation network. Therefore, for high-speed applications small size keeper is desirable.
where $W$ and $L$ denote the transistor size, and $\mu_n$ and $\mu_p$ are the electron and hole mobilities, respectively. However, the traditional keeper approach is less effective in new generations of CMOS technology. Although keeper upsizing improves noise immunity, it increases current contention between the keeper transistor and the evaluation network.

2.1 FOOTERLESS DOMINO LOGIC

A domino logic circuit includes a pre-charge circuit pre-charging a first dynamic node in response to a clock signal, a first logic network determining a logic level of the first dynamic node in response to first data signals, an inverter receiving the clock signal, a discharge circuit discharging a second dynamic node in response to an output signal of the inverter, and a second logic network determining a logic level of the second dynamic node in response to at least one second data signal and an output signal of the first dynamic node.

![Fig 2: Conventional High Fan in Domino OR Gate with Footed Domino Logic (FDL)](image1)

![Fig 3: Conventional High Fan in Domino OR Gate with Footer Less Domino Logic (FLDL)](image2)

The working of FooterLess Domino Logic (FLDL) is shown in Fig.1 is similar to Footed Domino Logic (FDL) shown in Fig.2. The advantage of FDL over FLDL is more noise immune. The noise immunity is higher because of using stacking effect due to the added footer transistor at the bottom of the evaluation network. FDL is preferred for noise immune applications but its speed is lower than FLDL (Moradi and Peiravi 2005).

2.2. HIGH-SPEED DOMINO

One of the existing leakage tolerant domino circuits is High Speed Domino (HSD) logic as shown in Fig 3. At the beginning of the evaluation phase, the input delay element is low and the clock is high. PMOS transistor MP3 is ON and therefore it turns OFF the keeper transistor MP2. After a delay equal to the delay of the inverters, when clock delayed is high, if the output node is high, MN1 remains in the OFF state and keeper transistor MP2 also remains OFF. This causes PMOS transistor MP2 (keeper transistor) to be turned ON to keep the dynamic node strongly connected to $V_{DD}$.

The turned OFF keeper transistor at the beginning of the evaluation phase helps to remove the contention between the keeper and NMOS evaluation network, thus achieving less power consumption and higher performance. However, the dynamic node is floating at the beginning of the evaluation phase since the keeper is turned OFF. Therefore, if there is noise at the inputs at the onset of evaluation, the dynamic node can be discharged.
2.3. CONDITIONAL KEEPER DOMINO LOGIC

Another existing leakage tolerant domino circuit is the Conditional Keeper Domino (CKD) logic. The circuit schematic of the conditional keeper is shown in Fig 4. The circuit works as follows: at the beginning of the evaluation phase, the smaller keeper (K1) is ON for keeping the state of the dynamic node. After delay of the inverters if the dynamic node is still high, the output of the NAND gate goes low to turn ON K2 (Zhao et al. 2007). This keeper transistor is sized larger than K1 to maintain the state of the dynamic node for the rest of the evaluation period. However, the conditional keeper remains OFF if the dynamic node is discharged to the ground. CKD logic has some problems like limitations on decreasing delays of the inverters and the NAND gate for improving noise immunity. Noise immunity can be improved by upsizing delay inverters, but this significantly increases power dissipation.

2.4 LEAKAGE CURRENT REPLICA

A leakage current replica (LCR) keeper for dynamic domino gates that uses an analog current mirror to replicate the leakage current of a dynamic gate pull-down stack and thus tracks process, voltage, and temperature. The proposed keeper has an overhead of one field-effect transistor per gate plus a portion of a shared current mirror.

In this paper, we present the leakage current replica (LCR) keeper which is a circuit that addresses the short-comings of the conventional keeper and previously proposed enhancements. The LCR keeper uses a conventional analog current mirror that tracks any process corner as well as voltage and temperature. The only variation that the LCR keeper cannot track is random on-die variation, which still must be addressed using conventional margining. A single current mirror structure can be shared among more than one dynamic gates. The LCR keeper overhead is one pFET per dynamic gate plus a portion of the shared current mirror circuit. An LCR keeper to improve the scaling of dynamic gates. The LCR keeper requires an overhead of one FET per dynamic gate plus a portion of a shared replica.

2.5 CONTROLLED KEEPER BY CURRENT COMPARISON DOMINO

In Domino logic, the PMOS keeper must be upsized to increase the noise margin in the system. If the noise margin is 10% of $V_{DD}$ then the PMOS keeper width can be sized to 10% of the worst case pull down width. But upsizing of keeper transistor leads
to power consumption and the contention between the keeper transistor and the pull down network increases. These problems will be solved if the keeper transistor is off when the gate wants to pull down the dynamic node. But the voltage of the dynamic node is mainly decreased to zero in two different states: either a conduction path to the ground is formed by the input vector or the leakage current of the pull down network with OFF transistors is increased that discharges the dynamic node due to the increased temperature or the existence of several parallel (leakage) paths from the dynamic node to the ground. The keeper transistor should not turn off in the latter state. However, the current in the former state is more than the other. Therefore, the only way to distinguish between the two states is use of a reference current, which corresponds to the pull down network leakage and the temperature of the chip.

The reference current is compared with the pull down network current. If there is no conducting path from the dynamic node to the ground and the only current in the PDN is the leakage current, the keeper transistor will not turn off because the reference current is greater than the leakage current. This idea is conceptually illustrated in Fig. 9. In fact there is a race between the pull down network and the reference current. The current, which is greater than the other wins the race and turns off its keeper PMOS transistor. Transistor $M_{pre2}$ is removed to discharge node K and thus turning on the keeper transistor in the pre-charge phase. This results in improved noise immunity. Therefore, unlike circuit designs such as HS domino in which the keeper transistor is off at the beginning of the evaluation phase, the keeper transistor is on in this design.

The proposed domino circuit is shown in Fig. 10. In this circuit $M_i$ is added in series with the evaluation network such as the wide OR gate, as illustrated in this schematic.

Moreover, $M_i$ is added in a diode configuration to provide more leakage current reduction when all inputs in the OR gate are at the low level or the circuit is set in the standby mode. Addition of $M_i$ results in a reduction of the subthreshold leakage of the evaluation network due to the stacking effect [37]. The voltage drop across $M_i$ due to the leakage current decreases the subthreshold leakage in the following ways. First, it makes the gate to source voltage of the evaluation transistors negative. Second, it increases the body effect and the threshold voltage of the evaluation transistors. Third, it decreases the drain to source voltage and DIBL of the evaluation transistors. Therefore, the leakage power of the proposed circuit is decreased especially in standby mode.

Since the leakage current of the pull down network is considerably low, a minimum keeper size is sufficient. However, increasing the keeper size increases the noise immunity especially in wide OR gates with fan-in of more than 32 inputs. Moreover, increasing the ratio of $W_3/W_6$ increases the reference current resulting in increased noise immunity. However, decreasing the ratio of $W_{keeper}/W_4$ increases the speed.

The circuit of the reference leakage current consists of transistors $M_5$, $M_6$, $M_7$ and $M_8$. The transistor $M_5$ is off in active mode and will be on in standby mode to reduce standby power. The size of the mirror transistor $M_3$ is chosen based on the leakage of the pull down transistors. The mirror current must be greater than the pull down leakage and smaller than the minimum PDN discharge current with at least one input at the high logic level to ensure correct operation. Since the reference circuit is a replica circuit of the PDN, the reference current varies with temperature just like the PDN leakage current. Thus, the design is almost insensitive to temperature variations.
2.6 CCD TECHNIQUE

Since in wide fan-in gates, the capacitance of the dynamic node is large, speed is decreased dramatically. In addition, noise immunity of the gate is reduced due to many parallel leaky paths in wide gates. Although upsizing the keeper transistor can improve noise robustness, power consumption and delay are increased due to large contention. These problems would be solved if the PDN implements logical function, is separated from the keeper transistor by using a comparison stage in which the current of the pull-up network (PUN) is compared with the worst case leakage current. This idea is conceptually illustrated which utilizes the PUN instead of the PDN. In fact, there is a race between the PUN and the reference current. Transistor MK is added in series with the reference current to reduce power consumption when the voltage of the output node has fallen to ground voltage. The proposed circuit for generation of reference current for all gates is shown in Fig. 5. This circuit is similar to a replica leakage circuit proposed by [7], in which a series diode-connection transistor M6 similar to M1 is added. In fact, as shown in Fig. 5, this circuit was a replica of the worst case leakage current of the PUN to correctly track leakage current variations due to process variations. Therefore, the gate of transistor M7 is connected to VDD, and its size is derived from the sizes of pMOS transistors of the PUN in the worst case, i.e., a 16-input OR gate, and hence its width is set equal to the sum of the widths of 16 pMOS transistors of the PUN. The proposed circuit can be considered as two stages. The first stage pre-evaluation network includes the PUN and transistors MPre, MEval, and M1. The PUN, which implements the desired logic function is disconnected from dynamic node, unlike traditional dynamic logic circuits, and indirectly changes the dynamic voltage. The second stage looks like a footless domino with one input [node A as input in Fig. 5], without any charge sharing, one transistor M2 regardless of the implemented Boolean function in the PUN, and a controlled keeper consists of two transistors. Only one pull-up transistor is connected to the dynamic node instead of the n-transistor in the n-bit OR gate to reduce capacitance on the dynamic node, yielding a higher speed. The input signal of the second stage is prepared by the first stage. In the evaluation phase, thus, the dynamic power consumption consists of two parts: one part for the first stage and the other for the second stage. As we know the dynamic power consumption directly depends on the capacitance, voltage swing, and contention current on the switching node in the constant condition for frequency, power supply, and temperature. The first stage with n-input has a lower voltage swing VDD to VTHP and no contention. On the other hand, the second stage has rail-to-rail voltage swing with minimum contention. Although the proposed circuit has some area overhead, it has less dynamic power consumption compared to footless domino.

Transistor M1 is configured in diode connection, i.e., its gate and drain terminal are connected together.

In the evaluation mode, the current of the PUN transistors establishes some voltage drop across M1. This voltage will be low, if all inputs are at the high level and only leakage current exists in the PUN and mirror transistor M2. Otherwise, if at least one conductive path exists between node A and ground, for example, level of one input becomes low in the OR gate, this voltage drop is raised up, turning on mirror transistor M2 and changing the output voltage. The voltage drop across transistor M1 causes the gate-source voltage of the off transistors in the PUN to become positive, yielding an exponential reduction in sub-threshold leakage due to the phenomenon called the stacking effect.

Pre-charge Face Input signals and clock voltage are in high and low levels, respectively, [CLK = —0], [CLK = —1] in Fig. 3.1] in this phase. Therefore, the voltages of the dynamic node (Dyn) and node A have fallen to the low level by transistor MDis and raised to the high level by transistor MPre, respectively. Hence, transistors MPre, MDis, Mk1, and Mk2 are off and transistors M1, M2, and MEval are off. Also, the output voltage is raised to the high level by the output inverter.

Evaluation Phase In this phase, clock voltage is in the high level [CLK = —1], [CLK = —0] in Fig. 5] and input signals can be in the low level. Hence, transistors MPre and MDis are off, transistor M1, M2, Mk2, and MEval are on, and transistor Mk1 can become on or off depending on input voltages. Thus, two states may occur. First, all of the input signals remain high. Second, at least one input falls to the low level. In the first state, a small amount of voltage is established across transistor M1 due to the leakage current. Although this leakage current is mirrored by transistor M2, the keeper transistors of the second stage (Mk1 and Mk2) compensate this mirrored leakage current. This voltage is also equal to drain-source voltage of M1 and depends on size of
M1 and its current. Increasing the pull-up current increases the mirrored current in transistor M2, thus voltage of the dynamic node is charged to VDD, yielding discharging the voltage of the output node and turning off the main keeper transistor Mk1. By this technique the contention current between the keeper transistor and the mirror transistor is mitigated.

III. EXPERIMENTAL RESULTS

The above circuit is designed using microwind DSCH, which is used to design the schematic circuit. Thus the circuit is simulated and Verilog file is taken.

IV. CONCLUSION

The leakage current of the evaluation network of dynamic gates was dramatically increased with technology scaling, especially in wide domino gates, yielding reduced noise immunity and increased power consumption. Thus, new designs were necessary to obtain desired noise robustness in very wide fan-in circuits. Moreover, increasing the fan-in not only reduced the worst case delay, it also increased the contention between the keeper transistor and the evaluation network. A new circuit design that we called CCD was proposed in this paper. The main goal was to make the domino circuits more robust and with low leakage without significant performance degradation or increased power consumption. This was done by comparing the evaluation current of the gate with the leakage current.

V. REFERENCES