## **UNIVERSIDAD SAN FRANCISCO DE QUITO USFQ**

Colegio de Ciencias e Ingenierías

### **Dual Mode Logic – Single-clock-cycle binary full-comparator**

.

## **Ricardo Paul Escobar Gavilanez**

### Ingeniería Electrónica y Automatización

Trabajo de fin de carrera presentado como requisito para la obtención del título de Ingeniero en Electrónica

Quito, 22 de julio de 2020

## **UNIVERSIDAD SAN FRANCISCO DE QUITO USFQ**

Colegio de Ciencias e Ingeniería

HOJA DE CALIFICACIÓN DE TRABAJO DE FIN DE CARRERA

**Dual Mode Logic – Single-clock-cycle binary full-comparator** 

### **Ricardo Paul Escobar Gavilanez**

Nombre del profesor, Título académico Nombre del profesor, Título académico Ramiro Taco, PhD. Luis Miguel Prócel, PhD.

Quito, 22 de Julio de 2020

### **DERECHOS DE AUTOR**

Por medio del presente documento certifico que he leído todas las Políticas y Manuales de la Universidad San Francisco de Quito USFQ, incluyendo la Política de Propiedad Intelectual USFQ, y estoy de acuerdo con su contenido, por lo que los derechos de propiedad intelectual del presente trabajo quedan sujetos a lo dispuesto en esas Políticas.

Asimismo, autorizo a la USFQ para que realice la digitalización y publicación de este trabajo en el repositorio virtual, de conformidad a lo dispuesto en el Art. 144 de la Ley Orgánica de Educación Superior.

| Nombres y apellidos: | Ricardo Paul Escobar Gavilanez |  |  |  |
|----------------------|--------------------------------|--|--|--|
| Código:              | 00132631                       |  |  |  |
| Cédula de identidad: | 1721514766                     |  |  |  |
| Lugar y fecha:       | Quito, julio de 2020           |  |  |  |

# ACLARACIÓN PARA PUBLICACIÓN

**Nota:** El presente trabajo, en su totalidad o cualquiera de sus partes, no debe ser considerado como una publicación, incluso a pesar de estar disponible sin restricciones a través de un repositorio institucional. Esta declaración se alinea con las prácticas y recomendaciones presentadas por el Committee on Publication Ethics COPE descritas por Barbour et al. (2017) Discussion document on best practice for issues around theses publishing, disponible en http://bit.ly/COPETheses.

### **UNPUBLISHED DOCUMENT**

**Note:** The following capstone project is available through Universidad San Francisco de Quito USFQ institutional repository. Nonetheless, this project – in whole or in part – should not be considered a publication. This statement follows the recommendations presented by the Committee on Publication Ethics COPE described by Barbour et al. (2017) Discussion document on best practice for issues around theses publishing available on http://bit.ly/COPETheses.

#### RESUMEN

El propósito de este trabajo es diseñar y simular un comparador binario de un solo ciclo de reloj con la tecnología Dual Mode Logic (DML) partiendo de un diseño de un comparador diseñado con Dinamo Logic (DL). Se han investigado alternativas a diseños CMOS de bajo voltaje para un rendimiento óptimo, DL fue propuesto pero han surgido problemas de carga compartida, susceptibilidad a glitches, ruido y sensitividad a variaciones de proceso en tecnologías nanométricas. El diseño DML presenta tanto las ventajas de CMOS como de las compuertas dinámicas y permite diferentes modos de operación. Este trabajo se ha realizado utilizando la tecnología de 32 *nm* donde las compuertas se dimensionaron con un voltaje de fuente de 1.2 *V* en el peor caso de las señales de entrada. El objetivo es comprender el comportamiento del dispositivo para un rango de voltajes de fuente para optimizar el trade-off entre delay y energía. El comparador diseñado con compuertas DML tipo A operando en modo dinámico logra una frecuencia máxima de 6.29 *GHz* y disipación de energía de 0.69  $\mu W/MHz$ .

Palabras clave: Dual mode logic (DML), Complementary metal-oxide-semiconductor (CMOS), comparador, eficiencia energética, nodos de tecnología nanoescalada.

#### ABSTRACT

The purpose of this work is to design and simulate an energy-efficient single-clock-cycle binary full comparator with Dual Mode Logic (DML) technology based on a comparator design designed with Domino Logic (DL). Alternatives to low voltage CMOS designs have been investigated for optimal performance, DL was proposed but problems of charge sharing, susceptibility to glitches, crosstalk noise and sensitivity to process variations in nanometric technologies have emerged. The DML design presents both the advantages of CMOS and dynamic gates, and it allows different modes of operation. This work has been carried out using 32nm technology where the gates were sized with supply voltage of 1.2V in the worst-case conditions. The objective is to understand the behavior of the device for a range of supply voltages to optimize the trade-off between energy and delay. The comparator designed with type A DML gates operating in dynamic mode achieves a maximum frequency of 6.29 *GHz* and 0.69  $\mu W/MHz$  energy dissipation.

Key words: Dual mode logic (DML), Complementary metal-oxide-semiconductor (CMOS), full comparator, energy efficiency, nanoscaled technology nodes.

### TABLE OF CONTENTS

| I. Introduction                      |    |
|--------------------------------------|----|
| II. Methodology                      | 13 |
| III. VLSI implementation and results |    |
| IV. Conclusions                      |    |
| References                           | 21 |

### TABLE INDEX

| Table 1. Sizing of Gates 14 |
|-----------------------------|
|-----------------------------|

### **FIGURE INDEX**

| Figure 1. DML gate topology. (a) Type A unfooted. (b) Type B unfooted. (c) Type A foot | ed. |
|----------------------------------------------------------------------------------------|-----|
| (d) Type B footed                                                                      | 12  |
| Figure 2. Top-level architecture of the full-comparator                                | 15  |
| Figure 3. Simulated benchmark.                                                         | 18  |
| Figure 4. Simulated waveforms for the worst-case switching of the input signals        | 19  |
| Figure 5. Full-comparator E-D plot as a function of VDD.                               | 19  |

#### I. INTRODUCTION

The comparison between two n-bit numbers  $A_{[n-1:0]}$  y  $B_{[n-1:0]}$  is widely used in almost all digital systems. To perform this operation, a full-comparator can be used because this electronic device is able to recognize the possible conditions between A and B: A = B, A > Band A < B. Full-comparators have been designed in different technologies using algorithms that try to get the minimum delay while being energy efficienct. It has been proved that the parallel-prefix approach is ideal to design efficient full-comparators due to the reduced number of computational steps (Frustaci et al., 2012). This approach has been used to design comparators with dynamic logic, in specific, Domino Logic (DL). It is well known that the semiconductor industry tries to follow Moore's Law, that states that the number of transistors in an integrated circuit (IC) doubles about every two years. There is a lot of interest in developing new techniques and technologies to follow this trend and the first idea that one can have is to make transistors smaller. This has been possible thanks to research made in many fields of physics, but the manufacturing processes are getting closer to a physical limit, where we will not be able to follow this trend anymore due to different physical effects at nanoscale. There is also the need for faster and energy efficient electronic devices for different applications, the standard technology used for IC design is CMOS technology that uses complementary and symmetrical pairs of n-type and p-type MOSFETs to perform logic functions. The p-type trasistors form a network known as Pull-Up Network (PUN) that is connected to the voltage supply source (VDD). The n-type transistors form a network known as Pull-Down Network (PDN) that is connected to ground (GND). There is not much that can be done to optimize these electronic devices, therefore new topologies are developed. In this work a full-comparator design is presented with the topology Dual Mode Logic (DML) that achieves a fast response and also it is energy efficient.

DML is a topology that have two modes of operation and that can be interchanged during runtime by a control signal, static mode and dynamic mode, thanks to the addition of one or two extra transistors that are driven by the control signal (Kaizerman et al., 2013). The static mode is achieved when the control signal is kept constant, so the gate has the same behavior as a CMOS gate. This mode of operation leads to a slower device but the dissipation of energy is smaller. On the other hand, the dynamic mode is achieved when the control signal is connected to a clock signal (CLK), the behavior of the gate in this mode of operation is different than CMOS because one can differentiate between two states, pre-charge and evaluation. During the pre-charge state the transistor that corresponds to the control signal that is in parallel to either the PUN or the PDN is turned on, therefore the output of the gate just depends on the current control signal. During the evaluation state, the behavior is the same as a CMOS gate and this is when the output is being evaluated. This mode of operation leads to a faster device but it dissipates more energy compared to the static mode of operation. One can clearly notice the main trade-off that the designer have to deal with to get the best performance using this topology, delay and energy. What makes DML interesting is the possibility to combine this two modes of operation in the same circuit, improving the performance and at the same time, reducing the energy dissipation if possible. Usually, the gates along the critical path are driven in dynamic mode, while the other gates are driven in the static mode.

DML devices can be chategorized as Type A or Type B and within these two, we have footed and unfooted gates. Type A DML gates have a p-type transistor in parallel to the PUN, if the gate is unfooted, no more transistors are added, if the gate is footed, one extra transistor is added in series to the PDN connected to GND. Type B DML gates have an n-type transistor in parallel to the PDN, if the gate is unfooted (or unheaded), no more transistors are added, if the gate is footed (or headed), one extra transistor is added in series to the PUN connected to VDD. The footed DML gates allows a considerable decrease in pre-charge time, but it has negative effects on the performance (Kaizerman et al., 2013). On the other hand, unfooted DML gates have a small gate drain capacitance allowing better performance but pre-charge is longer and there has to be a way to prevent short circuits when the CLK signal activates the transistor in parallel to either the PUN or the PDN. Type A and Type B, footed and unfooted topologies are shown in the next figure:



Figure 1. DML gate topology. (a) Type A unfooted. (b) Type B unfooted. (c) Type A footed. (d) Type B footed. The main purpose of this work is to design and simulate a full-comparator using the DML topology in nanoscaled technology modes to achieve better performance and less energy dissipation. Different trade-offs will be taken into account, optimize these will also be part of the work and to do so, the behavior of the device within a range on supply voltage is analyzed. The remainder of this work is organized as follows. Section II describes the methodology and the work that has been done prior to design the full-comparator. Section III presents the full-comparator design and the results of the simulations performed in the range of supply voltages. Finally, section IV concludes this work.

#### **II. METHODOLOGY**

For this work, the simulations are carried out using the software Custom Compiler from Synopsys using the library that corresponds to 32nm technology. The supply voltage (VDD) used to size every logic gate is 1.2V. The first step to design a logical gate is to understand the physical behavior of the transistor, in this case, a MOSFET. The parameters that influence the most on the behavior of the transistor are: the threshold voltage (VTH), transistor width (W), channel length (L), oxide thickness, and body voltage (Levi & Fish, 2013). The minimum transistor width following the design rules of the 32nm technology is  $0.23\mu m$ , from this value, all the gates were sized (find the value of W for each transistor) when the combination of inputs triggers the worst-case condition. The worst-case pull-down transition happens when there is a constant input for every transistor in the PDN except for one, similarly for the worst-case pull-up transition in the PUN. The easiest gate to size is the inverter, this gate has just one transistor in the PUN and one in the PDN. It is well know that the mobility of charge carries is different for the two types of transistors that we are dealing with, we first determine the value  $\beta = \frac{W_p}{W_n}$ , where  $W_p$  is the p-type transistor width and  $W_n$  is the n-type transistor width on the inverter. Once the widths of both transistors are found, the coefficients known as " $\alpha$ -stack" have to be found in the PUN as well as in the PDN. The gates NAND and NOR are used because they have all the transistors in series either in the PDN (NAND) or in the PUN (NOR). Considering the worst-case conditions, the correct sizing is found based on the voltage transfer curve ensuring that both PUN and PDN "pulls" the same amount of current. This is the typical procedure to size transistors in CMOS technology, since DML gates will be designed too, we have to know that every transistor in the network that has the transistor controlled with the CLK signal in parallel will be sized to the minimum value, including the transistor with the CLK signal. A total of seven gates have been used to obtain every coefficient that we need for

the different logic gates. The following values are obtained for the width of the transistors for CMOS and DML gates.

| Table | 1. | Sizing | of | Gates |
|-------|----|--------|----|-------|
|-------|----|--------|----|-------|

|          |         |           | Dual Mode Logic |           |           |           |
|----------|---------|-----------|-----------------|-----------|-----------|-----------|
| Gates    | Network | CMOS [µm] | ım] Type A      |           | Type B    |           |
|          |         |           | Unfooted        | Footed    | Unfooted  | Footed    |
|          |         |           | $[\mu m]$       | $[\mu m]$ | $[\mu m]$ | $[\mu m]$ |
| Inverter | PUN     | 1.096     | 0.230           | 0.230     | 1.096     | 1.541     |
| Inverter | PDN     | 0.460     | 0.460           | 0.768     | 0.230     | 0.230     |
| NAND2    | PUN     | 1.096     | 0.230           | 0.230     | 1.069     | 1.541     |
| NAND2    | PDN     | 0.768     | 0.768           | 1.096     | 0.230     | 0.230     |
| NAND3    | PUN     | 1.096     | 0.230           | 0.230     | 1.096     | 1.541     |
| NAND3    | PDN     | 1.047     | 1.047           | 1.311     | 0.230     | 0.230     |
| NAND4    | PUN     | 1.096     | 0.230           | 0.230     | 1.096     | 1.541     |
| NAND4    | PDN     | 1.311     | 1.311           | 1.563     | 0.230     | 0.230     |
| NAND5    | PUN     | 1.096     | 0.230           | 0.230     | 1.096     | 1.541     |
| NAND5    | PDN     | 1.563     | 1.563           | -         | 0.230     | 0.230     |
| NOR2     | PUN     | 1.541     | 0.230           | 0.230     | 1.541     | 1.946     |
| NOR2     | PDN     | 0.460     | 0.460           | 0.768     | 0.230     | 0.230     |
| NOR3     | PUN     | 1.946     | 0.230           | 0.230     | 1.946     | -         |
| NOR3     | PDN     | 0.460     | 0.460           | 0.768     | 0.230     | 0.230     |

Note that the DML sizing for the NAND5 and NOR3 gates is not shown. This is because the NAND6 and NOR4 would be needed, to design a binary comparator, not more than these values of transistor widths are needed.

Once we have all the coefficients needed to size the different gates of the full-comparator, it is a good time to review the top-level architecture of the full-comparator.



Figure 2. Top-level architecture of the full-comparator.

The stages of the full-comparator in the parallel-prefix algorithm are: pre-processing, parallel recursive, and post-processing. Each symbol on figure 2 represents a set of gates that perfom a particular logic function. The first stage takes 128 bits corresponding to the two numbers that one could compare. This stage produces the grouped generate (GG) and grouped propagate (GP) signals, each symbol takes 2 bits from A and 2 bits from B producing 2 GG and GP signals. From now on, a generic *i*th processing-element (PE) will be discussed. The signals GG and GP will take their values based on the following rule:

$$GP_{[i]} = 1$$
 if  $A_{[2i+1:2i]} = B_{[2i+1:2i]}$  else  $GP_{[i]} = 0$   
 $GG_{[i]} = 1$  if  $A_{[2i+1:2i]} < B_{[2i+1:2i]}$  else  $GG_{[i]} = 0$ 

The logic expression for the signals GG and GP can be easily found by the truth table. It is worth mention that the logic expression for the GG signal will be reformulated to exploit the fact that we can use an already calculated quantity, the XNOR operation with the bits  $A_{[2i+1]}$ and  $B_{[2i+1]}$ , to save one inverter for each PE expression. Finally, it is found that:

$$GG_{[i]} = B_{[2i+1]} \cdot \left[ B_{[2i]} \cdot \left( A_{[2i]} \oplus B_{[2i]} \right) + \overline{A_{[2i+1]}} \cdot \left( A_{[2i+1]} \oplus B_{[2i+1]} \right) \right]$$
$$+ B_{[2i]} \cdot \overline{A_{[2i+1]}} \cdot \left( A_{[2i]} \oplus B_{[2i]} \right)$$
$$GP_{[i]} = \overline{A_{[2i+1]}} \oplus \overline{B_{[2i+1]}} \cdot \overline{A_{[2i]}} \oplus \overline{B_{[2i]}}$$

A total of 32 modules are needed to calculate the 32 GG and GP signals needed on the parallel recursive stage. On this new stage, the signals GGG and GGP will be produced using DOT operators that are connected in the Brent-Kung parallel-prefix fashion (Brent & Kung, 1982). The generic *i*th operator receives a subset  $GG_{[4i+3:4i]}$  and  $GP_{[4i+3:4i]}$  and produces the signals GGG and GGP based according to the following expressions:

$$GGG_{[i]} = 0 \text{ and } GGP_{[i]} = 0 \quad ifA_{[4i+3:4i]} > B_{[4i+3:4i]}$$
$$GGG_{[i]} = 1 \text{ and } GGP_{[i]} = 0 \quad ifA_{[4i+3:4i]} < B_{[4i+3:4i]}$$
$$GGG_{[i]} = 0 \text{ and } GGP_{[i]} = 1 \quad ifA_{[4i+3:4i]} = B_{[4i+3:4i]}$$

The logic equations can be expressed as:

$$GGG_{[i]} = GG_{[4i+3]} + \sum_{k=0}^{2} \left( GG_{[4i+k]} \cdot \Pi_{l=k+1}^{3} GP_{[4i+l]} \right)$$
$$GGP_{[i]} = \Pi_{l=0}^{3} GP_{[4i+l]}$$

A total of 8 modules of DOT operators are needed to calculate the 8 GGG and GGP signals needed on the post-processing stage. This final stage is composed by two levels of logic. In the first level of logic, the signals G1, G2, P1, and P2 are calculated using DOT operators.

These signals are fed into the second level of logic that calculates the output of the fullcomparator. The logic equations of the OUT signals can be expressed as:

$$OUT_{[0]} = P_1 \cdot P_2$$
$$OUT_{[1]} = G_2 + G_1 \cdot P_2$$

The parallel-prefix algorithm does not permit the condition  $OUT_{[0]} = OUT_{[1]} = 1$  (Frustaci et al., 2012). There are just three possible outputs that correspond to the comparison of the two numbers *A* and *B*. These outputs are:

$$OUT_{[0]} = 0 \text{ and } OUT_{[1]} = 0 \text{ if } A_{[63:0]} > B_{[63:0]}$$
  
 $OUT_{[0]} = 1 \text{ and } OUT_{[1]} = 0 \text{ if } A_{[63:0]} < B_{[63:0]}$   
 $OUT_{[0]} = 0 \text{ and } OUT_{[1]} = 1 \text{ if } A_{[63:0]} = B_{[63:0]}$ 

These logic equations will be implemented at transistor level and transient simulations will be run on the Custom Compiler software to check if the full-comparator is behaving as expected. The information that we want to find is the delay and the energy dissipated by the device. Two delays will be measured with respect to the CLK signal, one on the rise of the  $GGG_{[0]}$ signal and the other on the rise of  $OUT_{[0]}$  when the value of  $B_{[0]}$  changes from 0 to 1 taking A = B = 0 as an initial condition. The maximum value between these two delays multiplied by a factor of 2 will be the maximum delay of the device. To measure the energy dissipated, the calculator tool of the software will be used to calculate the average power in the worst case condition, that is when the less significant bits of A and B are changing and the device is running at maximum frequency. The energy dissipated is equal to the average power divided by the maximum frequency of operation.

#### **III. VLSI IMPLEMENTATION AND RESULTS**

A transistor-level implementation of the 64-bit comparator was realized using 32nm 1.2V DML process. The inputs are passed to type-D registers and then, the signals are feed to the full-comparator where each output signal have 4x type flip-flop as loads. Typical-typical (TT) simulations where performed at  $25^{\circ}C$  and 1.2 V supply voltage, the simulated benchmark is shown below.



Figure 3. Simulated benchmark.

The gates implemented at transistor level with the correct sizing start with the XOR XNOR that uses 10 transistors to perform the 2 operations. This is the only gate that was not designed with DML due to the specific topology that includes an inverter, a pass transistor and a feedback loop to avoid weak logic outputs (Amini-Valashani, 2018). All other gates were implemented on the dynamic mode of operation because the goal of this work is to improve the performance of the device and study its behavior for different supply voltages.



Figure 4. Simulated waveforms for the worst-case switching of the input signals.

The energy-delay (E-D) plot is shown below as the main result of this work apart from the DML implementation of the full-comparator.



Figure 5. Full-comparator E-D plot as a function of VDD.

#### **IV. CONCLUSIONS**

A high-speed parallel-prefix binary full-comparator design with the DML topology has been presented with the goal to explore its performance in the subthreshold region and improve its immunity to process variations. It is possible to operate DML gates from a supply voltage as low as 300mV (Kaizerman et al., 2013). This design achieves high computational speed, and low energy dissipation because the parallel-prefix algorithm was used. It is worth mention that the this algorithm reduces the switching activities of internal nodes and also some possible short-circuits have been prevented from previous designs, therefore the energy consumption is expected to decrease significally. Various trade-offs among delay, energy, and area had to be taken into account when the design of this comparator was first proposed. The number of transistors will increase by approximately 150% compared to the design in DL because a PUN is needed. Fortunately, all transistors in the PUN have the minimum size possible, therefore the area needed for this device will not necessarily increase by the same amount. The maximum delay of the device is 159 ps that corresponds to a frequency of 6.29 GHz, and it dissipates  $0.69 \,\mu W/MHz$  of energy with 1.2 V supply voltage. This was possible thanks to the reduction on the technology node and the DML topology. This work is significant because the semiconductor industry always tend to make smaller devices with low-power consumption, as it was mentioned before, the efforts to make DL an alternative to CMOS have been fading because of the sensitivity to process variations. DML topology could become a great alternative to low-power electronic applications in the future and in this work, it was shown that the design can be difficult but not impossible. There is a lot of work that can be done in the future, for example, operate the device in static mode, compare the performance implementing a DL fullcomparator in 32 nm, design this comparator with type B DML gates, perform Monte-Carlo simulations, and re-test the device after the parasitic elements are extracted from the layout.

#### REFERENCES

- Amini-Valashani, M., Ayat, M., & Mirzakuchaki, S. (2018). Design and analysis of a novel low-power and energy-efficient 18T hybrid full adder. Microelectronics journal, 74, 49-59.
- Amirtharajah, R. & Baas. (2011). CMOS Logic. University of California, Intel Corporation.
- Brent, R. P., & Kung, H. T. (1982). A regular layout for parallel adders. IEEE transactions on Computers, (3), 260-264.
- Frustaci, F., Perri, S., Lanuzza, M., & Corsonello, P. (2012). Energy-efficient single-clockcycle binary comparator. International Journal of Circuit Theory and Applications, 40(3), 237-246.
- Kaizerman, A., Fisher, S. & Fish, A. (2013). Subthreshold Dual Mode Logic. *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, 21 (5), 979-983.
- Levi, I., Bass, O., Kaizerman, A., Belenky, A. & Fish, A. (2012). High speed Dual Mode Logic Carry Look Ahead Adder. 2012 IEEE International Symposium on Circuits and Systems (ISCAS), 3037-3040.
- Levi, I., and Fish, A. (2013). Dual Mode Logic—Design for Energy Efficiency and High Performance. *IEEE Access*, *1*, 258-265.
- Horowitz, P. & Hill, W. (1989). *The Art of Electronics* (2nd ed.). Cambridge University Press.
- Perri, S., & Corsonello, P. (2008). Fast low-cost implementation of single-clock-cycle binary comparator. IEEE Transactions on Circuits and Systems II: Express Briefs, 55(12), 1239-1243.
- Rabaey, J., Chandrakasan, A. & Nikole, B. (2003). *Digital Integrated Circuits. A design perspective*. Prentice Hall Electronics and VLSI Series.
- Shavit, N., Stanger, I., Taco, R. & Fish, A. (2018). Process Variation-Aware Datapath Employing Dual Mode Logic. 2018 IEEE SOI-3D-Subthreshold Microelectronics Technology Unified Conference (S3S), Burlingame, CA, USA, 1-3.
- Shavit, N., Taco, R. & Fish, A. (2018). Efficiency of Dual Mode Logic in Nanoscale Technology Nodes. 2018 IEEE International Conference on the Science of Electrical Engineering in Israel (ICSEE), 1-4.
- Yuzhaninov, V., Levi, I. & Fish, A. (2015). Design Flow and Characterization Methodology for Dual Mode Logic. *IEEE Access*, *3*, 3089-3101.