# Design and Evaluation of a Four-bit Carry Propagate Adder

Eric Blattler, William Goh, Casey Morrison, and Saeed Sadrameli

Abstract—The design and evaluation of a four-bit carry propagate adder (CPA) is presented. Methods for transistor sizing, low-power design, and area-efficient layout are discussed in detail. In particular, the tradeoff between power and clock frequency is analyzed. The benefits of having two separate supply voltages, one for critical-path logic and one for non-critical-path logic, are explored in regards to the power-delay product. This metric, which is presented as a function of clock period, reveals the tradeoffs that exist between speed and power dissipation of a circuit.

*Index Terms*—Carry propagate adder, energy-delay analysis, low-power, VLSI.

## I. INTRODUCTION

THE design of low-power data paths is a well-studied topic with major implications, especially for portable or highperformance applications. Several techniques for low-power design have been proposed and evaluated by [1], including parallel data paths, pipelining, and gated clocks. This paper focuses on the use of a dual power supply to minimize the dynamic power dissipation of a four-bit carry propagate adder (CPA). This particular technique is commonly used to reduce power consumption on the non-critical paths of a circuit. While it does incur some area overhead with the addition of a third supply rail, it has a potential for marked power savings. Area-efficient and delay-efficient design techniques are also discussed culminating in an energy-delay performance analysis.

### II. DESIGN METHODOLOGY

This design explores the effect of decreasing the power supply on non-critical paths to reduce the overall dynamic power dissipation. The CPA is designed for a minimum critical path delay to allow a larger maximum frequency, while decreasing the power supply to find the optimum power dissipation.

#### A. Specifications and Requirements

It is required to evaluate the performance of the simple dualsupply data path consisting of a 4-bit CPA in terms of area, power dissipation, delay, and energy-delay product. The tradeoffs caused by decreasing the power supply are documented. Worst-case power dissipation for the entire system is measured with respect to the worst-case input. The circuit area is considered during the sizing of the transistors in each stage. The project specifications are provided in Table I.

| Parameter              | Value                         |
|------------------------|-------------------------------|
| Equivalent input load  | Minimum-size inverter (2.5/1) |
| Equivalent output load | F04 inverter                  |
| V <sub>DDH</sub>       | 2.5V                          |
| Technology             | TSMC 0.25 µm Deep Submicron   |
| Minimum channel length | 0.24µm                        |
| Temperature            | 27°C                          |
| Input signal slopes    | 100ps                         |
| Clock duty cycle       | 50%                           |



Fig. 1. Top-level architecture for a four-bit carry propagate adder with latched inputs and outputs.

#### B. High-Level Architecture

The CPA architecture consists of five blocks: input latches, adder, level converter, output latches, and clock buffers. Fig. 1 illustrates the high-level architecture of this design. Signals entering the adder need to be synchronized to provide valid outputs. This synchronization task is performed using the input latches A, B, and C<sub>in</sub>. To obtain a power-efficient design, a multiple supply voltage topology is employed. Paths with higher throughput and/or lower latency requirements are assigned as high critical paths and need to be fed by higher V<sub>DD</sub>. In this design, C<sub>out</sub> is the most critical path to which a higher supply voltage (V<sub>DDH</sub>) needs to be allocated. The rest of the adder structure is supplied by a lower voltage, V<sub>DDL</sub>.

The level converter is responsible for boosting the signals driven by  $V_{DDL}$  to a higher voltage,  $V_{DDH}$ . Latch S performs the same function as the input latches do. In addition, with the use of two opposite-edge latches on either side of the data path, this design can easily be integrated with other latched data paths to form a flip-flop-based data path. The clock signal drives high capacitive loads and needs to be buffered up. Thus, a clock buffer is designed to enhance the clock signal strength and provide the desired current to the proceeding stages.

#### C. Module-Level Design

Module-level designs are optimized for delay according to the logical effort delay model. Upon reaching an optimum delay, certain steps are taken to reduce power consumption, sometimes at the expense of delay.

1) Input/Output Latch: The input/output latch design is transparent when the clock is high and opaque when the clock is low. In minimizing the size and power consumption, the transistors in the circuit that are responsible for the feedback mechanism are sized at their minimum size with p-to-n ratios of 2.5/1. The minimum size is defined as  $4\lambda$  for the width of PMOS and NMOS transistors. The output-driving inverter is designed to drive a FO4 (14C load capacitance) with an input capacitance of 7C. The input capacitance is defined as the sum of gate capacitances seen at the input.

The inverter in Fig. 2 is designed using the equation

$$cin = \frac{g * con}{\hat{f}}$$

Where  $\hat{f} = 2$ , g = 1, and  $C_{out} = 14$ , which yields  $C_{in} = 7$ . The inverter is scaled to be twice the minimum-sized inverter.

2) Carry Propagate Adder: To determine a delay-, area-, and power-efficient design for a one-bit carry propagate adder, the structure illustrated in Fig. 3 is adopted and analyzed. Initially, the transistors are sized to provide equivalent resistance on the pull-up and pull-down networks. Multipliers,  $K_i$ , are assigned to each stage, and a logical effort analysis is carried out on two paths:  $C_{IN}$ -to- $C_{OUT}$  and  $C_{IN}$ -to-S. In a cascaded adder design such as the one implemented in this paper,  $C_{IN}$ -to- $C_{OUT}$  is the dominant sub-path within the overall worst-case path, A0-to-S3. As a result, every effort is made to reduce the delay along this path in a power-efficient manner. The equations below and the calculations in Table II represent the logical effort analysis for the  $C_{IN}$ -to- $C_{OUT}$  path.

$$G = \prod_{i} g_{i} = 2 \qquad H = \frac{C_{out-path}}{C_{in-path}} = \frac{7k_{1} + 14k_{3}}{7k_{1} + 14k_{3}} = 1 \qquad P = \sum_{i} p_{i} = 5$$
$$B = \sum_{i} b_{i} = 1 + 2\frac{k_{3}}{k_{2}} \qquad F = GHB = 2\left(1 + 2\frac{k_{3}}{k_{2}}\right)$$
$$D_{\min} = N \cdot F^{\frac{1}{N}} + P = 2\sqrt{2\left(1 + 2\frac{k_{3}}{k_{2}}\right)} + 5$$

The  $C_{IN}$ -to-S path analysis is shown in Table III and in the following equations.



Fig. 2. Simplification of inverter design to drive FO4.



Fig. 3. One-bit carry propagate adder structure. Two paths are analyzed: CIN-to-S, and CIN-to-COUT.

| CIN-TO-COUT LOGICAL EFFORT ANALYSIS |        |                                |                               |  |  |  |
|-------------------------------------|--------|--------------------------------|-------------------------------|--|--|--|
| Delay component                     | Symbol | Stage                          |                               |  |  |  |
| Delay component                     | Symbol | 1                              | 2                             |  |  |  |
| Stage logical effort                | g      | $\frac{7}{3.5} = 2$            | 1                             |  |  |  |
| Stage electrical effort             | h      | $\frac{7k_3+3.5k_2}{7k_1}$     | $\frac{7k_1 + 14k_3}{3.5k_2}$ |  |  |  |
| Stage parasitic delay               | р      | $\frac{14}{3.5} = 4$           | 1                             |  |  |  |
| Stage branch effort                 | b      | $\frac{3.5k_2 + 7k_3}{3.5k_2}$ | 1                             |  |  |  |

TABLE II CIN-TO-COUT LOGICAL EFFORT ANALYSIS

TABLE III CIN-TO-S LOGICAL EFFORT ANALYSIS

| Delay Component         | Symbol | Stage                        |                       |                      |  |  |
|-------------------------|--------|------------------------------|-----------------------|----------------------|--|--|
| Delay Component         | Symbol | 1                            | 2                     | 3                    |  |  |
| Stage logical effort    | g      | $\frac{7}{3.5} = 2$          | $\frac{7}{3.5} = 2$   | 1                    |  |  |
| Stage electrical effort | h      | $\frac{7k_3 + 3.5k_2}{7k_1}$ | $\frac{3.5k_4}{7k_3}$ | $\frac{3.5}{3.5k_4}$ |  |  |
| Stage parasitic delay   | р      | $\frac{14}{3.5} = 4$         | $\frac{14}{3.5} = 4$  | 1                    |  |  |
| Stage branch effort     | b      | $\frac{3.5k_2 + 7k_3}{7k_3}$ | 1                     | 1                    |  |  |

 TABLE IV

 One-bit Carry Propagate Adder Sizing Results

| i                           | K   | A0-to-Cout      | A0-to-S3        |  |  |
|-----------------------------|-----|-----------------|-----------------|--|--|
| $\mathbf{I}$ $\mathbf{K}_1$ |     | Avg. Delay [ps] | Avg. Delay [ps] |  |  |
| 1                           | 2   |                 |                 |  |  |
| 2                           | 3   | 653 5           | 777             |  |  |
| 3                           | 0.5 | 055.5           | ///             |  |  |
| 4                           | 1   |                 |                 |  |  |

The conclusions drawn from this analysis and supported by [2] are as follows. Delay is best improved by scaling up the  $K_1$  and  $K_2$  transistor networks. These factors appear in the denominator for both delay equations, and are not offset by factors in the numerator. Similarly, improvement in delay can be achieved by reducing the  $K_3$  transistor network. Numerous simulations have been performed using these findings as a starting point for transistor sizing. Table IV shows the final K-factors chosen for this design to achieve an acceptable delay with reasonable power consumption.

To further reduce power consumption, multiple supply voltages are used within the adder circuit. The critical path,  $C_{IN}$ -to- $C_{OUT}$ , is supplied by a high  $V_{DDH}$ . The less-critical paths, Input-to-S, are supplied by a lower  $V_{DDL}$ . All transistor bodies are connected to the highest potential,  $V_{DDH}$ . Allthough this technique results in a greater delay through the Input-to-S path, the power savings obtained make it a worthwhile design choice. The power-delay improvement identified by [3] can be achieved through the use of this dual-supply approach.

3) Level Shifter: Due to the dual-supply nature of this design, the existence of a level converter is vital. Several level shifter designs were investigated to analyze their power and delay efficiency. The circuit that exhibited the most desirable power and delay characteristics is used in this design. Transistor sizes are chosen to achieve equal resistance on the pull-up and pull-down networks. Since the level shifter is driving a minimum-sized inverter, all the transistors can be sized minimally. Transistor sizing complies with p-to-n ratio of 2.5/1.

4) Clock Buffers: The clock signal for the system drives a high capacitive load and therefore needs to be buffered. The clock buffer is designed to strengthen the clock signal. The buffer requires the use of an odd amount of inverters. Through calculations it is evident that each transistor needs to be scaled up to provide the desired drive current. This allows the signal to reach all circuits of the system without any degradation of its quality.

#### D. Physical Design

The physical design of this four-bit CPA is carried out in a structured and consistent manner, using many of the conventions suggested by [5]. The layout is partitioned according to Fig. 5. Layouts for the adder circuit and the top-level design are discussed in detail.

1) Carry Propagate Adder: Each component of the 1-bit CPA, the majority and sum circuit, has a separate layout. Since the majority circuit contains PMOS transistors as wide as  $10W_{min}$  (4.8µm), and since the adder must accommodate three supply nets (GND, V<sub>DDH</sub>, and V<sub>DDL</sub>), the cell height for all cells within this design is defined to be the height of the adder cell:  $10.08\mu$ m (84 $\lambda$ ). As shown in Fig. 4, the PMOS sources in the majority circuit (left side) are connected to V<sub>DDH</sub>, and the PMOS sources in the sum circuit (right side) are connected to V<sub>DDL</sub>.

2) Top-level Design: The individual functional units of the CPA are positioned within the layout according to Fig. 5. The physical layout of the dual-supply 4-bit CPA is illustrated in Fig. 6. The final size of the layout is measured to be  $58.50 \mu m x 45.18 \mu m$  with a total area consumption of  $2643 \mu m^2$ .

#### **III. DESIGN EVALUATION**

The CPA design is evaluated using a combination of functional tests and performance characterizations. In addition to module-level simulations, design rule checking (DRC), and layout versus schematic (LVS) checking, extensive simulations are conducted at the top level.



Fig. 4. One-bit carry propagate adder layout.

| C <sub>in</sub><br>Iatch  | A₀<br>latch             | B₀<br>latch             | Majority<br>0 | Sum 0 | LC 0 | S₀<br>latch             |
|---------------------------|-------------------------|-------------------------|---------------|-------|------|-------------------------|
| CLK<br>buffer             | A₁<br>latch             | B₁<br>latch             | Majority<br>1 | Sum 1 | LC 1 | S <sub>1</sub><br>latch |
| CLKZ<br>buffer            | A <sub>2</sub><br>latch | B <sub>2</sub><br>latch | Majority<br>2 | Sum 2 | LC 2 | S <sub>2</sub><br>latch |
| C <sub>out</sub><br>latch | A <sub>3</sub><br>latch | B <sub>3</sub><br>latch | Majority<br>3 | Sum 3 | LC 3 | S <sub>3</sub><br>latch |

Fig. 5. Functional partitioning of the layout. Inputs go to the units in green, and outputs come from the units in blue.



Fig. 6. Dual supply 4-bit carry propagate adder layout.

#### A. Functional Verification

The circuit is tested for its worst-case performance, A[0..3]=1, B[0..3]=0, and Cin=1. Fig. 7 demonstrates the following output. On the clock's falling edge, point A, the latches pass the input data to the adder. The adder's logic is combinational and thus requires an output latch to synchronize the data. The resulting output, S[0..3], transitions to logic zero and the carry-out, S[4], transitions to logic one. Since carry-out is driven by V<sub>DDH</sub>, the delay of point B is less than point C, which is driven by V<sub>DDL</sub>.





# B. Energy-Delay Analysis

The first step in analyzing the energy-delay product is to set the adder for single-supply operation, where  $V_{DDL} = V_{DDH} =$ 2.5V. The minimum clock period for single-supply is increment by 0.05ns for each trial. The simulation start and end time is calculated to display precisely 10 clock cycles of data. The purpose of simulating over 10 clock cycles is to obtain a normalized total energy for all simulations. Then, the dualsupply is simulated by decreasing  $V_{DDL}$  for each iteration until it reaches 1.2V. The minimum clock period for each trial is determined by decreasing the clock period until the system fails. When the system fails, it is said that the system is capable of operating at that minimum clock period for the given dualsupply voltage. Table V contains the data extracted from simulations. The energy-period relationship is plotted in Fig. 8.

Fig. 8 clearly shows that a dual-supply design consumes less energy on average than a single-supply system. The reason for this enhancement is that the less-critical path can be operated at a larger delay ( $V_{DDL}$  is decreased) than the critical path, even as the throughput is the same for both dual- and single-supply system.

 TABLE V

 ENERGY-DELAY DATA FOR MULTIPLE VDD VALUES

| Trial | Supply<br>voltage low<br>(V <sub>DDH</sub> = 2.5V) | Minimum<br>clock<br>period | Maximum<br>clock<br>frequency | Avg.<br>Current<br>from V <sub>DDH</sub> | Avg.<br>Current<br>from V <sub>DDL</sub> | Average<br>power<br>taken from<br>V <sub>DDH</sub> | Average<br>power<br>taken from<br>V <sub>DDL</sub> | Total<br>Average<br>Power | Total<br>Energy |
|-------|----------------------------------------------------|----------------------------|-------------------------------|------------------------------------------|------------------------------------------|----------------------------------------------------|----------------------------------------------------|---------------------------|-----------------|
|       | VDDL                                               | T <sub>CLK-Min</sub>       | F <sub>CLK-Max</sub>          | IAvg-VDDH                                | IAvg-VDDL                                | PVDDH                                              | PVDDL                                              | PAVG                      | E               |
|       | (V)                                                | (ns)                       | (MHz)                         | (μA)                                     | (μA)                                     | (mW)                                               | (mW)                                               | (mW)                      | (pJ)            |
| 1     | 2.50                                               | 1.415                      | 706.71                        | 1683.67                                  | 119.47                                   | 4.209                                              | 0.299                                              | 4.508                     | 6.379           |
| 2     | 2.50                                               | 1.470                      | 680.27                        | 1613.28                                  | 114.95                                   | 4.033                                              | 0.287                                              | 4.321                     | 6.351           |
| 3     | 2.50                                               | 1.520                      | 657.89                        | 1559.77                                  | 111.07                                   | 3.899                                              | 0.278                                              | 4.177                     | 6.349           |
| 4     | 2.50                                               | 1.570                      | 636.94                        | 1510.82                                  | 107.49                                   | 3.777                                              | 0.269                                              | 4.046                     | 6.352           |
| 5     | 2.50                                               | 1.620                      | 617.28                        | 1464.75                                  | 104.22                                   | 3.662                                              | 0.261                                              | 3.922                     | 6.354           |
| 6     | 2.50                                               | 1.670                      | 598.80                        | 1421.16                                  | 101.05                                   | 3.553                                              | 0.253                                              | 3.806                     | 6.355           |
| 7     | 2.50                                               | 1.720                      | 581.40                        | 1379.73                                  | 98.12                                    | 3.449                                              | 0.245                                              | 3.695                     | 6.355           |
| 8     | 2.50                                               | 1.770                      | 564.97                        | 1340.49                                  | 95.33                                    | 3.351                                              | 0.238                                              | 3.590                     | 6.354           |
| 9     | 2.50                                               | 1.820                      | 549.45                        | 1303.75                                  | 92.70                                    | 3.259                                              | 0.232                                              | 3.491                     | 6.354           |
| 10    | 2.50                                               | 1.870                      | 534.76                        | 1269.19                                  | 90.25                                    | 3.173                                              | 0.226                                              | 3.399                     | 6.355           |
| 1     | 2.50                                               | 1.415                      | 706.71                        | 1683.67                                  | 119.47                                   | 4.209                                              | 0.299                                              | 4.508                     | 6.379           |
| 2     | 2.40                                               | 1.422                      | 703.23                        | 1676.51                                  | 111.19                                   | 4.191                                              | 0.267                                              | 4.458                     | 6.339           |
| 3     | 2.30                                               | 1.430                      | 699.30                        | 1672.10                                  | 103.84                                   | 4.180                                              | 0.239                                              | 4.419                     | 6.319           |
| 4     | 2.20                                               | 1.455                      | 687.29                        | 1633.71                                  | 96.15                                    | 4.084                                              | 0.212                                              | 4.296                     | 6.250           |
| 5     | 2.10                                               | 1.465                      | 682.59                        | 1626.28                                  | 90.21                                    | 4.066                                              | 0.189                                              | 4.255                     | 6.234           |
| 6     | 2.00                                               | 1.485                      | 673.40                        | 1603.54                                  | 84.12                                    | 4.009                                              | 0.168                                              | 4.177                     | 6.203           |
| 7     | 1.90                                               | 1.510                      | 662.25                        | 1575.78                                  | 78.21                                    | 3.939                                              | 0.149                                              | 4.088                     | 6.173           |
| 8     | 1.80                                               | 1.545                      | 647.25                        | 1538.04                                  | 72.22                                    | 3.845                                              | 0.130                                              | 3.975                     | 6.142           |
| 9     | 1.70                                               | 1.570                      | 636.94                        | 1518.14                                  | 67.00                                    | 3.795                                              | 0.114                                              | 3.909                     | 6.138           |
| 10    | 1.60                                               | 1.605                      | 623.05                        | 1494.24                                  | 61.50                                    | 3.736                                              | 0.098                                              | 3.834                     | 6.154           |
| 11    | 1.50                                               | 1.665                      | 600.60                        | 1440.36                                  | 55.36                                    | 3.601                                              | 0.083                                              | 3.684                     | 6.134           |
| 12    | 1.40                                               | 1.750                      | 571.43                        | 1376.22                                  | 48.80                                    | 3.441                                              | 0.068                                              | 3.509                     | 6.141           |
| 13    | 1.30                                               | 1.870                      | 534.76                        | 1306.54                                  | 41.79                                    | 3.266                                              | 0.054                                              | 3.321                     | 6.210           |
| 14    | 1.20                                               | 1.400                      | 714.29                        | 1825.82                                  | 41.88                                    | 4.565                                              | 0.050                                              | 4.615                     | 6.461           |



The trends in Table V and Fig. 8 show that the minimum power-delay product is achieved for  $V_{DDL}$  in the range of  $0.6V_{DDH}$  to  $0.7V_{DDH}$ . This agrees with claims made by [4]—that optimum power reduction can be achieved on non-critical paths by using a lower supply voltage in this range.

It is interesting to note that the energy benefits of reducing the supply voltage diminished beyond a certain  $V_{DDL}$ threshold. Continually lowering  $V_{DDL}$  beyond a certain point causes the critical path delay to increase at a faster rate than the decrease in power consumption. Thus, the power-delay product (energy) actually begins to increase with successive decreases in  $V_{DDL}$ .

#### IV. CONCLUSION

This paper has demonstrated how dual power supply techniques can yield significant power consumption reduction. A higher potential is assigned to the critical path, carry-in to carry-out, and other non-critical paths are fed by a lower potential. This results in an energy efficient design as shown in Fig. 8. There is however a drawback in choosing the lower voltage potential. As the difference in  $V_{DDH}$  and  $V_{DDL}$  increases, the overall delay of the system increases as well. Thus, there is an emergence for optimal selection of  $V_{DDL}$  to achieve acceptable delay and power performances. A  $V_{DDL}$  of 1.8V results in maximum clock frequency of 647.25MHz and total energy dissipation of 6.142pJ.

#### REFERENCES

- A. P. Chandrakasan and R. W. Brodersen, "Minimizing power consumption in digital CMOS circuits," *Proceedings of the IEEE*, no. 4, April 1995, pp. 498-523.
- [2] N Weste and D. Harris, CMOS VLSI Design: A Circuits and Systems Perspective. Boston, MA: Pearson, 2005, pp. 638–647.
- [3] A. P. Chandrakasan, S. Sheng, and R. W. Brodersen, "Low-Power CMOS Digital Design," *IEEE Journal of Solid-State Circuits*, vol. 27, no. 4, April 1992, pp. 473-484.
- [4] T. Kuroda, "Low-Power, High-Speed CMOS VLSI Design," IEEE Proceedings on Computer Design, 2002.
- [5] A. Neureuther. (2006, January). Standard Cell Template Definitions. Berkley EE141 course website. [Online]. Available: http://bwrc.eecs.berkeley.edu/Classes/ICDesign/EE141\_s00/Project/STA NDARD%20CELL%20TEMPLATE%20DEFINITIONS\_.htm
- [6] V. Kursun, R. M. Secareanu, and E. G. Friedman, "Low Power CMOS Bi-Directional Voltage Converter," Proceeding of the IEEE EDS/CAS Activities in Western New York Conference, November 2001, pp. 6-7.