# **Design and Implementation of Low Power ALU Design**

*Mr. Arindam Baul<sup>1</sup>, Mrs. Akansha Awasthi<sup>2</sup>* E.C.E. Department, Dr. C. V. Raman University, Kargi Road Kota, Bilaspur (C.G.)

**ABSTRACT:** Today, the entire device's in electronics needs to be realized with low power and optimized Area architectures because of power consumption and Area are of main consideration along with other performance parameters. Low power consumption helps to reduce heat dissipation, increases battery life and also reliability. Arithmetic and Logic Unit (ALU) is one of the frequent and the most fundamental component in low power processor design. An Arithmetic logic unit (ALU) is a major component of the central processing unit of a computer system. It does all process related to arithmetic and logic operations that need to be done on instruction words. As the operations become more complex the ALU also become more complex, more expensive and takes up more space in the CPU hence power consumption is a major issue. In this research work the power flow analysis of ALU circuits is analyzed and optimization of ALU power supply unit.

**Keywords** — Arithmetic and Logic Unit, Clock Gating, FPGA, CMOS, VHDL, Low power, Power Delay Product, clock power, register transfer level, dynamic power, leakage power.

## I. INTRODUCTION

In the earlier days, the designers of VLSI were more interested on the area of the circuits, performance, reliability and cost was also the main consideration and power consumption was their minor consideration. Now-a-days, the power is also being given equal importance in comparison to area and speed. In today's world the demand is more functional, energy efficient and optimized power devices. With modern day computers becoming faster and faster and by extension consuming more and more power there is a drive to design new computers with lower power consumption. With advancement in technology, the number of transistor count on a single CPU has increased. Integrating these transistors for power enhancement will also have an impact on power consumption because adding more and more transistor will give rise in the heat dissipated in the device [3]. Since most of the portable devices are battery driven the power consumption of these devices must be low so the battery life improves, reliability improves etc. Because of these reasons power management has become an important design constraints for most the computationally intensive and sophisticated applications. ALU is one of the most important units in a microprocessor and it performs most of the computational operation in a CPU and hence power consumption is an important issue in an ALU. To develop low power processor, low power ALU is developed since ALU is a basic integral part of any processor. Clock power is a major component of microprocessor power mainly because the clock is fed to most of the circuit blocks in the processor, and the clock switches every cycle. Thus the total clock power is a substantial component of total microprocessor power dissipation. Most of the power dissipation is of the dynamic type which necessities the reduction in switching power dissipation. Fig 1 shows the power distribution among different units of a recent high-performance CPU. The clock is the largest power consuming component which includes clock generator, clock drivers, clock distribution tree, latches, and clock loading due to all the clocked elements. Out of these, clock loading shares bulk amount of power. The main effect of power dissipation is the heat dissipated by the device. The increase in temperature results in decrease in life time of the transistors. This affects the reliability of the devices.



Fig1: Processor Power Distribution

In the earlier designs, the designers thought that the clock signal is a clean signal and should not be disabled. But it is one of the major sources for power dissipation because clock signal is feed to most of circuit blocks in the architecture.

It leads to unnecessary dynamic power consumption. Large VLSI circuits such as processors [1] contain register files, arithmetic units and control logic. The register file is typically not accessed in each clock cycle. Similarly, in an arbitrary sequential circuit, the values of particular registers need not be updated in every clock cycle. If simple conditions that determine the inaction of particular registers can be determined, then power reduction can be obtained by gating the clocks of these registers. When these conditions are satisfied, the switching activity within the registers is reduced to negligible levels. The same method can be applied to "turn off" or "power down" arithmetic units when these units are not in use in a particular clock cycle. In this architecture a clock gating technique is used such that all the blocks are not operated at the same and a chance to reduce the power consumption of unused block. It increases energy efficiency of target design and fulfill our obligation for green computing. This technique also reduces dynamic current and junction temperature of ALU design. The functionality of proposed architecture is verified using Xilinx and power is analyzed using Xpower power analysis tool.

The arithmetic logic unit is one of the main components inside a microprocessor. It is responsible for performing arithmetic and logic operations such as addition, subtraction, multiplication, increment, and decrement, logical operations, rotate operations, shifting operations. Arithmetic and Logic Units (ALU) also contributes to one of the highest power-density locations on the processor as it is clocked at the highest speed and is kept busy most of the time. Therefore, this strongly motivates energy-efficient ALU designs that satisfy the high-performance requirements, while reducing peak and average power dissipation.

The dynamic power dissipation is being comparable with both short circuit and leakage power as technology scale down. To identify and modify the various leakages and switching of components is very essential to estimate and also the reduction of power consumption in high speed and low power applications. Latch based Clock Gated ALU is the best method to reduce the power consumption. It is involved in all levels of system architecture, logic design, block design and gates.

Now a day's power management (power density in W/mm2) is the mounting issue in all part of chip design. As a chip manufacture, at architecture stage low power techniques need to be employed from RTL to GDSII. Due to aggressive leakage currents indeed cause high power (I2R) loss in CMOS circuitry. Two main types of power dissipations occurring in CMOS circuit are: Static power is caused due to leakage current. Dynamic power is caused due to charging and discharging of capacitance or due to switching activities of circuit.

In related to the design issues some of the equations as follows

$$P_{\text{leakage}} = \Sigma \text{ cell leakage } (1)$$

Dynamic power consumed by the device, when it switches from one state to another state. Dynamic power consists of switching power, consumed while charging and discharging the loads on a device, and internal power (also referred to as Short circuit power), consumed internal to the device while it is changing state.

$$P_{dynamic} = P_{internal} + P_{wires} (2)$$

$$P_{internal} = \Sigma cell dynamic power (3)$$

$$P_{wirec} = 1/2 \times C_{I} \times V_{2} \times T_{P} (4)$$

The power consumption doesn't take place when device changes it states (also referred to as static power). When a device is both static and switching which consumes leakage power, but Generally the Main concern with leakage power is when the device is in its inactive state, as all the power consumed in this state is considered "wasted" power. Power in VLSI circuits is optimized through various clock gating styles. In this research work the power flow analysis of ALU circuits is analyzed and optimization of ALU power supply unit is done through special handing technique.

#### A. Conventional ALU:

The existing method includes a simple Arithmetic and Logic Unit design with different arithmetic and logic operations. The existing basic design consists of a conventional type of arithmetic and logic circuits that perform various arithmetic and logic operations which are required shown in Fig2. Here clock signal can be applied to the ALU directly along with the input signal.



Fig2: Circuit module without clock gating technique

When the architecture is simulated, it is found to consume more power and it is the main disadvantage of the existing systems. The Fig 3 indicates the top level diagram for the 8 bit ALU. Here a, b, sel are the three input signals and clock is fourth input as shown in Figure. Op, mop returns the result of the ALU operation. Input Sel line determines which operation is performed. The ALU generates four flags-Zero (Z), Carry (C), Sign (S), and Parity (P). In ALU, the power calculation showed in Table I when the device is operating at different frequencies, without applying clock gating.

Table I: Power consumption of ALU without clock gating.

| Frequency | Clock(W) | Logic(W) | IO(W)   | Signal(W) |
|-----------|----------|----------|---------|-----------|
| 100 MHz   | 0.01093  | 0.00321  | 0.01558 | 0.00448   |
| 500 MHz   | 0.05466  | 0.01605  | 0.07805 | 0.02242   |
| 1000MHz   | 0.10932  | 0.01605  | 0.04446 | 0.04446   |
| 10GHz     | 1.09319  | 0.08982  | 1.22533 | 0.22533   |

#### B. Clock gating:

Several techniques to reduce the power consumption have been developed, of which clock gating is predominant. It is the most popular method for power reduction of clock signals and functional units. Clock signals are synchronizing signals that provide timing references for computation and communication in synchronous digital systems The clock gating techniques [2][3][4] have been developed to avoid unnecessary power consumptions, The basic principle behind this power optimization technique is if the clock signal is gated with the enable signal, if both the signals are active then only the output will come otherwise not .So whenever the output of the particular operation is not needed, unwanted switching activities will be suppressed and hence the power reduces. So Clock-gating is a technique where the clock signal is prevented from reaching the various modules of the ALU. Here enable signal acts as the control signal which is shown in figures 3. But Clock gating does not come for free.



Fig 3: Circuit module with Clock gating technique

Clock gating is achieved by ANDing the clock signal with a control signal to form a gated clock, which is then applied to different components of the circuit. To which module the gated clock should be applied is decided based on the control signal. Hence based on the enable signal and clock, if both the signals active then only particular operation will be done and the other operations remain idle. The enable signal is controlled by the selection control in the present design. This is shown that the clock-gating technique help in reducing the power consumption of the ALU, this technique i.e clock gating help in reducing the power greatly at high frequencies. Extra logic and interconnects are required to generate the clock enabling signals, and the resulting area overhead must be considered. After implementation of clock gating technique in ALU, the power calculation shown in below table II when the device is operating at different frequencies.

| Frequency | Clock(W) | Logic(W) | IO(W)   | Signal(W) |
|-----------|----------|----------|---------|-----------|
| 100 MHz   | 0.01077  | 0.00295  | 0.01598 | 0.00397   |
| 500 MHz   | 0.05385  | 0.01476  | 0.07605 | 0.01983   |
| 1 GHz     | 0.10771  | 0.02929  | 0.15051 | 0.03943   |
| 10GHz     | 0.07707  | 0.08269  | 0.57343 | 0.21024   |

Table II: Power consumption of ALU with clock gating

#### C. ALU Temperature analysis:

There is no change on Effective TJA(C/W) when we varying the device operating frequency. It is independent of clock gating techniques. Junction Temperature is directly proportional to frequency. It is shown that the junction temperature of the ALU is reduced when applying clock gating. So it may be concluded that the clock gating technique can be suitable to the heat efficient designs. The following Table III gives the junction temperature readings of ALU with and without clock gating.

| Frequency | Temperature<br>Without Clock<br>Gating. | Junction<br>Temperature<br>With Clock<br>Gating. |
|-----------|-----------------------------------------|--------------------------------------------------|
| 100 MHz   | 55.4                                    | 54.8                                             |
| 500 MHz   | 55.7                                    | 55.0                                             |
| 1 GHz     | 56.0                                    | 55.2                                             |
| 10GHz     | 61.1                                    | 59.0                                             |

#### D. ALU device utilization summary:

The following table IV describes the device utilization summary of ALU with and without clock gating.

|                   | <b>Used Without</b> | Used With Clock | Total Available |
|-------------------|---------------------|-----------------|-----------------|
|                   | Clock Gate          | Gate            |                 |
| Slices            | 80                  | 82              | 301448          |
| Slices Flip Flops | 80                  | 82              | 301448          |
| 4 Input LUT       | 192                 | 234             | 150720          |
| Bonded IOB        | 41                  | 51              | 600             |

| Table IV | : Device | utilization | summary |
|----------|----------|-------------|---------|
|          |          |             |         |

## II. Latch Based Arithmetic And Logic Unit

Clock gating technique saves power but increases over all area of the Architecture of the ALU.

In order to obtain the optimized power and area of the ALU design a Latch based Clock Gated ALU technique. For low power consumption ALU, we need a architecture which has a efficient adder for propagation and generation block. The operation of adding will be disabled if we need to perform any logical operation. The same is the method with clock gating also. Here also we switches off the arithmetic operations when logical operation is in use and vice-versa.

At present the levels are shifted from circuit level to register level using clock gating which switches off the unused sections of the design and reduces the consumed power consumption. Internally it again contains latch based and latch free clock gating design.

#### A. Using a Latch free Clock Gating:

At the rising edge of the clock, we use a AND Gate and at the falling edge we use a OR gate as shown in the below figure



Fig 4 .Latch free clock gated design

#### Disadvantage in Latch Free clock gating:

As shown in the below figure, if the enable signal goes low before the clock pulse goes to falling edge, the gated clock pulse automatically gets terminated before its actual termination.



Fig 5 Problem in latch-free clock gated design

#### B. Overcome using Latch based Clock Gated ALU:

In order to overcome the disadvantage in latch free based clock, the enable signal must hold from the active edge to falling edge of the clock, so that gated clock remains active for the complete period



Fig 6 Latch based clock gated ALU design



Fig. 7 Arithmetic and logic unit

## **III FUNCTIONING OF ALU**

#### A. Clear Function:

It resets the output of ALU .Instead of using a demultiplexed signal, use a clock gate then we can reduce 93.75% power consumption



#### **B.** Save Operand Register Value in ALU:

This will save the final operand destination value in ALU output section. This can be done by resetting all the other 15 sub modules. Hence 93.75% power can be reduced.



Fig. 9 Save Operand value

## C. Invert Operand Register Value in ALU:

ALU out= ~A; Pass complemented value of A to ALU output. Hence reduce 93.75% power reduction.



Fig. 10 Invert Operand Register

## D. Hold Data Bus Value:

ALU out=A; Pass value of A to ALU output.



Fig. 11 Hold Data bus value

#### **E.** Decrement Data Bus Value: ALU out=A-1;



Fig. 12 Decrement Data Bus Value

**F. Increment Data Bus Value:** ALU out=A+1;



Fig. 13 Increment Data Value

**G.** Addition Operation in ALU: ALU out=A+B;



Fig. 14 Addition





Fig. 15 subtraction

#### I. Logical AND Operation:

ALU out=A&B; Calculate Logical A & B and pass that value to ALU out In Clock Gating, we turn off the 15 functional units as shown in Fig.18. Hence reduce 93.75% power reduction



Fig. 16 Logical AND

Similarly remaining operations can be done to reduce power consumption for 93.75%

#### V. **RTL Technology Schematic:**

While synthesising in RTL circuit level, the schematic obtained is as shown.



#### VI. RESULTS

Fig.17 RTL Schematic

A low power ALU is designed in the platform Xilinx ISE 14.2 and synthesized on Spartan-6 FPGA.

> Power

Power dissipated to drive the input of the flip flop is due to switching power, short-circuit and leakage power. [11].

Power=  $P_{switching} + P_{short circuit} + P_{leakage}$  (5)

Switching Activity Factor: a

If the signal is a clock,  $\alpha = 1$  then If the signal switches once per cycle,  $\alpha = \frac{1}{2}$  besides For Dynamic gates: switch is either 0 or 2 times per cycle,  $\alpha = \frac{1}{2}$  and for the Static gates: depending on design, but typically  $\alpha = 0.1$ P<sub>s</sub>

$$_{\text{witching}} = a^* f^* C_{\text{eff}} * V_{\text{dd}}^2 (6)$$

Short-circuit power occurred when there is a transition between VDD and GND occurs

$$P_{\text{short circuit}} = I_{\text{sc}} * V_{\text{dd}} * f(7)$$

 $P_{\text{leakage}} = f(V_{\text{dd}}, V_{\text{th}}, W/L)$  (8)

## ALU Power affected by Clock Frequency:

Power is directly proportional to frequency.

| frequency | Clock Power | Logic Power | Signal Power | IOs Power |
|-----------|-------------|-------------|--------------|-----------|
| 100GHz    | 1679mW      | 153mW       | 802mW        | 410mW     |
| 1000GHz   | 16795mW     | 1198mW      | 7983mW       | 4099mW    |

Table V: Power and clock frequencies

In next phase using clock gating, we turn off rest 15 modules when any module is in execution then theoretical assumption is 93.75% power reduction. Table VI shows 88.23% clock power reduction using latch based clock gating.

| ſ | Latch Based Clock  | Total Power | Dynamic Power | Clock Power |
|---|--------------------|-------------|---------------|-------------|
|   | Gating             |             |               |             |
|   | Without Clock Gate | 94mW        | 41mW          | 17mW        |
| ſ | With Clock Gate    | 77mW        | 25mW          | 2mW         |

Table VI: Latch based clock gating

Table VII shows 70.58% clock power reduction using latch free clock gating.

|                        |             | 0 0           |             |  |
|------------------------|-------------|---------------|-------------|--|
| Latch Free Based Clock | Total Power | Dynamic Power | Clock Power |  |
| Gating                 |             |               |             |  |
| Without Clock Gate     | 94mW        | 41mW          | 17mW        |  |
| With Clock Gate        | 80mW        | 28mW          | 5mW         |  |

Table VII: Latch free based clock gating

### VII. CONCLUSION

Thus the power consumption is optimized in the modified design using the Latch based clock gated ALU technique and it is found to be efficient. With the conventional type of arithmetic and logic unit that executes all the operations at the same time, the power dissipation gets uncontrolled. Power consumption has reduced from circuit level to Register level The Register Transfer Level approach is always important because hardware designers generally verify power only at the gate level and any changes to the Register Transfer Level needs many design repetition to reduce power. Function has one dedicated module. When one instruction executes in their respective module, others module that was not used by current executing instruction must gated off by the clock gate. From given formula,

Power Reduction  $\% = \frac{\text{Number of Unit Gated}}{\text{Total Number of Unit}} * 100$  (9)

#### VIII. FUTURE SCOPE

Using Clock gating we can reduce the dynamic power consumed. We must be able to reduce leakage power. Latest FPGA techniques are based on 28 nm technology which contributes certain leakage power. So, there is need to reduce this leakage dynamic power along with dynamic power.

#### REFERENCES

[1]. J.P.Oliver, J. Curto, D. Bouvier, M. Ramos, and E. Boemo, "Clock gating and clock enable for FPGA power reduction", in Proc. 8th Southern Conference on Programmable Logic (SPL), pp. 1-5, 2012.

[2]. Jagrit Kathuria, M.Ayoubkhan & Arti Noor Centre for Development of J. Shinde and S. S. Salankar, "Clock gating-A power optimizing technique for VLSI circuits", in Proc. Annual IEEE India Conference (INDICON), pp. 1-4.

[3]. Advanced Computing, NOIDA, India. Review of Clock gating technique MIT International journal of Electronics and Communication Engineering, Vol.1 No.2 Aug 2011 pp106-114 ISSN 2230-7672 @ MIT Publication.

[4]. J. Castro, P. Parra and A. J. Acosta, "Optimization of clock-gating structures for low-leakage high-performance applications" in Proceedings of IEEE International Symposium on Efficient Embedded Computing, pp. 3220-3223, 2010.

[5]. V. Khorasani, B. V. Vahdat, and M. Mortazavi, "Design and implementation of floating point ALU on a FPGA processor," IEEE International Conference on Computing, Electronics and Electrical Technologies (ICCEET), pp. 772-776, 2012.

[6]. S.Cisneros, J. J. Panduro, J. Muro, and E. Boemo, "Rapid prototyping of a self-timed ALU with FPGAs," in Proc. International Conference on Reconfigurable Computing and FPGAs, pp. 26-33, 2012.

[7]. B.S. Ryu, J. S. Yi, K. Y. Lee, and T. W. Cho, "A design of low power 16-bit ALU," in Proceedings of the IEEE TENCON Conference, pp.868-871, 1999.

[8]. M.Afghahi and J. Yuan, "Double edgetriggered D-flip-flops for high-speed circuits," IEEE J. Solid-State Circuits, vol.26, no.8, pp.1168-1170, Aug. 1991.

[9]. A.Gago, R. Escano and J. A. Hidalgo, "Reduced implementation of D-type DET flipflops", IEEE J. Solid-State Circuits, vol.28, no.3, pp.400-402, Mar. 1993.

[10]. A R.Hossain, L. D. Wronski and A. Albicki, "Low Power Design using Edge Triggered Flip-flops", IEEE Trans. Systems VLSI S. Solid-State Circuits, Vol.2, no.2, pp.261-265, June 1994.

[11]. N.H.E.Weste and K. Eshraghian, "Principles of CMOS VLSI Design", A System Perspective Reading MA: Addison-Wesley), 1993.

[12]. S. M. Kang, "Accurate simulation of power dissipation in VLSI Systems", IEEE J. Solid state Circuits, Vol.21, no.5, pp.889-891 oct.1986 [1].

[13]. V. Khorasani, B. V. Vahdat, and M. Mortazavi, "Design and implementation of floating point ALU on a FPGA processor," IEEE International Conference on Computing, Electronics and Electrical Technologies (ICCEET), pp. 772-776, 2012.

[14]. J. P. Oliver, J. Curto, D. Bouvier, M. Ramos, and E. Boemo, "Clock gating and clock enable for FPGA power reduction," in Proc. 8thSouthern Conference on Programmable Logic (SPL), pp. 1-5, 2012.

[15]. J. Shinde and S. S. Salankar, "Clock gating-A power optimizing technique for VLSI circuits," in Proc. Annual IEEE India Conference

[16]. Computer system organization and architecture by William

[17]. S. Cisneros, J. J. Panduro, J. Muro, and E. Boemo, "Rapid prototyping of a self-timed ALU with FPGAs," in *Proc. International Conference on Reconfigurable Computing and FPGAs*, pp. 26-33, 2012.

[18]. B. S. Ryu, J. S. Yi, K. Y. Lee, and T. W. Cho, "A design of low power 16-bit ALU," in *Proceedings of the IEEE TENCON Conference*, pp.868-871, 1999.

[19]. BillMoyer, "Low Power Design for Embedded Processor", Proc. of IEEE, Vol.89, No.11, 2001, pp.1576-1586.

[20]. Priya singh, Ravi goel, 'clock gating: a comprehensive power optimization technique for sequential techniques' IJARCST, Vol 2, issue 2, apriljune 2014.

[21]. Bishwajeet Pandey, Jyotsana Yadav, M Pattanaik, Nitish Rajoria''Clock Gating Based Energy Efficient ALU Design and Implementation on FPGA'' IEEE 2013.

[22]. Jagrit Kthuria, M.Ayoub Khan, Arti Noor, 'A Review of clock gating techniques' MIT international journal of Electronics and Communication, Vol2.No.2Aug2011.

[23]. R.Ramya,S.Sudha''LUT Optimization Using Combined APC-OMS Technique For Memory-Based Computation''International Journal of Computer Applications in Engineering Sciences,ISSN: 2231-4946Volume III, Special Issue, August 2013.

[24]. Indrajit Shankar Acharya & Ruhan Bevi" LUT Optimization for Memory Based Computation using Modified OMS Technique"International Journal of Advanced Electrical and Electronics Engineering, (IJAEEE) ISSN(Print): 2278-8948, Volume-2, Issue-5, 2013.

[25]. Dr.M.Kamaraju and G.China venkateswara rao''Low power reduced instruction set architecture using clock gating technique''International Journal of VLSI design & Communication Systems (VLSICS) Vol.4, No.5, October 2013.

[26]. M.Morris Mano''Register transfer and Micro Operation''in Computer system architecture.pearson 3rd Edition.