# Low power Canonical Signed Digit Multiplier using Spurious Power Suppression Technique Adder 

Sruthin Balachandran V V ${ }^{1}$, Raghavendra Havaldar ${ }^{2}$<br>${ }^{1,2}$ Assistant Professor, Department of Electronic and Communication, AJ Institute of Engineering and Technology, Mangaluru 575006, Karnataka, India


#### Abstract

The critical parameter to be considered in designing of integrated chips for smart handheld devices the power utilization in order to extend the battery lifetime so that device can be used for longer period. Due to the exponential growth in the development of wireless technology and in electronic devicessuch as smart phones, smart TV etc- Digital signal processing applications have found to be used in these kinds of environments. But since DSP processing uses much complex algorithm for some applications, processing of it consumes more power. Hence low power consumption techniques are required for designing the DSP applications in Very large scale integrated circuits (VLSI). There are different techniques which are developed for reducing the power consumption, but have less effect in dynamic power consumption which governs the total power dissipation. This paper aims in designing a low power multiplier by making use of spurious power suppression technique (SPST). In this method, the arithmetic unit is separated into most significant part and least significant part, such that the MSP is switched off when it doesn't affect the computation results, thereby reducing the dynamic power so that overall total power consumption of VLSI combinational circuit will be reduced. Also one more technique that is used in the proposed system that takes advantage of one of the characteristics is the Canonical signed digit recoding technique.


The proposed system is designed in Cadence software and the results obtained for 32 bit SPST adder shows significant reduction of $35.8 \%$ in power consumption and overall power consumption of proposed system is 0.561 mW . Further the proposed system was used in power and area efficient 256 point FFT architecture, the results obtained showed reduction of $\mathbf{8 6 . 6 \%}$ in power consumption.
This project can be implemented for real time application such as orthogonal frequency division multiplexing systems.

## I. INTRODUCTION

TThe multiplier of DSP architecture plays a vital role in data processing. In every multiplier, the role of adders is to add the partial products obtained during multiplication process, and the number of adders depends on the number of partial products in a multiplier, which results in increased power consumption. As the technology is advancing, there is a requirement of multipliers with the reduced power consumption. Hence low power techniques are required to minimize the number of partial products and also to add the partial products efficiently.

Some of the low power techniques which exist that reduces the dynamic power consumption are explained in [1]-[7]. The design in [1] describes the concept of partially guarded computation technique in which the arithmetic unit is separated into two parts i.e. MSP and LSP and switch off the non-effective part to lower down the power consumption. Experimental results show that by using the PGC technique the power consumption in array multiplier is reduced to 10 to $44 \%$. The design in [2] explains the concept of low power adder in which the identification of effective dynamic range is checked and then only that part of functional unit addition operation is performed. Then the sum obtained is scaled to match the original length. The simulation result shows that the low power adder does the computation more efficiently than the conventional adders. The design in [3] proposes the design of low power multiplier that reduces the switching activity of the partial products by using the booth algorithm (Radix 4). The results show less power consumption with increased delay and area. In the paper [4] explains the technique to minimize the glitch power. This is done by replacing the existing gates into gates having the control input. The experimental results show reduction in glitch power of $14 \%$, therefore the total power reduction of $6.3 \%$ with $2.8 \%$ increased area. A double switch circuit switch off scheme is described in [5] which reduce the settling time in order to decrease the minimum power downtime after reactivation and thereby achieving minimum power dissipation. Results in this scheme show reduction of power consumption of power by $55 \%$.

The proposed system uses (a) Spurious Power Suppression Technique and (b) Canonical signed digit. The SPST separates the two N bit binary number into most significant part (MSP) and least significant part (LSP). The MSP computation is performed only when it affects the computational results otherwise MSP results to be obtained will be computed by logic circuit present in the SPST adder and signals from this circuit will compensate the MSP results [6].
The partial products generated during multiplication process depends on the number of non-zero digits, by using the Canonical Signed Digit technique the nonzero digits in a number can reduced thereby decreasing the number of partial products [7].

## II. DESIGN

## A. Block Diagram



Figure 1: Block diagram of Spurious Power Suppression enabled Canonical Signed Digit Multiplier.
Figure 1 consists of two 16 bits inputs $A$ and $B$. Input $A$ is given to the partial product candidate generator block which will generate three candidates of partial products they are $\{-\mathrm{A}$, $0, A\}$ which are of 32 bits. These partial products are selected based on the sign and magnitude values from the CSD recorder output. The CSD recoding block will recode the 16 bit input B and generates 17 bit magnitude values and 17 bit sign values. These sign and magnitude values will decide which one of the candidates will be selected for the next operation. The seventeen partial products generated based on CSD values are given to the selection of the partial products block where except the first partial product, the rest of 16 partial products will be given to the shifting block where these partial products are shifted left, again out of seventeen partial products, particular partial products are selected based on the nonzero magnitude values from the CSD block. Therefore out of seventeen partial products, only nine partial products are selected. Except the ninth partial product, rest of eight partial products where consecutive two partial products are given to the SPST adder and the sum obtained from the four SPST adders are given to the conventional adder. Again the outputs from the two adders are given to the conventional adder from which final result is obtained.

## B. SPST Adder



Figure 2: Examples showing the spurious transitions
Figure 2 shows the illustration of cause for the spurious signals transitions. In the first and second case it can be observed that while adding the two operands whether there is a carry from LSP or not the MSP result is not changed. From
the sum obtained in both cases, MSP results can be predicted. So computation of only the MSP of the two operands can be avoided and that will result in reducing the switching activities in those parts thereby decreasing the power consumption in the adder stage and also reducing the glitching noises. Seeing from this analysis, an SPST adder is designed which will separate the adder into two parts and freezes the input data to the MSP if they are not affecting the final sum.

To know if the MSP results are affecting the computation results, a detection logic circuit is designed to find out the effective range of input. This logic circuit is designed using the Boolean expression which is shown below:

$$
\begin{align*}
& A M S P=\mathrm{A}[31: 16] B M S P=\mathrm{B}[31: 16]  \tag{1}\\
& \text { Aand }=A[31] \times A[30] \times \ldots \times A[16]  \tag{2}\\
& \text { Band }=B[31] \times B[30] \times \ldots \times B[16]  \tag{3}\\
& \text { ANOR }=\{A[31]+A[30]+\cdots+A[16]\}^{\prime}  \tag{4}\\
& \text { BNOR }=\{B[31]+B[30]+\cdots+B[16]\}^{\prime} \tag{5}
\end{align*}
$$

Where $\mathrm{A}[\mathrm{m}]$ is the mth bit of the operand A and $\mathrm{B}[\mathrm{n}]$ is the nth bit of the operand B. and are the MSP part of the input A and B . when the bits of and are all zeros, then the value of and are one. When the bits of and are all ones, the value of and are one. There will be three output signals generated by the detection logic unit i.e. close, carrctrl and sign. The close value will decide whether to disable the MSP part or not. If the close is zero then the MSP part is closed in order to save the power consumption. By doing this, the zero inputs are fed to the MSP part, so that the switching activities in this part can be reduced thereby achieving zero dynamic power consumption. The MSP result to be obtained will be computed in the detection logic unit and MSP bits are compensated by the sign and carrctrl signals. The Boolean expression for the sign and carretrl signals is derived from the Karnaugh map shown in the Figure 4 and 5.

The detection logic circuit is shown in the figure below


Figure 3. Detection logic circuit design

Using the eight possible combinations of the input A and $B$ for which the sign, carrctrl, close, $\mathrm{A}_{\text {and }}, \mathrm{B}_{\text {and }}, \mathrm{A}_{\text {nor }}$ and $\mathrm{B}_{\text {nor }}$ which is shown in the table 3.1.
Table 1: Computation of sign, carrctrl and close for eight combinations of inputs A and B

| $\mathrm{A}_{\text {MSP }}$ | $\mathrm{B}_{\text {MSP }}$ | $\mathrm{C}_{\text {LSP }}$ | Close | CarrctrI | Sign |
| :---: | :---: | :---: | :---: | :---: | :---: |
| 0000000000000000 | 0000000000000000 | 0 | 0 | 0 | 000000000000000 |
| 0000000000000000 | 0000000000000000 | 1 | 0 | 1 | 000000000000000 |
| 0000000000000000 | 1111111111111111 | 0 | 0 | 1 | 111111111111111 |
| 0000000000000000 | 111111111111111 | 1 | 0 | 0 | 000000000000000 |
| 1111111111111111 | 0000000000000000 | 0 | 0 | 1 | 111111111111111 |
| 111111111111111 | 0000000000000000 | 1 | 0 | 0 | 000000000000000 |
| 111111111111111 | 1111111111111111 | 0 | 0 | 0 | 11111111111111 |
| 111111111111111 | 111111111111111 | 1 | 0 | 1 | 11111111111111 |


| Carretrl |  | $C_{L S P}, A_{\text {AND }}, A_{\text {NOR }}$ |  |  |  |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  |  | 000 | 001 | 011 | 010 | 100 | 101 | 111 | 110 |
| $B_{\text {AND }}, B_{\text {NOR }}$ | 00 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
|  | 01 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |
|  | 11 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
|  | 10 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 |

Figure 4. Karnaugh map for Carrctrl expression

| Sign |  | $C_{\text {LSP }}, A_{\text {AND }}, A_{\text {NOR }}$ |  |  |  |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  |  | 000 | 001 | 011 | 010 | 100 | 101 | 111 | 110 |
| $B_{\text {AND }}, B_{\text {NOR }}$ | 00 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
|  | 01 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
|  | 11 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
|  | 10 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 |

Figure 5. Karnaugh map for Sign expression
The expression of carrctrl and sign derived from the Karnaugh map is given below;

$$
\begin{align*}
& \text { Carrotrl }=\left(\overline{C_{\text {LSP }}} \times \overline{A_{\text {AND }}} \times A_{\text {NOR }} \times B_{\text {AND }} \times \overline{B_{\text {NOR }}}\right)+\left(\overline{\bar{C}_{\text {LSP }}} \times A_{\text {AND }} \times \overline{A_{\text {NOR }}} \times \overline{B_{A N D}} \times\right. \\
& \left.B_{\text {NOR }}\right)+\left(C_{L S P} \times \overline{A_{A N D}} \times A_{\text {NOR }} \times \overline{B_{A N D}} \times B_{\text {NOR }}\right)+\left(C_{\text {LSP }} \times A_{\text {AND }} \times \overline{A_{\text {NOR }}} \times B_{\text {AND }}\right. \\
& \times \overline{B_{\text {NOR }}} \text { ) }  \tag{7}\\
& \text { Sign }=\overline{c_{L S P}} \times\left(\overline{A_{\text {AND }}} \times A_{\text {NOR }} \times B_{\text {AND }} \times \overline{B_{\text {NOR }}}+A_{\text {AND }} \times \overline{A_{\text {NOR }}} \times \overline{B_{\text {AND }}} \times B_{\text {NOR }}+A_{\text {AND }}\right. \\
& \left.\times \overline{A_{\text {NOR }}} \times B_{\text {AND }} \times \overline{B_{\text {NOR }}}\right)+\left(C_{\text {LSP }} \times A_{A N D} \times \overline{A_{\text {NOR }}} \times B_{\text {AND }} \times \overline{B_{\text {NOR }}}\right) \tag{8}
\end{align*}
$$

Figure 6. Shows the 32 bit SPST adder design. In this design the two 32 bit inputs A and B is divided into the MSP and LSP. The LSP computation is done separately by the LSP adder. For the MSP part, Latches are used to control the inputs to the MSP adder that is designed using the AND gates. If the MSP computation is required then the latches allows the two MSP inputs to the adder or else the latches will freeze the MSP inputs allowing the zero inputs to be given to MSP adder. These MSP inputs are also given to the detection logic circuit, where it decides whether to turn MSP on or off. If the computation of MSP is needed, then detection logic circuit
will enable the latches that allow the MSP inputs to be given to the MSP adder and from which MSP computation is performed. If the computation of MSP is not necessary then detection logic circuit will disable the latch and zero inputs will be given to the MSP adder and the MSP sum to be obtained will be compensated by the sign extension circuit. The inputs to the sign extension circuit are three signals from the detection logic circuit.


Figure 6: SPST adder

## C. CSD recoding Block

Canonical signed digit is one of the form of number representation which is used for arithmetic operations. It is also called as a recoding technique that will recode the number to form a unique number having minimum non-zero digits. By using this technique, the average number of nonzero digits can never exceed $n / 2$. One of the important characteristics of the canonical signed digit is that there cannot be adjacent nonzero digits, so this makes the CSD representation unique. Taking the advantage of this characteristic, the CSD recoding circuit is designed that takes three input bits and converts it into single CSD digit shown in the figure 7. This converter will recode three binary bits i.e. $b_{i+1}, b_{i}$ and $b_{i-1}$ into a single CSD digit $x_{i}$ which is represented in terms of magnitude bit $X_{i, m}$ and sign bit $X_{i, s}$.


Figure 7: Binary to CSD conversion block
In the sign-magnitude encoding 0 is represented as 00,1 is represented as 01 and -1 is represented as 11 . Also two bypass
signals are used i.e. $p_{i}$ and $p_{i+1}$ where $p_{i}$ is the input bypass signal and $p_{i+1}$ is the output bypass signal from the converter block is generated. The value of output bypass signal $p_{i+1}$ is nothing but the same value of magnitude bit $\mathrm{x}_{\mathrm{i}, \mathrm{m}}$.
The magnitude bit depends on value of $b_{i}$ and $b_{i-1}$ and sign bit depends on $b_{i+1}$ and $x_{i, m}$. When the $p_{i}=1$, irrespective of the inputs all the outputs are made zeros, so in this case the inputs to the converter is ignored or bypassed, thereby setting the next bypass signal $p_{i+1}$ as zero which is generated for the next process [8].

Table 2: Binary to CSD conversion truth table

| $\mathbf{P}_{\mathbf{i}}$ | $\mathbf{b}_{\mathbf{i + 1}}$ | $\mathbf{b}_{\mathbf{i}}$ | $\mathbf{b}_{\mathbf{i} \mathbf{- 1}}$ | $\mathbf{x}_{\mathbf{i}}$ | $\mathbf{x}_{\mathbf{i}, \mathbf{s}}$ | $\mathbf{x}_{\mathbf{i} \mathbf{m}}$ | $\mathbf{P}_{\mathbf{i}+\mathbf{1}}$ |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
|  | 0 | 0 | 1 | 1 | 0 | 1 | 1 |
|  | 0 | 1 | 0 | 1 | 0 | 1 | 1 |
|  | 0 | 1 | 1 | 0 | 0 | 0 | 0 |
|  | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
|  | 1 | 0 | 1 | -1 | 1 | 1 | 1 |
|  | 1 | 1 | 0 | -1 | 1 | 1 | 1 |
|  | 1 | 1 | 1 | 0 | 0 | 0 | 0 |
| 1 | d | d | d | 0 | 0 | 0 | 0 |

The 16 bit CSD recoding block is shown in the figure This circuit generates the 17 bit magnitude values and 17 bit sign values of the 17 bit CSD representation of input. The recoding block uses single CSD digit binary to CSD conversion circuit which generates the sign bit and magnitude bit.


Figure 8: 16 bit CSD recoding circuit

## D. Power and area efficient 256 FFT architecture

Fast Fourier Transform is one of the Digital Signal Processing techniques that are used to convert a signal from time domain to frequency domain and vice versa. Using this technique, power and area efficient 256 point FFT architecture is designed shown in the figure 9 [9]. For multiplying the twiddle factors, SPST enabled CSD multiplier is used inside the complex multiplier block.

The truth table for binary to CSD conversion is shown in the table 2. It can be observed that when $p_{i}=0$, from the three binary bits $b_{i+1}, b_{i}$ and $b_{i-1}$ single CSD digit $x_{i}$ and the new bypass signal for the next process


Figure 9: A 256 point FFT architecture

## III. RESULTS

The SPST based CSD multiplier is coded in Verilog in cadence software. It is implemented using the 90 nm technology. Figure 4.5 and 4.6 shows the RTL design and the output waveform obtained from the SPST adder. From the output waveform it can be seen that although the output for the LSP is present in negative cycle but the computation of MSP is not done during that cycle instead MSP operation is performed during only the positive edge of the clock. Hence during positive edge of the clock, the final sum is available.


Figure 10: RTL schematic of the SPST adder


Figure 11: Output waveform of the SPST adder
The performance parameters of the SPST Adder are shown in table 3.

Table 3: Performance parameters of the SPST adder

| Performance parameters | SPST adder |
| :--- | :--- |
| Area $\left(\mu \mathrm{m}^{2}\right)$ | 2624 |
| Power (nW) | 20033.024 |
| Delay (ps) | 6867 |

The dynamic power results of MSP section of SPST adder and ripple carry adder is shown in table 4 . From the table 4.3 it can be seen that there is a significant reduction in the MSP adder dynamic power consumption. This is because the two non-effective input computation is not added by the MSP adder, instead whatever the sum result to be obtained will be compensated by the detection logic unit of the SPST adder. Hence unwanted switching activity in the MSP is reduced; therefore the dynamic power consumption is decreased.

Table 4: Comparison of dynamic power results of SPST adder and ripple carry adder

| Dynamic Power Results |  |  |  |
| :--- | :--- | :--- | :--- |
|  | Total dynamic <br> power (nW) | Dynamic power of <br> LSP section of the <br> adder (nW) | Dynamic power <br> of MSP section <br> of the adder <br> (nW) |
| SPST Adder | 16731.97 | 10796.795 | 700.321 |
| Carry Ripple <br> Adder | 27215.149 | 10796.795 | 13495.817 |

The output waveform obtained for the signed multiplication for all the signed input combinations is shown in figure 12.


Figure 12: Output waveform for signed multiplication
The performance parameters of the SPST based CSD multiplier is shown in the table 5.

Table 5: Performance parameters of the SPST based CSD multiplier

| Performance <br> parameters | SPST enabled CSD <br> multiplier |
| :--- | :--- |
| Power (nW) | 561606.766 |
| Area ( $\mu \mathrm{m}^{2}$ ) | 10879 |
| Delay (ps) | 12277 |

In the Power and area efficient 256 point FFT architecture, the SPST enabled CSD multiplier is used in the place of Baugh wooley multiplier. The result obtained was compared with the application using the Baugh wooley used shown in the table 4.6. It is observed that there is a reduction in power consumption and area using the implemented design.

Table 6: Comparison of Power and Area of modified system for signed multiplication

| COMPARISON OF POWER AND AREA OF SIGNED MULTIPLIER RESULTS |  |  |  |
| :---: | :---: | :---: | :---: |
| Power and Area <br> Efficient 256 FFT <br> Architecture | Using Baugh <br> Wooley <br> Multiplier | Using SPST enabled CSD Multiplier | Percentage reduction in number of cells, area and power |
| Number of cells | 51990 | 10994 | 78.8\% |
| Power (mW) | 23.228 | 3.1 | 86.6\% |
| Area ( $\mu{ }^{2}$ ) | 367019 | 70070 | 80.9\% |

## IV. CONCLUSIONS

The SPST adder results showed a significant decrease in the value of dynamic power consumption. A reduction of $38.5 \%$ with respective to carry ripple adder is obtained for an input combination containing $50 \%$ data having dynamic range of 16 bits out of 32 bits ( $50 \%$ of input combination require enabling of MSP adder). The power consumption of proposed
multiplier is 0.561 mW . Thereafter the proposed system was used in the application named power and area efficient 256 point FFT architecture; there was a reduction of $86.6 \%$ in total power consumption with respective to same application using Baugh wooley multiplier

## REFERENCES

[1]. J. Choi, J. Jeon, and K. Choi, "Power minimization of functional units by partially guarded computation," in Proc. IEEE Int. Symp. Low Power Electron. Des, 2000, pp. 131-136.
[2]. O. Chen, R. Sheen, and S. Wang, "A low-power adder operating on effective dynamic data ranges," IEEE Trans. Very Large Scale Integr.(VLSI) Syst., vol. 10, no. 4, pp. 435-453, Aug. 2002.
[3]. O. Chen, S.Wang, and Y. W.Wu, "Minimization of switching activities of partial products for designing low-power multipliers," IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 11, no. 3, pp. 418-433, Jun. 2003.
[4]. L. Benini, G. D. Micheli, A. Macii, E. Macii, M. Poncino, and R. Scarsi, "Glitch power minimization by selective gate freezing," IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 8, no. 3, pp. 287-298, Jun. 2000.
[5]. S. Henzler, G. Georgakos, J. Berthold, and D. Schmitt-Landsiedel, "Fast power-efficient circuit-block switch-off scheme," Electron. Lett. vol. 40, no. 2, pp. 103-104, Jan. 2004.
[6]. Kuan-Hung Chen, Yuan-Sun Chu, Member, "A Spurious-Power Suppression Technique For Multimedia/DSP Applications" IEEE Transactions On Circuits And Systems-I: Regular Papers, Vol. 56, No. 1, January 2009
[7]. Vishwanath.B.R, Theerthesha.T.S ,"Multiplier Using Canonical Signed Digit Code", International Journal for Research in Applied Science \& Engineering Technology (IJRASET), Volume 3 Issue V, May 2015, ISSN: 2321-9653
[8]. M. Faust, O. Gustafsson and C.H.Chang, "Fast and VLSI efficient binary- to-CSD Encoder using bypass signal", electronics letters, vol. 47, no. 1, 6th jan. 2011.
[9]. Sagar M, Sayed Saber Ali, Sharath, Shashidhara N J, Ms Sharon Thomas and Mr Vijay Ganesh P C, "Power and Area efficient 256 FFT architecture", 2016.

