DESIGN & SIMULATION OF FAST AND EFFICIENT MULTIPLICATION ALGORITHM IN VEDIC MATHEMATICS USING VERILOG

Ugra Mohan Kumar*1, Monika Gupta2, Vishal Ramola2

1 M.Tech Student, Faculty of Technology, Uttarakhand Technical University, Dehradun, India.
2 Assistant Professor, Faculty of Technology, Uttarakhand Technical University, Dehradun, India.

ABSTRACT

Multiplication is the most fundamental and commonly used operations in a CPU. These operations furthermore form the origin for other complex operations. With ever increasing requirement for faster clock frequency it becomes essential to have faster arithmetic unit. In this paper a new structure of Mathematics – Vedic Mathematics is used to execute operations. In this paper mainly two types of algorithms one is vedic and other one is vedic wallace algorithms are implemented for Multiplication in Verilog and performance is evaluated in Xilinx ISE Design Suite 13.2 platform then compared with different parameters like delay time and area (number of LUT) for several bits algorithms.

Keywords: Vedic mathematics, multiplication, delay time, Verilog.

1. INTRODUCTION

Multiplication is an important essential function in arithmetic operations. Multiplication-based different operations considered as Multiply and Accumulate (MAC) and internal product are among some of the commonly used Computation- Intensive Arithmetic Functions (CIAF) presently implemented and designed in many Digital Signal Processing (DSP) appliances considered as convolution of two or more than two information, Fast Fourier Transform (FFT) of different sequences, filtering of signals or information and all microprocessors its used in arithmetical and logical unit (ALU) [1]. Since multiplying is the most important factor for the implementation time for most of the DSP algorithms or techniques, so there is a need of most efficient and high speed multiplier. Currently, multiplication time is still the most important factor in determining the instruction cycle time and the delay time of a DSP chip.

The requirement for high speed processing has been growing as a result of increasing work for computer and signal processing applications. Higher throughput arithmetical and logical operations are important to accomplish the required performance in various real-time signal and image processing applications [2]. The main key of arithmetical and logical operations in these applications is multiplication techniques and the development and designing of fast and efficient multiplier circuit has been a subject of interest over last few years. Sinking the execution time and power consumption of required circuits are very necessary requirements for various applications such as in digital signal processing and in digital image processing [2,3]. This work presents different multiplier techniques and architectures. Multiplier based on Vedic or ancient Mathematics is one of the fast and efficient with low propagation delay and low power consumption multiplier.

2. VEDIC MATHEMATICS

Vedic Mathematics introduces the magnificent applications to Arithmetical calculation and verification, theory of numbers, complex multiplications, fundamental algebraic operations, complex factorizations, simple quadratic and advanced order equations, concurrent quadratic equations, partial fractions, in differential calculus and integral calculus, squaring of complex number, cubing, square root of complex number, cube root, 2-Dimensional and 3-Dimensional coordinate geometry and brilliant Vedic Numerical code.

(a) Vedic Mathematics Sutras and Up-sutras: Entire mechanics of Vedic mathematics is based on 16 sutras – formulas and 13 up-sutras meaning – corollaries.

Sutras

1. Ekadhikena Purvena

*Corresponding Author www.ijesr.org
2. Nikhilam Navatascharamam Dashatah
3. Urdhva-tiryagbyham
4. Paravartya Yojayet
5. Shunyam Samyasamucchayeye
6. Anurupe Sunyamanyat
7. Sankalana vyavakalanabhyam
8. Puranaprranabhyyam
9. Calana – Kalanabhyyam
10. Yavadunam
11. Vyastisamashthih
12. Sheshanykena Charmena
13. Sopantadvyamantyam
14. Ekyunena Purvena
15. Ginitasamucchayah
16. Gunaksamucchayah

Up-sutras
1. Anurupyena
2. Shishyate Sheshamjnah
3. Adyamadye Nantyamantyena
4. Kevalaih Saptakam Gunyat
5. Vestanam
6. Yavadunam Tavadunam
7. Yavadunam Tavadunikuta Varganka Yojayet
8. Antyayordshakepi
9. Anvatoreva
10. Samuccayagunitah
11. Lopanasthapanabhyam
12. Vilokanam
13. Ginitasamucchayah Samuccayagunitah

(b) Urdhva-tiryagbyham: The Nikhilam and Anurupyena are for special cases, whereas Urdhva-tiryagbyham is general formula applicable to all [20]. Its algebraic principle is based on multiplication of polynomials. Consider we want to multiply two 4th degree polynomials

\[ A x^4 + B x^3 + C x^2 + D x + E \]

\[ Z x^4 + Y x^3 + X x^2 + W x + V \]

\[ AZ x^8 + (AY+BZ) x^7 + (AX+BY+CY) x^6 + (AW+BX+CY+DZ) x^5 + (AV+BW+CX+DY+EZ) x^4 + (BV+CW+DX+EY+VZ) x^3 + (CV+DW+XE) x^2 + (DV+EW) x + EV \]

Fig 1: Multiplication of two fourth degree polynomials

Highest degree coefficient can be obtained by multiplication of two highest degree coefficients of individual polynomial namely A and Z. A next degree coefficient is obtained by addition of cross multiplication of coefficients of 4th degree and 3rd degree of other polynomials. It means A which is 4th degree coefficient of polynomial-1 is multiplied by 3rd degree
coefficient of polynomial-2 is added to 4th degree coefficient of polynomial-2 multiplied by 3rd degree coefficient of polynomial-1 to get (AY+BZ).

A  B  C  D  E

Z  Y  X  W  V

Fig 2: Vertically Crosswise First Cross Product

Similar logic of cross multiplication and addition can be extended till all 5 coefficients of both polynomials are used as follows. Every iteration gives a coefficient of product.

A  B  C  D  E

Z  Y  X  W  V

Fig 3: Vertically Crosswise Intermediate Cross Product

In this iteration, coefficient of degree 4 of product is obtained. For next iteration we drop A and Z which are the highest degree polynomial coefficients. The resulting operation gives coefficient of the degree 3 of multiplication of polynomials. As follows

A  B  C  D  E

Z  Y  X  W  V

Fig 4: Vertically Crosswise Intermediate Cross Product

Continuing with this process last coefficient is obtained by multiplication of 0th degree terms of both polynomials as E*V. This process can be done in both ways as it is symmetric. In summary the process can be stated as, process of addition of product of coefficients of two polynomials in crosswise manner with increase and then decrease in number of coefficients from left to right with crosswise meaning product of coefficients for one polynomial going rightwards while for other leftwards.

Any decimal number can be thought as a polynomial with unknown or $x$ equal to 10. Being said that, formula stated above can be utilized to calculate product of two decimal numbers. Each digit of decimal number is though as coefficient of power of 10. Only restriction in this case is each cross product should be only one digit, if not it is added to the next power of 10.

3. PROPOSED ALGORITHM

In binary system only 0 and 1 are used hence multiplication in *Urdhva-tiryagbhhyam* or vertically-crosswise formula is replaced by AND logic. Each AND will be a bit wide and these bits are added together to generate cross-product. Rules for vertically-crosswise multiplication remains same as starting from MSBs – Most Significant Bit, of both multiplicands considered for first cross product. Then increasing one bit in each further calculation with cross product taken for bits of multiplicands till all bits are used. Further dropping bits from MSB process of cross-product is continued till only LSB is used for cross-product. In binary number system the maximum width of cross-product depends on width of
multiplicands. For example, in 8 bit multiplication maximum cross-product width will be \( \log_2 8 + 1 = 4 \). In 16 bit it will be 5 and in 23 bits it will be 5 again.

(a) Comparison of Vedic and Conventional Multiplier: In this section comparison between Vedic multiplier by Urdhva-tiryagbhyam and conventional multiplier is made. Conventional multiplier of width \( N \times N \) will generate \( N \) number of partial products with each product containing \( N \) bits with 0 to 7 zeros added at the end. Vedic method generates \( (2N - 1) \) cross products of different widths which when combined forms \( (\log_2 N + 1) \) partial products for \( N \) bit multiplier. In case of number of partial products there is significant decrease in number for Vedic Mathematics. But partial products generated in case of conventional multipliers are just by AND one multiplier by digits of another multiplier, whereas in case of Vedic, partial products are obtained after cross products are generated which requires some logic. Hence in Vedic mathematics delay for partial products is equal to adder delay. Critical path would consist of adders adding maximum number of bits in cross product. In all cases it will be the cross product in which all bits of multipliers are considered. Different techniques are used to combine these partial products efficiently to reduce the total time required for multiplication. One such technique Wallace tree addition is discussed in next section.

(b) Combining Partial Products: Combining partial products is the most critical part in any multiplier which decides the performance of it. Different types of adders like Carry Save adders (CSA) or Carry Look-ahead Adders (CLA) are used frequently. To further improve the performance more parallelism is achieved by combining three or more partial products at a time until two are left and then to add these two partial products to get the final answer. This technique is called as Wallace tree adder.

(c) Vedic – Wallace: If Wallace tree structure with full and half adder is used then the class of multiplier is named as Vedic – Wallace multiplier. This type of multiplier essentially differs from conventional multiplier in the way the different bits are combined to form the final answer. 8, 16, and 23 bit multiplier are designed in Verilog with Vedic – Wallace structure with each of them customized to that particular width of multiplication to achieve high performance. These multipliers are designed as combinational blocks similar to equivalent Verilog design. Each of these designed are synthesized with Xilinx in two ways 1. To achieve the highest possible clock frequency with which they work 2. Relaxed clock frequency to check the area generated.

(d) Analysis: Wallace tree addition makes use of full adder extensively for reducing partial products. Figure 3.3 shows a gate level diagram of a full adder. It can be observed that to generation of sum has to pass through two XOR gates and is inherently slower than generation of carry.

Critical Path Analysis of a 4 bit multiplier

In this section comparison between critical path in 4 bit conventional and Vedic multiplier is done. In a 4 bit multiplier 4 partial products will be generated, in the diagram below named as P0 to P3.
In above multiplier a 3:2 reduction is used. After first level of Wallace tree 4 partial products are reduced to 3 partial products. As there are 4 bits to combine at power $2^3$, in first level only 3 can be combined leaving fourth for next level. It is shown in blue background. This extra bit is combined in next level of reduction. After this level only 2 partial products are left and hence added together to get final answer. The block level diagram of the process is shown below with critical path marked in red. Delay in critical path in terms of half and full adder delays are as follows.

\[
\text{Delay} = \text{FAS} + \text{FAS} + \text{HAC} + \text{FAC} + \text{FAC} + \text{FAS} \\
= 3 \text{ FAS} + 2 \text{ FAC} + \text{HAC}
\]

Where FAS = Full Adder Carry, FAC = Full Adder Carry, HAC = Half Adder Carry, HAS = Half Adder Sum, FA = Full Adder and HA = Half Adder.

In case of Vedic mathematics as bits at same power of two are added in one level. Hence 4 bits which are at power $2^3$ are added together to get a 3 bit sum. The MSB of this sum is at power $2^5$. If we compare the combination of partial product in Vedic and conventional the extra bit is at power 5 in Vedic compared to at power 3 for conventional. This carry forwarding is achieved with extra delay of a Half Adder Sum (HAS). Multiplication with Vedic structure is shown below. Figure 8 shows the critical path in red.
Critical Delay

\[ \text{Critical Delay} = \text{HAS} + \text{HAC} + \text{HAS} + \text{HAS} + \text{FAC} + \text{FAC} + \text{FAS} \]

\[ = 2 \text{FAC} + \text{FAS} + 3\text{HAS} + \text{HAC} \]

Critical delay for conventional

\[ 3 \text{FAS} + 2 \text{FAC} + \text{HAC} ?? 2 \text{FAC} + \text{FAS} + 3 \text{HAS} + \text{HAC} \]

\[ 3 - 2 \text{FAS} + 2 \text{FAC} + \text{HAC} ?? 2 \text{FAC} + \text{FAS} + 3 \text{HAS} + \text{HAC} \]

\[ 2 \text{FAS} \quad \text{XOR} \quad 3 \text{HAS} \]

\[ 4 \text{XOR Delay} > 3 \text{XOR Delay} \]

After comparing this with critical delay of conventional multiplier and cancelling equal terms we get 2 FAS – Full Adder Sums for conventional against 3 HAS - Half Adder Sums for Vedic. In terms of XOR gate this delay corresponds to 2×2 = 4 XOR gate delay for conventional against 3×1 = 3 XOR gate delay for Vedic multiplier. This analysis shows that 4 bit Vedic multiplier has less carry propagation delay than Wallace tree multiplier. Also for 4 bit design there is no difference between Vedic-Vedic and Vedic-Wallace structures. Depending upon the number of bits in multiplication Vedic mathematics structure and Wallace tree structure critical path varies and hence there is variable improvement in Vedic-Vedic and Vedic-Wallace structure.

Fig 10: Block level structure of a 4x4 multiplier by Vedic reduction

4. RESULTS

4.1 simulation results of 8x8 bit Vedic multiplier

Fig 11: Simulation result of 8X8 bits Vedic Multiplier
4.2 Simulation results of 16x16 bit Vedic multiplier

Fig 12: Simulation result of 16X16 bits Vedic Multiplier

4.3 Simulation results of 23x23 bit Vedic multiplier

Fig 13: Simulation result of 23X23 bits Vedic Multiplier

4.4 Simulation results of 8x8 bit Wallace multiplier

Fig 14: Simulation result of 8X8 bits Wallace Multiplier

4.5 Simulation results of 16x16 bit Wallace multiplier

Fig 15: Simulation result of 16X16 bits Wallace Multiplier
4.6 simulation results of 23x23 bit wallace multiplier

Fig 16: Simulation result of 23X23 bits Wallace Multiplier

5. CONCLUSION

The designs of 8x8, 16x16 bits and 23x23 bits Vedic multiplier have been implemented on Spartan3E (3s500fg320-4) device. The computation delay for 8x8 bits Wallace Multiplier was 65.065ns and for 8x8 bits Vedic multiplier was 16.484ns. 16x16 bits Wallace multiplier was 22.814ns and for 16x16 bits Vedic multiplier was 21.752ns. Also computation delays for 23x23 bits Wallace multiplier was 23.613ns and 23x23 bits Vedic multiplier was obtained 23.613ns respectively. It is therefore seen that the Vedic multipliers are much faster than the conventional multipliers for higher order bits. The algorithms of Vedic mathematics are much more efficient than of conventional mathematics.

Table 1: Comparative analysis of Vedic multiplier and Wallace multiplier

<table>
<thead>
<tr>
<th>FPGA Device Package</th>
<th>Type of Multiplier</th>
<th>Area (in no. of LUT's)</th>
<th>Delay (ns)</th>
</tr>
</thead>
<tbody>
<tr>
<td>SPARTAN3E (3s500fg320-4)</td>
<td>Vedic Mul. 8x8</td>
<td>162</td>
<td>16.484</td>
</tr>
<tr>
<td></td>
<td>Wallace Mul. 8x8</td>
<td>158</td>
<td>15.065</td>
</tr>
<tr>
<td></td>
<td>Vedic Mul. 16x16</td>
<td>810</td>
<td>21.752</td>
</tr>
<tr>
<td></td>
<td>Wallace Mul. 16x16</td>
<td>777</td>
<td>22.514</td>
</tr>
<tr>
<td></td>
<td>Vedic Mul. 23x23</td>
<td>1731</td>
<td>23.613</td>
</tr>
<tr>
<td></td>
<td>Wallace Mul. 23x23</td>
<td>1716</td>
<td>25.549</td>
</tr>
</tbody>
</table>

6. FUTURE SCOPE

In future this work can be extended to Division which can be implemented using Vedic Mathematics. Floating Point Vedic Processor could be also a good extension of this work.

REFERENCES


