

# DATA AWARE LOW POWER HIGH THROUGHPUT MODIFIED RADIX FFT PROCESSOR FOR WPAN APPLICATIONS

# **R. SAKTHIVEL and M. VANITHA**<sup>\*</sup>

School of Electronics Engineering, VIT University, VELLORE (T.N.) INDIA

# ABSTRACT

This paper presents a high-speed low-complexity modified radix  $2^5$ -512-point Fast Fourier transform (FFT) processor with an input of eight samples taken in pipelined approach for high rate wireless personal area network applications. The main advantage of using this method is to reduce the number of complex multiplications and the size of the twiddle factor memory. The proposed architecture has been implemented in 180 nm CMOS technology with a supply voltage of 1.8v. The results demonstrate that total number of cells used is 9937 occupying a core area of 265337.251 um<sup>2</sup>.

Key words: Discrete Fourier Transform, twiddle factor, booth multiplier.

# **INTRODUCTION**

For designing high rate<sup>1</sup> WPAN FFT's we need to have a throughput rate in GS/sec and it is achieved by increasing number of data- paths which results in increasing hardware cost .Many FFT processor architectures are introduced utilizing OFDM transmission like Single Path Delay Commutator (SDC) and multipath delay feedback. One OFDM symbol in the IEEE 802.11ad standards consists of a length of 512 subcarriers. Therefore, FFT computations are performed with 512-point arithmetic and should provide a high throughput rate<sup>3</sup>. Hence effective algorithm has been proposed which reduces the number of stages for an FFT. In the previous study radix 2<sup>5</sup> FFT based processor designed by using booth's multiplier. The disadvantage of booth architecture is the complexity of the circuit to generate partial products. Among the various FFT architectures, the multipath delay feedback (MDF) architecture is frequently used as a solution to provide a throughput rate of more than 1GS/s. A small radix is desirable because it results in a simple butterfly structure<sup>2</sup>. The radix-4 algorithm is primarily used for high data throughput FFT architectures, but requires a 4-

<sup>\*</sup>Author for correspondence; E-mail: mvanitha@vit.ac.in

point butterfly unit with high complexity. Recently, the radix- $2^5$  FFT algorithm and its architecture have been studied in order to reduce the number of complex multipliers<sup>2</sup>.

#### Proposed modified radix FFT algorithm

The proposed FFT processor uses constant multipliers based on the canonical signed digit (CSD) representation for the complex multiplication<sup>5</sup>. The equation (1) shows the number of butterfly stages and twiddle factor multiplications for each stage. In this paper a 512-point modified FFT<sup>3</sup> processor is proposed in which an input of eight samples taken in parallel for achieving high throughput rate. There are two modules based on the modified radix 2 algorithm that reduce the number of twiddle factor multiplications. The first module, which consists of five processing elements (PEs), is realized using the modified radix-2 algorithm, and the second module is implemented by cascading of the Butterfly 1 stage. The proposed architecture consists of butterfly units, complex Booth multipliers, complex constant multipliers, first-in first-out (FIFO), and a control unit. Fig. 1(a) shows the butterfly structure with 2's complement and adder circuit.

$$X(k) = \sum_{n=0}^{N-1} x(n) W_n^k, k = 0, 1, ..., N - 1$$
  

$$n = \left\langle \frac{N}{2} n_1 \frac{N}{4} n_2 + \frac{N}{8} n_3 + \frac{N}{16} n_4 + \frac{N}{32} N_5 + n_6 \right\rangle_N$$
  

$$k = \left\langle k_1 + 2k_2 + 4k_3 + 8k_4 \right\rangle_N \qquad \dots (1)$$

#### **Butterfly unit & FIFO**

The butterfly units perform complex additions and complex subtractions of two inputs X(n) and X(n+N/2) and all input values are saved into the FIFO until the N/2<sup>th</sup> input is entered .Then, the butterfly units perform calculations between the input values and FIFO outputs after entering the  $(N/2)+1^{st}$  input. During the last N/2 clock cycles all butterfly calculations are performed at each stage<sup>7</sup>. The butterfly outputs having the complex addition are given to the next stage as inputs and the complex subtraction outputs are saved in the FIFO and then during the next N/2 clock cycles, the FIFO outputs are fed to the next stage. Input real and imaginary values that are given are saved into the FIFO until the N/2<sup>th</sup> input is entered. Then, the butterfly unit performs the calculations between the input values and FIFO outputs, after entering then N/2+1 input. Among the butterfly outputs, the complex addition outputs are fed to the next stage and the complex subtraction outputs are saved in the FIFO, and then during the next N/2 clock cycles, the FIFO outputs are fed to the next stage addition outputs are fed to the next stage and the complex subtraction outputs are saved in the FIFO, and then during the next N/2 clock cycles, the FIFO outputs are fed to the next stage.

#### **Complex constant multiplier**

Constant coefficient multiplication is a common function in Finite Impulse Response (FIR) filtering. In this case the constant coefficients represent filter parameters and are changed much less frequently than the input data values<sup>4</sup>. The twiddle factor multiplication for FFT computation is performed using fixed-width complex multiplier. The twiddle factor  $W_8$  has just one coefficient, but twiddle factor  $W_{16}$  and  $W_{32}$  have 3 coefficients and 7 coefficients, respectively. In most of previous FFT architectures the twiddle factor  $W_{32}$  multiplication was designed using the complex Booth multipliers. If the complex Booth multipliers for the twiddle factor  $W_{32}$  multiplication are replaced by CSD constant multipliers with CSS technique, the hardware complexity will reduced. In addition, the memory size for storing twiddle factor LUT is only half size than using complex Booth multipliers. Fig. 1(b) and Fig.1 (c) shows the stage -1 and 2 of the proposed architecture.



Fig. 1: (a) Butterfly structure with 2's compliment and adder circuit; (b) Stage 1 of the proposed architecture; (c) Stage 2 of the proposed architecture

### Simulation and synthesized results

The architecture of the proposed FFT/IFFT processor was designed in Verilog HDL and simulated to verify its functionality. Both steps simulation and synthesis steps were performed using CADENCE RC compiler tool using 180 nm CMOS technology.

So from Table 2 we found that in 45 nm technology leakage power is less compared to other technologies. From Table 1 total power usage of Baugh Woolley multiplier is less compared to booth multiplier. Table 3 the complete ASIC chip of the proposed architecture with design specifications.

|                                  | Table 1: Powe<br>circuit usi<br>mult | r comparison of<br>ng different<br>ipliers | Table 2: Comparison of proposedand modified architectures indifferent technologies |                         |  |
|----------------------------------|--------------------------------------|--------------------------------------------|------------------------------------------------------------------------------------|-------------------------|--|
| Power                            | Previous 2 <sup>4</sup>              | Proposed 2 <sup>5</sup>                    | Previous 2 <sup>4</sup>                                                            | Proposed 2 <sup>5</sup> |  |
|                                  | FFT processor <sup>1</sup>           | FFT architecture                           | FFT processor <sup>1</sup>                                                         | FFT architecture        |  |
| Total power (nw)                 | 686296.368                           | 686294.971                                 | 686296.368                                                                         | 686294.971              |  |
| Leakage power 88.181<br>(nw)     |                                      | 88.181                                     | 88.181                                                                             | 88.181                  |  |
| Dynamic power 686208.187<br>(nw) |                                      | 686207.791                                 | 686208.187                                                                         | 686207.791              |  |

| Tal | ole . | 3: | Chip | layout | and | its | specifications |
|-----|-------|----|------|--------|-----|-----|----------------|
|-----|-------|----|------|--------|-----|-----|----------------|



# CONCLUSION

In this paper the modified radix  $-2^5$  FFT algorithm and Data path for 512 point modified radix  $-2^5$ FFT Processes have been proposed and we observed that booth multiplier will take much power and takes Long time for computation so we go for baugh-wooley Multiplier which takes less power consumption and computation time which in turn decreases the leakage power The proposed architecture has potential applications in high rate OFDM based WPAN applications.

## REFERENCES

1. Y. Lin, H. Liu and C. Lee, A 1-GS/s FFT/IFFT Processor for UWB Applications, IEEE J. Solid-State Circuits, **40(8)**, 1726-1735 (2005).

- 2. K. Cho, K. Lee, J. Chung and K. Parhi, Design of Low-Error Fixed Width Modified Booth Multiplier, IEEE Trans, Very Large Scale Integer, (VLSI) Syst., **12(5)**, 522-531 (2004).
- 3. Y.-W. Lin, H.-Y. Liu and C.-Y. Lee, A 1-GS/s FFT/IFFT Processor for UWB Applications, IEEE J. Solid-State Circuits, **40(8)**, 1726-1735 (2005).
- 4. E. Bidet, D. Castelain, C. Joanblanq and P. Senn, A Fast Single Chip Implementation of 8192 Complex Point FFT, IEEE J. Solid-State Circuits, **30**(**3**), 300-305 (1995).

Accepted : 11.10.2016