

# High Throughput Arbitrary Sample Rate Converter for Software Radios

Nitin Mathur and Dr. B. Lakshmi

Department of Electronics and Communication Engineering  
National Institute of Technology,  
Warangal, India

**Abstract**— In modern digital communication systems, arbitrary sample rate conversion is the most computation intensive task. In addition, a reconfigurable sample rate converter is often required to meet the sampling rate requirements of different radio standards. This paper proposes a pipelined architecture for FPGA implementation of arbitrary rate converter employing cut-set retiming and Sum-Of-Power-Of-Two (SOPOT) techniques to achieve high throughput while reducing the hardware. The proposed architecture for 16 bit precision is designed and implemented using Xilinx ISE 14.2 and XC3S500E-4FG320 FPGA device. The implementation results show that the proposed architecture improves throughput by 4.5 times.

**Keywords**—Farrow structure, Sample rate converter, cut-set retiming, SOPOT, Lagrange interpolation

## I. INTRODUCTION

In modern digital communication systems, it is required to alter the sample rate of the input signal by either an integer factor or a non-integer factor. Sample rate conversion by integer factor can be done using various types of FIR filters viz., Halfband Filter, Equiripple Filter and Cascaded Integer Comb (CIC) filter. However in some applications it is desired to have new sample values at arbitrary points between the present samples. Examples of such systems are: audio technology, speech coding echo cancellation in digital modems, symbol synchronization in digital receivers [1]. Also in conventional sample rate conversion (SRC) implementation, for different SRC ratios different filters are required and this limits the flexibility in covering different SRC ratios in same architecture. The Fractional-delay digital filters (FD-DF) are very useful in arbitrary sample rate conversion. In 1988 Farrow proposed a structure which has the capability of interpolating between samples in the input data stream of a band-limited signal. The polynomial approximation filters can be efficiently implemented using Farrow structure for any arbitrary interpolation factor [2]. The Farrow structure is preferred due to its reconfigurability, as the output sample rate is varied just by varying the fractional delay parameter. Farrow structure approximates the impulse response of an ideal fractional-delay digital filter with delay by polynomial interpolation. To calculate the coefficients of sub-filters of Farrow structure, Lagrange's method of

interpolation based on time domain is be employed [3]. The number of sub-filters required is equal to the order of the polynomial approximation used plus one. As order of polynomial approximation increases number of sub filters increases and multipliers required increases. Also speed of this architecture is very less due large number of multipliers in the critical path. In this paper an efficient implementation of the Farrow structure using Sum-of-Power-of-Two (SOPOT) [4] coefficients and cutset retiming [5] is proposed. The coefficients of the sub-filters in the Farrow structure are represented as Sum-of-Power-of-Two coefficients. The Sum-of-Power-of-Two technique is attractive for VLSI implementation because filter coefficients can be implemented efficiently using hard-wired adders and shifters. Also, to reduce the number of adders and subtractors required in this structure, minimum-adder multiplier blocks are used [7]. [7] discusses about usage of Sum-of-Power-of-Two and multiplier block technique to reduce the number of arithmetic operations but this architecture is not practical at high frequency applications. So Transposed form of FIR filter and cutset retiming is used to increase the speed of the architecture by reducing the critical path delay.

The paper is organized as follows: Section I presents introduction, Section II gives the overview of Farrow structure, the design and optimization of Farrow structure is described in Section III, Section IV gives the implementation results followed by conclusions in Section V.

## II. OVERVIEW OF FARROW STRUCTURE

Arbitrary sample rate conversion is a critical task in multirate signal processing. Though different architectures for arbitrary rate conversion are available in literature, Farrow structure based fractional delay filters provide flexible architecture due to reconfigurability of fractional delay parameter [2]. Fig.1 shows the Farrow structure and the output  $y(n)$  is given by

$$y(n) = \sum_{k=0}^M C_k D^k \quad (1)$$



Figure 1 General Farrow structure

where  $D$  is the adjustable delay parameter called as fractional interval such that  $0 \leq D \leq 1$ . The coefficients  $C_k$  are obtained by solving a set of  $(M+1)$  linear equations. If the fractional delay value is same for all inputs, this structure gives a delayed version of inputs with a delay of  $D$ . If the delay  $D$  is varying for every input then Farrow structure performs sample rate conversion. The coefficients of FIR filters are fixed for a given filter structure order even if the fractional delay changes. The variable parameter in the design is  $D$ . Farrow implementation of polynomial based filters has better anti-imaging characteristics, and hence suitable for upsampling. The computational needs of Farrow structure is  $(N(M+1) + (M+1))$  multiplications and  $N(M+1)$  additions per output sample and it grows with square of the interpolation order.

### III. PROPOSED ARCHITECTURE

The general Farrow structure[2] requires the fixed FIR filter coefficients. However, the structure has a large number of multiplier and adders in the critical path resulting in low speed. The coefficients for Farrow filter are obtained using Lagrange interpolation method which is a time domain technique. This technique determines the  $M^{\text{th}}$  order polynomial passing through  $(M+1)$  sample points. Lagrange interpolation method has three advantages: 1) A Fractional delay filter (FDF) with polynomial-defined coefficients allow us to use Farrow structure, 2) the FDF magnitude frequency response at low frequencies is flat, 3) the ease to compute the FDF coefficients from one closed form equation. In this work, 3<sup>rd</sup> order Lagrange polynomials are used and this method is called Cubic Lagrange Interpolation. This Interpolator is designed using four FIR filters. Fig.2 shows the magnitude and phase responses of the designed third order Lagrange interpolator, for different delay values. The coefficients of FIR filters computed using Lagrange's interpolation technique are  $\{-1/6, 1/2, -1/2, 1/6\}$ ,  $\{-1/2, 2, -5/2, 1\}$ ,  $\{-1/3, 3/2, -3, 11/6\}$ ,  $\{1, 0, 0, 0\}$  [3].

The Farrow structure requires large number of multiplications for implementation. In proposed architecture, multipliers with fixed coefficients are implemented using simple shifts and additions by employing Sum-of-Power-of-



Figure 2. Magnitude and phase delay response of 3rd order Lagrange interpolator

two (SOPOT) technique [4]. The coefficients of each FIR filter in the proposed architecture can be implemented with SOPOT terms as:

$$g[k] = \sum_{i=1}^L 2^{b_{i,k}} a_{i,k} \quad (2)$$

where,  $a_{i,k} \in \{1, 1\}$ ,  $b_{i,k} \in \{-t, \dots, u\}$  with  $t$  and  $u$  determine the word length of dynamic range of each filter coefficient and  $L$  is the number of terms used in the coefficient. Main idea in the algorithm [4] is to obtain the base 2 logarithm of each  $g[k]$  element. The proposed architecture computes the filter coefficients as sum of the powers of two terms as presented in the TABLE I. In addition to reducing the number of multipliers, the proposed architecture further reduces the number of adders by eliminating the redundancy in the filter coefficients. This can be achieved by using minimum adder multiplier block [5] in the transposed form of the cascaded FIR filters used in the proposed architecture.

The delay of direct form FIR filter structure used in the traditional Farrow structure can be computed by determining the critical path,  $t_{cpl}$ . It may be observed from the Fig. 1 that the critical path can be expressed as sum of the delays of multiplier ( $t_{mul}$ ) and delay of  $N$  adders ( $N * t_{add}$ ), where  $N$  is the order of FIR filter. In the proposed architecture, this critical path delay is reduced to  $t_{cpl} = t_{mul} + t_{add}$  by using the transposed form structure.



Figure 3. Optimized Farrow structure

**TABLE I**  
Farrow Coefficients as Sum-of-Power-of-Two

| Farrow Coefficient | SOPOT Coefficient                 |
|--------------------|-----------------------------------|
| 1/2                | $\{2^{-1}\}x_n$                   |
| 1/3                | $\{2^{-2} + 2^{-4} + 2^{-6}\}x_n$ |
| 1/6                | $\{2^{-3} + 2^{-5} + 2^{-7}\}x_n$ |
| 3/2                | $\{2 - 2^{-1}\}x_n$               |
| 11/6               | $\{2 - 2^{-3} - 2^{-5}\}x_n$      |
| 5/2                | $\{2 + 2^{-1}\}x_n$               |
| 3                  | $\{2 + 1\}x_n$                    |

The further reduction in the critical path delay of the proposed architecture is achieved by employing the Cut set retiming technique to the block 2 of the Farrow structure shown in the dotted lines of the Fig. 1. Retiming is a transformation technique used to change the location of delay elements in an architecture without affecting its transfer function. A cut-set retiming only affects the weights of the edges in the cut-set [6]. As shown in Fig. 3, each edge of a cut-set (indicated by dotted line) will add a register resulting in the reduction of the critical path ( $t_{cp2}$ ) to  $t_{add} + t_{mul}$ . This is in contrast to the critical path delay  $t_{cp2} = M * (t_{mul} + t_{add})$  of the block 2 of the Fig. 1.

It may be noted from the Fig. 1 and Fig. 3 that the resource utilization of proposed architecture is same as that of general architecture. However, the proposed architecture achieves the reduction in the critical path delay, resulting in significant speed enhancement.

#### IV. IMPLEMENTATION RESULTS

The proposed architecture and general architecture are modelled using VHDL for 16-bit precision. These models are simulated using ISE 14.2 to verify the functionality and implemented on a Xilinx Spartan 3E FPGA device (XC3S500E-4FG320, 90nm technology). It may be observed from the resource utilization of implemented architecture

presented in TABLE II that the substantial reduction in the hardware is achieved for proposed architecture. It may also be noted that the proposed architecture operates at 90.09 MHz while the general architecture operates at 19.685 MHz. This significant improvement in the frequency of operation of the proposed architecture results in high throughput.

**TABLE II**  
Comparison of hardware resource utilization on XC3S500E-4FG320

| Hardware Resources      | General Farrow Structure[2] | Optimized proposed Farrow Structure |
|-------------------------|-----------------------------|-------------------------------------|
| No. of Slices           | 1019                        | 240                                 |
| No. of Slice flip flops | 191                         | 315                                 |
| No. of 4 input LUTs     | 1636                        | 396                                 |
| No. of MULT18X18IOS     | 17                          | 3                                   |
| Frequency(MHz)          | 19.685                      | 90.09                               |

#### V. CONCLUSION

This paper presents the design of a Farrow based arbitrary sample rate converter by computing the coefficients using SOPOT technique to reduce hardware and employing cut-set retiming to reduce critical path delay. It may be concluded from the implementation results of the proposed architecture and the general farrow structure that the proposed architecture offers high throughput while consuming less hardware. The proposed architecture is well suited for the applications requiring frequent variation of the fractional interpolation parameter.

#### REFERENCES

- [1] M. Blok, "Fractional delay filter design for sample rate conversion", Proceedings of the Federated Conference on Computer Science and Information Systems, 2012 pp.701–706
- [2] Farrow C W, "A continuously variable digital delay element", IEEE International Symposium on Circuits and Systems, Espoo, 1988, pp.2641-2645.
- [3] Vesa Valimik, "A new filter implementation strategy for lagrange interpolation", IEEE International Symposium on Circuits and Systems, vol. 1, 1995, pp. 361-364.
- [4] C. Damian and E. Lunca, "A low area FIR filter for FPGA implementation", Telecommunications and Signal Processing (TSP), 2011, pp.521-524
- [5] A.G. Dempster and M.D. Macleod, "Use of minimum-adder multiplier blocks in FR digital filters", IEEE Trans. CAS II, vol. 42, no.9, 1995, pp. 569-577
- [6] K. K. Parhi, "VLSI digital signal processing systems—design and implementation", First Edition, Wiley India, Chapter4, pp. 97-106.
- [7] C. K. S. Pun, Y. C. Wu, S. C. Chan, and K. L. Ho, "On the design and efficient implementation of the Farrow structure," IEEE Signal Processing Lett., vol. 10, no. 7, pp. 189–192, July 2003.