# Fast Polynomial Multiplication (from Chapter 2.6)

Notes by Gene Cooperman, © 2009 (may be freely copied as long as this copyright notice remains)

We all know how to multiply polynomials by multiplying all the cross terms. Given a polynomial `P` of degree `d` (the largest power is `d`), and a polynomial `Q` of degree `e` (the largest power is `e`), the product, `P Q`, will have degree `d e`. To compute the product `P Q`, there will be up to `(d+1) (e+1)` cross terms to multiply.

In general, if two polynomials are both of degree `d`, then there will be `(d+1)2 = O(d2)` cross terms.

But using the idea of the fast Fourier transform, one can actually reduce the number of multiplications from `O(d2)` to `O(d log d)`. multiply the polynomials in `O(d log d)` time. Let `A(x)` and `B(x)` be polynomials of degree `d`. For example, if `d=3`, we might have `A(x) = 4 + 3x + 5x2 + 6x3`.

In outline, the idea for polynomial multiplication for `C(x) = A(x) B(x)` of degree `d` is:

1. Choose `d` points `x0,…, xd/2` and `x-1,…, x-d/2`.
2. Evaluation: `A(x0),…,A(xd/2` and `A(-x1),…,A(-xd/2)`; Do the same for `B(x)`
3. Multiplication: Compute `C(x)=A(x)B(x)` at each of the points.
So `C(xi) = A(xi) B(xi)` and `C(x-i) = A(x-i) B(x-i)`. Let `yi` be the constants such that `C(x-i) = yi` and ```C(x-i) = y-i```
4. Interpolation: Compute the unknown coefficients of `C(x)` using `C(x0) = y0`, and ```C(x1) = y1,…,C(xd/2) = yd/2``` and ```C(x-1) = y-1,…,C(x-d/2) = y-d/2```.

## Step 1 (Select Points):

We choose `d+1` points `x0, x1,…,xd` at which we want to evaluate `A(x)`, where

`xd/2+i = -xi`.

## Step 2 (Evaluation):

Let's see how to do step 2 the fast way. The polynomial `A(x)` can be split into even and odd terms (for `d` even):
`A(x) = (c0 + c2x2 +…+ cd/2xd/2) + (c1x + c3x3 +…+ cd/2-1xd/2-1)`
We can write this as:
`A(x) = Aeven(x) + Aodd(x)`
where

`Aeven(x) = c0 + c2x2 +…+ cd/2xd/2`
`Aodd(x) = c1x + c3x3 +…+ cd/2-1xd/2-1`

(Note that our `Aeven(x)` and Aodd(x) are different from the textbook's `Ae(x)` and `Ao(x)`. Specifically, `Ae(x2) = Aeven(x)` and `xAo(x2) = Aodd(x)`.)

Next, we compute `A(xi) = Aeven(x) + Aodd(x)`
and `A(-xi) = Aeven(-x) + Aodd(-x) = Aeven(x) - Aodd(x)`

Recall that we chose the `d+1` points `x0, x1,…,xd` at which we want to evaluate `A(x)`, where

`xd/2+i = -xi`.

So, we do step 2 by evaluating `Aeven(xi)` and `Aodd(xi)` for `x0, s1,…, xd/2`. After this, we achieve evaluations for `xd/2+i`free:`

``` Aeven(xd/2+i) = Aeven(-xi) = Aeven(xi) Aodd(xd/2+i) = Aodd(-xi) = -Aeven(xi) Then, we have d+1 evaluations of A(x), since A(x) = Aeven(x) + Aeven(x) Evaluating the d+1 terms of A() at all d+1 points would have required evaluating (d+1) (d+1) terms. By doing it this alternative way, we only need to evaluate about (d+1) (1+d/2) terms. Do likewise for another polynomial B(). Step 3 (Multiplication): Multiply at the d+1 points xi. C(xi) = A(xi) B(xi) When we are done, we have computed constants yi for which: ```
``` ```

`C(xi) = yi`

This requires `d+1` multiplications. Our goal is to find a fast multiplication in time `O(d log d)`. This step is `O(d)` and so it's already fast enough.

## Step 4 (Interpolation):

In general, interpolation requires a lot of steps. Given the unknown polynomial `C(x)` for which we only know:
`C(x0) = y0,…, C(xd) = yd`
we wish to find `c0,…, cd` such that
`C(x) = c0 + c1x + c2x2 +…+ cdxd`

As you would expect, one plugs in and solves. The `ci` are the unknowns that we have to solve for. After plugging into `C(x) = y` for ```x0, x1,…, xd```, we have `d+1` linear equations:

`c0 + c1(x0) + c2(x0)2 +…+ cd(x0)d = y0`
`c0 + c1(x1) + c2(x1)2 +…+ cd(x1)d = y1`
...

This means solving the linear simultaneous equations.

The advantage if we use `x1,…, xd/2` and and `x-1,…, x-d/2` is that we can replace the equations to solve:
`C(xi)=yi` and `C(-xi)=y-i`
by:
`C(xi) + C(-xi) = yi + y-i`
`C(xi) - C(-xi) = yi - y-i`
These last two equations are the same as:

`2 Ceven(xi) = yi + y-i`
`2 Codd(xi) = yi + y-i`

Now, we again plug in, but this time we have only half as many unknowns, ci, in each equation:
`2c0 + 2c2(xi)2 +…+ 2cd-1(xi)d-1 = yi + y-i`
`2c1x + 2c3(xi)3 +…+ 2cd(xi)d = yi - y-i`

So, we still have `d+1` linear equations to solve, but each linear equation now has only half as many terms.

# Generalizing to four polynomials instead of Aeven() and Aodd():

We have seen that the polynomial `Aeven(x)` has only half as many terms. So, we win by breaking up `A(x) = Aodd(x) + Aodd(x)`. How can we extend this further?

The answer is to go to complex numbers. Recall that the imaginary number `i` satisfies `i2 = -1`. Some simple algebra then shows that `i3 = -i` and `i4 = 1`. (Note that we will also continue to use `i` for the subscript index. It will be clear from context whether we mean `i` the imaginary number, or `i` the subscript index.)

So, instead of using `A(x)` and `A(-x)` for `A()` and replacing them by `A(x) + A(-x)` and `A(x) - A(-x)` for `A()`, let's use `P(x), P(ix), P(i2x), P(i3x)` (or the equivalent `P(x), P(ix), P(-x), P(-ix)`) for `P()`. Let's solve for some polynomials below. From these, we won't construct `2Peven(x) = P(x) + P(-x)` and `2Podd(x) = P(x) - P(-x)`, but four polynomials based on four linear combinations. We'll guess the right polynomials below.

`P(x)=1` `P(x)=x` `P(x)=x2` `P(x)=x3` `P(x)=x4` Short Name for Expression
`P(x) + P(ix) + P(-x) + P(-ix)`: 4 0 0 0 4x4 4P0 mod 4(x)
`P(x) - P(ix) + P(-x) - P(-ix)`: 0 0 4x2 0 0 4P2 mod 4(x)
`P(x) + P(ix) - P(-x) - P(-ix)`: 0 (2+2i)x 0 (2-2i)x3 0
`P(x) - P(ix) - P(-x) + P(-ix)`: 0 (2-2i)x 0 (2+2i)x3 0
`P(x) + iP(ix) - P(-x) - iP(-ix)`: 0 0 0 4x3 0 4P3 mod 4(x)
`P(x) - iP(ix) - P(-x) + iP(-ix)`: 0 4x 0 0 0 4P1 mod 4(x)

The good polynomials are:

`4P0 mod 4(x) = P(x) + P(ix) + P(-x) + P(-ix)`
`4P1 mod 4(x) = P(x) - iP(ix) - P(-x) + iP(-ix)`
`4P2 mod 4(x) = P(x) - P(ix) + P(-x) - P(-ix)`
`4P3 mod 4(x) = P(x) + iP(ix) - P(-x) - iP(-ix)`

The subscript names were chosen to indicate which terms are non-zero. For example `P1 mod 4()` has terms identical to `P()` for those terms of degree `1 mod 4`. The other terms of `P1 mod 4()` are all zero. These are the polynomials for which 3/4 of the terms are zero. So, these polynomials can be evaluated in 1/4 of the time.

Note that

`P(x) = P0 mod 4(x) + P1 mod 4(x) + P2 mod 4(x) + P3 mod 4(x)`

If we're going to make this work, we need an efficient way to convert between P(x), P(ix), P(-x), P(-ix) and the four polynomials above. This is the essence of the Fast Fourier Transform in the text. It allows you to convert from from an ordinary polynomial `P()` into the four transformed polynomials above. (This is analogous to the previous case in which we transformed into `Aeven()` and `Aodd()`.) Using the transformed polynomials, we can evaluate (Step 2) fast. Step 3 is always fast. And we can interpolate (Step 4) fast.

### Step 1 (Select Points):

We choose `d+1` points `x0, x1,…,xd` at which we want to evaluate `A(x)`, where

`xd/4+i = i xi`   for `1≤i<d/2`
`x2d/4+i = - xi` for `1≤i<d/2`
`x3d/4+i = -i xi` for `1≤i<d/2`

Note that the coefficients `i`, -1, and -i are the powers of the imaginary number `i`.

### Step 2 (Evaluation):

We can write this as:
`P(x) = P0 mod 4(x) + P1 mod 4(x) + P1 mod 4(x) + P2 mod 4(x) + P3 mod 4(x)`
where

`P0 mod 4(x) = c0 + c4x4 + …`
`P1 mod 4(x) = c1x + c5x5 + …`
...

So, we do step 2 by evaluating `P0 mod 4(xi)`, `P1 mod 4(xi)`, `P2 mod 4(xi)` and `P3 mod 4(xi)`, for `x0, s1,…, xd/4`. After this, we achieve evaluations of `P0 mod 4(xj)` with `j>d/4` for free:

```P0 mod 4(xd/4+i) = P0 mod 4(ixi) = P0 mod 4(xi)```
```P0 mod 4(x2d/4+i) = P0 mod 4(-xi) = P0 mod 4(xi)```
```P0 mod 4(x3d/4+i) = P0 mod 4(-ixi) = P0 mod 4(xi)```

and evaluations of `P1 mod 4(xj)` with `j>d/4` for free:

```P1 mod 4(xd/4+i) = P1 mod 4(ixi) = i P1 mod 4(xi)```
```P1 mod 4(x2d/4+i) = P1 mod 4(-xi) = - P1 mod 4(xi)```
```P1 mod 4(x3d/4+i) = P1 mod 4(-ixi) = -i P1 mod 4(xi)```

and evaluations of `P2 mod 4(xj)` with `j>d/4` for free:

```P2 mod 4(xd/4+i) = P2 mod 4(ixi) = - P2 mod 4(xi)```
```P2 mod 4(x2d/4+i) = P2 mod 4(-xi) = P2 mod 4(xi)```
```P2 mod 4(x3d/4+i) = P2 mod 4(-ixi) = - P2 mod 4(xi)```

and evaluations of `P3 mod 4(xj)` with `j>d/4` for free:

```P3 mod 4(xd/4+i) = P3 mod 4(ixi) = -i P3 mod 4(xi)```
```P3 mod 4(x2d/4+i) = P3 mod 4(-xi) = - P3 mod 4(xi)```
```P3 mod 4(x3d/4+i) = P3 mod 4(-ixi) = i P3 mod 4(xi)```

Then `P(x)` is computed as:

`P(x) = P0 mod 4(x) + P1 mod 4(x) + P1 mod 4(x) + P2 mod 4(x) + P3 mod 4(x)`

`P0 mod 4()`, `P1 mod 4()`, `P2 mod 4()` and `P3 mod 4()` can each be evaluated fast, since each has only 1/4 as many terms. Each of the four functions is applied to each of `x0, x1,…, xd/4`. So, this requires an evaluation of `(d+1) (d/4)` terms.

Do likewise for another polynomial, `Q(x)`.

### Step 3 (Multiplication):

Multiply at the `d+1` points `xi`.

`R(xi) = P(xi) Q(xi)`

When we are done, we have computed constants `yi` for which:

`R(xi) = yi`

This requires `d+1` multiplications. Our goal is to find a fast multiplication in time `O(d log d)`. This step is `O(d)` and so it's already fast enough.

### Step 4 (Interpolation):

Recall that with the interpolation phase of `Aeven()` and `Aodd()`, we needed to determine the unknown coefficients `c0, c1,…,cd` for the polynomial `C(x)`, given that:

`C(xi) = yi`

We again need to do again find unknown coefficients `c0, c1,…,cd`, but now for:

`R(xi) = yi`

We use the good polynomials found from the table at the beginning of this subsection:

`4R0 mod 4(x) = R(x) + R(ix) + R(-x) + R(-ix)`
`4R1 mod 4(x) = R(x) - iR(ix) - R(-x) + iR(-ix)`
`4R2 mod 4(x) = R(x) - R(ix) + R(-x) - R(-ix)`
`4R3 mod 4(x) = R(x) + iR(ix) - R(-x) - iR(-ix)`

We expand:

 `4R0 mod 4(xi)` `= R(xi) + R(ixi) + R(-xi) + R(-ixi)` ```= R(xi) + R(xd/4+i) + R(x2d/4+i) + R(x3d/4+i)``` `= yi + yd/4+i + y2d/4+i + y3d/4+i` `4R1 mod 4(x)` `= R(xi) - iR(ixi) - R(-xi) + iR(-ixi)` `= yi - iyd/4+i - y2d/4+i + iy3d/4+i` `4R2 mod 4(xi)` = R(xi) - R(ixi) + R(-xi) - R(-ixi) `= yi - yd/4+i + y2d/4+i - y3d/4+i` `4R3 mod 4(xi)` `= R(xi) + iR(ixi) - R(-xi) - iR(-ixi)` `= yi + iyd/4+i - y2d/4+i - iy3d/4+i`

So, we have to find the unknowns, `ci`:

```c0 + c4xi4 + … = yi + yd/4+i + y2d/4+i + y3d/4+i```
```c1xi + c5xi5 + … = yi - iyd/4+i - y2d/4+i + iy3d/4+i```
```c2xi + c6xi6 + … = yi - yd/4+i + y2d/4+i - y3d/4+i```
```c3xi + c7xi7 + … = yi + iyd/4+i - y2d/4+i - iy3d/4+i```

But when we do this step, each equation already has 3/4 of the coefficients set to zero, and so each of the `d+1` linear equations will have only 1/4 as many terms.

Furthermore, there are savings in computing the sums of the `yi` on the right hand side, since many of the partial sums are repeated. In the general case, there will be only `O(d log d)` sums of the `yi`.

## Generalizing Further:

From the previous two subsections (case of `Aeven(), Aodd()`, and case of ```P0 mod 4(), P1 mod 4(), P2 mod 4(), P3 mod 4()```), we can derive the general pattern.

The textbook does this in a somewhat more abstract (and more compact) manner, by using recursion. Now read the general case there. In converting between our notation and theirs, it's important to recognize that the textbook uses `Ae()` and `Ao()`, where:

`Aeven(x) = Ae(x2)`, and
`Aodd(x) = xAo(x2)`.

In general, if the polynomial is of degree `d = n-1` for some `n = 2k`, then we let ω = e2πin/k and choose `xi = ωi`:

`x0 = 1`
`x1 = ω = e2πi(n/k)`
`x2 = ω2 = e2πi(2n/k)`
`x3 = ω3 = e2πi(3n/k)`
...

For further reading about fast multiplication (both for integers and polynomials), try http://en.wikipedia.org/wiki/Multiplication_algorithm.