Notes by Gene Cooperman, © 2009 (may be freely copied as long as this copyright notice remains)
We all know how to multiply polynomials by multiplying all the cross
terms. Given a polynomial P
of degree d
(the largest power is
d
), and a polynomial Q
of degree e
(the largest power is e
),
the product, P Q
, will have degree d e
. To compute the product
P Q
, there will be up to (d+1) (e+1)
cross terms to multiply.
In general, if two polynomials are both of degree d
, then there
will be (d+1)2 = O(d2)
cross terms.
But using the idea of the fast Fourier transform, one can actually reduce
the number of multiplications from O(d2)
to O(d log d)
.
multiply the polynomials in O(d log d)
time. Let A(x)
and B(x)
be polynomials of degree d
. For example, if d=3
, we
might have A(x) = 4 + 3x + 5x2 + 6x3
.
In outline, the idea for polynomial multiplication for C(x) = A(x) B(x)
of degree d
is:
d
points x0,…, xd/2
and x-1,…, x-d/2
.A(x0),…,A(xd/2
and
A(-x1),…,A(-xd/2)
; Do the same for B(x)
C(x)=A(x)B(x)
at each of the points. C(xi) = A(xi) B(xi)
and C(x-i) = A(x-i) B(x-i)
.
Let yi
be the constants such that
C(x-i) = yi
and C(x-i)
= y-i
C(x)
using C(x0) = y0
,
and C(x1) = y1,…,C(xd/2)
= yd/2
and C(x-1) = y-1,…,C(x-d/2)
= y-d/2
.We choose d+1
points
x0, x1,…,xd
at which we want to evaluate A(x)
, where
xd/2+i = -xi
.
Let's see how to do step 2 the fast way. The polynomial A(x)
can be split
into even and odd terms (for d
even):
A(x) = (c0 + c2x2 +…+ cd/2xd/2) + (c1x + c3x3 +…+ cd/2-1xd/2-1)
We can write this as:
A(x) = Aeven(x) + Aodd(x)
where
Aeven(x) = c0 + c2x2 +…+ cd/2xd/2
Aodd(x) = c1x + c3x3 +…+ cd/2-1xd/2-1
(Note that our Aeven(x)
and Aodd(x) are different
from the textbook's Ae(x)
and Ao(x)
. Specifically,
Ae(x2) = Aeven(x)
and
xAo(x2) = Aodd(x)
.)
Next, we compute A(xi) = Aeven(x) + Aodd(x)
and A(-xi) = Aeven(-x) + Aodd(-x) = Aeven(x) - Aodd(x)
Recall that we chose the d+1
points
x0, x1,…,xd
at which we want to evaluate A(x)
, where
xd/2+i = -xi
.
So, we do step 2 by evaluating
Aeven(xi)
and Aodd(xi)
for x0, s1,…, xd/2
.
After this, we achieve evaluations for xd/2+i`free:
Aeven(xd/2+i) = Aeven(-xi) = Aeven(xi)
Aodd(xd/2+i) = Aodd(-xi) = -Aeven(xi)
Then, we have d+1
evaluations of A(x)
,
since A(x) = Aeven(x) + Aeven(x)
Evaluating the d+1
terms of A()
at all d+1
points would have
required evaluating
(d+1) (d+1)
terms. By doing it this alternative way, we only need
to evaluate about (d+1) (1+d/2)
terms.
Do likewise for another polynomial B()
.
Multiply at the d+1
points xi
.
C(xi) = A(xi) B(xi)
When we are done, we have computed constants yi
for which:
C(xi) = yi
This requires d+1
multiplications. Our goal is to
find a fast multiplication in time O(d log d)
. This step is O(d)
and so it's already fast enough.
In general, interpolation requires a lot of steps. Given the unknown
polynomial C(x)
for which we only know:
C(x0) = y0,…, C(xd) = yd
we wish to find c0,…, cd
such that
C(x) = c0 + c1x + c2x2 +…+ cdxd
As you would expect, one plugs in and solves.
The ci
are the unknowns that we have to solve for.
After plugging into C(x) = y
for x0, x1,…,
xd
,
we have d+1
linear equations:
c0 + c1(x0) + c2(x0)2 +…+ cd(x0)d = y0
c0 + c1(x1) + c2(x1)2 +…+ cd(x1)d = y1
...
This means solving the linear simultaneous equations.
The advantage if we use
x1,…, xd/2
and
and x-1,…, x-d/2
is that we can replace
the equations to solve:
C(xi)=yi
and C(-xi)=y-i
by:
C(xi) + C(-xi) = yi + y-i
C(xi) - C(-xi) = yi - y-i
These last two equations are the same as:
2 Ceven(xi) = yi + y-i
2 Codd(xi) = yi + y-i
Now, we again plug in, but this time we have only half as many unknowns,
ci, in each equation:
2c0 + 2c2(xi)2 +…+ 2cd-1(xi)d-1 = yi + y-i
2c1x + 2c3(xi)3 +…+ 2cd(xi)d = yi - y-i
So, we still have d+1
linear equations to solve, but
each linear equation now has only half as many terms.
We have seen that the polynomial Aeven(x)
has only half
as many terms. So, we win by breaking up
A(x) = Aodd(x) + Aodd(x)
.
How can we extend this further?
The answer is to go to complex numbers. Recall that the imaginary
number i
satisfies i2 = -1
. Some simple algebra then
shows that i3 = -i
and i4 = 1
.
(Note that we will also continue to use i
for the subscript index.
It will be clear from context whether we mean i
the imaginary number,
or i
the subscript index.)
So, instead of using
A(x)
and A(-x)
for A()
and replacing them by A(x) + A(-x)
and
A(x) - A(-x)
for A()
,
let's use P(x), P(ix), P(i2x), P(i3x)
(or the equivalent P(x), P(ix), P(-x), P(-ix)
) for P()
. Let's solve for
some polynomials below.
From these, we won't construct 2Peven(x) = P(x) + P(-x)
and 2Podd(x) = P(x) - P(-x)
,
but four polynomials based on four linear combinations.
We'll guess the right polynomials below.
P(x)=1 |
P(x)=x |
P(x)=x2 |
P(x)=x3 |
P(x)=x4 |
Short Name for Expression | |
---|---|---|---|---|---|---|
P(x) + P(ix) + P(-x) + P(-ix) : |
4 | 0 | 0 | 0 | 4x4 | 4P0 mod 4(x) |
P(x) - P(ix) + P(-x) - P(-ix) : |
0 | 0 | 4x2 | 0 | 0 | 4P2 mod 4(x) |
P(x) + P(ix) - P(-x) - P(-ix) : |
0 | (2+2i)x | 0 | (2-2i)x3 | 0 | |
P(x) - P(ix) - P(-x) + P(-ix) : |
0 | (2-2i)x | 0 | (2+2i)x3 | 0 | |
P(x) + iP(ix) - P(-x) - iP(-ix) : |
0 | 0 | 0 | 4x3 | 0 | 4P3 mod 4(x) |
P(x) - iP(ix) - P(-x) + iP(-ix) : |
0 | 4x | 0 | 0 | 0 | 4P1 mod 4(x) |
The good polynomials are:
4P0 mod 4(x) = P(x) + P(ix) + P(-x) + P(-ix)
4P1 mod 4(x) = P(x) - iP(ix) - P(-x) + iP(-ix)
4P2 mod 4(x) = P(x) - P(ix) + P(-x) - P(-ix)
4P3 mod 4(x) = P(x) + iP(ix) - P(-x) - iP(-ix)
The subscript names were chosen to indicate which terms are non-zero.
For example P1 mod 4()
has terms identical to P()
for those terms of degree 1 mod 4
.
The other terms of P1 mod 4()
are all zero.
These are the polynomials for which 3/4 of the terms are zero. So, these
polynomials can be evaluated in 1/4 of the time.
Note that
P(x) = P0 mod 4(x) + P1 mod 4(x) + P2 mod 4(x) + P3 mod 4(x)
If we're going to make this work, we need an efficient way to convert
between P(x), P(ix), P(-x), P(-ix) and the four polynomials above.
This is the essence of the Fast Fourier Transform in the text.
It allows you to convert from from an ordinary polynomial P()
into the four transformed polynomials above. (This is analogous
to the previous case in which we transformed into Aeven()
and Aodd()
.) Using the transformed
polynomials, we can evaluate (Step 2) fast. Step 3 is
always fast. And we can interpolate (Step 4) fast.
We choose d+1
points
x0, x1,…,xd
at which we want to evaluate A(x)
, where
xd/4+i = i xi
for1≤i<d/2
x2d/4+i = - xi
for1≤i<d/2
x3d/4+i = -i xi
for1≤i<d/2
Note that the coefficients i
, -1, and -i are the powers of the imaginary
number i
.
We can write this as:
P(x) = P0 mod 4(x) + P1 mod 4(x) + P1 mod 4(x) + P2 mod 4(x) + P3 mod 4(x)
where
P0 mod 4(x) = c0 + c4x4 + …
P1 mod 4(x) = c1x + c5x5 + …
...
So, we do step 2 by evaluating
P0 mod 4(xi)
, P1 mod 4(xi)
,
P2 mod 4(xi)
and P3 mod 4(xi)
,
for x0, s1,…, xd/4
.
After this, we achieve evaluations of P0 mod 4(xj)
with j>d/4
for free:
P0 mod 4(xd/4+i) = P0 mod 4(ixi) = P0 mod 4(xi)
P0 mod 4(x2d/4+i) = P0 mod 4(-xi) = P0 mod 4(xi)
P0 mod 4(x3d/4+i) = P0 mod 4(-ixi) = P0 mod 4(xi)
and evaluations of P1 mod 4(xj)
with j>d/4
for free:
P1 mod 4(xd/4+i) = P1 mod 4(ixi) = i P1 mod 4(xi)
P1 mod 4(x2d/4+i) = P1 mod 4(-xi) = - P1 mod 4(xi)
P1 mod 4(x3d/4+i) = P1 mod 4(-ixi) = -i P1 mod 4(xi)
and evaluations of P2 mod 4(xj)
with j>d/4
for free:
P2 mod 4(xd/4+i) = P2 mod 4(ixi) = - P2 mod 4(xi)
P2 mod 4(x2d/4+i) = P2 mod 4(-xi) = P2 mod 4(xi)
P2 mod 4(x3d/4+i) = P2 mod 4(-ixi) = - P2 mod 4(xi)
and evaluations of P3 mod 4(xj)
with j>d/4
for free:
P3 mod 4(xd/4+i) = P3 mod 4(ixi) = -i P3 mod 4(xi)
P3 mod 4(x2d/4+i) = P3 mod 4(-xi) = - P3 mod 4(xi)
P3 mod 4(x3d/4+i) = P3 mod 4(-ixi) = i P3 mod 4(xi)
Then P(x)
is computed as:
P(x) = P0 mod 4(x) + P1 mod 4(x) + P1 mod 4(x) + P2 mod 4(x) + P3 mod 4(x)
P0 mod 4()
, P1 mod 4()
, P2 mod 4()
and P3 mod 4()
can each be evaluated fast, since each has only 1/4 as many terms.
Each of the four functions is applied to each of x0, x1,…, xd/4
.
So, this requires an evaluation of (d+1) (d/4)
terms.
Do likewise for another polynomial, Q(x)
.
Multiply at the d+1
points xi
.
R(xi) = P(xi) Q(xi)
When we are done, we have computed constants yi
for which:
R(xi) = yi
This requires d+1
multiplications. Our goal is to
find a fast multiplication in time O(d log d)
. This step is O(d)
and so it's already fast enough.
Recall that with the interpolation phase of
Aeven()
and Aodd()
, we needed to determine
the unknown coefficients c0, c1,…,cd
for the polynomial C(x)
, given that:
C(xi) = yi
We again need to do again find unknown coefficients
c0, c1,…,cd
, but now for:
R(xi) = yi
We use the good polynomials found from the table at the beginning of this subsection:
4R0 mod 4(x) = R(x) + R(ix) + R(-x) + R(-ix)
4R1 mod 4(x) = R(x) - iR(ix) - R(-x) + iR(-ix)
4R2 mod 4(x) = R(x) - R(ix) + R(-x) - R(-ix)
4R3 mod 4(x) = R(x) + iR(ix) - R(-x) - iR(-ix)
We expand:
4R0 mod 4(xi)
= R(xi) + R(ixi) + R(-xi) + R(-ixi)
= R(xi) + R(xd/4+i) + R(x2d/4+i) + R(x3d/4+i)
= yi + yd/4+i + y2d/4+i + y3d/4+i
4R1 mod 4(x)
= R(xi) - iR(ixi) - R(-xi) + iR(-ixi)
= yi - iyd/4+i - y2d/4+i + iy3d/4+i
4R2 mod 4(xi)
= R(xi) - R(ixi) + R(-xi) - R(-ixi) = yi - yd/4+i + y2d/4+i - y3d/4+i
4R3 mod 4(xi)
= R(xi) + iR(ixi) - R(-xi) - iR(-ixi)
= yi + iyd/4+i - y2d/4+i - iy3d/4+i
So, we have to find the unknowns, ci
:
c0 + c4xi4 + … = yi + yd/4+i + y2d/4+i + y3d/4+i
c1xi + c5xi5 + … = yi - iyd/4+i - y2d/4+i + iy3d/4+i
c2xi + c6xi6 + … = yi - yd/4+i + y2d/4+i - y3d/4+i
c3xi + c7xi7 + … = yi + iyd/4+i - y2d/4+i - iy3d/4+i
But when we do this step, each equation already has 3/4 of the
coefficients set to zero, and so each of the d+1
linear equations will
have only 1/4 as many terms.
Furthermore, there are savings in computing the sums of the yi
on the right hand side, since many of the partial sums are repeated.
In the general case, there will be only O(d log d)
sums of the
yi
.
From the previous two subsections (case of Aeven(), Aodd()
,
and case of P0 mod 4(), P1 mod 4(),
P2 mod 4(), P3 mod 4()
), we can derive
the general pattern.
The textbook does this in a somewhat more abstract (and more compact)
manner, by using recursion. Now read the general case there.
In converting between our notation and theirs, it's important to
recognize that the textbook uses Ae()
and Ao()
,
where:
Aeven(x) = Ae(x2)
, and
Aodd(x) = xAo(x2)
.
In general, if the polynomial is of degree d = n-1
for some
n = 2k
, then we let ω = e2πin/k and choose
xi = ωi
:
x0 = 1
x1 = ω = e2πi(n/k)
x2 = ω2 = e2πi(2n/k)
x3 = ω3 = e2πi(3n/k)
...
For further reading about fast multiplication (both for integers and polynomials), try http://en.wikipedia.org/wiki/Multiplication_algorithm.