Notes by Gene Cooperman, © 2009 (may be freely copied as long as this copyright notice remains)
(An alternative model of computation from the one above might be used for very large integers. In that case, we might decide that accessing one digit or a sum, product, etc., of two digits is to be the unit of work. This model is better if we are thinking about integers that are much larger than what fits in the primitive types of our computer language.)
Note that a fast CPU will execute a unit of work faster than a slow CPU.
We measure time in units of work instead of CPU-seconds, so as to make
the complexity
analysis independent of the type of CPU. Because complexities will usually
be quoted using O()
notation anyway, any constant factor of speedup by a
particular CPU can be ignored. (See definition of O()
below.)
The size of the input is usually the number of units of work required
to access or copy the full input. For example, if the input
consists of an array with n
entries, we'll usually say that the
input is of size n
.
We want to know the time complexity of an algorithm (number of units
of work consumed by the pseudo-code of the algorithm) in order to
produce the output. For an input of size n
, we say that
the time of the algorithm, T(n)
, is some function of n
.
Since two people may have slightly different numbers depending
on what they consider to be a memory access, comparison, etc.
So, we use O()
notation, T(n) = O(…)
, so that
those different constants won't matter.
Pseudo-code should be as short as possible and as close to plain English as possible, while still maintaining precision. Pseudo-code is precise if, after any two competent programmers convert the pseudo-code to an implementation, the two implementations would be substantially equivalent.
If lim n→ ∞( g(x) / f(x) )
is equal
to some positive constant, then we say that
g(x) = O(f(x))
.
Hence, cf(x)
(for some constant c
)
is an upper bound for g(x)
in the limit of large n
.
Note that there are many possible g(x)
that have cf(x)
as
an upper bound. Hence, O(f(x))
is not a single function,
but really a family of functions.
Other notations, such as Ω() (lower bound) and Θ() (upper and lower bound) also exist, but we will not define them here. O() is especially common, since computer scientists are most interested in an upper bound on the running time.
A recurrence relation is an an equation for the time consumed by a recursive algorithm. (See example below.)
The textbook recommends the "Master Theorem". However, the
Master Theorem only handles certain cases, and it produces a
O()
complexity formula that hides the constants. A "Guess and Check"
method will show the constants, and it will allow you to produce answers
where the Master Theorem doesn't apply.
Let T(n)
be the time (number of units of work) for pseudo-code to
produce an answer. Consider the following example recurrence relation.
T(n) = 3T(n/2)+2n
(This can be interpreted as follow: An algorithm is defined
recursively. The algorithm with input of size n
recursively divides a problem into 3 subproblems, each with
input of size n/2
. The algorithm requires up to 2n
units
of work to split the problem into subproblems and up to 2n
units
of work to combine the answers from the subproblems.
Let us guess T(n) = c nk
for constants c
and k
.
If our guess is correct, we will solve for the constants. Since we
only care about complexity in the case of large n
, let's solve
the recurrence relation in that limit. If we take
lim n→ ∞
of the left hand side (l.h.s.)
and right hand side (r.h.s.) of the recurrence relation, we only get
infinity = infinity.
For the example recurrence relation, we first divide both sides by the
r.h.s. So, we will take the limit of the ratio of the l.h.s and
the r.h.s. The recurrence relation now looks as follows.
lim n→ ∞ l.h.s. / r.h.s. = 1
Let's try our guess, T(n) = c nk
.
lim n→ ∞ T(n) / (3T(n/2)+2n)
= cnk / (3c(n/2)k +2n)
Taking a limit now would yield infinity / infinity.
So, we divide top and bottom by cnk
.
lim n→ ∞ cnk / (3(cn/2)k +2n)
= lim n→ ∞ 1 / (3/2k + 2n/(cnk))
We want the limit to equal 1. If k<1
, then the limit equal zero.
So, we have to try k=1
or k>1
. For k=1
, the limit is less than one.
So, we need k>1
. Now, in the limit, 2n/(cnk)
goes to zero.
So, for 'k>1', the limit is: 1 / (3/2k)
.
We want the limit to equal 1
. So,
1 / (3/2k) = 1
2k = 3
So, k=log23
and T(n) = nlog23
.
In this case, the constant c
is arbitrary, and we would compute
it (if we cared) only by examining a base case, or by specializing
to T(n) = cnlog23
, and subtracting an
equal dominant term from left and right hand side. In any case,
T(n) = O( nlog23 ) ≈ O( n1.584 )
Note: Sometimes you will find that the constant k
cancels out,
and you can't solve for it. In this case, you will usually find that
the l.h.s and the r.h.s of the recurrence relation have a common
dominant term. Subtract the dominant term from the l.h.s. and r.h.s.,
and then take the limit of the ratio of the remaining l.h.s and r.h.s.