Cache Basics
by Gene Cooperman
Copyright (c) 2003, Gene Cooperman;
Rights to copy for non-commercial purposes are freely granted as long
as this copyright notice remains.
Basic Concepts
The cache holds data destined for RAM, but it is faster than RAM.
For simplicity, we assume a single cache (L1 only) below.
We begin by describing a direct-mapped cache (1-way set associative).
A cache is divided into cache blocks (also known as cache
lines). To fully specify a cache, you should specify
- the size of a cache,
- the size of a cache block,
- the set associativity,
- the cache write policy (write-through vs. write-back),
- and the cache set replacement policy (if the cache is set associative).
We will say that the n-th block of a cache is at index n.
(Some texts call this the group number or the set number.)
A CPU address can be decoded into the fields used by a cache.
The fields of a cache are (from highest order bits to lowest):
- tag
- index
- offset (within a cache block)
A cache address
can be specified simply by index and offset.
The tag is kept to allow the cache to translate from a cache
address (tag, index, and offset) to a unique CPU address.
A cache hit means that the CPU tried to access an address, and
a matching cache block (index, offset, and matching tag) was available in
cache. So, the cache did not need to access RAM. In a cache miss,
the CPU tries to access an address, and there is no matching cache block.
So, the cache is forced to access RAM.
Example
The original Pentium 4 had a 4-way set associative L1 data cache of size
8 KB with 64 byte cache blocks. Hence, there are 8KB/64
= 128 cache blocks. If it's 4-way set associative, this
implies 128/4=32 sets (and hence 2^5 = 32 different indices).
There are 64=2^6 possible offsets.
Since the CPU address is
32 bits, this implies 32=21+5+6, and hence 21 bits of tag field.
The original Pentium 4 also had an 8-way set associative L2 integrated cache
of size 256 KB with 128 byte cache blocks. This implies
32=17+8+7, and hence 17 bits of tag field.
State Transitions (write-back, write-allocate, direct-mapped cache)
Every cache block has associated with it at least the Modify and
Valid bits, and
a tag address. The Valid bit says if the cache block is used (has
valid data) or is unused. The Modify bit makes sense only if the Valid
bit is set. The Modify bit says whether the data in the cache block
is different from RAM (modified) or is the same as RAM.
A tag
is said to be a matching tag if, after decoding a CPU address
from a pending read or write, the tag field of the CPU address
matches the tag associated with a cache block at the cache address
given by (index, offset).
Note that an optimized cache will take shortcuts, as compared to the
state transitions below. But it will result in the same final state.
- V = 0, read/write pending
-
Read from RAM into cache block;
Set Tag according to RAM address; Set V = 1; Set M = 0;
Go to V=1, M=0, tag matching, either read or write pending
- V = 1, M = 0, tag matching, read pending
-
Read from portion of cache block to CPU; Done
- V = 1, M = 0, tag matching, write pending
-
Write from CPU to portion of cache block; Set M = 1; Done
- V = 1, M = 0, tag not matching, read/write pendingu
-
Set V = 0, Go to V = 0, read/write pending
- V = 1, M = 1, tag matching, read pending
-
Read from portion of cache block to CPU; Done
- V = 1, M = 1, tag matching, write pending
-
Write from CPU to portion of cache block; Done
- V = 1, M = 1, tag not matching, read/write pending
-
Write cache block to RAM;
Set V = 0; Go to V = 0, read/write pending
Write from CPU to portion of cache block; Done
Set Associativity
If the cache is n-way set associative, then a cache address (index, offset)
specifies not just one cache block, but a set of n cache blocks.
On a read from or write to cache, if any cache block in the set has a matching
tag, then it is a cache hit, and that cache block is used.
If no cache block in the set has a matching tag, then it is a cache
miss, and one of the cache blocks of the set is chosen for replacement.
Data from that cache block is written back to RAM to make room for
new data exactly as in the case of tag not matching in the state transitions.
Cache Policies
A write-through cache writes a cache block back immediately after
the CPU write to the cache block. A write-back cache writes
the cache block back later. Most caches are write-allocate
caches. This means that if the CPU attempts to write, and the cache
contains no matching cache block, then the cache will first allocate
(load) the matching cache block from RAM. Only after that can the
CPU write to the cache. Most caches today are also non-blocking
or lockup-free. This means that in case of a cache miss on
write, the cache will buffer the CPU request to write to cache, and allow
the CPU to continue.
If the cache is set associative and if a cache miss occurs, then the
cache set replacement policy determines which cache block is
chosen for replacement. A common choice is some approximation of
LRU (Least Recently Used).
Other Concepts
CPUs today often have L1 (level one), L2, and possibly L3 caches.
They are typically organized as inclusive caches. In this
case, the L1 cache holds a fast copy of data for L2 cache, the L2 cache
for L3, and the L3 cache holds a fast copy of data for RAM.
If the cache sizes are L1: 32 KB, L2: 512 KB, L3: 4 MB,
then the caches hold 4 MB of data from RAM.
In an exclusive cache, no two caches hold the same data from RAM.
If the cache sizes are L1: 32 KB, L2: 512 KB, L3: 4 MB,
then the caches hold 4.53 MB od data from RAM. As of this
writing (2003), the AMD processors use an exclusive cache, while the
Intel processors use an inclusive cache.