Yuhao Zhu

A Framework For Understanding Cache Coherence

Aug 12, 2014

The Premise

The correctness of shared-memory processor means each processor, or a process/thread from a programmer’s perspective, has a coherent view of the memory, i.e., they agree upon the value of a particular memory location, i.e., it is not possible that there are two “correct” values of an address at any given point. Whatever the cache coherence protocol does, it has to guarantee a consistent memory view across all the processors and memory.

To ensure correctness, cache coherence has to ensure the following two things:

The Framework

Any cache coherence implementation has three levels: protocol, communication, interconnection. The three levels are orthogonal to each other. Any cache coherence implementation is a certain permutation of the three.

Protocol: MSI => MESI/MOSI => MOESI

Communication: two flavors

Interconnection: two flavors

At the protocol layer: the protocol is to mainly meant to guarantee correctness (one can certainly optimizes it for better performance, but it depends on the specific choice of the other two layers). Essentially it has to ensure the two write properties (write propagation and write serialization).

At the communication layer: The goal of the communication layer is to provide the communication substrate to correctly implement the protocol since processors have to somehow talk to each other to achieve a consistent view of the memory. The protocol only cares about each individual processor. It is the communication layer that enables realistic implementation. Note that the communication scheme is independent of the underlying interconnection. It is a conceptual layer. But of course some interconnects with best with some communication scheme.

In summary, the choice between the two communication schemes are really a tradeoff between bandwidth, latency, and easy of ordering control.

At the interconnection layer: the particular communication scheme of the protocol is strongly tied to the underlying interconnection of the system. It is the interconnection that finally decides how the coherency is implemented.

Case-studies of Some Cache Coherence Implementations

The traditional snoopy based cache coherence uses:

The traditional directory based cache coherence (e.g., MIPS R10k and Origin 2000) uses:

The destination-set prediction (DSP) based implementation uses

The token coherence uses: