The CPU can only directly fetch instructions and data from cache memory, located directly on the processor chip. Cache memory must be loaded in from the main system memory (the Random Access Memory, or RAM). RAM however, only retains it's contents when the power is on, so needs to be stored on more permanent storage.
We call these layers of memory the memory hierarchy
Table 3-1. Memory Hierarchy
Speed | Memory | Description |
---|---|---|
Fastest | Cache | Cache memory is memory actually embedded inside the CPU. Cache memory is very fast, typically taking only once cycle to access, but since it is embedded directly into the CPU there is a limit to how big it can be. In fact, there are several sub-levels of cache memory (termed L1, L2, L3) all with slightly increasing speeds. |
RAM | All instructions and storage addresses for the processor must come from RAM. Although RAM is very fast, there is still some significant time taken for the CPU to access it (this is termed latency). RAM is stored in separate, dedicated chips attached to the motherboard, meaning it is much larger than cache memory. | |
Slowest | Disk | We are all familiar with software arriving on a floppy disk or CDROM, and saving our files to the hard disk. We are also familiar with the long time a program can take to load from the hard disk -- having physical mechanisms such as spinning disks and moving heads means disks are the slowest form of storage. But they are also by far the largest form of storage. |
The important point to know about the memory hierarchy is the trade offs between speed an size -- the faster the memory the smaller it is. Of course, if you can find a way to change this equation, you'll end up a billionaire!
Cache is one of the most important elements of the CPU architecture. To write efficient code developers need to have an understanding of how the cache in their systems works.
The cache is a very fast copy of the slower main system memory. Cache is much smaller than main memories because it is included inside the processor chip alongside the registers and processor logic. This is prime real estate in computing terms, and there are both economic and physical limits to it's maximum size. As manufacturers find more and more ways to cram more and more transistors onto a chip cache sizes grow considerably, but even the largest caches are tens of megabytes, rather than the gigabytes of main memory or terrabytes of hard disk otherwise common. (XXXexample)
The cache is made up of small chunks of mirrored main memory. The size of these chunks is called the line size, and is typically something like 64 kilobytes. When talking about cache, it is very common to talk about the line size, or a cache line, which refers to one chunk of mirrored main memory. The cache can only load and store memory in sizes a multiple of a cache line.
As the cache is quite small compared to main memory, it will obviously fill up quite quickly as a process goes about its execution. Once the cache is full the processor needs to get rid of a line to make room for a new line. There are many algorithms by which the processor can choose which line to evict; for example least recently used (LRU) is an algorithm where the oldest unused line is discarded to make room for the new line.
When data is only read from the cache there is no need to ensure consistency with main memory. However, when the processor starts writing to cache lines it needs to make some decisions about how to update the underlying main memory. A write-through cache will write the changes directly into the main system memory as the processor updates the cache. This is slower since the process of writing to the main memory is, as we have seen, slower. Alternatively a write-back cache delays writing the changes to RAM until absolutely necessary. The obvious advantage is that less main memory access is required when cache entries are written. Cache lines that have been written but not committed to memory are referred to as dirty. The disadvantage is that when a cache entry is evicted, it may require two memory accesses (one to write dirty data main memory, and another to load the new data).
XXX: associativity, thrashing?