You hear the word all the time but you’re not quite sure what it means. Let’s see what this article can do to help:
Let’s start out by explaining a developing problem in system design: at our present moment, scientists have found ways to greatly increase CPU (computer processing unit) clockspeed and performance, but the technology regarding equally speedy memory remains a bun in the oven. That leaves scientists scrambling to find a way to make sure their CPU speeds make a difference in overall computer function.
A common analogy used to explain the problem and solution involves a downtown furniture workshop and a lumberyard that keeps moving further and further out of the city and into rural land. That movement represents the increasing divide between the speeds of the CPU and of the memory (the memory is the lumberyard). No matter what size the trucks are that ship lumber from the lumberyard to the furniture shop, they’re going to take longer and longer to arrive after the furniture shop files its order.
Ok, conflict understood. Possible solution: rent out a smaller warehouse in the city and have it act as a cache for the workshop; it could have a driver on-hand who could drive out and get whatever the furniture shop needs whenever a need springs up. The bigger the cache, the better, because it will store more of all the raw materials that the furniture shop could possibly need.
Now think of that city warehouse as the level 1 (L1) cache. The L1 cache can be accessed extremely easily by the CPU, so it’s a sensible place to store of all the most relevant and predictably necessary data. The L1 is able to be so quick because it’s made of the fastest and most expensive type of static random-access memory (SRAM). The 4 to 6 transistors that make up every SRAM trump the one-transistor-per-cell of dynamic random-access memory (DRAM), but they also cost quite a lot more, so engineers generally want to be conservative with them.
When the processor reaches for data that isn’t in the L1, it’s called a cache miss. This is definitely a situation worth avoiding seeing as people that pay for an ultra-high-clockrate processor like the P4 don’ expect to be forced to wait for the time it takes for data to load from the main memory (it’s not a simple matter of waiting for something to load; the time that that takes may keep a program from functioning properly).
The solution is to build a second cache. Expanding L1 is only an option so much as you’re willing to pay for more and more of those expensive transistors. If you want faster access to memory but don’t want to foot the bill, you can build an L2 cache that sits between L1 and the main memory. Hence the cache (or memory) hierarchy begins to form. The L2 isn’t as fast as the L1, and the L3 isn’t as fast as the L2, but each slower tier also holds less relevant information. At the end of the day, data is still transferred more efficiently with the tiers than without them.