Why cache is faster




















Cache memory in computer systems is used to improve system performance. Cache memory operates in the same way as RAM in that it is volatile. When the system is shutdown the contents of cache memory are cleared.

Cache memory allows for faster access to data for two reasons:. The process of refreshing RAM means that it takes longer to retrieve data from main memory. Cache memory will copy the contents of some data held in RAM.

To simplify the process, it works on the understanding that most programs store data in sequence. If the processor is currently processing data from locations 0 — 32, cache memory will copy the contents of locations 64 in anticipation that they would be needed next. If you are confused about cache memory, I suggest you read the top part of this story.

Locality is a fancy way of saying data that is "close together," either in time or space. Caching with a smaller, faster but generally more expensive memory works because typically a relatively small amount of the overall data is the data that is being accessed the most often.

DRAM today has a cycle time of around 70ns. Cache is on-die static RAM and has an access time of around 6ns. From Computer Science Wiki. This is a basic concept in computer science. This is a large part of why being close to CPU as L1 cache is , allows memory to be faster.

Other answers already covered all the relevant bits: locality and the associated data transfer cost, bus width and clock, and so on ; speed of light again, associated to transfer costs and bus width and throughput ; different memory technology SRAM vs. One bit that was left out and it's just mentioned in Darkhogg comment: larger caches have better hit rates but longer latency. Multiple levels of cache where introduced also to address this tradeoff.

There is an excellent question and answer on this point on electronics SE. From the answers, it seems to me that a point to be highlighted is: the logic which performs all the required operations for a cache read is not that simple especially if the cache is set-associative, like most caches today.

It requires gates, and logic. So, even if we rule out cost and die space. If someone would try to implement a ridiculously large L1 cache, the logic which performs all the required operations for a cache read would also become large.

At some point, the propagation delay through all this logic would be too long and the operations which had taken just a single clock cycle beforehand would have to be split into several clock cycles.

This will rise the latency. There are a lot of good points raised in the other answers, but one factor appears to be missing: address decoding latency. The following is a vast oversimplification of how memory address decoding works, but it gives a good idea of why large DRAM chips ar generally quite slow.

When the processor needs to access memory, it sends a command to the memory chip to select the specific word it wants to use. This command is called a Column Address Select we'll ignore row addresses for now. The memory chip now has to activate the column requested, which it does by sending the address down a cascade of logic gates to make a single write that connects to all the cells in the column.

Depending on how it's implemented, there will be a certain amount of delay for each bit of address until the result comes out the other end. This is called the CAS latency of the memory. Because those bits have to be examined sequentially, this process takes a lot longer than a processor cycle which usually has only a few transistors in sequence to wait for. It also takes a lot longer than a bus cycle which is usually a few times slower than a processor cycle.

A CAS command on a typical memory chip is likely to take on the order of 5ns IIRC - it's been a while since I looked at timings , which is more than an order of magnitude slower than a, processor cycle. Fortunately, we break addresses into three parts column, row, and bank which allows each part to be smaller and process those parts concurrently, otherwise the latency would be even longer. Processor cache, however, does not have this problem.

Not only is it much smaller, so address translation is an easier job, it actually doesn't need to translate more than a small fragment of the address in some variants, none of it at all because it is associative. That means that along side each cached line of memory, there are extra memory cells that store part or all of the address. Obviously this makes the cache even more expensive, but it means that all of the cells can be queried to see whether they have the particular line of memory we want simultaneously, and then the only one hopefully that has the right data will dump it onto a bus that connects the entire memory to the main processor core.

This happens in less than a cycle, because it is much simpler. One of the philosophies I studied was the obtain-maximum-throughput-in-minimum- hardware movement when we talk about any cache based memory, be it CPU cache, buffer cache or memory cache for that purpose.

The CPU cache is a smaller, faster memory space which stores copies of the data from the most recently used main memory locations. The buffer cache is a main memory area which stores copies of the data from the most recently used disk locations.

The browser cache is directory or similar space which stores copies of the data from the most recently visited websites by users. Reference: How Computer Memory Works. Sign up to join this community. The best answers are voted up and rise to the top. Stack Overflow for Teams — Collaborate and share knowledge with a private group. Create a free Team What is Teams? Learn more. Why is CPU cache memory so fast?



0コメント

  • 1000 / 1000