The problem with SRAM in these smaller geometries is one of leakage, which is not getting any better. And it’s hard to drop the voltage and maintain good read and write performance. As MRAM is non-volatile, it’s possible to drop power to the cells when they are not being read. But the write power alone can approach the leakage power of an equivalent SRAM array for typical L1 cache traffic. By adding a further layer of memory hierarchy, the researchers reckon much of the write power overhead can be avoided. The technique they tried was not to implement a small ‘L0′ cache but it’s not far off. The proposed architecture is based on two small SRAM buffers that store the most recently accessed blocks of data, taking advantage of access locality in most programs: a lot of the write activity will be to the same few variables when a loop is being executed. There is little point in allowing all those writes make it to the MRAM when the data will most likely be overwritten in a few cycles’ time. The architecture used has both a filter buffer and a victim buffer, similar to the victim caches used to improve the performance of small L1 caches where two frequently accessed blocks of data wind up being hashed to the same cache line. The filter buffer responds to most of the cache accesses during normal operation. Because the filter buffer is direct mapped, there is a strong chance of cache line conflicts so the victim buffer is there to help act as an overspill for the unlucky blocks of data. A filter buffer of 2Kbyte with a victim buffer of 512byte was found to be relatively insensitive to typical cache sizes, ranging from 16Kbyte to 64Kbyte – hit rate hovered very close to 96.25 per cent for three cache sizes in that range. Performance varied with benchmarks, as you would expect. Gcc did not perform well, for example. However, on average, the extra buffers used with MRAM halved the area needed for the cache – although this has to be weighed against increased process complexity – and cut overall power consumption by more than 75 per cent. One other advantage of using MRAM is that it is far more resilient to soft errors than SRAM. This may not matter much for typical L1 cache sizes in mainstream systems but may prove valuable for reliable hardware that is likely to be used at high altitudes or other radiation-prone environments.
Magnetic RAM: an unexpected choice for low-power cache memory
Magnetic memory (MRAM) was not at my list of candidates for future cache memories but in a paper for the latest issue of IEEE Transactions on VLSI Systems, a group of engineers from Xi’an Jiaotong University make the case for using the memory to replace SRAM in level-one (L1) caches despite a very obvious drawback: the write speed and power consumption of MRAM are not good at all. Hongbin Sun and colleagues focus their attention on the more recently variant of MRAM, spin-torque transfer (STT) rather than traditional MRAM, which does not scale so well with process. After all, it’s going to take a while for MRAM to become a serious candidate for cache memory in mainstream SoCs. However, STT has some crossover with memristor technologies and seems to have potential beyond 28nm or 22nm.