“Architecting Energy-Efficient STT-RAM Based Register File on GPGPUs via Delta Compression”, Hang Zhang, Xuhao Chen, Nong Xiao, Fang Liu2016-06-05 (; similar)⁠:

To facilitate efficient context switches, GPUs usually employ a large-capacity register file to accommodate a massive amount of context information. However, the large register file introduces high power consumption, flowing to high leakage power SRAM cells. Emerging non-volatile STT-RAM memory has recently been studied as a potential replacement to alleviate the leakage challenge when constructing register files on GPUs. Unfortunately, due to the long write latency and high energy consumption associated with write operations in STT-RAM, simply replacing SRAM with STT-RAM for register files would incur non-trivial performance overhead and only bring marginal energy benefits.

In this paper, we propose to optimize STT-RAM based GPU register files for better energy-efficiency and performance via 2 techniques. First, we employ a light-weight compression framework with awareness of register value similarity. It is coupled with a group-based write driver control to mitigate the high energy overhead caused by STT-RAM writes. Second, to address the long write latency overhead of STT-RAM, we propose a centralized SRAM-based write buffer design to efficiently absorb STT-RAM writes with better buffer usage, rather than the conventional design with distributed per-bank based write buffers. The experimental results show that our STT-RAM based register file design consumes only 37.4% energy over the SRAM baseline, while incurring only negligible performance degradation.