ARP: An adaptive runtime mechanism to partition shared cache in SMT architecture

Xiufeng Sui*, Junmin Wu, Guoliang Chen

*此作品的通讯作者

科研成果: 期刊稿件文章同行评审

1 引用 (Scopus)

摘要

Simultaneous multithreading is a latency-tolerant architecture that usually employs shared L2 cache. It can execute multiple instructions from multiple threads each cycle, thus increasing the pressure on memory hierarchy. In this paper, the problem of partitioning a shared cache between multiple concurrently executing threads in the SMT architecture, especially the issue of fairness in cache sharing and its relation to throughput are studied. The commonly used LRU policy implicitly partitions a shared cache on a demand basis, giving more cache resources to the application that has a high demand. LRU manages the cache unfairly, and will lead to some serious problems, such as thread starvation and priority inversion. An adaptive runtime partition (ARP) mechanism is implemented to manage the shared cache. ARP takes fairness as the metric of cache partitioning, and uses dynamic partitioning algorithm to optimize fairness. The dynamic partitioning algorithm is easy to implement, requires little or no profiling. Meanwhile it uses a classical monitor circuit to collect the stack distance information of each thread, and requires less than 0.25% of storage overhead. The evaluation shows that on the average, ARP improves the fairness of a 2-way SMT by a factor of 2.26, while increasing the throughput by 14.75%, compared with LRU-based cache partitioning.

源语言英语
页(从-至)1269-1277
页数9
期刊Jisuanji Yanjiu yu Fazhan/Computer Research and Development
45
7
出版状态已出版 - 7月 2008
已对外发布

指纹

探究 'ARP: An adaptive runtime mechanism to partition shared cache in SMT architecture' 的科研主题。它们共同构成独一无二的指纹。

引用此