A utility based cache optimization mechanism for multi-thread workloads

Yixuan Tang; Junmin Wu; Guoliang Chen; Xiufeng Sui; Jing Huang

A utility based cache optimization mechanism for multi-thread workloads

Yixuan Tang^*, Junmin Wu, Guoliang Chen, Xiufeng Sui, Jing Huang

^*此作品的通讯作者

科研成果: 期刊稿件 › 文章 › 同行评审

1 引用（Scopus）

摘要

Modern multi-core processors usually employ shared level 2 cache to support fast data access among concurrent threads. However, under the pressure of high resource demand, the commonly used LRU policy may result in interferences among threads and degrades the overall performance. Partitioning the shared cache is a relatively flexible resource allocation method, but most previous partition approaches aimed at multi-programmed workloads and they ignored the difference between shared and private data access patterns of multi-threaded workloads, leading to the utility decrease of the shared data. Most traditional cache partitioning methods aim at single memory access pattern, and neglect the frequency and recency information of cachelines. In this paper, we study the access characteristics of private and shared data in multi-thread workloads, and propose a utility-based pseudo partition cache partitioning mechanism (UPP). UPP dynamically collects utility information of each thread and shared data, and takes the overall marginal utility as the metric of cache partitioning. Besides, UPP exploits both frequency and recency information of a workload simultaneously, in order to evict dead cachelines early and filter less reused blocks through dynamic insertion and promotion mechanism.

源语言	英语
页（从-至）	170-180
页数	11
期刊	Jisuanji Yanjiu yu Fazhan/Computer Research and Development
卷	50
期	1
出版状态	已出版 - 1月 2013
已对外发布	是

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{2762dd25fee2453c8283440926c77d74,

title = "A utility based cache optimization mechanism for multi-thread workloads",

abstract = "Modern multi-core processors usually employ shared level 2 cache to support fast data access among concurrent threads. However, under the pressure of high resource demand, the commonly used LRU policy may result in interferences among threads and degrades the overall performance. Partitioning the shared cache is a relatively flexible resource allocation method, but most previous partition approaches aimed at multi-programmed workloads and they ignored the difference between shared and private data access patterns of multi-threaded workloads, leading to the utility decrease of the shared data. Most traditional cache partitioning methods aim at single memory access pattern, and neglect the frequency and recency information of cachelines. In this paper, we study the access characteristics of private and shared data in multi-thread workloads, and propose a utility-based pseudo partition cache partitioning mechanism (UPP). UPP dynamically collects utility information of each thread and shared data, and takes the overall marginal utility as the metric of cache partitioning. Besides, UPP exploits both frequency and recency information of a workload simultaneously, in order to evict dead cachelines early and filter less reused blocks through dynamic insertion and promotion mechanism.",

keywords = "Insertion policy, Multi-core processors, Multi-threaded program, Replacement algorithm, Shared cache partitioning",

author = "Yixuan Tang and Junmin Wu and Guoliang Chen and Xiufeng Sui and Jing Huang",

year = "2013",

month = jan,

language = "English",

volume = "50",

pages = "170--180",

journal = "Jisuanji Yanjiu yu Fazhan/Computer Research and Development",

issn = "1000-1239",

publisher = "Science China Press",

number = "1",

}

TY - JOUR

T1 - A utility based cache optimization mechanism for multi-thread workloads

AU - Tang, Yixuan

AU - Wu, Junmin

AU - Chen, Guoliang

AU - Sui, Xiufeng

AU - Huang, Jing

PY - 2013/1

Y1 - 2013/1

N2 - Modern multi-core processors usually employ shared level 2 cache to support fast data access among concurrent threads. However, under the pressure of high resource demand, the commonly used LRU policy may result in interferences among threads and degrades the overall performance. Partitioning the shared cache is a relatively flexible resource allocation method, but most previous partition approaches aimed at multi-programmed workloads and they ignored the difference between shared and private data access patterns of multi-threaded workloads, leading to the utility decrease of the shared data. Most traditional cache partitioning methods aim at single memory access pattern, and neglect the frequency and recency information of cachelines. In this paper, we study the access characteristics of private and shared data in multi-thread workloads, and propose a utility-based pseudo partition cache partitioning mechanism (UPP). UPP dynamically collects utility information of each thread and shared data, and takes the overall marginal utility as the metric of cache partitioning. Besides, UPP exploits both frequency and recency information of a workload simultaneously, in order to evict dead cachelines early and filter less reused blocks through dynamic insertion and promotion mechanism.

AB - Modern multi-core processors usually employ shared level 2 cache to support fast data access among concurrent threads. However, under the pressure of high resource demand, the commonly used LRU policy may result in interferences among threads and degrades the overall performance. Partitioning the shared cache is a relatively flexible resource allocation method, but most previous partition approaches aimed at multi-programmed workloads and they ignored the difference between shared and private data access patterns of multi-threaded workloads, leading to the utility decrease of the shared data. Most traditional cache partitioning methods aim at single memory access pattern, and neglect the frequency and recency information of cachelines. In this paper, we study the access characteristics of private and shared data in multi-thread workloads, and propose a utility-based pseudo partition cache partitioning mechanism (UPP). UPP dynamically collects utility information of each thread and shared data, and takes the overall marginal utility as the metric of cache partitioning. Besides, UPP exploits both frequency and recency information of a workload simultaneously, in order to evict dead cachelines early and filter less reused blocks through dynamic insertion and promotion mechanism.

KW - Insertion policy

KW - Multi-core processors

KW - Multi-threaded program

KW - Replacement algorithm

KW - Shared cache partitioning

UR - http://www.scopus.com/inward/record.url?scp=84874676743&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:84874676743

SN - 1000-1239

VL - 50

SP - 170

EP - 180

JO - Jisuanji Yanjiu yu Fazhan/Computer Research and Development

JF - Jisuanji Yanjiu yu Fazhan/Computer Research and Development

IS - 1

ER -

A utility based cache optimization mechanism for multi-thread workloads

摘要

其它文件与链接

指纹

引用此