TY - GEN
T1 - ARP
T2 - 2010 International Conference on Electronics and Information Engineering, ICEIE 2010
AU - Tang, Yixuan
AU - Wu, Junmin
AU - Sui, Xiufeng
AU - Chen, Guoliang
AU - Yin, Wei
AU - Jin, Yingqi
PY - 2010
Y1 - 2010
N2 - With growth of on-chip communication delays and working sets of commercial and scientific workloads, L2 caches of Chip Multiprocessors (CMPs) are subject to heave pressure. Basically, there are two kinds of designs for L2 cache. First, using shared L2 cache to maximize the aggregate cache capacity and minimize off-chip memory requests. Second, using private L2 cache to minimize delays on global wires and cache access time. Recent hybrid designs offer replication to balance latency and capacity, however it requires complicated lookup and coherence mechanisms that increase latency or fail to optimize core counts. Our experiments with tiled architecture show that communication traffic of each tile is imbalance and, utilization of each L2 cache is significant different. Based on this observation, we propose a novel adaptive replication policy (ARP) based on tiled shared caches, a mechanism that regularly checks workload behavior to control replication. ARP replicates cache blocks only when the benefit of replication is larger than the cost. Simulations of 16-core CMPs shows that ARP provides better performance: communication traffic is reduced by 3%-48%, average access distance is reduced by 3%-52%, and utilization ratio of aggregate L2 caches capacity is increased by 60%-350%.
AB - With growth of on-chip communication delays and working sets of commercial and scientific workloads, L2 caches of Chip Multiprocessors (CMPs) are subject to heave pressure. Basically, there are two kinds of designs for L2 cache. First, using shared L2 cache to maximize the aggregate cache capacity and minimize off-chip memory requests. Second, using private L2 cache to minimize delays on global wires and cache access time. Recent hybrid designs offer replication to balance latency and capacity, however it requires complicated lookup and coherence mechanisms that increase latency or fail to optimize core counts. Our experiments with tiled architecture show that communication traffic of each tile is imbalance and, utilization of each L2 cache is significant different. Based on this observation, we propose a novel adaptive replication policy (ARP) based on tiled shared caches, a mechanism that regularly checks workload behavior to control replication. ARP replicates cache blocks only when the benefit of replication is larger than the cost. Simulations of 16-core CMPs shows that ARP provides better performance: communication traffic is reduced by 3%-48%, average access distance is reduced by 3%-52%, and utilization ratio of aggregate L2 caches capacity is increased by 60%-350%.
KW - NOC
KW - Replication
KW - Tiled chip multiprocessor
UR - http://www.scopus.com/inward/record.url?scp=78049339509&partnerID=8YFLogxK
U2 - 10.1109/ICEIE.2010.5559726
DO - 10.1109/ICEIE.2010.5559726
M3 - Conference contribution
AN - SCOPUS:78049339509
SN - 9781424476800
T3 - ICEIE 2010 - 2010 International Conference on Electronics and Information Engineering, Proceedings
SP - V2123-V2127
BT - ICEIE 2010 - 2010 International Conference on Electronics and Information Engineering, Proceedings
Y2 - 1 August 2010 through 3 August 2010
ER -