ARP: An adaptive replication policy in tiled chip multiprocessor

Yixuan Tang; Junmin Wu; Xiufeng Sui; Guoliang Chen; Wei Yin; Yingqi Jin

doi:10.1109/ICEIE.2010.5559726

ARP: An adaptive replication policy in tiled chip multiprocessor

Yixuan Tang^*, Junmin Wu, Xiufeng Sui, Guoliang Chen, Wei Yin, Yingqi Jin

^*Corresponding author for this work

University of Science and Technology of China

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

Abstract

With growth of on-chip communication delays and working sets of commercial and scientific workloads, L2 caches of Chip Multiprocessors (CMPs) are subject to heave pressure. Basically, there are two kinds of designs for L2 cache. First, using shared L2 cache to maximize the aggregate cache capacity and minimize off-chip memory requests. Second, using private L2 cache to minimize delays on global wires and cache access time. Recent hybrid designs offer replication to balance latency and capacity, however it requires complicated lookup and coherence mechanisms that increase latency or fail to optimize core counts. Our experiments with tiled architecture show that communication traffic of each tile is imbalance and, utilization of each L2 cache is significant different. Based on this observation, we propose a novel adaptive replication policy (ARP) based on tiled shared caches, a mechanism that regularly checks workload behavior to control replication. ARP replicates cache blocks only when the benefit of replication is larger than the cost. Simulations of 16-core CMPs shows that ARP provides better performance: communication traffic is reduced by 3%-48%, average access distance is reduced by 3%-52%, and utilization ratio of aggregate L2 caches capacity is increased by 60%-350%.

Original language	English
Title of host publication	ICEIE 2010 - 2010 International Conference on Electronics and Information Engineering, Proceedings
Pages	V2123-V2127
DOIs	https://doi.org/10.1109/ICEIE.2010.5559726
Publication status	Published - 2010
Externally published	Yes
Event	2010 International Conference on Electronics and Information Engineering, ICEIE 2010 - Kyoto, Japan Duration: 1 Aug 2010 → 3 Aug 2010

Publication series

Name	ICEIE 2010 - 2010 International Conference on Electronics and Information Engineering, Proceedings
Volume	2

Conference

Conference	2010 International Conference on Electronics and Information Engineering, ICEIE 2010
Country/Territory	Japan
City	Kyoto
Period	1/08/10 → 3/08/10

Keywords

NOC
Replication
Tiled chip multiprocessor

Access to Document

10.1109/ICEIE.2010.5559726

Cite this

Tang, Y., Wu, J., Sui, X., Chen, G., Yin, W., & Jin, Y. (2010). ARP: An adaptive replication policy in tiled chip multiprocessor. In ICEIE 2010 - 2010 International Conference on Electronics and Information Engineering, Proceedings (pp. V2123-V2127). Article 5559726 (ICEIE 2010 - 2010 International Conference on Electronics and Information Engineering, Proceedings; Vol. 2). https://doi.org/10.1109/ICEIE.2010.5559726

@inproceedings{b6b8879765ae4ba0a78016c0b4c5e31a,

title = "ARP: An adaptive replication policy in tiled chip multiprocessor",

abstract = "With growth of on-chip communication delays and working sets of commercial and scientific workloads, L2 caches of Chip Multiprocessors (CMPs) are subject to heave pressure. Basically, there are two kinds of designs for L2 cache. First, using shared L2 cache to maximize the aggregate cache capacity and minimize off-chip memory requests. Second, using private L2 cache to minimize delays on global wires and cache access time. Recent hybrid designs offer replication to balance latency and capacity, however it requires complicated lookup and coherence mechanisms that increase latency or fail to optimize core counts. Our experiments with tiled architecture show that communication traffic of each tile is imbalance and, utilization of each L2 cache is significant different. Based on this observation, we propose a novel adaptive replication policy (ARP) based on tiled shared caches, a mechanism that regularly checks workload behavior to control replication. ARP replicates cache blocks only when the benefit of replication is larger than the cost. Simulations of 16-core CMPs shows that ARP provides better performance: communication traffic is reduced by 3%-48%, average access distance is reduced by 3%-52%, and utilization ratio of aggregate L2 caches capacity is increased by 60%-350%.",

keywords = "NOC, Replication, Tiled chip multiprocessor",

author = "Yixuan Tang and Junmin Wu and Xiufeng Sui and Guoliang Chen and Wei Yin and Yingqi Jin",

year = "2010",

doi = "10.1109/ICEIE.2010.5559726",

language = "English",

isbn = "9781424476800",

series = "ICEIE 2010 - 2010 International Conference on Electronics and Information Engineering, Proceedings",

pages = "V2123--V2127",

booktitle = "ICEIE 2010 - 2010 International Conference on Electronics and Information Engineering, Proceedings",

note = "2010 International Conference on Electronics and Information Engineering, ICEIE 2010 ; Conference date: 01-08-2010 Through 03-08-2010",

}

Tang, Y, Wu, J, Sui, X, Chen, G, Yin, W & Jin, Y 2010, ARP: An adaptive replication policy in tiled chip multiprocessor. in ICEIE 2010 - 2010 International Conference on Electronics and Information Engineering, Proceedings., 5559726, ICEIE 2010 - 2010 International Conference on Electronics and Information Engineering, Proceedings, vol. 2, pp. V2123-V2127, 2010 International Conference on Electronics and Information Engineering, ICEIE 2010, Kyoto, Japan, 1/08/10. https://doi.org/10.1109/ICEIE.2010.5559726

ARP: An adaptive replication policy in tiled chip multiprocessor. / Tang, Yixuan; Wu, Junmin; Sui, Xiufeng et al.
ICEIE 2010 - 2010 International Conference on Electronics and Information Engineering, Proceedings. 2010. p. V2123-V2127 5559726 (ICEIE 2010 - 2010 International Conference on Electronics and Information Engineering, Proceedings; Vol. 2).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - ARP

T2 - 2010 International Conference on Electronics and Information Engineering, ICEIE 2010

AU - Tang, Yixuan

AU - Wu, Junmin

AU - Sui, Xiufeng

AU - Chen, Guoliang

AU - Yin, Wei

AU - Jin, Yingqi

PY - 2010

Y1 - 2010

N2 - With growth of on-chip communication delays and working sets of commercial and scientific workloads, L2 caches of Chip Multiprocessors (CMPs) are subject to heave pressure. Basically, there are two kinds of designs for L2 cache. First, using shared L2 cache to maximize the aggregate cache capacity and minimize off-chip memory requests. Second, using private L2 cache to minimize delays on global wires and cache access time. Recent hybrid designs offer replication to balance latency and capacity, however it requires complicated lookup and coherence mechanisms that increase latency or fail to optimize core counts. Our experiments with tiled architecture show that communication traffic of each tile is imbalance and, utilization of each L2 cache is significant different. Based on this observation, we propose a novel adaptive replication policy (ARP) based on tiled shared caches, a mechanism that regularly checks workload behavior to control replication. ARP replicates cache blocks only when the benefit of replication is larger than the cost. Simulations of 16-core CMPs shows that ARP provides better performance: communication traffic is reduced by 3%-48%, average access distance is reduced by 3%-52%, and utilization ratio of aggregate L2 caches capacity is increased by 60%-350%.

AB - With growth of on-chip communication delays and working sets of commercial and scientific workloads, L2 caches of Chip Multiprocessors (CMPs) are subject to heave pressure. Basically, there are two kinds of designs for L2 cache. First, using shared L2 cache to maximize the aggregate cache capacity and minimize off-chip memory requests. Second, using private L2 cache to minimize delays on global wires and cache access time. Recent hybrid designs offer replication to balance latency and capacity, however it requires complicated lookup and coherence mechanisms that increase latency or fail to optimize core counts. Our experiments with tiled architecture show that communication traffic of each tile is imbalance and, utilization of each L2 cache is significant different. Based on this observation, we propose a novel adaptive replication policy (ARP) based on tiled shared caches, a mechanism that regularly checks workload behavior to control replication. ARP replicates cache blocks only when the benefit of replication is larger than the cost. Simulations of 16-core CMPs shows that ARP provides better performance: communication traffic is reduced by 3%-48%, average access distance is reduced by 3%-52%, and utilization ratio of aggregate L2 caches capacity is increased by 60%-350%.

KW - NOC

KW - Replication

KW - Tiled chip multiprocessor

UR - http://www.scopus.com/inward/record.url?scp=78049339509&partnerID=8YFLogxK

U2 - 10.1109/ICEIE.2010.5559726

DO - 10.1109/ICEIE.2010.5559726

M3 - Conference contribution

AN - SCOPUS:78049339509

SN - 9781424476800

T3 - ICEIE 2010 - 2010 International Conference on Electronics and Information Engineering, Proceedings

SP - V2123-V2127

BT - ICEIE 2010 - 2010 International Conference on Electronics and Information Engineering, Proceedings

Y2 - 1 August 2010 through 3 August 2010

ER -

Tang Y, Wu J, Sui X, Chen G, Yin W, Jin Y. ARP: An adaptive replication policy in tiled chip multiprocessor. In ICEIE 2010 - 2010 International Conference on Electronics and Information Engineering, Proceedings. 2010. p. V2123-V2127. 5559726. (ICEIE 2010 - 2010 International Conference on Electronics and Information Engineering, Proceedings). doi: 10.1109/ICEIE.2010.5559726

ARP: An adaptive replication policy in tiled chip multiprocessor

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this