NxtSPR: A deadlock-free shortest path routing dedicated to relaying for Triplet-Based many-core Architecture

Chunfeng Li; Karim Soliman; Fei Yin; Jin Wei; Feng Shi

doi:10.1016/j.parco.2024.103094

NxtSPR: A deadlock-free shortest path routing dedicated to relaying for Triplet-Based many-core Architecture

Chunfeng Li, Karim Soliman, Fei Yin, Jin Wei, Feng Shi^*

^*Corresponding author for this work

School of Computer Science and Technology

Beijing Institute of Technology

Research output: Contribution to journal › Article › peer-review

Abstract

Deadlock-free routing is a significant challenge in Network-on-Chip (NoC) design as it affects the network's latency, power consumption, and load balance, impacting the performance of multi-processor systems-on-chip. However, achieving deadlock-free routing will routinely result in expensive overhead as previous solutions either sacrifice performance or power efficiency to proactively avoid deadlocks or impose high hardware complexity to resolve deadlocks when they occur reactively. Utilizing the various characteristics of NoC to implement deadlock-free routing can be significantly more cost-effective with less impact on performance. This paper proposes a relay routing algorithm (NxtSPR) with a shortest path property and a deadlock prevention mechanism based on a synchronized Hamiltonian ring. The proposal is based on an in-depth study of the characteristics of a Triplet-Based many-core Architecture (TriBA) NoC. We establish various important topology-related theories and perform a formal verification (proof-based) for them. By utilizing the critical subgraph and apex of TriBA, NxtSPR can pre-calculate downstream nodes forwarding ports for packets by using a concise judgment strategy. This significantly reduces the computational overhead required for data transmission while optimizing the pipeline design of routers to decrease packet transmission latency and power consumption compared to other TriBA routing algorithms. We group the data transmissions according to the levels of maximum Hamiltonian edges a packet will traverse during its transmission life cycle. Independent data transmissions between groups can avoid mutual interference and resource competition, eliminating potential deadlocks. Gem5 simulation results show that, under the synthetic traffic patterns, compared to the representative (Table) and up-to-date (SPR4T) routing algorithms, NxtSPR achieves a 20.19%, 14.76%, and 5.54%, 4.66% reduction in average packet latency and per-packet power consumption, respectively. Moreover, it has an average of 18.50% and 4.34% improvement in throughput, as compared to them. PARSEC benchmark results show that NxtSPR reduces application runtime by up to a maximum of 22.30% and 12.82% compared to Table and SPR4T, and running the same applications with TriBA results in a maximum runtime reduction of 10.77% compared to 2D-Mesh.

Original language	English
Article number	103094
Journal	Parallel Computing
Volume	121
DOIs	https://doi.org/10.1016/j.parco.2024.103094
Publication status	Published - Sept 2024

Keywords

Chip multi-processor
Deadlock-free
Hamiltonian ring
Network-on-Chip (NoC)
Routing algorithm
Triplet-Based many-core Architecture (TriBA)

Access to Document

10.1016/j.parco.2024.103094

Cite this

@article{ddcdad52414847068a3fab5a1507c434,

title = "NxtSPR: A deadlock-free shortest path routing dedicated to relaying for Triplet-Based many-core Architecture",

abstract = "Deadlock-free routing is a significant challenge in Network-on-Chip (NoC) design as it affects the network's latency, power consumption, and load balance, impacting the performance of multi-processor systems-on-chip. However, achieving deadlock-free routing will routinely result in expensive overhead as previous solutions either sacrifice performance or power efficiency to proactively avoid deadlocks or impose high hardware complexity to resolve deadlocks when they occur reactively. Utilizing the various characteristics of NoC to implement deadlock-free routing can be significantly more cost-effective with less impact on performance. This paper proposes a relay routing algorithm (NxtSPR) with a shortest path property and a deadlock prevention mechanism based on a synchronized Hamiltonian ring. The proposal is based on an in-depth study of the characteristics of a Triplet-Based many-core Architecture (TriBA) NoC. We establish various important topology-related theories and perform a formal verification (proof-based) for them. By utilizing the critical subgraph and apex of TriBA, NxtSPR can pre-calculate downstream nodes forwarding ports for packets by using a concise judgment strategy. This significantly reduces the computational overhead required for data transmission while optimizing the pipeline design of routers to decrease packet transmission latency and power consumption compared to other TriBA routing algorithms. We group the data transmissions according to the levels of maximum Hamiltonian edges a packet will traverse during its transmission life cycle. Independent data transmissions between groups can avoid mutual interference and resource competition, eliminating potential deadlocks. Gem5 simulation results show that, under the synthetic traffic patterns, compared to the representative (Table) and up-to-date (SPR4T) routing algorithms, NxtSPR achieves a 20.19%, 14.76%, and 5.54%, 4.66% reduction in average packet latency and per-packet power consumption, respectively. Moreover, it has an average of 18.50% and 4.34% improvement in throughput, as compared to them. PARSEC benchmark results show that NxtSPR reduces application runtime by up to a maximum of 22.30% and 12.82% compared to Table and SPR4T, and running the same applications with TriBA results in a maximum runtime reduction of 10.77% compared to 2D-Mesh.",

keywords = "Chip multi-processor, Deadlock-free, Hamiltonian ring, Network-on-Chip (NoC), Routing algorithm, Triplet-Based many-core Architecture (TriBA)",

author = "Chunfeng Li and Karim Soliman and Fei Yin and Jin Wei and Feng Shi",

note = "Publisher Copyright: {\textcopyright} 2024 Elsevier B.V.",

year = "2024",

month = sep,

doi = "10.1016/j.parco.2024.103094",

language = "English",

volume = "121",

journal = "Parallel Computing",

issn = "0167-8191",

publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - NxtSPR

T2 - A deadlock-free shortest path routing dedicated to relaying for Triplet-Based many-core Architecture

AU - Li, Chunfeng

AU - Soliman, Karim

AU - Yin, Fei

AU - Wei, Jin

AU - Shi, Feng

PY - 2024/9

Y1 - 2024/9

N2 - Deadlock-free routing is a significant challenge in Network-on-Chip (NoC) design as it affects the network's latency, power consumption, and load balance, impacting the performance of multi-processor systems-on-chip. However, achieving deadlock-free routing will routinely result in expensive overhead as previous solutions either sacrifice performance or power efficiency to proactively avoid deadlocks or impose high hardware complexity to resolve deadlocks when they occur reactively. Utilizing the various characteristics of NoC to implement deadlock-free routing can be significantly more cost-effective with less impact on performance. This paper proposes a relay routing algorithm (NxtSPR) with a shortest path property and a deadlock prevention mechanism based on a synchronized Hamiltonian ring. The proposal is based on an in-depth study of the characteristics of a Triplet-Based many-core Architecture (TriBA) NoC. We establish various important topology-related theories and perform a formal verification (proof-based) for them. By utilizing the critical subgraph and apex of TriBA, NxtSPR can pre-calculate downstream nodes forwarding ports for packets by using a concise judgment strategy. This significantly reduces the computational overhead required for data transmission while optimizing the pipeline design of routers to decrease packet transmission latency and power consumption compared to other TriBA routing algorithms. We group the data transmissions according to the levels of maximum Hamiltonian edges a packet will traverse during its transmission life cycle. Independent data transmissions between groups can avoid mutual interference and resource competition, eliminating potential deadlocks. Gem5 simulation results show that, under the synthetic traffic patterns, compared to the representative (Table) and up-to-date (SPR4T) routing algorithms, NxtSPR achieves a 20.19%, 14.76%, and 5.54%, 4.66% reduction in average packet latency and per-packet power consumption, respectively. Moreover, it has an average of 18.50% and 4.34% improvement in throughput, as compared to them. PARSEC benchmark results show that NxtSPR reduces application runtime by up to a maximum of 22.30% and 12.82% compared to Table and SPR4T, and running the same applications with TriBA results in a maximum runtime reduction of 10.77% compared to 2D-Mesh.

AB - Deadlock-free routing is a significant challenge in Network-on-Chip (NoC) design as it affects the network's latency, power consumption, and load balance, impacting the performance of multi-processor systems-on-chip. However, achieving deadlock-free routing will routinely result in expensive overhead as previous solutions either sacrifice performance or power efficiency to proactively avoid deadlocks or impose high hardware complexity to resolve deadlocks when they occur reactively. Utilizing the various characteristics of NoC to implement deadlock-free routing can be significantly more cost-effective with less impact on performance. This paper proposes a relay routing algorithm (NxtSPR) with a shortest path property and a deadlock prevention mechanism based on a synchronized Hamiltonian ring. The proposal is based on an in-depth study of the characteristics of a Triplet-Based many-core Architecture (TriBA) NoC. We establish various important topology-related theories and perform a formal verification (proof-based) for them. By utilizing the critical subgraph and apex of TriBA, NxtSPR can pre-calculate downstream nodes forwarding ports for packets by using a concise judgment strategy. This significantly reduces the computational overhead required for data transmission while optimizing the pipeline design of routers to decrease packet transmission latency and power consumption compared to other TriBA routing algorithms. We group the data transmissions according to the levels of maximum Hamiltonian edges a packet will traverse during its transmission life cycle. Independent data transmissions between groups can avoid mutual interference and resource competition, eliminating potential deadlocks. Gem5 simulation results show that, under the synthetic traffic patterns, compared to the representative (Table) and up-to-date (SPR4T) routing algorithms, NxtSPR achieves a 20.19%, 14.76%, and 5.54%, 4.66% reduction in average packet latency and per-packet power consumption, respectively. Moreover, it has an average of 18.50% and 4.34% improvement in throughput, as compared to them. PARSEC benchmark results show that NxtSPR reduces application runtime by up to a maximum of 22.30% and 12.82% compared to Table and SPR4T, and running the same applications with TriBA results in a maximum runtime reduction of 10.77% compared to 2D-Mesh.

KW - Chip multi-processor

KW - Deadlock-free

KW - Hamiltonian ring

KW - Network-on-Chip (NoC)

KW - Routing algorithm

KW - Triplet-Based many-core Architecture (TriBA)

UR - http://www.scopus.com/inward/record.url?scp=85200220398&partnerID=8YFLogxK

U2 - 10.1016/j.parco.2024.103094

DO - 10.1016/j.parco.2024.103094

M3 - Article

AN - SCOPUS:85200220398

SN - 0167-8191

VL - 121

JO - Parallel Computing

JF - Parallel Computing

M1 - 103094

ER -

NxtSPR: A deadlock-free shortest path routing dedicated to relaying for Triplet-Based many-core Architecture

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this