TY - JOUR
T1 - WBSP
T2 - A novel synchronization mechanism for architecture parallel simulation
AU - Wu, Junmin
AU - Zhu, Xiaodong
AU - Li, Tao
AU - Sui, Xiufeng
N1 - Publisher Copyright:
© 1968-2012 IEEE.
PY - 2016/3/1
Y1 - 2016/3/1
N2 - Parallelization is an efficient approach to accelerate multi-core, multi-processor and cluster architecture simulators. Nevertheless, frequent synchronization can significantly hinder the performance of a parallel simulator. A common practice in alleviating synchronization cost is to relax synchronization using lengthened synchronous steps. However, as a side effect, simulation accuracy deteriorates considerably. Through analyzing various factors contributing to the causality error in lax synchronization, we observe that a coherent speed across all nodes is critical to achieve high accuracy. To this end, we propose wall-clock based synchronization (WBSP), a novel mechanism that uses wall-clock time to maintain a coherent running speed across the different nodes by periodically synchronizing simulated clocks with the wall clock within each lax step. Our proposed method only results in a modest precision loss while achieving performance close to lax synchronization. We implement WBSP in a many-core parallel simulator and a cluster parallel simulator. Experimental results show that at a scale of 32-host threads, it improves the performance of the many-core simulator by 4.3× on average with less than a 5.5 percent accuracy loss compared to the conservative mechanism. On the cluster simulator with 64 nodes, our proposed scheme achieves an 8.3× speedup compared to the conservative mechanism while yielding only a 1.7 percent accuracy loss. Meanwhile, WBSP outperforms the recent proposed adaptive mechanism on simulations that exhibit heavy traffic.
AB - Parallelization is an efficient approach to accelerate multi-core, multi-processor and cluster architecture simulators. Nevertheless, frequent synchronization can significantly hinder the performance of a parallel simulator. A common practice in alleviating synchronization cost is to relax synchronization using lengthened synchronous steps. However, as a side effect, simulation accuracy deteriorates considerably. Through analyzing various factors contributing to the causality error in lax synchronization, we observe that a coherent speed across all nodes is critical to achieve high accuracy. To this end, we propose wall-clock based synchronization (WBSP), a novel mechanism that uses wall-clock time to maintain a coherent running speed across the different nodes by periodically synchronizing simulated clocks with the wall clock within each lax step. Our proposed method only results in a modest precision loss while achieving performance close to lax synchronization. We implement WBSP in a many-core parallel simulator and a cluster parallel simulator. Experimental results show that at a scale of 32-host threads, it improves the performance of the many-core simulator by 4.3× on average with less than a 5.5 percent accuracy loss compared to the conservative mechanism. On the cluster simulator with 64 nodes, our proposed scheme achieves an 8.3× speedup compared to the conservative mechanism while yielding only a 1.7 percent accuracy loss. Meanwhile, WBSP outperforms the recent proposed adaptive mechanism on simulations that exhibit heavy traffic.
KW - cluster system
KW - full system simulation
KW - lax synchronization
KW - parallel simulation
UR - http://www.scopus.com/inward/record.url?scp=84962054034&partnerID=8YFLogxK
U2 - 10.1109/TC.2015.2439253
DO - 10.1109/TC.2015.2439253
M3 - Article
AN - SCOPUS:84962054034
SN - 0018-9340
VL - 65
SP - 992
EP - 1005
JO - IEEE Transactions on Computers
JF - IEEE Transactions on Computers
IS - 3
M1 - 7115101
ER -