Abstract
Parallelization is an efficient approach to accelerate multi-core, multi-processor and cluster architecture simulators. Nevertheless, frequent synchronization can significantly hinder the performance of a parallel simulator. A common practice in alleviating synchronization cost is to relax synchronization using lengthened synchronous steps. However, as a side effect, simulation accuracy deteriorates considerably. Through analyzing various factors contributing to the causality error in lax synchronization, we observe that a coherent speed across all nodes is critical to achieve high accuracy. To this end, we propose wall-clock based synchronization (WBSP), a novel mechanism that uses wall-clock time to maintain a coherent running speed across the different nodes by periodically synchronizing simulated clocks with the wall clock within each lax step. Our proposed method only results in a modest precision loss while achieving performance close to lax synchronization. We implement WBSP in a many-core parallel simulator and a cluster parallel simulator. Experimental results show that at a scale of 32-host threads, it improves the performance of the many-core simulator by 4.3× on average with less than a 5.5 percent accuracy loss compared to the conservative mechanism. On the cluster simulator with 64 nodes, our proposed scheme achieves an 8.3× speedup compared to the conservative mechanism while yielding only a 1.7 percent accuracy loss. Meanwhile, WBSP outperforms the recent proposed adaptive mechanism on simulations that exhibit heavy traffic.
Original language | English |
---|---|
Article number | 7115101 |
Pages (from-to) | 992-1005 |
Number of pages | 14 |
Journal | IEEE Transactions on Computers |
Volume | 65 |
Issue number | 3 |
DOIs | |
Publication status | Published - 1 Mar 2016 |
Externally published | Yes |
Keywords
- cluster system
- full system simulation
- lax synchronization
- parallel simulation