TY - GEN
T1 - PCAsim
T2 - 2010 International Conference on Computer Design and Applications, ICCDA 2010
AU - Zhu, Xiaodong
AU - Wu, Junmin
AU - Sui, Xiufeng
AU - Yin, Wei
AU - Wang, Qingbo
AU - Gong, Zhe
PY - 2010
Y1 - 2010
N2 - As the approaching of the multi-core era, chip multiprocessor(CMP) architectures present a challenge for efficient simulation, combining with the requirements of a detailed simulator running realistic workloads. Parallelization, which can exploit inherent parallelism in CMP simulation, is a common method to reduce simualtion time. We design and implement PCAsim, a parallel cycle accurate and user-level CMP simulator running on shared memory platform. The simulator is parallelized by POSIX threads according to target system architecture. Each core thread and the manager thread are synchronized with Slack mechanism [11]. But we find slack mechanism can not ensure the simulator against time violation among events generated by network activity and cache coherence protocol. To solve the problem, we propose an effective synchronous method called pending barrier. This method augments the power of traditional conservative parallel synchronous mechanism and improves simulation accuracy with negligible performance degradation. Except synchronization, we also encountered many other troublesome issues in implementing PCAsim. This paper describes some common ones and illustrates how we address them. The evaluations show that PCAsim can achieve reasonable speed-up and scalability.
AB - As the approaching of the multi-core era, chip multiprocessor(CMP) architectures present a challenge for efficient simulation, combining with the requirements of a detailed simulator running realistic workloads. Parallelization, which can exploit inherent parallelism in CMP simulation, is a common method to reduce simualtion time. We design and implement PCAsim, a parallel cycle accurate and user-level CMP simulator running on shared memory platform. The simulator is parallelized by POSIX threads according to target system architecture. Each core thread and the manager thread are synchronized with Slack mechanism [11]. But we find slack mechanism can not ensure the simulator against time violation among events generated by network activity and cache coherence protocol. To solve the problem, we propose an effective synchronous method called pending barrier. This method augments the power of traditional conservative parallel synchronous mechanism and improves simulation accuracy with negligible performance degradation. Except synchronization, we also encountered many other troublesome issues in implementing PCAsim. This paper describes some common ones and illustrates how we address them. The evaluations show that PCAsim can achieve reasonable speed-up and scalability.
KW - Architectural simulation
KW - Chip multiprocessors
KW - Parallel simulation
UR - http://www.scopus.com/inward/record.url?scp=77955917222&partnerID=8YFLogxK
U2 - 10.1109/ICCDA.2010.5540881
DO - 10.1109/ICCDA.2010.5540881
M3 - Conference contribution
AN - SCOPUS:77955917222
SN - 9781424471638
T3 - 2010 International Conference on Computer Design and Applications, ICCDA 2010
SP - V1597-V1601
BT - 2010 International Conference on Computer Design and Applications, ICCDA 2010
Y2 - 25 June 2010 through 27 June 2010
ER -