SPA-GPT: General Pulse Tailor for Simple Power Analysis Based on Reinforcement Learning

Ziyu Wang; Yaoling Ding; An Wang; Yuwei Zhang; Congming Wei; Shaofei Sun; Liehuang Zhu

doi:10.46586/tches.v2024.i4.40-83

SPA-GPT: General Pulse Tailor for Simple Power Analysis Based on Reinforcement Learning

Ziyu Wang, Yaoling Ding, An Wang^*, Yuwei Zhang, Congming Wei, Shaofei Sun, Liehuang Zhu

^*Corresponding author for this work

School of Cyberspace Science and Technology

Beijing Institute of Technology

Research output: Contribution to journal › Article › peer-review

Abstract

In side-channel analysis of public-key algorithms, we usually classify operations based on the differences in power traces produced by different basic operations (such as modular square or modular multiplication) to recover secret information like private keys. The more accurate the segmentation of power traces, the higher the efficiency of their classification. There exist two commonly used methods: one is equidistant segmentation, which requires a fixed number of basic operations and similar trace lengths for each type of operation, leading to limited application scenarios; the other is peak-based segmentation, which relies on personal experience to configure parameters, resulting in insufficient flexibility and poor universality. In this paper, we propose an automated trace segmentation method based on reinforcement learning applicable to a wide range of common implementation of public-key algorithms. The introduction of reinforcement learning, which doesn’t need labels, into trace processing for side-channel analysis marks its debut in this field. Our method has good universality on the traces with varying segment lengths and differing peak heights. By using prioritized experience replay optimized Deep Q-Network algorithm, we reduce the required number of parameters to one, which is the key length. We also employ various techniques to improve the segmentation effectiveness, such as clustering algorithm and enveloped-based feature enhancement. We validate the effectiveness of the new method in nine scenarios involving hardware and software implementations of different public-key algorithms executed on diverse platforms such as microcontrollers, SAKURA-G, and smart cards. Specifically, one of these implementations is protected by time randomization countermeasures. Experimental results show that a basic version of our method can correctly segment most traces. The enhanced version is capable of reconstructing the sequence of operations during trace segmentation, achieving an accuracy rate of 100% for the majority of the traces. For traces that cannot be entirely restored, we utilize reward values of reinforcement learning to correct errors and achieve fully recovery. We also conducted comparative experiments with supervised seq2seq methods, revealing our approach’s 42% higher accuracy in operation recovery and 96% faster time efficiency. In addition, we applied our method to the post-quantum cryptography Kyber, and successfully recovered an intermediate value crucial for deriving the secret key. Besides, power traces collected from these devices have been uploaded as open databases, which are available for researchers engaged in public-key algorithms to conduct related experiments or verify our method.

Original language	English
Pages (from-to)	40-83
Number of pages	44
Journal	IACR Transactions on Cryptographic Hardware and Embedded Systems
Volume	2024
Issue number	4
DOIs	https://doi.org/10.46586/tches.v2024.i4.40-83
Publication status	Published - 5 Sept 2024

Keywords

Deep Q-Network
Kyber
Power Trace Segmentation
Public-key Algorithms
Reinforcement Learning
Side-channel Analysis

Access to Document

10.46586/tches.v2024.i4.40-83

Cite this

@article{913744a0a2ed402c87dfccc7670d185b,

title = "SPA-GPT: General Pulse Tailor for Simple Power Analysis Based on Reinforcement Learning",

abstract = "In side-channel analysis of public-key algorithms, we usually classify operations based on the differences in power traces produced by different basic operations (such as modular square or modular multiplication) to recover secret information like private keys. The more accurate the segmentation of power traces, the higher the efficiency of their classification. There exist two commonly used methods: one is equidistant segmentation, which requires a fixed number of basic operations and similar trace lengths for each type of operation, leading to limited application scenarios; the other is peak-based segmentation, which relies on personal experience to configure parameters, resulting in insufficient flexibility and poor universality. In this paper, we propose an automated trace segmentation method based on reinforcement learning applicable to a wide range of common implementation of public-key algorithms. The introduction of reinforcement learning, which doesn{\textquoteright}t need labels, into trace processing for side-channel analysis marks its debut in this field. Our method has good universality on the traces with varying segment lengths and differing peak heights. By using prioritized experience replay optimized Deep Q-Network algorithm, we reduce the required number of parameters to one, which is the key length. We also employ various techniques to improve the segmentation effectiveness, such as clustering algorithm and enveloped-based feature enhancement. We validate the effectiveness of the new method in nine scenarios involving hardware and software implementations of different public-key algorithms executed on diverse platforms such as microcontrollers, SAKURA-G, and smart cards. Specifically, one of these implementations is protected by time randomization countermeasures. Experimental results show that a basic version of our method can correctly segment most traces. The enhanced version is capable of reconstructing the sequence of operations during trace segmentation, achieving an accuracy rate of 100% for the majority of the traces. For traces that cannot be entirely restored, we utilize reward values of reinforcement learning to correct errors and achieve fully recovery. We also conducted comparative experiments with supervised seq2seq methods, revealing our approach{\textquoteright}s 42% higher accuracy in operation recovery and 96% faster time efficiency. In addition, we applied our method to the post-quantum cryptography Kyber, and successfully recovered an intermediate value crucial for deriving the secret key. Besides, power traces collected from these devices have been uploaded as open databases, which are available for researchers engaged in public-key algorithms to conduct related experiments or verify our method.",

keywords = "Deep Q-Network, Kyber, Power Trace Segmentation, Public-key Algorithms, Reinforcement Learning, Side-channel Analysis",

author = "Ziyu Wang and Yaoling Ding and An Wang and Yuwei Zhang and Congming Wei and Shaofei Sun and Liehuang Zhu",

year = "2024",

month = sep,

day = "5",

doi = "10.46586/tches.v2024.i4.40-83",

language = "English",

volume = "2024",

pages = "40--83",

journal = "IACR Transactions on Cryptographic Hardware and Embedded Systems",

issn = "2569-2925",

publisher = "Ruhr-University of Bochum",

number = "4",

}

TY - JOUR

T1 - SPA-GPT

T2 - General Pulse Tailor for Simple Power Analysis Based on Reinforcement Learning

AU - Wang, Ziyu

AU - Ding, Yaoling

AU - Wang, An

AU - Zhang, Yuwei

AU - Wei, Congming

AU - Sun, Shaofei

AU - Zhu, Liehuang

PY - 2024/9/5

Y1 - 2024/9/5

N2 - In side-channel analysis of public-key algorithms, we usually classify operations based on the differences in power traces produced by different basic operations (such as modular square or modular multiplication) to recover secret information like private keys. The more accurate the segmentation of power traces, the higher the efficiency of their classification. There exist two commonly used methods: one is equidistant segmentation, which requires a fixed number of basic operations and similar trace lengths for each type of operation, leading to limited application scenarios; the other is peak-based segmentation, which relies on personal experience to configure parameters, resulting in insufficient flexibility and poor universality. In this paper, we propose an automated trace segmentation method based on reinforcement learning applicable to a wide range of common implementation of public-key algorithms. The introduction of reinforcement learning, which doesn’t need labels, into trace processing for side-channel analysis marks its debut in this field. Our method has good universality on the traces with varying segment lengths and differing peak heights. By using prioritized experience replay optimized Deep Q-Network algorithm, we reduce the required number of parameters to one, which is the key length. We also employ various techniques to improve the segmentation effectiveness, such as clustering algorithm and enveloped-based feature enhancement. We validate the effectiveness of the new method in nine scenarios involving hardware and software implementations of different public-key algorithms executed on diverse platforms such as microcontrollers, SAKURA-G, and smart cards. Specifically, one of these implementations is protected by time randomization countermeasures. Experimental results show that a basic version of our method can correctly segment most traces. The enhanced version is capable of reconstructing the sequence of operations during trace segmentation, achieving an accuracy rate of 100% for the majority of the traces. For traces that cannot be entirely restored, we utilize reward values of reinforcement learning to correct errors and achieve fully recovery. We also conducted comparative experiments with supervised seq2seq methods, revealing our approach’s 42% higher accuracy in operation recovery and 96% faster time efficiency. In addition, we applied our method to the post-quantum cryptography Kyber, and successfully recovered an intermediate value crucial for deriving the secret key. Besides, power traces collected from these devices have been uploaded as open databases, which are available for researchers engaged in public-key algorithms to conduct related experiments or verify our method.

AB - In side-channel analysis of public-key algorithms, we usually classify operations based on the differences in power traces produced by different basic operations (such as modular square or modular multiplication) to recover secret information like private keys. The more accurate the segmentation of power traces, the higher the efficiency of their classification. There exist two commonly used methods: one is equidistant segmentation, which requires a fixed number of basic operations and similar trace lengths for each type of operation, leading to limited application scenarios; the other is peak-based segmentation, which relies on personal experience to configure parameters, resulting in insufficient flexibility and poor universality. In this paper, we propose an automated trace segmentation method based on reinforcement learning applicable to a wide range of common implementation of public-key algorithms. The introduction of reinforcement learning, which doesn’t need labels, into trace processing for side-channel analysis marks its debut in this field. Our method has good universality on the traces with varying segment lengths and differing peak heights. By using prioritized experience replay optimized Deep Q-Network algorithm, we reduce the required number of parameters to one, which is the key length. We also employ various techniques to improve the segmentation effectiveness, such as clustering algorithm and enveloped-based feature enhancement. We validate the effectiveness of the new method in nine scenarios involving hardware and software implementations of different public-key algorithms executed on diverse platforms such as microcontrollers, SAKURA-G, and smart cards. Specifically, one of these implementations is protected by time randomization countermeasures. Experimental results show that a basic version of our method can correctly segment most traces. The enhanced version is capable of reconstructing the sequence of operations during trace segmentation, achieving an accuracy rate of 100% for the majority of the traces. For traces that cannot be entirely restored, we utilize reward values of reinforcement learning to correct errors and achieve fully recovery. We also conducted comparative experiments with supervised seq2seq methods, revealing our approach’s 42% higher accuracy in operation recovery and 96% faster time efficiency. In addition, we applied our method to the post-quantum cryptography Kyber, and successfully recovered an intermediate value crucial for deriving the secret key. Besides, power traces collected from these devices have been uploaded as open databases, which are available for researchers engaged in public-key algorithms to conduct related experiments or verify our method.

KW - Deep Q-Network

KW - Kyber

KW - Power Trace Segmentation

KW - Public-key Algorithms

KW - Reinforcement Learning

KW - Side-channel Analysis

UR - http://www.scopus.com/inward/record.url?scp=85204339108&partnerID=8YFLogxK

U2 - 10.46586/tches.v2024.i4.40-83

DO - 10.46586/tches.v2024.i4.40-83

M3 - Article

AN - SCOPUS:85204339108

SN - 2569-2925

VL - 2024

SP - 40

EP - 83

JO - IACR Transactions on Cryptographic Hardware and Embedded Systems

JF - IACR Transactions on Cryptographic Hardware and Embedded Systems

IS - 4

ER -

SPA-GPT: General Pulse Tailor for Simple Power Analysis Based on Reinforcement Learning

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this