TY - JOUR
T1 - Speed-Oriented Architecture for Binary Field Point Multiplication on Elliptic Curves
AU - Li, Jiakun
AU - Zhong, Shun'An
AU - Li, Zhe
AU - Cao, Shan
AU - Zhang, Jingqi
AU - Wang, Weijiang
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2019
Y1 - 2019
N2 - This paper introduces a novel speed-oriented architecture of point multiplication in elliptic curve cryptography. A balanced full-precision multiplier is proposed to shorten latency, and a new modular inversion architecture is integrated to reduce the total number of clock cycles in point multiplication. A modified Montgomery Ladder algorithm that takes three clock cycles to calculate one input bit is proposed to best utilize hardware resources. A mixed-pipeline technique is used to balance the delay of different paths and increase frequency. The proposed architecture is implemented on GF(2163) and GF(2571), based on Xilinx Virtex-5 and Virtex-7 FPGA. For GF(2163), the design reaches 211 MHz, with 29309 LUTs, and 547 clock cycles or 2.6μs latency on Virtex-5; 320.5 MHz, with 28911 LUTs and 1.7μs latency on Virtex-7. For GF(2571), the design reaches 186 MHz, with 286400 LUTs, and 1813 clock cycles or 9.6μs latency on Virtex-5 267 MHz, 290001 LUTs and 6.79 μs latency on Virtex-7. The proposed design achieves the lowest latency among all existing works, and its performance is also among the top. Furthermore, it is demonstrated that the proposed architecture maintains a high speed for larger binary fields, making it more suitable to be implemented in large-bit-length platforms with a higher security level. Since the multiplier and its segments work in different bit-length and refer to different fields, the proposed architecture can also be upgraded to a reconfigurable design to support multiple-field point multiplication in the future.
AB - This paper introduces a novel speed-oriented architecture of point multiplication in elliptic curve cryptography. A balanced full-precision multiplier is proposed to shorten latency, and a new modular inversion architecture is integrated to reduce the total number of clock cycles in point multiplication. A modified Montgomery Ladder algorithm that takes three clock cycles to calculate one input bit is proposed to best utilize hardware resources. A mixed-pipeline technique is used to balance the delay of different paths and increase frequency. The proposed architecture is implemented on GF(2163) and GF(2571), based on Xilinx Virtex-5 and Virtex-7 FPGA. For GF(2163), the design reaches 211 MHz, with 29309 LUTs, and 547 clock cycles or 2.6μs latency on Virtex-5; 320.5 MHz, with 28911 LUTs and 1.7μs latency on Virtex-7. For GF(2571), the design reaches 186 MHz, with 286400 LUTs, and 1813 clock cycles or 9.6μs latency on Virtex-5 267 MHz, 290001 LUTs and 6.79 μs latency on Virtex-7. The proposed design achieves the lowest latency among all existing works, and its performance is also among the top. Furthermore, it is demonstrated that the proposed architecture maintains a high speed for larger binary fields, making it more suitable to be implemented in large-bit-length platforms with a higher security level. Since the multiplier and its segments work in different bit-length and refer to different fields, the proposed architecture can also be upgraded to a reconfigurable design to support multiple-field point multiplication in the future.
KW - ECC (elliptic curve cryptography)
KW - FPGA implementation
KW - ITA (Itoh Tsujii algorithm)
KW - Montgomery Ladder
KW - point multiplication
UR - http://www.scopus.com/inward/record.url?scp=85064595996&partnerID=8YFLogxK
U2 - 10.1109/ACCESS.2019.2903170
DO - 10.1109/ACCESS.2019.2903170
M3 - Article
AN - SCOPUS:85064595996
SN - 2169-3536
VL - 7
SP - 32048
EP - 32060
JO - IEEE Access
JF - IEEE Access
M1 - 8660394
ER -