Browse > Article
http://dx.doi.org/10.13089/JKIISC.2021.31.3.463

Optimization Study of Toom-Cook Algorithm in NIST PQC SABER Utilizing ARM/NEON Processor  

Song, JinGyo (Department of Financial Information Security, Kookmin University)
Kim, YoungBeom (Department of Financial Information Security, Kookmin University)
Seo, Seog Chung (Department of Financial Information Security, Kookmin University)
Abstract
Since 2016, National Institute of Standards and Technology (NIST) has been conducting a post quantum cryptography standardization project in preparation for a quantum computing environment. Three rounds are currently in progress, and most of the candidates (5/7) are lattice-based. Lattice-based post quantum cryptography is evaluated to be applicable even in an embedded environment where resources are limited by providing efficient operation processing and appropriate key length. Among them, SABER KEM provides the efficient modulus and Toom-Cook to process polynomial multiplication with computation-intensive tasks. In this paper, we present the optimized implementation of evaluation and interpolation in Toom-Cook algorithm of SABER utilizing ARM/NEON in ARMv8-A platform. In the evaluation process, we propose an efficient interleaving method of ARM/NEON, and in the interpolation process, we introduce an optimized implementation methodology applicable in various embedded environments. As a result, the proposed implementation achieved 3.5 times faster performance in the evaluation process and 5 times faster in the interpolation process than the previous reference implementation.
Keywords
SABER; Toom-Cook; ARM/NEON; Parallel Implementation; Internet of Things;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Peter W. Shor, "Polynomial-time algorithm for prime factorization and discrete logarithms on a quatum computer", SIAM review, vol. 41, no. 2, pp. 303-332, Oct. 1997   DOI
2 NIST PQC Standardization "https://csrc.nist.gov/projects/post-quantum-cryptography", 2021. 4. 15
3 Hwajeong Seo, Taehwan Park, Shinwook Heo, Gyuwon Seo, Bongjin Bae, Zhi Hu, Lu Zhou, Yasuyuki Nogami, Youwen Zhu, Howon Kim, "Parallel Implementations of LEA, Revisited", WISA, 10144, pp. 318-330, Aug. 2016
4 Hwajeong Seo, Kyuhwang An, Hyeokdong Kwon, Taehwan Park, Zhi Hu and Howon Kim, "Parallel Implementations of CHAM", WISA, 11402, pp. 93-104, Aug. 2018
5 Hwajeong Seo, Taehwan Park, Janghyun Ji, Zhi Hu, and Howon Kim, "ARM/NEON Co-design of Multiplication/Squaring", WISA, 10763, pp. 72-84, Aug. 2017
6 ARMv8 A64 Instruction set, "https://developer.arm.com/documentation/ddi0596/2020-12/Base-Instructions", 2021. 4. 15
7 ARMv8 ASIMD Instruction set, "https://developer.arm.com/documentation/ddi0596/2020-12/SIMD-FP-Instructions", 2021. 4. 15
8 Leon Botros, Matthias J. Kannwischer, Peter Schwabe, "Memory-Efficient High-Speed Implementation of Kyber on Cortex-M4", AFRICACRYPTO, vol. 11627, pp 209-228, July. 2019
9 Jose Maria Bermudo Mera, Angshuman Karmakar, and Ingrid Verbauwhede, ""Time-memory trade-off in Toom-Cook multiplication: an application to module-lattice based cryptography", CHES, vol. 2020, no. 2, pp. 222-244, Sep. 2020,
10 Denisa O. C. Greconici, Matthias J. Kannwischer and Daan Sprenkels, "Compact Dilithium Implementations on Cortex-M3 and Cortex-M4", CHES, vol. 2021, no. 1, pp. 1-24, Sep. 2021