請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/85677完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 李彥寰(Yen-Huan Li) | |
| dc.contributor.author | Vincent Bert Hwang | en |
| dc.contributor.author | 黃柏文 | zh_TW |
| dc.date.accessioned | 2023-03-19T23:21:17Z | - |
| dc.date.copyright | 2022-07-05 | |
| dc.date.issued | 2022 | |
| dc.date.submitted | 2022-06-21 | |
| dc.identifier.citation | [ABB+20] Nicolas Aragon, Paulo Barreto, Slim Bettaieb, Loic Bidoux, Olivier Blazy, Jean-Christophe Deneuville, Phillipe Gaborit, Shay Gueron, Tim Guneysu, Carlos Aguilar Melchor, Rafael Misoczki, Edoardo Persichetti, Nicolas Sendrier, Jean-Pierre Tillich, Gilles Zemor, Valentin Vasseur, Santosh Ghosh, and Jan Richter-Brokmann. BIKE. Submission to the NIST Post-Quantum Cryptography Standardization Project [NIS], 2020. https://bikesuite.org/. [ABC+20] Martin R. Albrecht, Daniel J. Bernstein, Tung Chou, Carlos Cid, Jan Gilcher, Tanja Lange, Varun Maram, Ingo von Maurich, Rafael Misoczki, Ruben Niederhagen, Kenneth G. Paterson, Edoardo Persichetti, Christiane Peters, Peter Schwabe, Nicolas Sendrier, Jakub Szefer, Cen Jung Tjhai, Martin Tomlinson, and Wen Wang. Classic McEliece. Submission to the NIST Post-Quantum Cryptography Standardization Project [NIS], 2020. https://classic.mceliece.org/. [ABCG20] Erdem Alkim, Yusuf Alper Bilgin, Murat Cenk, and François Gérard. Cortex-M4 optimizations for {R, M} LWE schemes. IACR Transactions on Cryptographic Hardware and Embedded Systems, 2020(3):336–357, 2020. https://tches.iacr.org/index.php/TCHES/article/view/8593. [ABD+20a] Roberto Avanzi, Joppe Bos, Léo Ducas, Eike Kiltz, Tancr`ede Lepoint, Vadim Lyubashevsky, John M. Schanck, Peter Schwabe, Gregor Seiler, and Damien Stehlé. CRYSTALS–Dilithium. Submission to the NIST Post-Quantum Cryptography Standardization Project [NIS], 2020. https://pq-crystals.org/dilithium/. [ABD+20b] Roberto Avanzi, Joppe Bos, Léo Ducas, Eike Kiltz, Tancr`ede Lepoint, Vadim Lyubashevsky, John M. Schanck, Peter Schwabe, Gregor Seiler, and Damien Stehlé. CRYSTALS–Kyber. Submission to the NIST Post-Quantum Cryptography Standardization Project [NIS], 2020. https://pq-crystals.org/kyber/. [ACC+21] Erdem Alkim, Dean Yun-Li Cheng, Chi-Ming Marvin Chung, Hülya Evkan, Leo Wei-Lun Huang, Vincent Hwang, Ching-Lin Trista Li, Rube nNiederhagen, Cheng-Jhih Shih, Julian Wälde, and Bo-Yin Yang. Polynomial Multiplication in NTRU Prime Comparison of Optimization Strategies on Cortex-M4. IACR Transactions on Cryptographic Hardware and Embedded Systems, 2021(1):217–238, 2021. https://tches.iacr.org/ index.php/TCHES/article/view/8733. [ACC+22] Amin Abdulrahman, Jiun-Peng Chen, Yu-Jia Chen, Vincent Hwang, Matthias J. Kannwischer, and Bo-Yin Yang. Multi-moduli NTTs for Saber on Cortex-M3 and Cortex-M4. IACR Transactions on Cryptographic Hardware and Embedded Systems, 2022(1):127–151, 2022. https://tches.iacr.org/index.php/TCHES/article/view/9292. [ACD74] Thomas L Adam, K. Mani Chandy, and JR Dickson. A Comparison of List Schedules for Parallel Processing Systems. Communications of the ACM, 17(12):685–690, 1974. [AHKS22] Amin Abdulrahman, Vincent Hwang, Matthias J. Kannwischer, and Dann Sprenkels. Faster Kyber and Dilithium on the Cortex-M4. 2022. To appear at ACNS 2022, available as https://eprint.iacr.org/2022/ 112. [AP21] Alexandre Adomnicai and Thomas Peyrin. Fixslicing AES-like Ciphers. IACR Transactions on Cryptographic Hardware and Embedded Systems, 2021(1):402–425, 2021. https://tches.iacr.org/index.php/TCHES/article/view/8739. [ARM10a] ARM. Cortex-M3 Technical Reference Manual, 2010. https:// developer.arm.com/documentation/ddi0337/h. [ARM10b] ARM. Cortex-M4 Technical Reference Manual, 2010. https:// developer.arm.com/documentation/ddi0439/b/. [ARM15] ARM. Cortex-A72 Software Optimization Guide, 2015. https:// developer.arm.com/documentation/uan0016/a/. [ARM21a] ARM. Arm Architecture Reference Manual, Armv8, for Armv8-A architecture profile, 2021. https://developer.arm.com/documentation/ddi0487/gb/?lang=en. [ARM21b] ARM. Armv7-M Architecture Refernce Manual, 2021. https://developer.arm.com/documentation/ddi0403/ed. [ARM21c] ARM. Procedure Call Standard for the Arm® 64-bit Architecture (AArch64), 2021. https://github.com/ARM-software/abi-aa/releases. [Bar86] Paul Barrett. Implementing the Rivest Shamir and Adleman Public Key Encryption Algorithm on a Standard Digital Signal Processor. In CRYPTO 1986, LNCS, pages 311–323. SV, 1986. [BBC+20] Daniel J. Bernstein, Billy Bob Brumley, Ming-Shing Chen, Chitchanok Chuengsatiansup, Tanja Lange, Adrian Marotzke, Bo-Yuan Peng, Nicola Tuveri, Christine van Vredendaal, and Bo-Yin Yang. NTRU Prime. Submission to the NIST Post-Quantum Cryptography Standardization Project [NIS], 2020. [BBCT21] Daniel J. Bernstein, Billy Bob Brumley, Ming-Shing Chen, and Nicola Tuveri. OpenSSLNTRU: Faster post-quantum TLS key exchange. arXiv preprint arXiv:2106.08759, 2021. [BBELG16] Mounir Bahtat, Said Belkouch, Philippe Elleaume, and Philippe Le Gall. Instruction scheduling heuristic for an efficient FFT in VLIW processors with balanced resource usage. EURASIP Journal on Advances in Signal Processing, 2016(1):1–21, 2016. [BDL+12] Daniel J Bernstein, Niels Duif, Tanja Lange, Peter Schwabe, and Bo-Yin Yang. High-speed high-security signatures. Journal of cryptographic engineering, 2(2):77–89, 2012 [Ber01] Daniel J. Bernstein. Multidigit multiplication for mathematicians. 2001. [BHK+22] Hanno Becker, Vincent Hwang, Matthias J. Kannwischer, Bo-Yin Yang, and Shang-Yi Yang. Neon NTT: Faster Dilithium, Kyber, and Saber on Cortex-A72 and Apple M1. IACR Transactions on Cryptographic Hardware and Embedded Systems, 2022(1):221–244, 2022. https://tches. iacr.org/index.php/TCHES/article/view/9295. [BMK+22] Hanno Becker, Jose Maria Bermudo Mera, Angshuman Karmakar, Joseph Yiu, and Ingrid Verbauwhedeg. Polynomial multiplication on embedded vector architectures. IACR Transactions on Cryptographic Hardware and Embedded Systems, 2022(1):482–505, 2022. https://tches. iacr.org/index.php/TCHES/article/view/9305. [Bou89] Nicolas Bourbaki. Algebra I. Springer, 1989. [BY19] Daniel J. Bernstein and Bo-Yin Yang. Fast constant-time gcd computation and modular inversion. IACR Transactions on Cryptographic Hardware and Embedded Systems, 2019(3):340–398, 2019. https://tches.iacr.org/index.php/TCHES/article/view/8298. [CDH+20] Cong Chen, Oussama Danba, Jeffrey Hoffstein, Andreas Hulsing, Joost Rijneveld, John M. Schanck, Peter Schwabe, William Whyte, Zhenfei Zhang, Tsunekazu Saito, Takashi Yamakawa, and Keita Xagawa. NTRU. Submission to the NIST Post-Quantum Cryptography Standardization Project [NIS], 2020. https://ntru.org/. [CF94] Richard Crandall and Barry Fagin. Discrete Weighted Transforms and Large-integer Arithmetic. Mathematics of computation, 62(205):305– 324, 1994. [CF13] Jean Cardinal and Samuel Fiorini. On Generalized Comparison-Based Sorting Problems. In Space-Efficient Data Structures, Streams, and Algorithms, pages 164–175. 2013. [CFJ+10] Jean Cardinal, Samuel Fiorini, Gwenaël Joret, Raphaël M Jungers, and J Ian Munro. An Efficient Algorithm for Partial Order Production. SIAM journal on computing, 39(7):2927–2940, 2010. [CFMR+20] A. Casanova, J.-C. Faugere, G. Macario-Rat, J. Patarin, L. Perret, and J. Ryckeghem. GeMSS. Submission to the NIST Post-Quantum Cryptography Standardization Project [NIS], 2020. https://www-polsys.lip6.fr/Links/NIST/GeMSS.html. [Che21] Yun-Li Cheng. Number Theoretic Transform for Polynomial Multiplication in Lattice-based Cryptography on ARM Processors. Master’s thesis, 2021. https://github.com/dean3154/ntrup_m4. [CHK+21] Chi-Ming Marvin Chung, Vincent Hwang, Matthias J. Kannwischer, Gregor Seiler, Cheng-Jhih Shih, and Bo-Yin Yang. NTT Multiplication for NTT-unfriendly Rings New Speed Records for Saber and NTRU on Cortex-M4 and AVX2. IACR Transactions on Cryptographic Hardware and Embedded Systems, 2021(2):159–188, 2021. https://tches.iacr.org/index.php/TCHES/article/view/8791. [CT65] James W. Cooley and John W. Tukey. An Algorithm for the Machine Calculation of Complex Fourier Series. Mathematics of Computation, 19(90):297–301, 1965. [DCP+20] Jintai Ding, Ming-Shing Chen, Albrecht Petzoldt, Dieter Schmidt, Bo-Yin Yang, Matthias Kannwischer, and Jacques Patarin. Rainbow. Submission to the NIST Post-Quantum Cryptography Standardization Project [NIS], 2020. https://www.pqcrainbow.org/. [Für09] Martin Fürer. Faster Integer Multiplication. SIAM Journal on Computing, 39(3):979–1005, 2009. https://doi.org/10.1137/070711761. [GKP94] Ronald L. Graham, Donald E. Knuth, and Oren Patashnik. Concrete Mathematics: a Foundation for Computer Science. Addison-Wesley, second edition, 1994. [GKS21] Denisa O. C. Greconici, Matthias J. Kannwischer, and Daan Sprenkels. Compact Dilithium Implementations on Cortex-M3 and Cortex-M4. IACR Transactions on Cryptographic Hardware and Embedded Systems, 2021(1):1–24, 2021. https://tches.iacr.org/index.php/TCHES/article/view/8725. [Goo58] Irving John Good. The Interaction Algorithm and Practical Fourier Analysis. Journal of the Royal Statistical Society: Series B (Methodological), 20(2):361–372, 1958. [GS66] W. M. Gentleman and G. Sande. Fast Fourier Transforms: For Fun and Profit. In Proceedings of the November 7-10, 1966, Fall Joint Computer Conference, AFIPS’66 (Fall), pages 563–578. Association for Computing Machinery, 1966. https://doi.org/10.1145/1464291.1464352. [Har14] David Harvey. Faster arithmetic for number-theoretic transforms. Journal of Symbolic Computation, 60:113–119, 2014. [HBD+20] Andreas Hulsing, Daniel J. Bernstein, Christoph Dobraunig, Maria Eichlseder, Scott Fluhrer, Stefan-Lukas Gazdag, Panos Kampanakis, Stefan Kolbl, Tanja Lange, Martin M. Lauridsen, Florian Mendel, Ruben Niederhagen, Christian Rechberger, Joost Rijneveld, Peter Schwabe, Jean-Philippe Aumasson, Bas Westerbaan, and Ward Beul- lens. SPHINCS+. Submission to the NIST Post-Quantum Cryptography Standardization Project [NIS], 2020. [HMCS77] D. Harris, J. McClellan, D. Chan, and H. Schuessler. Vector Radix Fast Fourier Transform. In ICASSP’77. IEEE International Conference on Acoustics, Speech, and Signal Processing, volume 2, pages 548–551, 1977. [HvdH21] David Harvey and Joris van der Hoeven. Integer multiplication in time O (n log n). Annals of Mathematics, 193(2):563–617, 2021. [IKPC20] Írem Keskinkurt Paksoy and Murat Cenk. TMVP-based Multiplication for Polynomial Quotient Rings and Application to Saber on ARM Cortex-M4. Cryptology ePrint Archive, 2020. https://eprint.iacr.org/2020/1302. [IKPC22] Írem Keskinkurt Paksoy and Murat Cenk. Faster NTRU on ARM Cortex-M4 with TMVP-based multiplication. 2022. https://ia.cr/2022/300. [Jac12] Nathan Jacobson. Basic Algebra I. Courier Corporation, 2012. [JAC+20] David Jao, Reza Azarderakhsh, Matthew Campagna, Craig Costello, Luca De Feo, Basil Hess, Amir Jalali, Brian Koziel, Brian LaMacchia, Patrick Longa, Michael Naehrig, Joost Renes, Vladimir Soukharev, David Urbanik, Geovandro Pereira, Koray Karabina, and Aaron Hutchinson. SIKE. Submission to the NIST Post-Quantum Cryptography Standardization Project [NIS], 2020. https://sike.org/. [Knu14] Donald E. Knuth. The Art of Computer Programming, Volume 2: Seminumerical Algorithms. Addison-Wesley Professional, third edition, 2014. [KRS19] Matthias J Kannwischer, Joost Rijneveld, and Peter Schwabe. Faster Multiplication in Z2m[x] on Cortex-M4 to Speed up NIST PQC Candidates. In International Conference on Applied Cryptography and Network Security, pages 281–301. Springer, 2019. [KRSS] Matthias J. Kannwischer, Joost Rijneveld, Peter Schwabe, and Ko Stoffelen. PQM4: Post-quantum crypto library for the ARM Cortex-M4. https://github.com/mupq/pqm4. [LBK+16] Xavier Leroy, Sandrine Blazy, Daniel Kästner, Bernhard Schommer, Markus Pister, and Christian Ferdinand. CompCert-A Formally Verified Optimizing Compiler. In ERTS 2016: Embedded Real Time Software and Systems, 8th European Congress, 2016. [Li21] Ching-Lin Li. Implementation of Polynomial Modular Inversion in Lattice-based cryptography on ARM. Master’s thesis, 2021. https: //github.com/trista5658321/polyinv-m4. [MAB+20] Carlos Aguilar Melchor, Nicolas Aragon, Slim Bettaieb, Loïc Bidoux, Olivier Blazy, Jean-Christophe Deneuville, Philippe Gaborit, Edoardo Persichetti, Gilles Zémor, and Jurjen Bos. HQC. Submission to the NIST Post-Quantum Cryptography Standardization Project [NIS], 2020. http://pqc-hqc.org/. [MKV20] Jose Maria Bermudo Mera, Angshuman Karmakar, and Ingrid Verbauwhede. Time-memory trade-off in Toom-Cook multiplication: an application to module-lattice based cryptography. IACR Transactions on Cryptographic Hardware and Embedded Systems, 2020(2):222–244, 2020. https://tches.iacr.org/index.php/TCHES/article/view/8550. [Mon85] Peter L. Montgomery. Modular Multiplication Without Trial Division. Mathematics of computation, 44(170):519–521, 1985. [NAB+20] Michael Naehrig, Erdem Alkim, Joppe Bos, Léo Ducas, Karen Easterbrook, Brian LaMacchia, Patrick Longa, Ilya Mironov, Valeria Nikolaenko, Christopher Peikert, Ananth Raghunathan, and Douglas Stebil. FrodoKEM. Submission to the NIST Post-Quantum Cryptography Standardization Project [NIS], 2020. [NG21] Duc Tri Nguyen and Kris Gaj. Optimized Software Implementations of CRYSTALS-Kyber, NTRU, and Saber Using NEON-Based Special Instructions of ARMv8, 2021. Third PQC Standardization Conference. [NIS] NIST, the US National Institute of Standards and Technology. Post-quantum cryptography standardization project. https://csrc.nist.gov/Projects/post-quantum-cryptography. [Nus80] Henri Nussbaumer. Fast polynomial transform algorithms for digital convolution. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(2):205–215, 1980. [PFH+20] Thomas Prest, Pierre-Alain Fouque, Jeffrey Hoffstein, Paul Kirchner, Vadim Lyubashevsky, Thomas Pornin, Thomas Ricosset, Gregor Seiler, William Whyte, and Zhenfei Zhang. Falcon. Submission to the NIST Post-Quantum Cryptography Standardization Project [NIS], 2020. https://falcon-sign.info/. [Rau94] B. Ramakrishna Rau. Iterative Modulo Scheduling: An Algorithm For Software Pipelining Loops. In Proceedings of the 27th annual international symposium on Microarchitecture, pages 63–74, 1994. [RMD+15] Stephen Richardson, Dejan Marković, Andrew Danowitz, John Brunhaver, and Mark Horowitz. Building Conflict-Free FFT Schedules. IEEE Transactions on Circuits and Systems I: Regular Papers, 62(4):1146–1155, 2015. [Sch77] Arnold Schönhage. Schnelle multiplikation von polynomen über körpern der charakteristik 2. Acta Informatica, 7(4):395–398, 1977. [Sei18] Gregor Seiler. Faster AVX2 optimized NTT multiplication for Ring-LWE lattice cryptography. 2018. https://eprint.iacr.org/2018/039. [Shi20] Cheng-Jhih Shih. Personal communication, June 2020. [Sho94] Peter W. Shor. Algorithms for quantum computation: discrete logarithms and factoring. In FOCS 1994, pages 124–134. IEEE, 1994. https://ieeexplore.ieee.org/abstract/document/365700. [SS71] Arnold Schönhage and Volker Strassen. Schnelle Multiplikation großer Zahlen. Computing, 7(3-4):281–292, 1971. [vdH04] Joris van der Hoeven. The Truncated Fourier Transform and Applications. In Proceedings of the 2004 international symposium on Symbolic and algebraic computation, pages 290–296, 2004. [Win78] Shmuel Winograd. On Computing the Discrete Fourier Transform. Mathematics of computation, 32(141):175–199, 1978. | |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/85677 | - |
| dc.description.abstract | 這篇論文收集在碩士研究中所發表或是投稿的相關論文。 我們專注於數論變換在 Cortex-M3、Cortex-M4、Cortex-A72 上的組合語言實作。 在 Cortex-M3 上,我們用 Armv7-M 來實作;在 Cortex-M4 上,我們用 Armv7E-M 來實作;在 Cortex- A72 上,我們用 Armv8.0-A 來實作。為了呈現優化的效果,我們將數論變換的實作應用於 Dilithium、Kyber、NTRU、NTRU Prime 和 Saber 這些晶格密碼系統中。 我們在 Cortex-M3 上實作了 Saber;在 Cortex-M4 上實作了 NTRU、NTRU Prime 和 Saber;在 Cortex-A72 上實作了 Dilithium、Kyber 和 Saber。 我們針對速度、記憶體和程式碼大小來優化 Saber 在 Cortex-M3 和 Cortex-M4 上的實作、 針對速度和程式碼大小來優化 NTRU 和 NTRU Prime 在 Cortex-M4 上的實作、 針對速度來優化 Dilithium、Kyber 和 Saber 在 Cortex-A72 上的實作。 | zh_TW |
| dc.description.abstract | This thesis collects selected works published or submitted during the study of the master’s program. We focus on the assembly realizations of number–theoretic transforms (NTTs) on Cortex-M3, Cortex-M4, and Cortex-A72. In particular, we implement the NTTs in Armv7-M for Cortex-M3, Armv7E-M for Cortex-M4, and Armv8.0- A for Cortex-A72. To demonstrate the optimizations, we select the lattice-based cryptosystems Dilithium, Kyber, NTRU, NTRU Prime, and Saber for evaluation. We apply our optimizations to Saber on Cortex-M3; NTRU, NTRU Prime, and Saber on Cortex-M4; and Dilithium, Kyber, and Saber on Cortex-A72. We optimize Saber on Cortex-M3 and Cortex-M4 for speed, memory usage, and code size; NTRU and NTRU Prime on Cortex-M4 for speed and code size; and Dilithium, Kyber, and Saber on Cortex-A72 for speed. | en |
| dc.description.provenance | Made available in DSpace on 2023-03-19T23:21:17Z (GMT). No. of bitstreams: 1 U0001-0105202214311500.pdf: 1277256 bytes, checksum: dffe39ca2035e9c452d1fd29c8dc6bab (MD5) Previous issue date: 2022 | en |
| dc.description.tableofcontents | 摘要 v Abstract vii List of Figures xiii List of Tables xv List of Algorithms xviii List of Listings xix 1 Introduction 1 1.1 SelectedSubjectsandPlatforms..................... 2 1.2 Contributions ............................... 4 1.2.1 Cortex-M3and Cortex-M4.................... 4 1.2.2 Cortex-A72 ............................ 6 1.3 Other works................................ 6 1.3.1 Other Works During My Master’s Program. . . . . . . . . . . 6 1.3.2 Other Related Works Prior to My Master’s Program . . . . . 7 1.4 Organization of this Thesis........................ 8 1.5 Source Code Management ........................ 8 2 Schemes 9 2.1 Dilithium ................................. 9 2.2 Kyber ................................... 9 2.3 NTRU................................... 10 2.4 NTRUPrime ............................... 10 2.5 Saber.................................... 11 3 Mathematical Background 13 3.1 Sets .................................... 13 3.1.1 Sets, Subsets, Power Sets, Cartesian Products . . . . . . . . . 13 3.1.2 Maps, Compositions, Cartesian Products . . . . . . . . . . . . 14 3.1.3 Equivalence Relations, Partitions, Quotient sets . . . . . . . . 15 3.1.4 Factorization of Maps ...................... 16 3.2 Basic Algebra............................... 17 3.2.1 Monoid and Group Homomorphisms .............. 18 3.2.2 Rings, Cartesian Product..................... 18 3.2.3 Ring Homomorphisms ...................... 19 3.2.4 Ring Congruences, Ideals..................... 19 3.2.5 Quotient Rings .......................... 20 3.2.6 Kernel, Factorization of Ring Homomorphisms . . . . . . . . . 20 3.2.7 Examples ............................. 20 3.3 Abstract Algebra ............................. 21 3.3.1 Modules, Tensor Product of Modules . . . . . . . . . . . . . . 21 3.3.2 Associative Algebras, Tensor Product of Associative Algebras 23 4 Number-Theoretic Transforms 25 4.1 Convolutions ............................... 25 4.1.1 Convolutions ........................... 25 4.1.2 Weighted Convolutions...................... 26 4.1.3 Toeplitz Matrices......................... 26 4.2 The Chinese Remainder Theorem for Rings . . . . . . . . . . . . . . 27 4.3 Number-Theoretic Transforms...................... 28 4.3.1 Principal n-th Root of Unity................... 28 4.3.2 The Number-Theoretic Transform................ 29 4.3.3 Realizing ·R[x]/⟨xn−ζn⟩ via NTTR[x]/⟨xn−ζn⟩:ω . . . . . . . . . . . 30 4.3.4 Incomplete NTTs......................... 31 4.3.5 Näive Computation of NTTs................... 31 4.4 Fast Fourier Transforms ......................... 32 4.4.1 General Digit Reversal ...................... 33 4.4.2 Cooley–Tukey FFT........................ 34 4.4.3 Gentleman–Sande FFT...................... 35 4.4.4 Good–Thomas FFT........................ 36 4.4.5 Comparisons between Cooley–Tukey, Gentleman–Sande, and Good–Thomas FFTs ....................... 37 4.4.6 Vector-Radix FFT ........................ 39 5 Instruction Set Architectures 43 5.1 Armv7-M ................................. 43 5.1.1 General-Purpose Registers.................... 43 5.1.2 Thumb Instruction Sets ..................... 43 5.1.3 Instructions ............................ 44 5.1.4 Digital Signal Processing Extension . . . . . . . . . . . . . . . 45 5.1.5 FPv4-SPExtension........................ 48 5.2 64-bit Armv8-A.............................. 48 5.2.1 General-Purpose Registers.................... 48 5.2.2 SIMD Registers.......................... 49 5.2.3 Instruction Format ........................ 49 5.2.4 Instructions ............................ 50 6 Central Processing Units 57 6.1 Cortex-M4 Instruction Timing...................... 57 6.2 Cortex-M3 Instruction Timing...................... 58 6.3 Cortex-A72 ................................ 58 6.3.1 Pipeline .............................. 58 6.3.2 Instruction Timing ........................ 58 7 Building Blocks 61 7.1 Modular Reductions and Multiplications . . . . . . . . . . . . . . . . 61 7.1.1 BarrettReduction ........................ 62 7.1.2 Montgomery Reduction and Multiplication . . . . . . . . . . . 62 7.1.3 Barrett–Montgomery Correspondences . . . . . . . . . . . . . 63 7.1.4 Cortex-M3 Implementations ................... 64 7.1.5 Cortex-M4 Implementations ................... 66 7.1.6 Cortex-A72 Implementations................... 68 7.2 Butterflies................................. 72 7.2.1 Butterflies for NTTs ....................... 72 7.2.2 Radix-2Butterflies on Cortex-M4................ 74 7.2.3 Radix-2Butterflies on Cortex-M3................ 76 7.2.4 Radix-2Butterflies on Cortex-A72 ............... 77 7.2.5 CT–GS butterflies over x8−1 on Cortex-M4 . . . . . . . . . . 80 7.2.6 Improving Non-Radix-2 Butterflies on Cortex-M4 . . . . . . . 82 7.2.7 Dedicated Vector–Radix Butterflies on Cortex-M4 . . . . . . . 84 8 Implementations of Schemes 85 8.1 NTT Multiplications for NTT-Unfriendly Rings . . . . . . . . . . . . 87 8.2 More on Incomplete NTTs........................ 87 8.2.1 Incomplete NTTs: Review.................... 87 8.2.2 Code Size Consideration of Good–Thomas FFT . . . . . . . . 87 8.2.3 Combining Cooley–Tukey, Good–Thomas, and Vector–RadixFFTs................................ 88 8.3 MatrixVectorMul............................. 89 8.3.1 Computing MatrixVectorMul with Monomorphisms . . . . . . 90 8.3.2 Asymmetric Multiplication.................... 90 8.3.3 Time-Memory Tradeoffs for Saber................ 93 8.4 Stack Optimizations on Cortex-M4 ................... 94 8.5 Cortex-M4: NTRU, NTRU Prime, and Saber . . . . . . . . . . . . . . 96 8.5.1 NTRU and NTRU Prime..................... 96 8.5.2 Saber ............................... 99 8.6 Cortex-M3:Saber .............................102 8.6.1 Saber ...............................102 8.7 Cortex-A72 :Dilithium, Kyber, and Saber . . . . . . . . . . . . . . . .103 8.7.1 MatrixVectorMul in Dilithium, Kyber, and Saber . . . . . . . 103 8.7.2 The Need for Permutations in the NTTs . . . . . . . . . . . .103 8.7.3 NTTs in Saber ..........................103 8.7.4 NTTs in Dilithium ........................104 8.7.5 NTTs in Kyber ..........................104 9 Results 105 9.1 Cortex-M4 Results ............................106 9.1.1 Benchmark Environment.....................107 9.1.2 Polynomial Multiplications in NTRU and NTRU Prime . . . . 107 9.1.3 Performance of NTRU ......................110 9.1.4 Performanceof NTRUPrime ..................110 9.1.5 MatrixVectorMul and InnerProd in Saber. . . . . . . . . . .112 9.1.6 Performance of Saber.......................114 9.2 Cortex-M3 Results ............................115 9.2.1 Benchmark Environment.....................115 9.2.2 MatrixVectorMul and InnerProd in Saber . . . . . . . . . . . 115 9.2.3 Performance of Saber.......................116 9.3 Cortex-A72 Results............................117 9.3.1 Benchmark Environment.....................117 9.3.2 Benchmark of NTT-Related Functions . . . . . . . . . . . . .117 9.3.3 MatrixVectorMul and InnerProd in Dilithium, Kyber, and Saber ...............................118 9.3.4 Performance of Dilithium, Kyber, and Saber . . . . . . . . . . 118 10 Future Works 121 10.1 Directions for Future Research . . . . . . . . . . . . . . . . . . . . . . 121 10.2 On Systemization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 10.3 OnVerification ..............................124 11 Bibiography..............127 | |
| dc.language.iso | en | |
| dc.subject | Cortex-A72 | zh_TW |
| dc.subject | 晶格密碼系統 | zh_TW |
| dc.subject | 數論變換 | zh_TW |
| dc.subject | 快速傅立葉變換 | zh_TW |
| dc.subject | Arm 組合語言 | zh_TW |
| dc.subject | Cortex-M3 | zh_TW |
| dc.subject | Cortex-M4 | zh_TW |
| dc.subject | lattice-based cryptosystem | en |
| dc.subject | Cortex-A72 | en |
| dc.subject | Cortex-M4 | en |
| dc.subject | Cortex-M3 | en |
| dc.subject | Arm assembly | en |
| dc.subject | fast Fourier transform | en |
| dc.subject | number–theoretic transform | en |
| dc.title | 使用Armv7-M、Armv7E-M、Armv8-A 實作數論變換之案例研究 | zh_TW |
| dc.title | Case Studies on Implementing Number-Theoretic Transforms with Armv7-M, Armv7E-M, and Armv8-A | en |
| dc.type | Thesis | |
| dc.date.schoolyear | 110-2 | |
| dc.description.degree | 碩士 | |
| dc.contributor.oralexamcommittee | 趙坤茂(Kun-Mao Chao),蕭旭君(Hsu-Chun Hsiao),楊柏因(Bo-Yin Yang) | |
| dc.subject.keyword | 晶格密碼系統,數論變換,快速傅立葉變換,Arm 組合語言,Cortex-M3,Cortex-M4,Cortex-A72, | zh_TW |
| dc.subject.keyword | lattice-based cryptosystem,number–theoretic transform,fast Fourier transform,Arm assembly,Cortex-M3,Cortex-M4,Cortex-A72, | en |
| dc.relation.page | 172 | |
| dc.identifier.doi | 10.6342/NTU202200738 | |
| dc.rights.note | 同意授權(全球公開) | |
| dc.date.accepted | 2022-06-23 | |
| dc.contributor.author-college | 電機資訊學院 | zh_TW |
| dc.contributor.author-dept | 資訊工程學研究所 | zh_TW |
| dc.date.embargo-lift | 2022-07-05 | - |
| 顯示於系所單位: | 資訊工程學系 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| U0001-0105202214311500.pdf | 1.25 MB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
