基於區塊鏡射正交層和對數機率退火損失函數構造具可認證穩健性的深度神經網路

賴柏翰; Bo-Han Lai

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/97730

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	陳尚澤	zh_TW
dc.contributor.advisor	Shang-Tse Chen	en
dc.contributor.author	賴柏翰	zh_TW
dc.contributor.author	Bo-Han Lai	en
dc.date.accessioned	2025-07-16T16:05:04Z	-
dc.date.available	2025-07-17	-
dc.date.copyright	2025-07-16	-
dc.date.issued	2025	-
dc.date.submitted	2025-07-05	-
dc.identifier.citation	[1] Chien-Yao Wang, Alexey Bochkovskiy, and Hong-Yuan Mark Liao. Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv preprint arXiv:2207.02696, 2022. [2] Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. In Advances in Neural Information Processing Systems (NeurIPS), 2020. [3] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. In International Conference on Learning Representations (ICLR), 2014. [4] Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014. [5] Nicholas Carlini and David Wagner. Audio adversarial examples: Targeted attacks on speech-to-text. In IEEE security and privacy workshops (SPW), 2018. [6] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras,and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations (ICLR), 2018. [7] Ali Shafahi, Mahyar Najibi, Mohammad Amin Ghiasi, Zheng Xu, John Dickerson, Christoph Studer, Larry S Davis, Gavin Taylor, and Tom Goldstein. Adversarial training for free! In Advances in Neural Information Processing Systems (NeurIPS), 2019. [8] Zekai Wang, Tianyu Pang, Chao Du, Min Lin, Weiwei Liu, and Shuicheng Yan. Better diffusion models further improve adversarial training. In International Conference on Machine Learning (ICML), 2023. [9] Pouya Samangouei, Maya Kabkab, and Rama Chellappa. Defense-gan: Protecting classifiers against adversarial attacks using generative models. In International Conference on Learning Representations (ICLR), 2018. [10] Nilaksh Das, Madhuri Shanbhogue, Shang-Tse Chen, Fred Hohman, Siwei Li, Li Chen, Michael E Kounavis, and Duen Horng Chau. Shield: Fast, practical defense and vaccination for deep learning using jpeg compression. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD), 2018. [11] Minjong Lee and Dongwoo Kim. Robust evaluation of diffusion-based adversarial purification. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023. [12] Linyi Li, Tao Xie, and Bo Li. Sok: Certified robustness for deep neural networks. In 2023 IEEE symposium on security and privacy (SP), 2023. [13] Jeremy Cohen, Elan Rosenfeld, and Zico Kolter. Certified adversarial robustness via randomized smoothing. In International Conference on Machine Learning (ICML), 2019. [14] Mathias Lecuyer, Vaggelis Atlidakis, Roxana Geambasu, Daniel Hsu, and Suman Jana. Certified robustness to adversarial examples with differential privacy. In 2019 IEEE Symposium on Security and Privacy (SP), 2019. [15] Greg Yang, Tony Duan, J Edward Hu, Hadi Salman, Ilya Razenshteyn, and Jerry Li. Randomized smoothing of all shapes and sizes. In International Conference on Machine Learning (ICML), 2020. [16] Ruediger Ehlers. Formal verification of piece-wise linear feed-forward neural networks. In International Symposium on Automated Technology for Verification and Analysis (ATVA), 2017. [17] Sven Gowal, Krishnamurthy Dvijotham, Robert Stanforth, Rudy Bunel, Chongli Qin, Jonathan Uesato, Relja Arandjelovic, Timothy Mann, and Pushmeet Kohli. On the effectiveness of interval bound propagation for training verifiably robust models. arXiv preprint arXiv:1810.12715, 2018. [18] Mark Niklas Mueller, Franziska Eckert, Marc Fischer, and Martin Vechev. Certified training: Small boxes are all you need. In The Eleventh International Conference on Learning Representations (ICLR), 2023. [19] Zhouxing Shi, Yihan Wang, Huan Zhang, J Zico Kolter, and Cho-Jui Hsieh. Eﬀiciently computing local Lipschitz constants of neural networks via bound propagation. In Advances in Neural Information Processing Systems (NeurIPS), 2022. [20] Shiqi Wang, Huan Zhang, Kaidi Xu, Xue Lin, Suman Jana, Cho-Jui Hsieh, and J Zico Kolter. Beta-crown: Eﬀicient bound propagation with per-neuron split constraints for neural network robustness verification. In Advances in Neural Information Processing Systems (NeurIPS), 2021. [21] Huan Zhang, Shiqi Wang, Kaidi Xu, Linyi Li, Bo Li, Suman Jana, Cho-Jui Hsieh, and J Zico Kolter. General cutting planes for bound-propagation-based neural network verification. In Advances in Neural Information Processing Systems (NeurIPS), 2022. [22] Yusuke Tsuzuku, Issei Sato, and Masashi Sugiyama. Lipschitz-margin training: Scalable certification of perturbation invariance for deep neural networks. In Advances in Neural Information Processing Systems (NeurIPS), 2018. [23] Qiyang Li, Saminul Haque, Cem Anil, James Lucas, Roger B Grosse, and Jörn-Henrik Jacobsen. Preventing gradient attenuation in Lipschitz constrained convolutional networks. In Advances in Neural Information Processing Systems (NeurIPS), 2019. [24] Aladin Virmaux and Kevin Scaman. Lipschitz regularity of deep neural networks: analysis and eﬀicient estimation. In Advances in Neural Information Processing Systems (NeurIPS), 2018. [25] Cem Anil, James Lucas, and Roger Grosse. Sorting out Lipschitz function approximation. In International Conference on Machine Learning (ICML), 2019. [26] Artem Chernodub and Dimitri Nowicki. Norm-preserving orthogonal permutation linear unit activation functions (OPLU). arXiv preprint arXiv:1604.02313, 2016. [27] Jan Müller, Reinhard Klein, and Michael Weinmann. Orthogonal wasserstein gans. arXiv preprint arXiv:1911.13060, 2019. [28] Lechao Xiao, Yasaman Bahri, Jascha Sohl-Dickstein, Samuel Schoenholz, and Jeffrey Pennington. Dynamical isometry and a mean field theory of cnns: How to train 10,000-layer vanilla convolutional neural networks. In International Conference on Machine Learning (ICML), 2018. [29] Haozhi Qi, Chong You, Xiaolong Wang, Yi Ma, and Jitendra Malik. Deep isometric learning for visual recognition. In International Conference on Machine Learning (ICML), 2020. [30] Asher Trockman and J Zico Kolter. Orthogonalizing convolutional layers with the Cayley transform. In International Conference on Learning Representations (ICLR), 2021. [31] Sahil Singla and Soheil Feizi. Skew orthogonal convolutions. In International Conference on Machine Learning (ICML), 2021. [32] Xiaojun Xu, Linyi Li, and Bo Li. LOT: Layer-wise orthogonal training on improving l2 certified robustness. In Advances in Neural Information Processing Systems (NeurIPS), 2022. [33] Peter H Schönemann. A generalized solution of the orthogonal procrustes problem. Psychometrika, 31(1):1–10, 1966. [34] Tan Yu, Jun Li, Yunfeng Cai, and Ping Li. Constructing orthogonal convolutions in an explicit manner. In International Conference on Learning Representations (ICLR), 2022. [35] Kai Hu, Klas Leino, Zifan Wang, and Matt Fredrikson. A recipe for improved certifiable robustness. In The Twelfth International Conference on Learning Representations (ICLR), 2024. [36] Bernd Prach and Christoph H Lampert. Almost-orthogonal layers for eﬀicient general-purpose Lipschitz networks. In European Conference on Computer Vision (ECCV), 2022. [37] Laurent Meunier, Blaise J Delattre, Alexandre Araujo, and Alexandre Allauzen. A dynamical system perspective for Lipschitz neural networks. In International Conference on Machine Learning (ICML), 2022. [38] Alexandre Araujo, Aaron J Havens, Blaise Delattre, Alexandre Allauzen, and Bin Hu. A unified algebraic perspective on Lipschitz neural networks. In International Conference on Learning Representations (ICLR), 2023. [39] Ruigang Wang and Ian Manchester. Direct parameterization of Lipschitz-bounded deep networks. In International Conference on Machine Learning (ICML), 2023. [40] Mahyar Fazlyab, Alexander Robey, Hamed Hassani, Manfred Morari, and George Pappas. Eﬀicient and accurate estimation of Lipschitz constants for deep neural networks. In Advances in Neural Information Processing Systems (NeurIPS), 2019. [41] Farzan Farnia, Jesse Zhang, and David Tse. Generalizable adversarial training via spectral normalization. In International Conference on Learning Representations (ICLR), 2019. [42] Klas Leino, Zifan Wang, and Matt Fredrikson. Globally-robust neural networks. In International Conference on Machine Learning (ICML), 2021. [43] Kai Hu, Andy Zou, Zifan Wang, Klas Leino, and Matt Fredrikson. Unlocking deterministic robustness certification on Imagenet. In Advances in Neural Information Processing Systems (NeurIPS), 2023. [44] Sahil Singla, Surbhi Singla, and Soheil Feizi. Improved deterministic l2 robustness on CIFAR-10 and CIFAR-100. In International Conference on Learning Representations (ICLR), 2022. [45] Sahil Singla and Soheil Feizi. Improved techniques for deterministic l2 robustness. In Advances in Neural Information Processing Systems (NeurIPS), 2022. [46] G Dietrich. A new formulation of the hypermatrix householder-QR decomposition. Computer Methods in Applied Mechanics and Engineering, 9(3):273–280, 1976. [47] Robert Schreiber and Beresford Parlett. Block reflectors: Theory and computation. SIAM Journal on Numerical Analysis, 25(1):189–205, 1988. [48] El Mehdi Achour, FranÃ§ois Malgouyres, and Franck Mamalet. Existence, stability and scalability of orthogonal convolutional neural networks. Journal of Machine Learning Research (JMLR), 2022. [49] Anil K Jain. Fundamentals of digital image processing. Prentice-Hall, Inc., 1989. [50] Louis Béthune, Thibaut Boissin, Mathieu Serrurier, Franck Mamalet, Corentin Friedrich, and Alberto Gonzalez Sanz. Pay attention to your loss: understand-ing misconceptions about Lipschitz neural networks. In Advances in Neural Information Processing Systems (NeurIPS), 2022. [51] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2015. [52] Peter L Bartlett, Dylan J Foster, and Matus J Telgarsky. Spectrally-normalized margin bounds for neural networks. In Advances in Neural Information Processing Systems (NeurIPS), 2017. [53] Michel Ledoux and Michel Talagrand. isoperimetry and processes. Springer Science & Business Media, 2013. Probability in Banach Spaces: [54] Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollar. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017. [55] Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. Elucidating the design space of diffusion-based generative models. In Advances in Neural Information Processing Systems (NeurIPS), 2022. [56] Tero Karras, Miika Aittala, Jaakko Lehtinen, Janne Hellsten, Timo Aila, and Samuli Laine. Analyzing and improving the training dynamics of diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024. [57] Tero Karras, Miika Aittala, Tuomas Kynkäänniemi, Jaakko Lehtinen, Timo Aila, and Samuli Laine. Guiding a diffusion model with a bad version of itself. In Advances in Neural Information Processing Systems (NeurIPS), 2024. [58] Bernd Prach, Fabio Brau, Giorgio Buttazzo, and Christoph H Lampert. 1-Lipschitz layers compared: Memory, speed, and certifiable robustness. arXiv preprint arXiv:2311.16833, 2023. [59] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems (NeurIPS), 2019. [60] S Singla and S Feizi. Fantastic four: Differentiable bounds on singular values of convolution layers. In International Conference on Learning Representations (ICLR), 2021. [61] Timothy Dozat. Incorporating Nesterov momentum into Adam. International Conference on Learning Representations workshop (ICLR workshop), 2016. [62] Michael Zhang, James Lucas, Jimmy Ba, and Geoffrey E Hinton. Lookahead optimizer: k steps forward, 1 step back. In Advances in Neural Information Processing Systems (NeurIPS), 2019. [63] Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar. Foundations of machine learning. MIT press, 2018. [64] Logan Engstrom, Andrew Ilyas, and Anish Athalye. Evaluating and understanding the robustness of adversarial logit pairing. arXiv preprint arXiv:1807.10272, 2018. [65] Tianlong Chen, Zhenyu Zhang, Sijia Liu, Shiyu Chang, and Zhangyang Wang. Robust overfitting may be mitigated by properly learned smoothening. In International Conference on Learning Representations (ICLR), 2021. [66] Francesco Croce and Matthias Hein. Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In International Conference on Machine Learning (ICML), 2020. [67] Yuhao Mao, Mark Müller, Marc Fischer, and Martin Vechev. Connecting certified and adversarial training. In Advances in Neural Information Processing Systems (NeurIPS), 2023. [68] Edward J Hu, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models. In International Conference on Learning Representations (ICLR), 2022.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/97730	-
dc.description.abstract	Lipschitz 神經網路在深度學習中以具備可證明的穩定性而聞名。本研究提出一種新穎且高效的區塊鏡射正交（Block Reflector Orthogonal，簡稱 BRO）層，進一步提升正交層在建構更具表現力的 Lipschitz 神經網路架構上的能力。此外，我們從理論角度分析 Lipschitz 神經網路的特性，並引入一種新的損失函數，透過退火機制來擴大大多數資料點的分類間隔。這使得 Lipschitz 模型在分類結果的可認證穩健性上表現更佳。結合我們提出的 BRO 層與損失函數，我們設計出 BRONet ——一個簡潔且高效的 Lipschitz 神經網路，能達成目前最佳的可認證穩健性。我們在 CIFAR-10/100、Tiny-ImageNet 以及 ImageNet 上進行大量實驗與實證分析，結果顯示我們的方法優於現有的基準模型。	zh_TW
dc.description.abstract	Lipschitz neural networks are well-known for providing certified robustness in deep learning. In this paper, we present a novel, eﬀicient Block Reflector Orthogonal (BRO) layer that enhances the capability of orthogonal layers on constructing more expressive Lipschitz neural architectures. In addition, by theoretically analyzing the nature of Lipschitz neural networks, we introduce a new loss function that employs an annealing mechanism to increase margin for most data points. This enables Lipschitz models to provide better certified robustness. By employing our BRO layer and loss function, we design BRONet — a simple yet effective Lipschitz neural network that achieves state-of-the-art certified robustness. Extensive experiments and empirical analysis on CIFAR-10/100, Tiny-ImageNet, and ImageNet validate that our method outperforms existing baselines.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2025-07-16T16:05:04Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2025-07-16T16:05:04Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	Verification Letter from the Oral Examination Committee i Acknowledgements iii 摘要 v Abstract vii Contents ix List of Figures xiii List of Tables xv Denotation xvii Chapter 1 Introduction 1 Chapter 2 Preliminaries 5 2.1 Certified Robustness with Lipschitz Neural Networks . . . . . . . 5 2.2 Lipschitz Constant Control & Orthogonality . . . . . . . . . . . . 6 Chapter 3 Related Work 9 3.1 Orthogonal Layers . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.2 Other 1-Lipschitz Layers . . . . . . . . . . . . . . . . . . . . . . . 10 3.3 Lipschitz Regularization . . . . . . . . . . . . . . . . . . . . . . . 10 3.4 Training Techniques of Lipschitz Neural Networks . . . . . . . . . 11 Chapter 4 Methodology 13 4.1 BRO: Block Reflector Orthogonal Layer . . . . . . . . . . . . . . 13 4.1.1 Low-rank Orthogonal Parameterization Scheme . . . . . . . . . 13 4.1.2 Properties of BRO Layer . . . . . . . . . . . . . . . . . . . . . . 16 4.1.2.1 Iterative Approximation-Free . . . . . . . . . . . . . 16 4.1.2.2 Time and Memory Eﬀiciency . . . . . . . . . . . . . 17 4.1.2.3 Non-universal Orthogonal Parameterization . . . . . 18 4.2 Logit Annealing Loss Function . . . . . . . . . . . . . . . . . . . 18 Chapter 5 Experiments 23 5.1 Main Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 5.2 Ablation Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 5.2.1 Extra Diffusion Data Augmentation . . . . . . . . . . . . . . . . 25 5.2.2 Backbone Comparison . . . . . . . . . . . . . . . . . . . . . . . 26 5.2.3 LipConvNet Benchmark . . . . . . . . . . . . . . . . . . . . . . 27 5.2.4 Improvement on ImageNet . . . . . . . . . . . . . . . . . . . . . 28 5.3 LA Loss Effectiveness . . . . . . . . . . . . . . . . . . . . . . . . 28 Chapter 6 Conclusion 31 References 33 Appendix A — BRO Layer Analysis 43 A.1 Proof of Proposition 1 . . . . . . . . . . . . . . . . . . . . . . . . 43 A.2 Proof of Proposition 2 . . . . . . . . . . . . . . . . . . . . . . . . 45 A.3 The Effect of Zero-padding . . . . . . . . . . . . . . . . . . . . . 49 A.4 Analysis of Semi-Orthogonal BRO Layer . . . . . . . . . . . . . . 50 A.5 Complexity Comparison of Orthogonal Layers . . . . . . . . . . . 53 Appendix B — Implementation Details 57 B.1 Computational Resources . . . . . . . . . . . . . . . . . . . . . . 57 B.2 Architecture Details . . . . . . . . . . . . . . . . . . . . . . . . . 58 B.3 B.4 B.5 Architecture and Rank-n Configuration . . . . . . . . . . . . . . 62 LA Hyper-parameters . . . . . . . . . . . . . . . . . . . . . . . . 62 Table 5.1 Details . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 B.6 Table 5.2 Details . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 B.7 Table 5.3 Details . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 B.8 Table 5.4 Details . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 Appendix C — Logit Annealing Loss Function C.1 65 Proof of Theorem 1 . . . . . . . . . . . . . . . . . . . . . . . . . 65 C.2 CR Loss Risk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 C.3 CR Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 C.4 Annealing Mechanism . . . . . . . . . . . . . . . . . . . . . . . . 71 Appendix D — Additional Experiments D.1 D.2 D.3 D.4 D.5 D.6 73 Empirical Robustness . . . . . . . . . . . . . . . . . . . . . . . . 73 BRO Rank-n Ablation Experiments . . . . . . . . . . . . . . . . 74 LA Loss Ablation Experiments . . . . . . . . . . . . . . . . . . . 75 LA Loss Hyper-parameters Experiments . . . . . . . . . . . . . . 77 LipConvNet Ablation Experiments . . . . . . . . . . . . . . . . . 78 Instability of LOT Parameterization . . . . . . . . . . . . . . . . 79 Appendix E — Limitations 81 Appendix F — Impact Statement 83	-
dc.language.iso	en	-
dc.subject	對抗性防禦	zh_TW
dc.subject	可認證穩健性	zh_TW
dc.subject	Lipschitz 神經網路	zh_TW
dc.subject	Lipschitz neural network	en
dc.subject	adversarial defense	en
dc.subject	certified robustness	en
dc.title	基於區塊鏡射正交層和對數機率退火損失函數構造具可認證穩健性的深度神經網路	zh_TW
dc.title	Enhancing Certified Robustness via Block Reflector Orthogonal Layers and Logit Annealing Loss	en
dc.type	Thesis	-
dc.date.schoolyear	113-2	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	游家牧;林忠緯	zh_TW
dc.contributor.oralexamcommittee	Chia-Mu Yu;Chung-Wei Lin	en
dc.subject.keyword	對抗性防禦,可認證穩健性,Lipschitz 神經網路,	zh_TW
dc.subject.keyword	adversarial defense,certified robustness,Lipschitz neural network,	en
dc.relation.page	83	-
dc.identifier.doi	10.6342/NTU202501507	-
dc.rights.note	未授權	-
dc.date.accepted	2025-07-08	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	資訊工程學系	-
dc.date.embargo-lift	N/A	-
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-113-2.pdf 未授權公開取用	6.94 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。