穩健聯邦式機器學習於惡意攻擊者和消極參與者之情境

陳智泓; Chih-Hung Chen

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/87528

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	王奕翔	zh_TW
dc.contributor.advisor	I-Hsiang Wang	en
dc.contributor.author	陳智泓	zh_TW
dc.contributor.author	Chih-Hung Chen	en
dc.date.accessioned	2023-06-14T16:10:22Z	-
dc.date.available	2026-02-10	-
dc.date.copyright	2023-06-14	-
dc.date.issued	2023	-
dc.date.submitted	2023-02-13	-
dc.identifier.citation	[1] Dan Alistarh, Zeyuan Allen-Zhu, and Jerry Li. Byzantine stochastic gradient descent. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, pages 4618–4628, 2018. [2] Zeyuan Allen-Zhu, Faeze Ebrahimianghazani, Jerry Li, and Dan Alistarh. Byzantine-resilient non-convex stochastic gradient descent. In International Conference on Learning Representations, 2020. [3] Ganesh Ananthanarayanan, Ali Ghodsi, Scott Shenker, and Ion Stoica. Effective straggler mitigation: Attack of the clones. In 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI 13), pages 185–198, 2013. [4] Gilad Baruch, Moran Baruch, and Yoav Goldberg. A little is enough: Circumventing defenses for distributed learning. Advances in Neural Information Processing Systems, 32, 2019. [5] Peva Blanchard, El Mahdi El Mhamdi, Rachid Guerraoui, and Julien Stainer. Machine learning with adversaries: Byzantine tolerant gradient descent. Advances in Neural Information Processing Systems, 30, 2017. [6] Lingjiao Chen, Hongyi Wang, Zachary Charles, and Dimitris Papailiopoulos. Draco: Byzantine-resilient distributed training via redundant gradients. In International Conference on Machine Learning, pages 903–912. PMLR, 2018. [7] Ashok Cutkosky and Francesco Orabona. Momentum-based variance reduction in non-convex sgd. Advances in neural information processing systems, 32, 2019. [8] Deepesh Data and Suhas Diggavi. Byzantine-tolerant distributed coordinate descent. In 2019 IEEE International Symposium on Information Theory (ISIT), pages 2724–2728. IEEE, 2019. [9] Deepesh Data and Suhas Diggavi. On byzantine-resilient high-dimensional stochastic gradient descent. In 2020 IEEE International Symposium on Information Theory (ISIT), pages 2628–2633. IEEE, 2020. [10] Deepesh Data and Suhas Diggavi. Byzantine-resilient high-dimensional sgd with local iterations on heterogeneous data. In International Conference on Machine Learning, pages 2478–2488. PMLR, 2021. [11] Deepesh Data and Suhas Diggavi. Byzantine-resilient sgd in high dimensions on heterogeneous data. In 2021 IEEE International Symposium on Information Theory (ISIT), pages 2310–2315. IEEE, 2021. [12] Deepesh Data, Linqi Song, and Suhas N Diggavi. Data encoding for byzantine- resilient distributed optimization. IEEE Transactions on Information Theory, 67(2):1117–1140, 2020. [13] Sagar Dhakal, Saurav Prakash, Yair Yona, Shilpa Talwar, and Nageen Himayat. Coded computing for distributed machine learning in wireless edge network. In 2019 IEEE 90th Vehicular Technology Conference (VTC2019-Fall), pages 1–6. IEEE, 2019. [14] Ilias Diakonikolas, Gautam Kamath, Daniel M Kane, Jerry Li, Ankur Moitra, and Alistair Stewart. Robustly learning a gaussian: Getting optimal error, efficiently. In Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 2683–2702. SIAM, 2018. [15] Hubert Eichner, Tomer Koren, Brendan McMahan, Nathan Srebro, and Kunal Talwar. Semi-cyclic stochastic gradient descent. In International Conference on Machine Learning, pages 1764–1773. PMLR, 2019. [16] Sadegh Farhadkhani, Rachid Guerraoui, Nirupam Gupta, Rafael Pinot, and John Stephan. Byzantine machine learning made easy by resilient averaging of momentums. In International Conference on Machine Learning. PMLR, 2022. [17] Eduard Gorbunov, Konstantin P Burlachenko, Zhize Li, and Peter Richtárik. Marina: Faster non-convex distributed learning with compression. In International Conference on Machine Learning, pages 3788–3798. PMLR, 2021. [18] Eduard Gorbunov, Samuel Horváth, Peter Richtárik, and Gauthier Gidel. Variance reduction is an antidote to byzantines: Better rates, weaker assumptions and communication compression as a cherry on the top. arXiv preprint arXiv:2206.00529, 2022. [19] Xinran Gu, Kaixuan Huang, Jingzhao Zhang, and Longbo Huang. Fast federated learning in the presence of arbitrary device unavailability. Advances in Neural Information Processing Systems, 34:12052–12064, 2021. [20] Andrew Hard, Kanishka Rao, Rajiv Mathews, Swaroop Ramaswamy, Franccoise Beaufays, Sean Augenstein, Hubert Eichner, Chloé Kiddon, and Daniel Ramage. Federated learning for mobile keyboard prediction. arXiv preprint arXiv:1811.03604, 2018. [21] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. [22] Peter J Huber. Robust estimation of a location parameter. In Breakthroughs in statistics, pages 492–518. Springer, 1992. [23] Sai Praneeth Karimireddy, Lie He, and Martin Jaggi. Byzantine-robust learning on heterogeneous datasets via bucketing. arXiv preprint arXiv:2006.09365, 2020. [24] Sai Praneeth Karimireddy, Satyen Kale, Mehryar Mohri, Sashank Reddi, Sebastian Stich, and Ananda Theertha Suresh. Scaffold: Stochastic controlled averaging for federated learning. In International Conference on Machine Learning, pages 5132–5143. PMLR, 2020. [25] Jakub Konevcnỳ, Brendan McMahan, and Daniel Ramage. Federated optimization: Distributed optimization beyond the datacenter. arXiv preprint arXiv:1511.03575, 2015. [26] Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. https://www.cs.toronto.edu/ kriz/learning-features-2009-TR.pdf, 2009. [27] Leslie Lamport, Robert Shostak, and Marshall Pease. The byzantine generals problem. ACM Transactions on Programming Languages and Systems, 4(3):382–401, 1982. [28] Yann LeCun. The mnist database of handwritten digits. http://yann.lecun.com/exdb/mnist/, 1998. [29] Chuan Li. Openai’s gpt-3 language model: A technical overview. https://lambdalabs.com/blog/demystifying-gpt-3, 2020. [30] Jerry Li. Robustness in machine learning (cse 599-m). https://jerryzli.github.io/robust-ml-fall19.html, 2019. [31] Tian Li, Anit Kumar Sahu, Manzil Zaheer, Maziar Sanjabi, Ameet Talwalkar, and Virginia Smith. Federated optimization in heterogeneous networks. Proceedings of Machine Learning and Systems, 2:429–450, 2020. [32] Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics, pages 1273–1282. PMLR, 2017. [33] NVIDIA. Training covid-19 patients: 20 hospitals in 20 days build ai model that predicts oxygen needs. https://blogs.nvidia.com/blog/2020/10/05/federated-learning-covid-oxygen-needs, 2020. [34] Jungwuk Park, Dong-Jun Han, Minseok Choi, and Jaekyun Moon. Sageflow: Robust federated learning against both stragglers and adversaries. Advances in Neural Information Processing Systems, 34:840–851, 2021. [35] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019. [36] Krishna Pillutla, Sham M Kakade, and Zaid Harchaoui. Robust aggregation for federated learning. IEEE Transactions on Signal Processing, 70:1142–1154, 2022. [37] Iosif Pinelis. Optimum bounds for the distributions of martingales in banach spaces. The Annals of Probability, pages 1679–1706, 1994. [38] Boris T Polyak. Some methods of speeding up the convergence of iteration methods. Ussr computational mathematics and mathematical physics, 4(5):1–17, 1964. [39] Shashank Rajput, Hongyi Wang, Zachary Charles, and Dimitris Papailiopoulos. Detox: A redundancy-based framework for faster and more robust gradient aggregation. Advances in Neural Information Processing Systems, 32, 2019. [40] Raed Abdel Sater and A Ben Hamza. A federated learning approach to anomaly detection in smart buildings. ACM Transactions on Internet of Things, 2(4):1–23, 2021. [41] Da Wang, Gauri Joshi, and Gregory Wornell. Efficient task replication for fast response times in parallel computation. In The 2014 ACM international conference on Measurement and modeling of computer systems, pages 599–600, 2014. [42] Shiqiang Wang and Mingyue Ji. A unified analysis of federated learning with arbitrary client participation. arXiv preprint arXiv:2205.13648, 2022. [43] Cong Xie, Oluwasanmi Koyejo, and Indranil Gupta. Fall of empires: Breaking byzantine-tolerant sgd by inner product manipulation. In Uncertainty in Artificial Intelligence, pages 261–270. PMLR, 2020. [44] Min Ye and Emmanuel Abbe. Communication-computation efficient gradient coding. In International Conference on Machine Learning, pages 5610–5619. PMLR, 2018. [45] Dong Yin, Yudong Chen, Ramchandran Kannan, and Peter Bartlett. Byzantine-robust distributed learning: Towards optimal statistical rates. In International Conference on Machine Learning, pages 5650–5659. PMLR, 2018. [46] Qian Yu, Mohammad Maddah-Ali, and Salman Avestimehr. Polynomial codes: an optimal design for high-dimensional coded matrix multiplication. Advances in Neural Information Processing Systems, 30, 2017. [47] Jingzhao Zhang, Sai Praneeth Karimireddy, Andreas Veit, Seungyeon Kim, Sashank Reddi, Sanjiv Kumar, and Suvrit Sra. Why are adaptive methods good for attention models? Advances in Neural Information Processing Systems, 33:15383–15393, 2020.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/87528	-
dc.description.abstract	在拜占庭聯邦式機器學習的問題設定中，有一個參數伺服器(parameter server)負責協調多個客戶 (clients) --也稱作工作機器 (worker machines)-- 去訓練出一個機器學習模型。然而，有一部分的客戶可能是拜占庭 (Byzantine client) --也稱作惡意攻擊者--，這些惡意攻擊者會傳送錯誤的運算結果給伺服器，目的是避免訓練表現的收斂。本論文考慮一個實際且具有挑戰性的設定，除了惡意攻擊者之外，其他非惡意攻擊者的客戶可能是消極參與者 (straggling client)。這些消極參與者雖然會按照伺服器的請求 (request)，回覆正確的運算結果，但是因為運算速度和網路延遲等因素，消極參與者無法回覆伺服器的所有請求。儘管從實務的角度，同時容許惡意攻擊者和消極參與者的聯邦式學習問題至關重要，我們透過文獻回顧發現既有的演算法並不讓人滿意，主要的原因有以下兩者。首先，對於那些宣稱能夠同時容許惡意攻擊者和消極參與者的既有演算法，他們在理論上的錯誤率並不能保證會隨著迭代次數而遞減。再者，對於那些只考慮惡意攻擊者的既有演算法，在我們設計的實驗設定之下，他們的模型錯誤率會隨著迭代次數而發散。這些只考慮惡意攻擊的演算法之所以會失敗的原因，其實是因為落後者的存在造成非惡意攻擊者們的運算結果之間的變異量變大，因此，間接使得惡意攻擊者有更多的空間去設計錯誤的回傳結果，來達到影響力更大的攻擊。本論文也利用這個特性，設計了新的拜占庭攻擊去規避既有的聯邦式學習演算法。為了要實現穩健的聯邦式機器學習演算法，我們提出了一個以填補為概念的方法叫作 FILLUP ，使用不同的向量去填補消極參與者來不及回覆的運算結果，以削弱消極參與者的影響。此外，針對那些獨立和同分布 (independent and identically distributed) 的消極參與者，我們提出了 REUPLOAD 方法，透過安排客戶們的重傳次數，來削弱惡意參與者的影響。在理論和實務上，我們利用 FILLUP 和 REUPLOAD 方法，讓只考慮惡意攻擊的既有演算法延伸到有消極參與者的問題設定。假設隨機梯度的雜訊的變異量有上界，並考慮非凸函數 (non-convex) 的最佳化問題，無論消極參與者是獨立且同分布的或是敵對的 (adversarial)，我們提出的方法都保證理論上的錯誤率會隨著迭代次數遞減。假設隨機梯度和母體梯度的差有上界，並考慮一般凸函數(generally-convex) 或是強凸函數 (strongly-convex) 的最佳化問題，在消極參與者是獨立且同分布的情況下，我們提出的方法保證理論上的錯誤率會隨著迭代次數遞減，並且吻合下界的次第 (order)。我們也在實務的資料集設計實驗，實驗結果顯示儘管在有消極參與者的情況下，我們提出的方法能有效地抵擋許多不同的拜占庭惡意攻擊。	zh_TW
dc.description.abstract	In Byzantine-robust federated learning, a parameter server coordinates multiple clients --known as worker machines-- to train a learning model. However, a fraction of these clients may be compromised (called Byzantines), which may send incorrect responses to the server with the aim to prevent the convergence of the learning process. In this work, we consider a more practical but more challenging setting that allows the non-Byzantine clients to be straggling clients, or stragglers, which always response correctly but fail to respond to a proportion of the requests from the server. Although robustness in the presence of both kinds of faulty clients (Byzantine and stragglers) is a crucial issue in a real-world federated learning system, we argue that none of existing robust algorithms are satisfactory, due to the following two-fold reason. First, for those existing algorithms which tolerate both kinds of faulty clients, all of them cannot guarantee a vanishing error rate in theory. Secondly, for those existing algorithms which only tolerate Byzantine clients, all of them exhibit divergent error under some mild straggler-presence settings in our empirical evaluation. The underlying reason for the failure of existing Byzantine-robust algorithms is that the presence of stragglers leads to an extra variance among the updates of non-Byzantine clients, which leaves more room for Byzantine attacks. In particular, we design a new simple Byzantine attack that utilizes the advantage of the presence of stragglers and circumvent existing Byzantine-robust algorithms. To fulfill a true sense of robustness against both faulty clients, we propose FILLUP , a filler-based scheme that fills up the missing updates from stragglers to mitigate the extra variance incurred from stragglers. Moreover, we propose REUPLOAD , which is a client-scheduling scheme tailored for the stragglers whose straggling pattern is independent and identically distributed (i.i.d.) across the requests from the server. Our schemes can adapt existing robust algorithms to straggler-presence setting both in theory and in practice. With the variance-bounded noise assumption, our algorithm guarantees a vanishing error rate against both adversarial and i.i.d. stragglers for non-convex objective functions. With the deviation-bounded noise assumption, our algorithm guarantees a tight, vanishing error rate against i.i.d. stragglers for general-convex and strongly-convex objective functions. Experimental results show that our schemes are effective against various attacks for real-world datasets.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-06-14T16:10:22Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2023-06-14T16:10:22Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	誌謝 i 摘要 iii Abstract v Contents vii List of Tables xiii Chapter 1 Introduction 1 1.1 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.1.1 Summary of Theoretical Results . . . . . . . . . . . . . . . . . . . 6 1.2 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Chapter 2 Background 11 2.1 Stochastic Optimization . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2 Federated Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.3 Robust Federated Learning . . . . . . . . . . . . . . . . . . . . . . . 15 2.3.1 Byzantine Client . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.3.2 Straggling Client . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 Chapter 3 Failures of Existing Byzantine-Robust Methods 19 3.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.1.1 History-Aware Methods . . . . . . . . . . . . . . . . . . . . . . . . 21 3.1.2 Straggling Pattern and FILLUP . . . . . . . . . . . . . . . . . . . . 25 3.1.3 Byzantine-Robust Aggregators . . . . . . . . . . . . . . . . . . . . 26 3.1.4 Byzantine Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.1.4.1 Non-Omniscient Attacks . . . . . . . . . . . . . . . . 29 3.1.4.2 Omniscient Attacks . . . . . . . . . . . . . . . . . . . 30 3.2 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.2.1 Logistic Regression on MNIST . . . . . . . . . . . . . . . . . . . . 34 3.2.2 ResNet-20 on CIFAR-10 . . . . . . . . . . . . . . . . . . . . . . . 38 3.2.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 Chapter 4 Byzantine-Robustness with I.I.D. Stragglers 43 4.1 Convergence Rate of RF-DSGDM . . . . . . . . . . . . . . . . . . . 44 4.1.1 RF-DSGDM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 4.1.2 Convergence Rate for Non-Convex Problem . . . . . . . . . . . . . 46 4.1.3 Proof of Theorem 4.1 . . . . . . . . . . . . . . . . . . . . . . . . . 48 4.1.3.1 REUPLOAD to Ensure Byzantine Minority . . . . . . . . 49 4.1.3.2 Trade-off between Communication and Convergence . 53 4.1.3.3 Convergence Analysis . . . . . . . . . . . . . . . . . . 58 4.2 Convergence Rate of RF-BYZSGD . . . . . . . . . . . . . . . . . . 62 4.2.1 RF-BYZSGD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.2.2 Convergence Rate for Generally-Convex Problem . . . . . . . . . . 64 4.2.3 Convergence Rate for Strongly-Convex Problem . . . . . . . . . . 67 4.2.4 Proof of Theorem 4.2 . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.2.4.1 Bounding VarErr and BiasErr by HA-AGG . . . . . . . 70 4.2.4.2 FILLUP Adapts HA-AGG to Straggler-Present Setting . . 73 4.2.4.3 Trade-off between Communication and Convergence . 78 4.2.4.4 Convergence Analysis . . . . . . . . . . . . . . . . . . 82 4.2.5 Proof of Theorem 4.3 . . . . . . . . . . . . . . . . . . . . . . . . . 84 4.3 Lower Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 4.3.1 Generally-Convex Problem . . . . . . . . . . . . . . . . . . . . . . 87 4.3.2 Strongly-Convex Problem . . . . . . . . . . . . . . . . . . . . . . 90 Chapter 5 Byzantine-Robustness with Adversarial Stragglers 95 5.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . 95 5.2 Convergence Rate of RF-DSGDM for Non-Convex Problem . . . . . 98 5.2.1 Convergence Rate for Non-Convex Problem . . . . . . . . . . . . . 98 Chapter 6 Experiments 101 6.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 6.2 Impact of REUPLOAD . . . . . . . . . . . . . . . . . . . . . . . . . . 103 6.2.1 Logistic Regression on MNIST . . . . . . . . . . . . . . . . . . . . 103 6.2.2 ResNet-20 on CIFAR-10 . . . . . . . . . . . . . . . . . . . . . . . 104 Chapter 7 Conclusion and Future Works 107 7.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 7.2 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 References 109 Appendix A — Deferred Proofs from Section 4.1 117 A.1 Additional Notations . . . . . . . . . . . . . . . . . . . . . . . . . . 117 A.2 Technical Lemmas . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 A.3 Deferred Proofs of Lemmas . . . . . . . . . . . . . . . . . . . . . . 121 A.3.1 Proof of Lemma 4.1 . . . . . . . . . . . . . . . . . . . . . . . . . . 121 A.3.2 Proof of Lemma 4.3 . . . . . . . . . . . . . . . . . . . . . . . . . . 123 A.3.3 Proof of Lemma 4.4 . . . . . . . . . . . . . . . . . . . . . . . . . . 124 A.3.4 Proof of Lemma 4.5 . . . . . . . . . . . . . . . . . . . . . . . . . . 124 A.3.5 Proof of Lemma A.1 . . . . . . . . . . . . . . . . . . . . . . . . . 126 A.3.6 Proof of Lemma A.2 . . . . . . . . . . . . . . . . . . . . . . . . . 130 A.3.7 Proof of Lemma A.3 . . . . . . . . . . . . . . . . . . . . . . . . . 132 A.3.8 Proof of Example 4.2 . . . . . . . . . . . . . . . . . . . . . . . . . 134 A.3.9 Proof of Example 4.3 . . . . . . . . . . . . . . . . . . . . . . . . . 135 A.3.10 Proof of Example 4.4 . . . . . . . . . . . . . . . . . . . . . . . . . 136 Appendix B — Deferred Proofs from Section 4.2 139 B.1 Technical Lemmas . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 B.2 Deferred Proofs of Lemmas . . . . . . . . . . . . . . . . . . . . . . 139 B.2.1 Proof of Lemma 4.7 . . . . . . . . . . . . . . . . . . . . . . . . . . 140 B.2.2 Proof of Lemma 4.8 . . . . . . . . . . . . . . . . . . . . . . . . . . 142 B.2.3 Proof of Lemma 4.9 . . . . . . . . . . . . . . . . . . . . . . . . . . 145 B.2.4 Proof of Lemma 4.10 . . . . . . . . . . . . . . . . . . . . . . . . . 147 B.2.5 Proof of Lemma 4.11 . . . . . . . . . . . . . . . . . . . . . . . . . 147 B.2.6 Proof of Lemma 4.12 . . . . . . . . . . . . . . . . . . . . . . . . . 150 B.2.7 Proof of Lemma 4.13 . . . . . . . . . . . . . . . . . . . . . . . . . 153 B.2.8 Proof of Lemma 4.14 . . . . . . . . . . . . . . . . . . . . . . . . . 154 B.2.9 Proof of Example 4.5 . . . . . . . . . . . . . . . . . . . . . . . . . 156 B.2.10 Proof of Example 4.6 . . . . . . . . . . . . . . . . . . . . . . . . . 157 B.2.11 Proof of Example 4.7 . . . . . . . . . . . . . . . . . . . . . . . . . 158 B.2.12 Proof of Example 4.8 . . . . . . . . . . . . . . . . . . . . . . . . . 159 B.2.13 Proof of Lemma B.9 . . . . . . . . . . . . . . . . . . . . . . . . . 161 B.2.14 Proof of Lemma B.10 . . . . . . . . . . . . . . . . . . . . . . . . . 162 B.2.15 Proof of Lemma B.11 . . . . . . . . . . . . . . . . . . . . . . . . . 163 Appendix C — Deferred Proofs from Chapter 5 165 C.1 Proof of Theorem 5.1 . . . . . . . . . . . . . . . . . . . . . . . . . . 165 C.2 Deferred Proofs of Lemmas . . . . . . . . . . . . . . . . . . . . . . 171 Appendix D — Proof of Lower Bounds 175 D.1 Technical Lemmas . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 D.2 Proof of Theorem 4.4 . . . . . . . . . . . . . . . . . . . . . . . . . . 180 D.3 Proof of Theorem 4.5 . . . . . . . . . . . . . . . . . . . . . . . . . . 182 D.4 Deferred Proofs of Lemmas . . . . . . . . . . . . . . . . . . . . . . 185 D.4.1 Proof of Lemma D.17 . . . . . . . . . . . . . . . . . . . . . . . . . 185 D.4.2 Proof of Lemma D.18 . . . . . . . . . . . . . . . . . . . . . . . . . 186 D.4.3 Proof of Lemma D.19 . . . . . . . . . . . . . . . . . . . . . . . . . 187 D.4.4 Proof of Lemma D.22 . . . . . . . . . . . . . . . . . . . . . . . . . 190 D.4.5 Proof of Lemma D.21 . . . . . . . . . . . . . . . . . . . . . . . . . 192 Appendix E — Additional Experiments 195 E.1 Weak Sense of Stragglers . . . . . . . . . . . . . . . . . . . . . . . . 195	-
dc.language.iso	en	-
dc.subject	故障容許	zh_TW
dc.subject	聯邦式機器學習	zh_TW
dc.subject	分散式機器學習	zh_TW
dc.subject	消極參與容許	zh_TW
dc.subject	拜占庭惡意攻擊容許	zh_TW
dc.subject	隨機最佳化	zh_TW
dc.subject	Stochastic optimization	en
dc.subject	Distributed learning	en
dc.subject	Byzantine-robustness	en
dc.subject	Straggler-robustness	en
dc.subject	Fault-tolerance	en
dc.subject	Federated learning	en
dc.title	穩健聯邦式機器學習於惡意攻擊者和消極參與者之情境	zh_TW
dc.title	Robust Federated Learning in the Presence of Byzantine and Straggling Clients	en
dc.type	Thesis	-
dc.date.schoolyear	111-1	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	洪樂文;林士駿	zh_TW
dc.contributor.oralexamcommittee	Yao-Win Hong;Shih-Chun Lin	en
dc.subject.keyword	聯邦式機器學習,分散式機器學習,拜占庭惡意攻擊容許,消極參與容許,故障容許,隨機最佳化,	zh_TW
dc.subject.keyword	Federated learning,Distributed learning,Byzantine-robustness,Straggler-robustness,Fault-tolerance,Stochastic optimization,	en
dc.relation.page	196	-
dc.identifier.doi	10.6342/NTU202300335	-
dc.rights.note	同意授權(限校園內公開)	-
dc.date.accepted	2023-02-15	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	電信工程學研究所	-
dc.date.embargo-lift	2028-02-10	-
顯示於系所單位：	電信工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-111-1.pdf 未授權公開取用	1.02 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。