基於參數伺服器之雙批量尺寸學習

Kuan-Wei Lu; 盧冠維

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/80203

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	劉邦鋒(Pangfeng Liu)
dc.contributor.author	Kuan-Wei Lu	en
dc.contributor.author	盧冠維	zh_TW
dc.date.accessioned	2022-11-23T09:31:37Z	-
dc.date.available	2022-02-21
dc.date.available	2022-11-23T09:31:37Z	-
dc.date.copyright	2022-02-21
dc.date.issued	2022
dc.date.submitted	2022-01-27
dc.identifier.citation	[1] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, et al. Tensorflow: A system for largescale machine learning. In 12th {USENIX} symposium on operating systems design and implementation ({OSDI} 16), pages 265–283, 2016. [2] J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, M. Mao, M. Ranzato, A. Senior, P. Tucker, K. Yang, et al. Large scale distributed deep networks. Advances in neural information processing systems, 25:1223–1231, 2012. [3] P. Goyal, P. Dollár, R. Girshick, P. Noordhuis, L. Wesolowski, A. Kyrola, A. Tulloch, Y. Jia, and K. He. Accurate, large minibatch sgd: Training imagenet in 1 hour. arXiv preprint arXiv:1706.02677, 2017. [4] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. [5] Q. Ho, J. Cipar, H. Cui, S. Lee, J. K. Kim, P. B. Gibbons, G. A. Gibson, G. Ganger, and E. P. Xing. More effective distributed ml via a stale synchronous parallel parameter server. In Advances in neural information processing systems, pages 1223–1231, 2013. [6] Z. Hu, J. Xiao, Z. Tian, X. Zhang, H. Zhu, C. Yao, N. Sun, and G. Tan. A variable batch size strategy for large scale distributed dnn training. In 2019 IEEE Intl Conf on Parallel Distributed Processing with Applications, Big Data Cloud Computing, Sustainable Computing Communications, Social Computing Networking (ISPA/BDCloud/SocialCom/SustainCom), pages 476–485. IEEE, 2019. [7] N. S. Keskar, D. Mudigere, J. Nocedal, M. Smelyanskiy, and P. T. P. Tang. On largebatch training for deep learning: Generalization gap and sharp minima. In International Conference on Learning Representations, 2017. [8] A. Krizhevsky. One weird trick for parallelizing convolutional neural networks. arXiv preprint arXiv:1404.5997, 2014. [9] A. Krizhevsky, G. Hinton, et al. Learning multiple layers of features from tiny images. Citeseer, 2009. [10] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. Communications of the ACM, 60(6):84–90, 2017. [11] M. Li, D. G. Andersen, J. W. Park, A. J. Smola, A. Ahmed, V. Josifovski, J. Long, E. J. Shekita, and B.Y. Su. Scaling distributed machine learning with the parameter server. In 11th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 14), pages 583–598, 2014. [12] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.Y. Fu, and A. C. Berg. Ssd: Single shot multibox detector. In European conference on computer vision, pages 21–37. Springer, 2016. [13] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al. Pytorch: An imperative style, highperformance deep learning library. Advances in neural information processing systems, 32:8026–8037, 2019. [14] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi. You only look once: Unified, realtime object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 779–788, 2016. [15] S. Ren, K. He, R. Girshick, and J. Sun. Faster rcnn: towards realtime object detection with region proposal networks. IEEE transactions on pattern analysis and machine intelligence, 39(6):1137–1149, 2016. [16] K. Simonyan and A. Zisserman. Very deep convolutional networks for largescale image recognition. arXiv preprint arXiv:1409.1556, 2014. [17] S. L. Smith, P.J. Kindermans, C. Ying, and Q. V. Le. Don’t decay the learning rate, increase the batch size. In International Conference on Learning Representations, 2018. [18] T. Takase, S. Oyama, and M. Kurihara. Why does large batch training result in poor generalization? a comprehensive explanation and a better strategy from the viewpoint of stochastic optimization. Neural computation, 30(7):2005–2023, 2018. [19] P. Yin, P. Luo, and T. Nakamura. Small batch or large batch? gaussian walk with rebound can teach. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1275–1284, 2017.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/80203	-
dc.description.abstract	分散式機器學習對於應用具有許多數據和參數的深度學習模型至關重要。當前對分散式機器學習的研究集中在使用更多硬體設備與強大的計算單元進行快速的訓練。對此，模型訓練傾向於使用更大的批量尺寸來加快訓練速度。然而，由於泛化能力差，大批量訓練往往會出現準確率低的問題。對於大批量，研究人員已經提出了許多複雜的方法來解決準確性的問題。這些方法通常具有複雜的機制，因此使訓練更加困難。此外，用於大批量的強大訓練硬體價格昂貴，並非所有研究人員都能負擔得起。我們提出了雙批量尺寸學習方案來解決批量大小的問題。我們使用硬體的最大批量尺寸來實現我們可以負擔的最大訓練效率。此外，我們在訓練過程中引入了更小的批量尺寸，以提高模型的泛化能力。此方法在同一訓練中同時使用兩個不同的批量尺寸，以減少測試損失並獲得良好的泛化能力，且訓練時間只會略有增加。我們實作我們的雙批量尺寸學習方案並進行實驗。通過增加 5% 的訓練時間，我們可以在某些情況下將損失從 1.429 減少到 1.246。此外，通過適當調整大批量和小批量的百分比，我們可以在某些情況下將準確率提高 2.8%。而在訓練時間額外增加 10% 的情況下，我們可以將損失從 1.429 減少到 1.193。並且在適度調整大批量和小批量的數量後，準確率可以提升 2.9%。在同一訓練中使用兩種不同的批量尺寸會帶來兩個複雜性。首先，兩種不同批量尺寸的數據處理速度不同，所以我們必須按比例分配數據，以最大化整體處理速度。此外，基於整體處理速度的考慮，較小的批量將看到更少的數據，我們按比例調整它們對參數服務器中全局權重更新的貢獻。我們使用小批量和大批量之間的數據比例來調整貢獻。實驗結果表明，此貢獻調整將最終準確率提高 0.9%。	zh_TW
dc.description.provenance	Made available in DSpace on 2022-11-23T09:31:37Z (GMT). No. of bitstreams: 1 U0001-2201202210284000.pdf: 2462756 bytes, checksum: 0f4eba33a23eb00520c630fb805039c5 (MD5) Previous issue date: 2022	en
dc.description.tableofcontents	口試委員審定書 i 致謝 ii 摘要 iii Abstract v Contents vii List of Figures ix List of Tables x Chapter 1 Introduction 1 Chapter 2 Background 3 2.1 Batch Size Effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.2 Stochastic Gradient Descent . . . . . . . . . . . . . . . . . . . . . . 4 2.3 Parameter Server Framework . . . . . . . . . . . . . . . . . . . . . 6 2.4 Synchronization Schemes . . . . . . . . . . . . . . . . . . . . . . . 7 2.4.1 Bulk Synchronous Parallel (BSP) . . . . . . . . . . . . . . . . . . . 7 2.4.2 Asynchronous Parallel (ASP) . . . . . . . . . . . . . . . . . . . . . 7 2.4.3 Stale Synchronous Parallel (SSP) . . . . . . . . . . . . . . . . . . . 8 Chapter 3 Dual Batch Size Learning 9 3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.2 Training Time Prediction . . . . . . . . . . . . . . . . . . . . . . . . 10 3.3 Batch Size and Data Size . . . . . . . . . . . . . . . . . . . . . . . . 12 3.4 ModelUpdate Factor . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Chapter 4 Experimental Result 15 4.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.1.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.1.2 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.1.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.2 Using ModelUpdate Factor during Training . . . . . . . . . . . . . 16 4.3 Numbers of Workers with Small Batch Size . . . . . . . . . . . . . . 18 4.3.1 k=1.05 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4.3.2 k=1.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4.3.3 Training with Eight Workers . . . . . . . . . . . . . . . . . . . . . 26 Chapter 5 Conclusion 28 References 30
dc.language.iso	en
dc.subject	分散式學習	zh_TW
dc.subject	機器學習	zh_TW
dc.subject	深度神經網路	zh_TW
dc.subject	批量尺寸	zh_TW
dc.subject	參數伺服器	zh_TW
dc.subject	deep neural networks	en
dc.subject	machine learning	en
dc.subject	distributed learning	en
dc.subject	batch size	en
dc.subject	parameter server	en
dc.title	基於參數伺服器之雙批量尺寸學習	zh_TW
dc.title	Dual Batch Size Learning on Parameter Server	en
dc.date.schoolyear	110-1
dc.description.degree	碩士
dc.contributor.oralexamcommittee	洪鼎詠(Jian-Ye Ching),吳真貞(Hung-Hui Li),(Yi-Rong Yang)
dc.subject.keyword	機器學習,深度神經網路,批量尺寸,分散式學習,參數伺服器,	zh_TW
dc.subject.keyword	machine learning,deep neural networks,batch size,distributed learning,parameter server,	en
dc.relation.page	32
dc.identifier.doi	10.6342/NTU202200147
dc.rights.note	同意授權(全球公開)
dc.date.accepted	2022-01-27
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊網路與多媒體研究所	zh_TW
顯示於系所單位：	資訊網路與多媒體研究所

文件中的檔案：

檔案	大小	格式
U0001-2201202210284000.pdf	2.41 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。