Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊網路與多媒體研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/80203
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor劉邦鋒(Pangfeng Liu)
dc.contributor.authorKuan-Wei Luen
dc.contributor.author盧冠維zh_TW
dc.date.accessioned2022-11-23T09:31:37Z-
dc.date.available2022-02-21
dc.date.available2022-11-23T09:31:37Z-
dc.date.copyright2022-02-21
dc.date.issued2022
dc.date.submitted2022-01-27
dc.identifier.citation[1] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, et al. Tensorflow: A system for large­scale machine learning. In 12th {USENIX} symposium on operating systems design and implementation ({OSDI} 16), pages 265–283, 2016. [2] J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, M. Mao, M. Ranzato, A. Senior, P. Tucker, K. Yang, et al. Large scale distributed deep networks. Advances in neural information processing systems, 25:1223–1231, 2012. [3] P. Goyal, P. Dollár, R. Girshick, P. Noordhuis, L. Wesolowski, A. Kyrola, A. Tulloch, Y. Jia, and K. He. Accurate, large minibatch sgd: Training imagenet in 1 hour. arXiv preprint arXiv:1706.02677, 2017. [4] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. [5] Q. Ho, J. Cipar, H. Cui, S. Lee, J. K. Kim, P. B. Gibbons, G. A. Gibson, G. Ganger, and E. P. Xing. More effective distributed ml via a stale synchronous parallel parameter server. In Advances in neural information processing systems, pages 1223–1231, 2013. [6] Z. Hu, J. Xiao, Z. Tian, X. Zhang, H. Zhu, C. Yao, N. Sun, and G. Tan. A variable batch size strategy for large scale distributed dnn training. In 2019 IEEE Intl Conf on Parallel Distributed Processing with Applications, Big Data Cloud Computing, Sustainable Computing Communications, Social Computing Networking (ISPA/BDCloud/SocialCom/SustainCom), pages 476–485. IEEE, 2019. [7] N. S. Keskar, D. Mudigere, J. Nocedal, M. Smelyanskiy, and P. T. P. Tang. On large­batch training for deep learning: Generalization gap and sharp minima. In International Conference on Learning Representations, 2017. [8] A. Krizhevsky. One weird trick for parallelizing convolutional neural networks. arXiv preprint arXiv:1404.5997, 2014. [9] A. Krizhevsky, G. Hinton, et al. Learning multiple layers of features from tiny images. Citeseer, 2009. [10] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. Communications of the ACM, 60(6):84–90, 2017. [11] M. Li, D. G. Andersen, J. W. Park, A. J. Smola, A. Ahmed, V. Josifovski, J. Long, E. J. Shekita, and B.­Y. Su. Scaling distributed machine learning with the parameter server. In 11th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 14), pages 583–598, 2014. [12] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.­Y. Fu, and A. C. Berg. Ssd: Single shot multibox detector. In European conference on computer vision, pages 21–37. Springer, 2016. [13] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al. Pytorch: An imperative style, high­performance deep learning library. Advances in neural information processing systems, 32:8026–8037, 2019. [14] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi. You only look once: Unified, real­time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 779–788, 2016. [15] S. Ren, K. He, R. Girshick, and J. Sun. Faster r­cnn: towards real­time object detection with region proposal networks. IEEE transactions on pattern analysis and machine intelligence, 39(6):1137–1149, 2016. [16] K. Simonyan and A. Zisserman. Very deep convolutional networks for large­scale image recognition. arXiv preprint arXiv:1409.1556, 2014. [17] S. L. Smith, P.­J. Kindermans, C. Ying, and Q. V. Le. Don’t decay the learning rate, increase the batch size. In International Conference on Learning Representations, 2018. [18] T. Takase, S. Oyama, and M. Kurihara. Why does large batch training result in poor generalization? a comprehensive explanation and a better strategy from the viewpoint of stochastic optimization. Neural computation, 30(7):2005–2023, 2018. [19] P. Yin, P. Luo, and T. Nakamura. Small batch or large batch? gaussian walk with rebound can teach. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1275–1284, 2017.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/80203-
dc.description.abstract分散式機器學習對於應用具有許多數據和參數的深度學習模型至關重要。當前對分散式機器學習的研究集中在使用更多硬體設備與強大的計算單元進行快速的訓練。對此,模型訓練傾向於使用更大的批量尺寸來加快訓練速度。然而,由於泛化能力差,大批量訓練往往會出現準確率低的問題。對於大批量,研究人員已經提出了許多複雜的方法來解決準確性的問題。這些方法通常具有複雜的機制,因此使訓練更加困難。此外,用於大批量的強大訓練硬體價格昂貴,並非所有研究人員都能負擔得起。 我們提出了雙批量尺寸學習方案來解決批量大小的問題。我們使用硬體的最大批量尺寸來實現我們可以負擔的最大訓練效率。此外,我們在訓練過程中引入了更小的批量尺寸,以提高模型的泛化能力。此方法在同一訓練中同時使用兩個不同的批量尺寸,以減少測試損失並獲得良好的泛化能力,且訓練時間只會略有增加。 我們實作我們的雙批量尺寸學習方案並進行實驗。通過增加 5% 的訓練時間,我們可以在某些情況下將損失從 1.429 減少到 1.246。此外,通過適當調整大批量和小批量的百分比,我們可以在某些情況下將準確率提高 2.8%。而在訓練時間額外增加 10% 的情況下,我們可以將損失從 1.429 減少到 1.193。並且在適度調整大批量和小批量的數量後,準確率可以提升 2.9%。 在同一訓練中使用兩種不同的批量尺寸會帶來兩個複雜性。首先,兩種不同批量尺寸的數據處理速度不同,所以我們必須按比例分配數據,以最大化整體處理速度。此外,基於整體處理速度的考慮,較小的批量將看到更少的數據,我們按比例調整它們對參數服務器中全局權重更新的貢獻。我們使用小批量和大批量之間的數據比例來調整貢獻。實驗結果表明,此貢獻調整將最終準確率提高 0.9%。zh_TW
dc.description.provenanceMade available in DSpace on 2022-11-23T09:31:37Z (GMT). No. of bitstreams: 1
U0001-2201202210284000.pdf: 2462756 bytes, checksum: 0f4eba33a23eb00520c630fb805039c5 (MD5)
Previous issue date: 2022
en
dc.description.tableofcontents口試委員審定書 i 致謝 ii 摘要 iii Abstract v Contents vii List of Figures ix List of Tables x Chapter 1 Introduction 1 Chapter 2 Background 3 2.1 Batch Size Effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.2 Stochastic Gradient Descent . . . . . . . . . . . . . . . . . . . . . . 4 2.3 Parameter Server Framework . . . . . . . . . . . . . . . . . . . . . 6 2.4 Synchronization Schemes . . . . . . . . . . . . . . . . . . . . . . . 7 2.4.1 Bulk Synchronous Parallel (BSP) . . . . . . . . . . . . . . . . . . . 7 2.4.2 Asynchronous Parallel (ASP) . . . . . . . . . . . . . . . . . . . . . 7 2.4.3 Stale Synchronous Parallel (SSP) . . . . . . . . . . . . . . . . . . . 8 Chapter 3 Dual Batch Size Learning 9 3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.2 Training Time Prediction . . . . . . . . . . . . . . . . . . . . . . . . 10 3.3 Batch Size and Data Size . . . . . . . . . . . . . . . . . . . . . . . . 12 3.4 Model­Update Factor . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Chapter 4 Experimental Result 15 4.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.1.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.1.2 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.1.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.2 Using Model­Update Factor during Training . . . . . . . . . . . . . 16 4.3 Numbers of Workers with Small Batch Size . . . . . . . . . . . . . . 18 4.3.1 k=1.05 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4.3.2 k=1.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4.3.3 Training with Eight Workers . . . . . . . . . . . . . . . . . . . . . 26 Chapter 5 Conclusion 28 References 30
dc.language.isoen
dc.subject分散式學習zh_TW
dc.subject機器學習zh_TW
dc.subject深度神經網路zh_TW
dc.subject批量尺寸zh_TW
dc.subject參數伺服器zh_TW
dc.subjectdeep neural networksen
dc.subjectmachine learningen
dc.subjectdistributed learningen
dc.subjectbatch sizeen
dc.subjectparameter serveren
dc.title基於參數伺服器之雙批量尺寸學習zh_TW
dc.titleDual Batch Size Learning on Parameter Serveren
dc.date.schoolyear110-1
dc.description.degree碩士
dc.contributor.oralexamcommittee洪鼎詠(Jian-Ye Ching),吳真貞(Hung-Hui Li),(Yi-Rong Yang)
dc.subject.keyword機器學習,深度神經網路,批量尺寸,分散式學習,參數伺服器,zh_TW
dc.subject.keywordmachine learning,deep neural networks,batch size,distributed learning,parameter server,en
dc.relation.page32
dc.identifier.doi10.6342/NTU202200147
dc.rights.note同意授權(全球公開)
dc.date.accepted2022-01-27
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept資訊網路與多媒體研究所zh_TW
顯示於系所單位:資訊網路與多媒體研究所

文件中的檔案:
檔案 大小格式 
U0001-2201202210284000.pdf2.41 MBAdobe PDF檢視/開啟
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved