請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/22111
完整後設資料紀錄
DC 欄位 | 值 | 語言 |
---|---|---|
dc.contributor.advisor | 劉邦鋒(Pangfeng Liu) | |
dc.contributor.author | Cing-Fu Jhu | en |
dc.contributor.author | 朱清福 | zh_TW |
dc.date.accessioned | 2021-06-08T04:03:06Z | - |
dc.date.copyright | 2018-08-03 | |
dc.date.issued | 2018 | |
dc.date.submitted | 2018-08-03 | |
dc.identifier.citation | [1]Backpropagation.https://en.wikipedia.org/wiki/Backpropagation.
[2]cublas.https://developer.nvidia.com/cublas. [3]Cuda-unified-memory.https://devblogs.nvidia.com/unified-memory-in-cuda-6/. [4]cudnn.https://developer.nvidia.com/cudnn. [5]M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado,A. Davis, J. Dean, M. Devin, S. Ghemawat, I. J. Goodfellow, A. Harp, G. Irving,M. Isard, Y. Jia, R. Józefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané,R. Monga, S. Moore, D. G. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner,I. Sutskever, K. Talwar, P. A. Tucker, V. Vanhoucke, V. Vasudevan, F. B. Viégas,O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng. Tensor-flow: Large-scale machine learning on heterogeneous distributed systems.CoRR,abs/1603.04467, 2016. [6]T. Chen, M. Li, Y. Li, M. Lin, N. Wang, M. Wang, T. Xiao, B. Xu, C. Zhang, andZ.Zhang.Mxnet: Aflexibleandefficientmachinelearninglibraryforheterogeneousdistributed systems.CoRR, abs/1512.01274, 2015. [7]T. Chen, B. Xu, C. Zhang, and C. Guestrin. Training deep nets with sublinear mem-ory cost.CoRR, abs/1604.06174, 2016. [8]T. Chilimbi, Y. Suzue, J. Apacible, and K. Kalyanaraman. Project adam: Buildingan efficient and scalable deep learning training system. InProceedings of the 11thUSENIX Conference on Operating Systems Design and Implementation, OSDI’14,pages 571–582, Berkeley, CA, USA, 2014. USENIX Association. [9]R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. P. Kuksa.Natural language processing (almost) from scratch.CoRR, abs/1103.0398, 2011. [10]H.Cui,H.Zhang,G.R.Ganger,P.B.Gibbons,andE.P.Xing. Geeps: Scalabledeeplearningondistributedgpuswithagpu-specializedparameterserver. InProceedingsoftheEleventhEuropeanConferenceonComputerSystems,EuroSys’16,pages4:1–4:16, New York, NY, USA, 2016. ACM. [11]J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, M. Mao, M. aurelio Ranzato,A. Senior, P. Tucker, K. Yang, Q. V. Le, and A. Y. Ng. Large scale distributed deepnetworks. In F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, editors,Advances in Neural Information Processing Systems 25, pages 1223–1231. CurranAssociates, Inc., 2012. [12]J. Deng, W. Dong, R. Socher, L. J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In2009 IEEE Conference on Computer Visionand Pattern Recognition, pages 248–255, June 2009. [13]C. dos Santos and M. Gatti. Deep convolutional neural networks for sentiment anal-ysis of short texts. InProceedings of COLING 2014, the 25th International Con-ference on Computational Linguistics: Technical Papers, pages 69–78. Dublin CityUniversity and Association for Computational Linguistics, 2014. [14]I. Gelado, J. E. Stone, J. Cabezas, S. Patel, N. Navarro, and W.-m. W. Hwu. Anasymmetric distributed shared memory model for heterogeneous parallel systems.SIGPLAN Not., 45(3):347–358, Mar. 2010. [15]A.GravesandJ.Schmidhuber. Framewisephonemeclassificationwithbidirectionallstmnetworks. InProceedings.2005IEEEInternationalJointConferenceonNeuralNetworks, 2005., volume 4, pages 2047–2052 vol. 4, July 2005. [16]S. Han, J. Pool, J. Tran, and W. J. Dally. Learning both weights and connections forefficient neural networks.CoRR, abs/1506.02626, 2015. [17]K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition.CoRR, abs/1512.03385, 2015. [18]Y.Jia,E.Shelhamer,J.Donahue,S.Karayev,J.Long,R.B.Girshick,S.Guadarrama,and T. Darrell. Caffe: Convolutional architecture for fast feature embedding.CoRR,abs/1408.5093, 2014. [19]P. Judd, J. Albericio, T. H. Hetherington, T. M. Aamodt, N. D. E. Jerger, R. Urtasun,andA.Moshovos. Reduced-precisionstrategiesforboundedmemoryindeepneuralnets.CoRR, abs/1511.05236, 2015. [20]D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Neurocomputing: Foundationsof research. chapter Learning Representations by Back-propagating Errors, pages696–699. MIT Press, Cambridge, MA, USA, 1988. [21]X. Sierra-Canto, F. Madera-Ramirez, and V. Uc-Cetina. Parallel training of a back-propagation neural network using cuda. InProceedings of the 2010 Ninth Inter-national Conference on Machine Learning and Applications, ICMLA ’10, pages307–312, Washington, DC, USA, 2010. IEEE Computer Society. [22]K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scaleimage recognition.CoRR, abs/1409.1556, 2014. [23]C. Szegedy, S. Ioffe, and V. Vanhoucke. Inception-v4, inception-resnet and the im-pact of residual connections on learning.CoRR, abs/1602.07261, 2016. [24]C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. E. Reed, D. Anguelov, D. Erhan, V. Van-houcke,andA.Rabinovich. Goingdeeperwithconvolutions.CoRR,abs/1409.4842,2014. [25]Y. Taigman, M. Yang, M. Ranzato, and L. Wolf. Deepface: Closing the gap tohuman-level performance in face verification. In2014 IEEE Conference on Com-puter Vision and Pattern Recognition, pages 1701–1708, June 2014. | |
dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/22111 | - |
dc.description.abstract | 許多大型的深層人工神經網絡模型在近幾年內已經被提出以實現更精確的訓練效果。這些巨大網路模型的訓練依賴大量的記憶體空間與通訊量,而這些都是在提高深度學習的效能方面具有挑戰性的問題。在本文中,我們分析了訓練深層神經網絡的資料存取模式,並提出一個減少圖形處理器上資料使用量與圖形處理器和主機之間的資料搬動量的資料固定演算法。我們證明了尋找一個最佳資料搬移的排程是NP完全的,並提出了一個可以找到最佳解的偽多項式時間的動態規劃演算法。亦即,我們觀察了深層神經網絡訓練的存取模式,並提出圖形處理器上專門的資料固定演算法,以最小化不必要的資料搬移。我們接著實施動態規劃,以訓練真正的深度學習模型。實驗證實,與 GeePS (一個先進的深度學習框架)相比,我們可以固定多20%的資料到圖形處理器的記憶體之中。此外,我們還提出了一個用於深度學習反向傳播計算的記憶需求減少技術。我們分析了深度學習反向傳播計算的存取模式,並認識到梯度計算與權重更新這兩個依序完成的主要步驟可以部分重疊。此外,我們分析了計算的語義,並意識到通過延遲權重更新,我們可以避免傳統的平行實作中由於讀寫衝突所需要的雙緩衝。我們接著實現我們的技術,並將參數梯度所需的記憶體使用率降低了75%。 | zh_TW |
dc.description.abstract | Many large deep neural network models have been proposed in recent years to achieve more accurate training results. The training of these large models requires a huge amount of memory and communication, which becomes a challenging issue in improving the performance of deep learning. In this paper, we analyze the data access pattern of training a deep neural network and propose a data pinning algorithm that reduces the data usage on GPU and the movement between a GPU and its CPU host. We show that to find an optimal data movement scheduling is NP-complete,and propose a dynamic programming that can find the optimal solution in pseudo polynomial time. That is, we observe the access pattern of the training of the deep neural network and propose specialized GPU data pinning algorithm that minimizes the unnecessary data movements. We then implement our dynamic programming on to train real deep learning models. The experiments show that we can pin up to 20% more data into GPU memory than GeePS, a state of art deep learning framework. We also propose memory reduction technique for back-propagation in deep learning. We analyzed the access pattern of back propagation in deep learning and realized that gradient computation and weight update, two transitionally sequentially done major steps, can be partially overlapped. In addition, we analyzed the semantics of the computation and realized that by delaying weight update we can avoid double buffering due to read/write conflicts in traditional naive parallel implementation. We then implement our techniques and observe up to 75% reduction in memory usage. | en |
dc.description.provenance | Made available in DSpace on 2021-06-08T04:03:06Z (GMT). No. of bitstreams: 1 ntu-107-R05922163-1.pdf: 1345212 bytes, checksum: 8276ce2da897793b81c83795ec9b1fef (MD5) Previous issue date: 2018 | en |
dc.description.tableofcontents | 論文口試委員審定書i
誌謝ii 摘要iii Abstractiv 1 Introduction1 2 Deep Learning4 2.1 Deep Learning Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . .4 2.2 Forward Pass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4 2.3 Back-propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5 2.4 Weight Update . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6 2.5 Deep Learning using GPU . . . . . . . . . . . . . . . . . . . . . . . . .6 3 Related Works7 3.1 Model Parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7 3.2 Memory Management . . . . . . . . . . . . . . . . . . . . . . . . . . . .7 3.3 Memory Cost Reduction . . . . . . . . . . . . . . . . . . . . . . . . . .8 3.4 Optimizations for Training Deep Neural Networks . . . . . . . . . . . .9 4 GPU Memory Pinning10 4.1 Problem Description . . . . . . . . . . . . . . . . . . . . . . . . . . . .10 4.2 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12 4.3 NP-Completeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13 4.4 Dynamic Programming . . . . . . . . . . . . . . . . . . . . . . . . . . .14 5 Backward Propagation Optimization16 6 Experiments19 6.1 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .19 6.2 Dataset and Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . .19 6.3 Experiment Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . .21 6.3.1 Batch Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . .21 6.3.2 GPU Memory Pinning . . . . . . . . . . . . . . . . . . . . . . .22 6.3.3 Concurrent Update . . . . . . . . . . . . . . . . . . . . . . . . .22 7 Conclusion25 Bibliography26 | |
dc.language.iso | en | |
dc.title | 針對圖形處理器上的深度學習反向傳播計算之排程優化 | zh_TW |
dc.title | Scheduling Optimization of Backpropagation for Deep Learning on GPU | en |
dc.type | Thesis | |
dc.date.schoolyear | 106-2 | |
dc.description.degree | 碩士 | |
dc.contributor.oralexamcommittee | 吳真貞,徐慰中 | |
dc.subject.keyword | 深度學習,反向傳播,動態規劃,記憶體優化, | zh_TW |
dc.subject.keyword | Deep Learning,Back-propagation,Dynamic Programming,Memory Optimization, | en |
dc.relation.page | 29 | |
dc.identifier.doi | 10.6342/NTU201802320 | |
dc.rights.note | 未授權 | |
dc.date.accepted | 2018-08-03 | |
dc.contributor.author-college | 電機資訊學院 | zh_TW |
dc.contributor.author-dept | 資訊工程學研究所 | zh_TW |
顯示於系所單位: | 資訊工程學系 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-107-1.pdf 目前未授權公開取用 | 1.31 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。