Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/80276
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor楊佳玲(Chia-Lin Yang)
dc.contributor.authorCheng-Yu Tsaien
dc.contributor.author蔡承佑zh_TW
dc.date.accessioned2022-11-24T03:03:41Z-
dc.date.available2021-11-16
dc.date.available2022-11-24T03:03:41Z-
dc.date.copyright2021-11-16
dc.date.issued2021
dc.date.submitted2021-06-18
dc.identifier.citation[1] Kaiming He, X. Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016. [2] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classification with deep convolutional neural networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems ­ Volume 1, NIPS’12, page 1097–1105, Red Hook, NY, USA, 2012. Curran Associates Inc. [3] J. Chang, Y. Chen, G. Chan, H. Cheng, P. Wang, Y. Lin, H. Fujiwara, R. Lee, H. Liao, P. Wang, G. Yeap, and Q. Li. 15.1 a 5nm 135mb sram in euv and highmobility­channel finfet technology with metal coupling and charge­sharing writeassist circuitry schemes for high­density and low­vmin applications. In 2020 IEEE International Solid­ State Circuits Conference ­ (ISSCC), pages 238–240, 2020. doi: 10.1109/ISSCC19947.2020.9062967. [4] Amd radeon rx 6900 xt. URL https://www.amd.com/en/products/graphics/amd-radeon-rx-6900-xt. [5] Nvidia rtx 3090, . URL https://www.nvidia.com/en-us/geforce/graphics-cards/rtx-3090/. [6] Nvidia rtx 3090 specs, . URL https://www.techpowerup.com/gpu-specs/geforce-rtx-3090.c3622. [7] Yanping Huang, Youlong Cheng, Ankur Bapna, Orhan Firat, Dehao Chen, Mia Xu Chen, HyoukJoong Lee, Jiquan Ngiam, Quoc V. Le, Yonghui Wu, and Zhifeng Chen. Gpipe: Efficient training of giant neural networks using pipeline parallelism. In Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d’Alché­Buc, Emily B. Fox, and Roman Garnett, editors, Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8­14, 2019, Vancouver, BC, Canada, pages 103–112, 2019. URL https://proceedings.neurips.cc/paper/2019/hash/093f65e080a295f8076b1c5722a46aa2-Abstract.html. [8] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L. Chen. Mobilenetv2: Inverted residuals and linear bottlenecks. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4510–4520, 2018. [9] Alexey Bochkovskiy, Chien­Yao Wang, and Hong­Yuan Mark Liao. Yolov4: Optimal speed and accuracy of object detection, 2020. [10] keras.applications. URL https://keras.io/api/applications/. [11] Joseph Redmon. Darknet: Open source neural networks in c. http://pjreddie.com/darknet/, 2013–2016. [12] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. Attention is all you need. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30, pages 5998–6008. Curran Associates, Inc., 2017. URL https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf. [13] Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Łukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, and Jeffrey Dean. Google’s neural machine translation system: Bridging the gap between human and machine translation, 2016. [14] H. Kwon, P. Chatarasi, V. Sarkar, T. Krishna, M. Pellauer, and A. Parashar. Maestro: A data­centric approach to understand reuse, performance, and hardware cost of dnn mappings. IEEE Micro, 40(3):20–29, 2020. [15] M. Khairy, Z. Shen, T. M. Aamodt, and T. G. Rogers. Accel­sim: An extensible simulation framework for validated gpu modeling. In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA), pages 473–486, 2020. [16] H. Zhu, M. Akrout, B. Zheng, A. Pelegris, A. Jayarajan, A. Phanishayee, B. Schroeder, and G. Pekhimenko. Benchmarking and analyzing deep neural network training. In 2018 IEEE International Symposium on Workload Characterization (IISWC), pages 88–100, 2018. doi: 10.1109/IISWC.2018.8573476. [17] Ananda Samajdar, Yuhao Zhu, Paul Whatmough, Matthew Mattina, and Tushar Krishna. Scale­sim: Systolic cnn accelerator simulator. arXiv preprint arXiv:1811.02883, 2018. [18] Baidu deepbench. URL https://github.com/baidu-research/DeepBench. [19] Aajna Karki, Chethan Palangotu Keshava, Spoorthi Mysore Shivakumar, Joshua Skow, Goutam Madhukeshwar Hegde, and Hyeran Jeon. Tango: A deep neural network benchmark suite for various accelerators, 2019. [20] K. Siu, D. M. Stuart, M. Mahmoud, and A. Moshovos. Memory requirements for convolutional neural network hardware accelerators. In 2018 IEEE International Symposium on Workload Characterization (IISWC), pages 111–121, 2018. doi: 10.1109/IISWC.2018.8573527. [21] M. Alwani, H. Chen, M. Ferdman, and P. Milder. Fused­layer cnn accelerators. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 1–12, 2016. doi: 10.1109/MICRO.2016.7783725. [22] Samuel Williams, Andrew Waterman, and David Patterson. Roofline: An insightful visual performance model for multicore architectures. Commun. ACM, 52(4):65–76, April 2009. ISSN 0001­0782. doi: 10.1145/1498765.1498785. URL https://doi.org/10.1145/1498765.1498785. [23] Linnan Wang, Jinmian Ye, Yiyang Zhao, Wei Wu, Ang Li, Shuaiwen Leon Song, Zenglin Xu, and Tim Kraska. Superneurons: Dynamic gpu memory management for training deep neural networks. Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Feb 2018. doi: 10.1145/3178487.3178491. URL http://dx.doi.org/10.1145/3178487.3178491. [24] X. Chen, D. Z. Chen, Y. Han, and X. S. Hu. modnn: Memory optimal deep neural network training on graphics processing units. IEEE Transactions on Parallel and Distributed Systems, 30(3):646–661, 2019. doi: 10.1109/TPDS.2018.2866582. [25] Nvidia rtx 2080 ti specs, . URL https://www.techpowerup.com/gpu-specs/geforce-rtx-2080-ti.c3305. [26] Peter Mattson, Christine Cheng, Cody Coleman, Greg Diamos, Paulius Micikevicius, David Patterson, Hanlin Tang, Gu­Yeon Wei, Peter Bailis, Victor Bittorf, David Brooks, Dehao Chen, Debojyoti Dutta, Udit Gupta, Kim Hazelwood, Andrew Hock, Xinyuan Huang, Atsushi Ike, Bill Jia, Daniel Kang, David Kanter, Naveen Kumar, Jeffery Liao, Guokai Ma, Deepak Narayanan, Tayo Oguntebi, Gennady Pekhimenko, Lillian Pentecost, Vijay Janapa Reddi, Taylor Robie, Tom St. John, Tsuguchika Tabaru, Carole­Jean Wu, Lingjie Xu, Masafumi Yamazaki, Cliff Young, and Matei Zaharia. Mlperf training benchmark, 2020. [27] V. J. Reddi, C. Cheng, D. Kanter, P. Mattson, G. Schmuelling, C. Wu, B. Anderson, M. Breughe, M. Charlebois, W. Chou, R. Chukka, C. Coleman, S. Davis, P. Deng, G. Diamos, J. Duke, D. Fick, J. S. Gardner, I. Hubara, S. Idgunji, T. B. Jablin, J. Jiao, T. S. John, P. Kanwar, D. Lee, J. Liao, A. Lokhmotov, F. Massa, P. Meng, P. Micikevicius, C. Osborne, G. Pekhimenko, A. T. R. Rajan, D. Sequeira, A. Sirasao, F. Sun, H. Tang, M. Thomson, F. Wei, E. Wu, L. Xu, K. Yamada, B. Yu, G. Yuan, A. Zhong, P. Zhang, and Y. Zhou. Mlperf inference benchmark. In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA), pages 446–459, 2020. doi: 10.1109/ISCA45697.2020.00045.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/80276-
dc.description.abstract自從 AlexNet 在 2012 年的 ImageNet challenge 的突破後,深度神經網路 (DNN) 已經在眾多領域展現其價值。而現今許多 DNN 的硬體加速器設計都是採用小的晶片上快取 (on­chip cache) 搭配大的晶片外記憶體 (off­chip memory) 以避免頻繁的資料讀寫耗費太多時間或能量。然而,隨著科技及晶片製程的演進,除了上述的設計外,硬體設計者開始擁有更多的記憶體設計的選項。因此擁有一個用來衡量各種記憶體搭配的優劣利弊的工具變得重要。 然而,現存的工具存在以下的限制: 1) 只能用於推論 (inference),不能用於神經網路的訓練(training) 2) 只用圖像辨識的神經網路作為主要的效能評估指標 3) 只有模擬卷積層 (convolutional layer) 內部的資料流 (dataflow),而忽略其他例如批正規層 (batch normalization layer)、活化層 (activation layer) 等層影響。我們認為神經網路的訓練對於拓展應用領域或是研究更有效率的網路結構皆極其重要,且除了卷積層及全連接層以外的層,在神經網路中訓練也具有不可忽略的影響。 在這篇論文中,我們提出了一個著重於記憶體的神經網路訓練效能分析模型。這個分析模型以神經網路結構、晶片上快取的容量、晶片外快取的頻寬作為輸入參數,假設採用幾近最佳化的軟體管理快取 (software­managed cache) 以避開快取設計中實作細節對效能的折扣,預估這組輸入參數下能夠得到的訓練效能,例如訓練一回合需要的執行時間、平均頻寬、資料搬移量等等。 這篇論文具有以下貢獻: 1) 提出一個可以用於評估整個深度神經網路訓練過程效能的模型,並且有將過程中的所有層皆考慮進去,而非只考量某些計算量較大的層。 2) 對於深度神經網路中各種規模的資料再利用提出徹底的分析。 3) 提出幾項對於現行神經網路的觀察及建議以提供未來深度神經網路的研究及優化可著重的方向。zh_TW
dc.description.provenanceMade available in DSpace on 2022-11-24T03:03:41Z (GMT). No. of bitstreams: 1
U0001-1706202117241400.pdf: 4394598 bytes, checksum: 44b5af2ebad20a906c98b4f24221d71f (MD5)
Previous issue date: 2021
en
dc.description.tableofcontents"口試委員會審定書 i 致謝 ii 摘要 iii Abstract iv 1 Introduction 1 2 Related Work 4 2.1 Analytical Model 6 2.2 Simulator 6 2.3 Benchmarking and Profiling 7 3 Data Reuse in ML Training 9 3.1 Deep Neural Network Training 9 3.2 Main Data Types 9 3.3 Reuse Scopes 10 3.3.1 Intra­layer Reuse 11 3.3.2 Adjacent­layer Reuse 11 3.3.3 Block Scale Reuse 11 3.3.4 Recurrent Weight Reuse 12 3.3.5 Forward­backward Reuse 13 3.4 Operational Intensity and Reuse Frequency 13 4 Methodology 16 4.1 Problem Definition 16 4.2 Framework Overview 17 4.2.1 Model Transformation 19 4.2.2 Layer execution order scheduling 19 4.2.3 access_list, prefetch_list construction 19 4.2.4 Layer­wise execution 21 4.2.5 Performance estimations 21 4.3 Challenges and Limitations of this Work 23 4.3.1 Hardware Platform 24 4.3.2 Complexity of Cache Management Policy 24 4.3.3 Scope of this Work 24 5 Experiment Results 26 5.1 Workloads 26 5.2 Characteristic of the Workloads 27 5.3 Overall Performance 30 5.3.1 Memory Traffic 30 5.3.2 Execution Time 33 5.3.3 Average Bandwidth 36 5.4 Layer Execution Time Breakdown 38 5.5 Effect of Batch Size 39 6 Conclusion 42 Bibliography 44"
dc.language.isoen
dc.subject分析模型zh_TW
dc.subject快取容量zh_TW
dc.subject頻寬zh_TW
dc.subject神經網路訓練zh_TW
dc.subject資料再利用zh_TW
dc.subject深度神經網路zh_TW
dc.subjectcache capacityen
dc.subjectdata reuseen
dc.subjectanalytical modelen
dc.subjectDeep Neural Networken
dc.subjecttrainingen
dc.subjectbandwidthen
dc.title著重於記憶體子系統的深度神經網路訓練效能分析模型zh_TW
dc.titleA Performance Analytical Model for DNN Training with Focus on Memory Subsystemen
dc.date.schoolyear109-2
dc.description.degree碩士
dc.contributor.oralexamcommittee陳依蓉(Hsin-Tsai Liu),鄭湘筠(Chih-Yang Tseng)
dc.subject.keyword深度神經網路,神經網路訓練,頻寬,快取容量,分析模型,資料再利用,zh_TW
dc.subject.keywordDeep Neural Network,training,bandwidth,cache capacity,analytical model,data reuse,en
dc.relation.page48
dc.identifier.doi10.6342/NTU202101034
dc.rights.note同意授權(限校園內公開)
dc.date.accepted2021-06-18
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept資訊工程學研究所zh_TW
顯示於系所單位:資訊工程學系

文件中的檔案:
檔案 大小格式 
U0001-1706202117241400.pdf
授權僅限NTU校內IP使用(校園外請利用VPN校外連線服務)
4.29 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved