Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/74357
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor楊佳玲(Chia-Lin Yang)
dc.contributor.authorTzu-Hsien Yangen
dc.contributor.author楊子賢zh_TW
dc.date.accessioned2021-06-17T08:31:31Z-
dc.date.available2020-08-20
dc.date.copyright2019-08-20
dc.date.issued2019
dc.date.submitted2019-08-12
dc.identifier.citation[1] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Proceedings of Neural Information Processing Systems (NIPS), 2012.
[2] P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. Lecun, “Overfeat: Integrated recognition, localization and detection using convolutional networks,” in Proceedings of International Conference on Learning Representations (ICLR), 2014.
[3] G. Chen, C. Parada, and G. Heigold, “Small-footprint keyword spotting using deep neural networks,” in Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2014.
[4] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in Proceedings of International Conference on Learning Repre- sentations (ICLR), 2015.
[5] Y. L. Cun, L. D. Jackel, B. Boser, J. S. Denker, H. P. Graf, I. Guyon, D. Henderson, R. E. Howard, and W. Hubbard, “Handwritten digit recognition: Applications of neural network chips and automatic learning,” IEEE Communications Magazine, vol. 27, no. 11, 1989.
[6] M. Alwani, H. Chen, M. Ferdman, and P. Milder, “Fused-layer cnn accelerators,” in Proceedings of International Symposium on Microarchitecture (MICRO), 2016.
[7] K. Hsu, F. Lee, Y. Lin, E. Lai, J. Wu, D. Lee, M. Lee, H. Lung, K. Hsieh, and C. Lu, “A study of array resistance distribution and a novel operation algorithm for WOx ReRAM memory,” in Proceedings of International Conference on Solid State Devices and Materials (SSDM), 2015.
[8] P. Chi, S. Li, C. Xu, T. Zhang, J. Zhao, Y. Liu, Y. Wang, and Y. Xie, “Prime: A novel processing-in-memory architecture for neural network computation in reram-based main memory,” in Proceedings of International Symposium on Computer Architec- ture (ISCA), 2016.
[9] A. Shafiee, A. Nag, N. Muralimanohar, R. Balasubramonian, J. P. Strachan, M. Hu, R. S. Williams, and V. Srikumar, “Isaac: A convolutional neural network accel- erator with in-situ analog arithmetic in crossbars,” in Proceedings of International Symposium on Computer Architecture (ISCA), 2016.
[10] M. Hu, J. P. Strachan, Z. Li, E. M. Grafals, N. Davila, C. Graves, S. Lam, N. Ge, J. J. Yang, and R. S. Williams, “Dot-product engine for neuromorphic com- puting: Programming 1t1m crossbar to accelerate matrix-vector multiplication,” in Proceedings of Design Automation Conference (DAC), 2016.
[11] S. Zhang, Z. Du, L. Zhang, H. Lan, S. Liu, L. Li, Q. Guo, T. Chen, and Y. Chen, “Cambricon-x: An accelerator for sparse neural networks,” in Proceedings of Inter- national Symposium on Microarchitecture (MICRO), 2016.
[12] Y. Chen, T. Krishna, J. S. Emer, and V. Sze, “Eyeriss: An energy-efficient reconfig- urable accelerator for deep convolutional neural networks,” IEEE Journal of Solid- State Circuits, vol. 52, no. 1, pp. 127–138, 2017.
[13] J. Albericio, P. Judd, T. Hetherington, T. Aamodt, N. E. Jerger, and A. Moshovos, “Cnvlutin: Ineffectual-neuron-free deep neural network computing,” in Proceedings of International Symposium on Computer Architecture (ISCA), 2016.
[14] A. Parashar, M. Rhu, A. Mukkara, A. Puglielli, R. Venkatesan, B. Khailany, J. Emer, S. W. Keckler, and W. J. Dally, “Scnn: An accelerator for compressed-sparse convolutional neural networks,” in Proceedings of International Symposium on Com- puter Architecture (ISCA), 2017.
[15] S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M. A. Horowitz, and W. J. Dally, “Eie: Efficient inference engine on compressed deep neural network,” in Proceedings of International Symposium on Computer Architecture (ISCA), 2016.
[16] S. Han, J. Pool, J. Tran, and W. J. Dally, “Learning both weights and connections for efficient neural networks,” arXiv, 2015.
[17] X. Glorot, A. Bordes, and Y. Bengio, “Deep sparse rectifier neural networks,” in Pro- ceedings of International Conference on Artificial Intelligence and Statistics (AIS- tats), 2011.
[18] A. L. Maas, A. Y. Hannun, and A. Y. Ng, “Rectifier nonlinearities improve neural network acoustic models,” in Proceedings of International Conference on Machine Learning (ICML), 2013.
[19] W. Wen, C. Wu, Y. Wang, Y. Chen, and H. Li, “Learning structured sparsity in deep neural networks,” in Proceedings of Neural Information Processing Systems (NIPS), 2016.
[20] P. Wang, Y. Ji, C. Hong, Y. Lyu, D. Wang, and Y. Xie, “Snrram: An efficient sparse neural network computation architecture based on resistive random-access memory,” in Proceedings of Design Automation Conference (DAC), 2018.
[21] H. Ji, L. Song, L. Jiang, H. H. Li, and Y. Chen, “ReCom: An efficient resistive accel- erator for compressed deep neural networks,” in Proceedings of Design, Automation Test in Europe (DATE), 2018.
[22] M. Lin, H. Cheng, W. Lin, T. Yang, I. Tseng, C. Yang, H. Hu, H. Chang, H. Li, and M. Chang, “Dl-rsim: A simulation framework to enable reliable reram-based accelerators for deep learning,” in Proceedings of the International Conference on Computer-Aided Design (ICCAD), 2018.
[23] W. Chen, K. Li, W. Lin, K. Hsu, P. Li, C. Yang, C. Xue, E. Yang, Y. Chen, Y. Chang, T. Hsu, Y. King, C. Lin, R. Liu, C. Hsieh, K. Tang, and M. Chang, “A 65nm 1Mb nonvolatile computing-in-memory ReRAM macro with sub-16ns multiply-and- accumulate for binary DNN AI edge processors,” in Proceedings of International Solid-State Circuits Conference (ISSCC), 2018.
[24] S. Chang, M. Sandler, and A. Zhmoginov, “The power of sparsity in convolutional neural networks,” arXiv, 2017.
[25] S. Han, H. Mao, and W. J. Dally, “Deep compression: Compressing deep neural network with pruning, trained quantization and Huffman coding,” arXiv, 2015.
[26] M. Denil, B. Shakibi, L. Dinh, M. Ranzato, and N. de Freitas, “Predicting param- eters in deep learning,” in Proceedings of Neural Information Processing Systems (NIPS), 2013.
[27] S. R. Li, J. Park, and P. T. P. Tang, “Enabling sparse Winograd convolution by native pruning,” arXiv, 2017.
[28] F. Su, W. Chen, L. Xia, C. Lo, T. Tang, Z. Wang, K. Hsu, M. Cheng, J. Li, Y. Xie, Y. Wang, M. Chang, H. Yang, and Y. Liu, “A 462GOPs/J RRAM-based nonvolatile intelligent processor for energy harvesting IoE system featuring nonvolatile logics and processing-in-memory,” in Proceedings of International Symposium on VLSI Technology, Systems and Applications(VLSI-TSA), 2017.
[29] C. Xue, W. Chen, J. Liu, J. Li, W. Lin, W. Lin, J. Wang, W. Wei, T. Chang, T. Chang, T. Huang, H. Kao, S. Wei, Y. Chiu, C. Lee, C. Lo, Y. King, C. Lin, R. Liu, C. Hsieh, K. Tang, and M. Chang, “A 1Mb multibit ReRAM computing-in-memory macro with 14.6ns parallel MAC computing time for CNN based AI edge processors,” in Proceedings of International Solid-State Circuits Conference (ISSCC), 2019.
[30] P. Chen, X. Peng, and S. Yu, “NeuroSim: A circuit-level macro model for bench- marking neuro-inspired architectures in online learning,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 37, no. 12, pp. 3067– 3080, 2018.
[31] B. Feinberg, S. Wang, and E. Ipek, “Making memristive neural network accelerators reliable,” in Proceedings of International Symposium on High Performance Computer Architecture (HPCA), 2018.
[32] C. Xu, D. Niu, N. Muralimanohar, R. Balasubramonian, T. Zhang, S. Yu, and Y. Xie, “Overcoming the challenges of crossbar resistive memory architectures,” in Pro- ceedings of International Symposium on High Performance Computer Architecture (HPCA), 2015.
[33] D. Garbin, E. Vianello, O. Bichler, Q. Rafhay, C. Gamrat, G. Ghibaudo, B. DeSalvo, and L. Perniola, “Hfo2-based oxram devices as synapses for convolutional neural networks,” IEEE Transactions on Electron Devices, vol. 62, no. 8, pp. 2494–2501, 2015.
[34] M. Saberi, R. Lotfi, K. Mafinezhad, and W. A. Serdijn, “Analysis of power con- sumption and linearity in capacitive digital-to-analog converters used in successive approximation ADCs,” IEEE Transactions on Circuits and Systems I: Regular Pa- pers, vol. 58, no. 8, pp. 1736–1748, 2011.
[35] P. Gu, B. Li, T. Tang, S. Yu, Y. Cao, Y. Wang, and H. Yang, “Technological exploration of rram crossbar array for matrix-vector multiplication,” in Proceedings of Asia and South Pacific Design Automation Conference (ASPDAC), 2015.
[36] W. D. Hillis and G. L. Steele, “Data parallel algorithms,” Commun. ACM , vol. 29, no. 12, pp. 1170–1183, 1986.
[37] N. Muralimanohar, R. Balasubramonia, and N. P. Jouppi, “CACTI 6.0: A tool to model large caches,” HP Lab, Tech. Rep., 2009.
[38] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” in Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
[39] A. Krizhevsky, “Learning multiple layers of features from tiny images,” in Tech Report, University of Toronto, 2009.
[40] J. Deng, W. Dong, R. Socher, L. Li, K. Li, and F. Li, “ImageNet: A large-scale hierar- chical image database,” in Proceedings of Computer Vision and Pattern Recognition (CVPR), 2009.
[41] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Proceedings of Computer Vision and Pattern Recognition (CVPR), 2015.
[42] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” arXiv, 2015.
[43] J. Park, S. R. Li, W. Wen, H. Li, Y. Chen, and P. Dubey, “Holistic sparsecnn: Forging the trident of accuracy, speed, and size,” arXiv, 2016.
[44] P. Molchanov, S. Tyree, T. Karras, T. Aila, and J. Kautz, “Pruning convolutional neural networks for resource efficient transfer learning,” arXiv, 2016.
[45] J. Yu, A. Lukefahr, D. Palframan, G. Dasika, R. Das, and S. Mahlke, “Scalpel: Customizing dnn pruning to the underlying hardware parallelism,” in Proceedings of International Symposium on Computer Architecture (ISCA), 2017.
[46] L. Liang, L. Deng, Y. Zeng, X. Hu, Y. Ji, X. Ma, G. Li, and Y. Xie, “Crossbar-aware neural network pruning,” arXiv, 2018.
[47] P. Hill, A. Jain, M. Hill, B. Zamirai, C. Hsu, M. A. Laurenzano, S. Mahlke, L. Tang, and J. Mars, “Deftnn: Addressing bottlenecks for dnn execution on gpus via synapse vector elimination and near-compute data fission,” in Proceedings of International Symposium on Microarchitecture (MICRO), 2017.
[48] J. Lin, Z. Zhu, Y. Wang, and Y. Xie, “Learning the sparsity for reram: Mapping and pruning sparse neural network for reram based accelerator,” in Proceedings of Asia and South Pacific Design Automation Conference (ASPDAC), 2019.
[49] X. Chen, J. Zhu, J. Jiang, and C. Tsui, “CompRRAE: RRAM-based convolutional neural network accelerator with reduced computations through a runtime activation estimation,” in Proceedings of Asia and South Pacific Design Automation Conference (ASPDAC), 2019.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/74357-
dc.description.abstract利用神經網路模型的稀疏性以減少無效的計算為普遍使用的方法以達到高能效的深度神經網路推論加速器。然而由於緊密耦合的縱橫式結構,在基於可變電阻式記憶體之神經網路加速器下探索稀疏性尚為較少關注的部分。現有的可變電阻式記憶體之神經網路加速器架構研究假設整個縱橫式陣列可以在單一周期內啟動。
然而考慮推論之精確度,矩陣-向量計算在實踐中必須以更小的粒度執行,稱之為操作單位(Operation Unit)。基於OU的架構創造了新的機會來探索深度神經網路的稀疏性。在本論文中,我們提出了第一個實際的稀疏可變電阻式記憶體引擎(Sparse ReRAM Engine)同時利用權重與激活的稀疏性。我們的評估顯示提出的方法可以有效的消除無效的計算,並且提供可觀的效能改善與能源節省。
zh_TW
dc.description.abstractExploiting model sparsity to reduce ineffectual computation is a commonly used approach to achieve energy efficiency for DNN inference accelerators.
However, due to the tightly coupled crossbar structure, exploiting sparsity for ReRAM-based NN accelerator is a less explored area. Existing architectural studies on ReRAM-based NN accelerators assume that an entire crossbar array can be activated in a single cycle.
However, due to inference accuracy considerations, matrix-vector computation must be conducted in a smaller granularity in practice, called Operation Unit (OU).
An OU-based architecture creates a new opportunity to exploit DNN sparsity. In this paper, we propose the first practical Sparse ReRAM Engine that exploits both weight and activation sparsity. Our evaluation shows that the proposed method is effective in eliminating ineffectual computation, and delivers significant performance improvement and energy savings.
en
dc.description.provenanceMade available in DSpace on 2021-06-17T08:31:31Z (GMT). No. of bitstreams: 1
ntu-108-R06922094-1.pdf: 1679337 bytes, checksum: 6baccfe53cd619a5eefcbb67bc3b7410 (MD5)
Previous issue date: 2019
en
dc.description.tableofcontents1 Introduction 1
2 Background 5
2.1 ReRAM-based DNN Accelerator Architecture 5
2.2 Challenges in Exploiting DNN Sparsity in ReRAM-based DNN Ac- celerator 7
3 A Practical ReRAM Accelerator Architecture 10
4 New Opportunity for Exploiting Sparsity in OU-based ReRAM Accelerator 15
4.1 Weight Compression 17
4.2 Activation Compression 18
5 SPARSE ReRAM ENGINE 21
5.1 Index Decoder 22
5.2 Dynamic OU Formation 24
5.3 SRE Pipeline 25
6 Evaluation Methodology 27
7 Experimental Results 31
7.1 Performance and Energy 31
7.2 Indexing Overhead Analysis 33
7.3 Sensitivity Studies 34
7.4 Non-SSL Sparse Neural Networks 37
7.5 Comparison with Over-Idealized Design 38
8 Related Work 40
9 Conclusion 43
Reference 44
dc.language.isozh-TW
dc.subject神經網路zh_TW
dc.subject稀疏性zh_TW
dc.subject可變電阻式記憶體zh_TW
dc.subject加速器架構zh_TW
dc.subjectaccelerator architectureen
dc.subjectReRAMen
dc.subjectsparsityen
dc.subjectNeural networken
dc.titleSparse ReRAM Engine: 聯合探索壓縮神經網路之權重與激活稀疏性zh_TW
dc.titleSparse ReRAM Engine: Joint Exploration of Activation and Weight Sparsity in Compressed Neural Networksen
dc.typeThesis
dc.date.schoolyear107-2
dc.description.degree碩士
dc.contributor.oralexamcommittee鄭湘筠(Hsiang-Yun Cheng),劉宗德(Tsung-Te Liu)
dc.subject.keyword神經網路,稀疏性,可變電阻式記憶體,加速器架構,zh_TW
dc.subject.keywordNeural network,sparsity,ReRAM,accelerator architecture,en
dc.relation.page48
dc.identifier.doi10.6342/NTU201903060
dc.rights.note有償授權
dc.date.accepted2019-08-12
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept資訊工程學研究所zh_TW
顯示於系所單位:資訊工程學系

文件中的檔案:
檔案 大小格式 
ntu-108-1.pdf
  未授權公開取用
1.64 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved