Sparse ReRAM Engine: 聯合探索壓縮神經網路之權重與激活稀疏性

Tzu-Hsien Yang; 楊子賢

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/74357

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	楊佳玲(Chia-Lin Yang)
dc.contributor.author	Tzu-Hsien Yang	en
dc.contributor.author	楊子賢	zh_TW
dc.date.accessioned	2021-06-17T08:31:31Z	-
dc.date.available	2020-08-20
dc.date.copyright	2019-08-20
dc.date.issued	2019
dc.date.submitted	2019-08-12
dc.identifier.citation	[1] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Proceedings of Neural Information Processing Systems (NIPS), 2012. [2] P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. Lecun, “Overfeat: Integrated recognition, localization and detection using convolutional networks,” in Proceedings of International Conference on Learning Representations (ICLR), 2014. [3] G. Chen, C. Parada, and G. Heigold, “Small-footprint keyword spotting using deep neural networks,” in Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2014. [4] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in Proceedings of International Conference on Learning Repre- sentations (ICLR), 2015. [5] Y. L. Cun, L. D. Jackel, B. Boser, J. S. Denker, H. P. Graf, I. Guyon, D. Henderson, R. E. Howard, and W. Hubbard, “Handwritten digit recognition: Applications of neural network chips and automatic learning,” IEEE Communications Magazine, vol. 27, no. 11, 1989. [6] M. Alwani, H. Chen, M. Ferdman, and P. Milder, “Fused-layer cnn accelerators,” in Proceedings of International Symposium on Microarchitecture (MICRO), 2016. [7] K. Hsu, F. Lee, Y. Lin, E. Lai, J. Wu, D. Lee, M. Lee, H. Lung, K. Hsieh, and C. Lu, “A study of array resistance distribution and a novel operation algorithm for WOx ReRAM memory,” in Proceedings of International Conference on Solid State Devices and Materials (SSDM), 2015. [8] P. Chi, S. Li, C. Xu, T. Zhang, J. Zhao, Y. Liu, Y. Wang, and Y. Xie, “Prime: A novel processing-in-memory architecture for neural network computation in reram-based main memory,” in Proceedings of International Symposium on Computer Architec- ture (ISCA), 2016. [9] A. Shafiee, A. Nag, N. Muralimanohar, R. Balasubramonian, J. P. Strachan, M. Hu, R. S. Williams, and V. Srikumar, “Isaac: A convolutional neural network accel- erator with in-situ analog arithmetic in crossbars,” in Proceedings of International Symposium on Computer Architecture (ISCA), 2016. [10] M. Hu, J. P. Strachan, Z. Li, E. M. Grafals, N. Davila, C. Graves, S. Lam, N. Ge, J. J. Yang, and R. S. Williams, “Dot-product engine for neuromorphic com- puting: Programming 1t1m crossbar to accelerate matrix-vector multiplication,” in Proceedings of Design Automation Conference (DAC), 2016. [11] S. Zhang, Z. Du, L. Zhang, H. Lan, S. Liu, L. Li, Q. Guo, T. Chen, and Y. Chen, “Cambricon-x: An accelerator for sparse neural networks,” in Proceedings of Inter- national Symposium on Microarchitecture (MICRO), 2016. [12] Y. Chen, T. Krishna, J. S. Emer, and V. Sze, “Eyeriss: An energy-efficient reconfig- urable accelerator for deep convolutional neural networks,” IEEE Journal of Solid- State Circuits, vol. 52, no. 1, pp. 127–138, 2017. [13] J. Albericio, P. Judd, T. Hetherington, T. Aamodt, N. E. Jerger, and A. Moshovos, “Cnvlutin: Ineffectual-neuron-free deep neural network computing,” in Proceedings of International Symposium on Computer Architecture (ISCA), 2016. [14] A. Parashar, M. Rhu, A. Mukkara, A. Puglielli, R. Venkatesan, B. Khailany, J. Emer, S. W. Keckler, and W. J. Dally, “Scnn: An accelerator for compressed-sparse convolutional neural networks,” in Proceedings of International Symposium on Com- puter Architecture (ISCA), 2017. [15] S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M. A. Horowitz, and W. J. Dally, “Eie: Efficient inference engine on compressed deep neural network,” in Proceedings of International Symposium on Computer Architecture (ISCA), 2016. [16] S. Han, J. Pool, J. Tran, and W. J. Dally, “Learning both weights and connections for efficient neural networks,” arXiv, 2015. [17] X. Glorot, A. Bordes, and Y. Bengio, “Deep sparse rectifier neural networks,” in Pro- ceedings of International Conference on Artificial Intelligence and Statistics (AIS- tats), 2011. [18] A. L. Maas, A. Y. Hannun, and A. Y. Ng, “Rectifier nonlinearities improve neural network acoustic models,” in Proceedings of International Conference on Machine Learning (ICML), 2013. [19] W. Wen, C. Wu, Y. Wang, Y. Chen, and H. Li, “Learning structured sparsity in deep neural networks,” in Proceedings of Neural Information Processing Systems (NIPS), 2016. [20] P. Wang, Y. Ji, C. Hong, Y. Lyu, D. Wang, and Y. Xie, “Snrram: An efficient sparse neural network computation architecture based on resistive random-access memory,” in Proceedings of Design Automation Conference (DAC), 2018. [21] H. Ji, L. Song, L. Jiang, H. H. Li, and Y. Chen, “ReCom: An efficient resistive accel- erator for compressed deep neural networks,” in Proceedings of Design, Automation Test in Europe (DATE), 2018. [22] M. Lin, H. Cheng, W. Lin, T. Yang, I. Tseng, C. Yang, H. Hu, H. Chang, H. Li, and M. Chang, “Dl-rsim: A simulation framework to enable reliable reram-based accelerators for deep learning,” in Proceedings of the International Conference on Computer-Aided Design (ICCAD), 2018. [23] W. Chen, K. Li, W. Lin, K. Hsu, P. Li, C. Yang, C. Xue, E. Yang, Y. Chen, Y. Chang, T. Hsu, Y. King, C. Lin, R. Liu, C. Hsieh, K. Tang, and M. Chang, “A 65nm 1Mb nonvolatile computing-in-memory ReRAM macro with sub-16ns multiply-and- accumulate for binary DNN AI edge processors,” in Proceedings of International Solid-State Circuits Conference (ISSCC), 2018. [24] S. Chang, M. Sandler, and A. Zhmoginov, “The power of sparsity in convolutional neural networks,” arXiv, 2017. [25] S. Han, H. Mao, and W. J. Dally, “Deep compression: Compressing deep neural network with pruning, trained quantization and Huffman coding,” arXiv, 2015. [26] M. Denil, B. Shakibi, L. Dinh, M. Ranzato, and N. de Freitas, “Predicting param- eters in deep learning,” in Proceedings of Neural Information Processing Systems (NIPS), 2013. [27] S. R. Li, J. Park, and P. T. P. Tang, “Enabling sparse Winograd convolution by native pruning,” arXiv, 2017. [28] F. Su, W. Chen, L. Xia, C. Lo, T. Tang, Z. Wang, K. Hsu, M. Cheng, J. Li, Y. Xie, Y. Wang, M. Chang, H. Yang, and Y. Liu, “A 462GOPs/J RRAM-based nonvolatile intelligent processor for energy harvesting IoE system featuring nonvolatile logics and processing-in-memory,” in Proceedings of International Symposium on VLSI Technology, Systems and Applications(VLSI-TSA), 2017. [29] C. Xue, W. Chen, J. Liu, J. Li, W. Lin, W. Lin, J. Wang, W. Wei, T. Chang, T. Chang, T. Huang, H. Kao, S. Wei, Y. Chiu, C. Lee, C. Lo, Y. King, C. Lin, R. Liu, C. Hsieh, K. Tang, and M. Chang, “A 1Mb multibit ReRAM computing-in-memory macro with 14.6ns parallel MAC computing time for CNN based AI edge processors,” in Proceedings of International Solid-State Circuits Conference (ISSCC), 2019. [30] P. Chen, X. Peng, and S. Yu, “NeuroSim: A circuit-level macro model for bench- marking neuro-inspired architectures in online learning,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 37, no. 12, pp. 3067– 3080, 2018. [31] B. Feinberg, S. Wang, and E. Ipek, “Making memristive neural network accelerators reliable,” in Proceedings of International Symposium on High Performance Computer Architecture (HPCA), 2018. [32] C. Xu, D. Niu, N. Muralimanohar, R. Balasubramonian, T. Zhang, S. Yu, and Y. Xie, “Overcoming the challenges of crossbar resistive memory architectures,” in Pro- ceedings of International Symposium on High Performance Computer Architecture (HPCA), 2015. [33] D. Garbin, E. Vianello, O. Bichler, Q. Rafhay, C. Gamrat, G. Ghibaudo, B. DeSalvo, and L. Perniola, “Hfo2-based oxram devices as synapses for convolutional neural networks,” IEEE Transactions on Electron Devices, vol. 62, no. 8, pp. 2494–2501, 2015. [34] M. Saberi, R. Lotfi, K. Mafinezhad, and W. A. Serdijn, “Analysis of power con- sumption and linearity in capacitive digital-to-analog converters used in successive approximation ADCs,” IEEE Transactions on Circuits and Systems I: Regular Pa- pers, vol. 58, no. 8, pp. 1736–1748, 2011. [35] P. Gu, B. Li, T. Tang, S. Yu, Y. Cao, Y. Wang, and H. Yang, “Technological exploration of rram crossbar array for matrix-vector multiplication,” in Proceedings of Asia and South Pacific Design Automation Conference (ASPDAC), 2015. [36] W. D. Hillis and G. L. Steele, “Data parallel algorithms,” Commun. ACM , vol. 29, no. 12, pp. 1170–1183, 1986. [37] N. Muralimanohar, R. Balasubramonia, and N. P. Jouppi, “CACTI 6.0: A tool to model large caches,” HP Lab, Tech. Rep., 2009. [38] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” in Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998. [39] A. Krizhevsky, “Learning multiple layers of features from tiny images,” in Tech Report, University of Toronto, 2009. [40] J. Deng, W. Dong, R. Socher, L. Li, K. Li, and F. Li, “ImageNet: A large-scale hierar- chical image database,” in Proceedings of Computer Vision and Pattern Recognition (CVPR), 2009. [41] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Proceedings of Computer Vision and Pattern Recognition (CVPR), 2015. [42] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” arXiv, 2015. [43] J. Park, S. R. Li, W. Wen, H. Li, Y. Chen, and P. Dubey, “Holistic sparsecnn: Forging the trident of accuracy, speed, and size,” arXiv, 2016. [44] P. Molchanov, S. Tyree, T. Karras, T. Aila, and J. Kautz, “Pruning convolutional neural networks for resource efficient transfer learning,” arXiv, 2016. [45] J. Yu, A. Lukefahr, D. Palframan, G. Dasika, R. Das, and S. Mahlke, “Scalpel: Customizing dnn pruning to the underlying hardware parallelism,” in Proceedings of International Symposium on Computer Architecture (ISCA), 2017. [46] L. Liang, L. Deng, Y. Zeng, X. Hu, Y. Ji, X. Ma, G. Li, and Y. Xie, “Crossbar-aware neural network pruning,” arXiv, 2018. [47] P. Hill, A. Jain, M. Hill, B. Zamirai, C. Hsu, M. A. Laurenzano, S. Mahlke, L. Tang, and J. Mars, “Deftnn: Addressing bottlenecks for dnn execution on gpus via synapse vector elimination and near-compute data fission,” in Proceedings of International Symposium on Microarchitecture (MICRO), 2017. [48] J. Lin, Z. Zhu, Y. Wang, and Y. Xie, “Learning the sparsity for reram: Mapping and pruning sparse neural network for reram based accelerator,” in Proceedings of Asia and South Pacific Design Automation Conference (ASPDAC), 2019. [49] X. Chen, J. Zhu, J. Jiang, and C. Tsui, “CompRRAE: RRAM-based convolutional neural network accelerator with reduced computations through a runtime activation estimation,” in Proceedings of Asia and South Pacific Design Automation Conference (ASPDAC), 2019.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/74357	-
dc.description.abstract	利用神經網路模型的稀疏性以減少無效的計算為普遍使用的方法以達到高能效的深度神經網路推論加速器。然而由於緊密耦合的縱橫式結構，在基於可變電阻式記憶體之神經網路加速器下探索稀疏性尚為較少關注的部分。現有的可變電阻式記憶體之神經網路加速器架構研究假設整個縱橫式陣列可以在單一周期內啟動。然而考慮推論之精確度，矩陣-向量計算在實踐中必須以更小的粒度執行，稱之為操作單位(Operation Unit)。基於OU的架構創造了新的機會來探索深度神經網路的稀疏性。在本論文中，我們提出了第一個實際的稀疏可變電阻式記憶體引擎(Sparse ReRAM Engine)同時利用權重與激活的稀疏性。我們的評估顯示提出的方法可以有效的消除無效的計算，並且提供可觀的效能改善與能源節省。	zh_TW
dc.description.abstract	Exploiting model sparsity to reduce ineffectual computation is a commonly used approach to achieve energy efficiency for DNN inference accelerators. However, due to the tightly coupled crossbar structure, exploiting sparsity for ReRAM-based NN accelerator is a less explored area. Existing architectural studies on ReRAM-based NN accelerators assume that an entire crossbar array can be activated in a single cycle. However, due to inference accuracy considerations, matrix-vector computation must be conducted in a smaller granularity in practice, called Operation Unit (OU). An OU-based architecture creates a new opportunity to exploit DNN sparsity. In this paper, we propose the first practical Sparse ReRAM Engine that exploits both weight and activation sparsity. Our evaluation shows that the proposed method is effective in eliminating ineffectual computation, and delivers significant performance improvement and energy savings.	en
dc.description.provenance	Made available in DSpace on 2021-06-17T08:31:31Z (GMT). No. of bitstreams: 1 ntu-108-R06922094-1.pdf: 1679337 bytes, checksum: 6baccfe53cd619a5eefcbb67bc3b7410 (MD5) Previous issue date: 2019	en
dc.description.tableofcontents	1 Introduction 1 2 Background 5 2.1 ReRAM-based DNN Accelerator Architecture 5 2.2 Challenges in Exploiting DNN Sparsity in ReRAM-based DNN Ac- celerator 7 3 A Practical ReRAM Accelerator Architecture 10 4 New Opportunity for Exploiting Sparsity in OU-based ReRAM Accelerator 15 4.1 Weight Compression 17 4.2 Activation Compression 18 5 SPARSE ReRAM ENGINE 21 5.1 Index Decoder 22 5.2 Dynamic OU Formation 24 5.3 SRE Pipeline 25 6 Evaluation Methodology 27 7 Experimental Results 31 7.1 Performance and Energy 31 7.2 Indexing Overhead Analysis 33 7.3 Sensitivity Studies 34 7.4 Non-SSL Sparse Neural Networks 37 7.5 Comparison with Over-Idealized Design 38 8 Related Work 40 9 Conclusion 43 Reference 44
dc.language.iso	zh-TW
dc.title	Sparse ReRAM Engine: 聯合探索壓縮神經網路之權重與激活稀疏性	zh_TW
dc.title	Sparse ReRAM Engine: Joint Exploration of Activation and Weight Sparsity in Compressed Neural Networks	en
dc.type	Thesis
dc.date.schoolyear	107-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	鄭湘筠(Hsiang-Yun Cheng),劉宗德(Tsung-Te Liu)
dc.subject.keyword	神經網路,稀疏性,可變電阻式記憶體,加速器架構,	zh_TW
dc.subject.keyword	Neural network,sparsity,ReRAM,accelerator architecture,	en
dc.relation.page	48
dc.identifier.doi	10.6342/NTU201903060
dc.rights.note	有償授權
dc.date.accepted	2019-08-12
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊工程學研究所	zh_TW
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-108-1.pdf 目前未授權公開取用	1.64 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。