基於交錯可變電阻式記憶體操作單位量及功耗優化之稀疏圖重映射演算法

Cheng-Yuan Wang; 王政元

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/85079

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	張耀文(Yao-Wen Chang)
dc.contributor.author	Cheng-Yuan Wang	en
dc.contributor.author	王政元	zh_TW
dc.date.accessioned	2023-03-19T22:42:17Z	-
dc.date.copyright	2022-08-18
dc.date.issued	2022
dc.date.submitted	2022-08-12
dc.identifier.citation	[1] E. Agichtein, C. Castillo, D. Donato, and A. Gionis, “Finding high-quality content in social media,” in Proceedings of the International Conference on Web Search and Web Data, pp. 183–194, 2008. [2] P. Chi, S. Li, C. Xu, T. Zhang, J. Zhao, Y. Liu, Y. Wang, and Y. Xie, “PRIME: A novel processing-in-memory architecture for neural network computation in ReRAM-Based main memory,” in Proceedings of ACM/IEEE International Symposium on Computer Architecture, pp. 27–39, 2016. [3] G. Dai, T. Huang, Y. Chi, J. Zhao, G. Sun, Y. Liu, Y. Wang, Y. Xie, and H. Yang, “GraphH: A processing-in-memory architecture for large-scale graph processing,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 38, no. 4, pp. 640–653, 2018. [4] G. Dai, T. Huang, Y. Wang, H. Yang, and J. Wawrzynek, “GraphSAR: A sparsity-aware processing-in-memory architecture for large-scale graph processing on ReRAMs,” in Proceedings of IEEE/ACM International Conference on Computer-Aided Design, pp. 120–126, 2019. [5] C. Giannoula, I. Fernandez, J. G. Luna, K. Koziris, G. Goumas, and O. Mutlu, “SparseP: Towards efficient sparse matrix vector multiplication on real processing-in-memory architectures,” in Proceedings of ACM International Conference on Measurement and Analysis of Computing Systems, pp. 1–49, 2022. [6] T. J. Ham, N. Sundaram, N. Satish, and M. Martonosi, “Graphicionado: A high-performance and energy-efficient accelerator for graph analytics,” in Proceedings of ACM/IEEE International Symposium on Microarchitecture, pp. 1–13, 2016. [7] M. Hu, J. P. Strachan, Z. Li, E. M. Grafals, N. Davila, C. Graves, S. Lam, N. Ge, J. J. Yang, and R. S. Williams, “Dot-product engine for neuromorphic computing: programming 1T1M crossbar to accelerate matrix-vector multiplication,” in Proceedings of ACM/IEEE Design Automation Conference, pp. 1–6, 2016. [8] ITRS, “International technology roadmap for semiconductors 2.0: Executive report,” in International Technology Roadmap for Semiconductors, p. 79, 2015. [9] L. Jure. (2014) Snap stanford network analysis project. [Online]. Available: https://snap.stanford.edu/snap/index.html [10] A. Kyrola, G. Blelloch, and C. Guestrin, “Graphchi: large-scale graph computation on just a PC,” in Proceedings of USENIX Conference on Operating Systems Design and Implementation, pp. 31–46, 2012. [11] R. Li, C. Wang, and K. C.-C. Chang, “User profiling in an ego network: Coprofiling attributes and relationships,” in Proceedings of ACM Web Conference, pp. 819–830, 2014. [12] J. Lin, Z. Zhu, Y.Wang, and Y. Xie, “Learning the sparsity for reram: mapping and pruning sparse neural network for reram based accelerator,” in Proceedings of IEEE/ACM International Conference on Computer-Aided Design, pp. 639–644, 2019. [13] M.-Y. Lin, H.-Y. Cheng, W.-T. Lin, T.-H. Yang, I.-C. Tseng, C.-L. Yang, H.-W. Hu, H.-S. Chang, H.-P. Li, and M.-F. Chang, “DL-RSIM: A simulation framework to enable reliable ReRAM-based accelerators for deep learning,” in Proceedings of IEEE/ACM International Conference on Computer-Aided Design, pp. 1–8, 2018. [14] T.-S. Lo, C.-F.Wu, Y.-H. Chang, T.-W. Kuo, and W.-C.Wang, “Space-efficient graph data placement to save energy of ReRAM crossbar,” in Proceedings of ACM/IEEE International Symposium on Low Power Electronics and Design, pp. 1–6, 2021. [15] G. Malewicz, M. H. Austern., A. J. C. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski, “Pregel: a system for large-scale graph processing,” in Proceedings of ACM International Conference on Management of Data, pp. 135–146, 2010. [16] A. Shafiee, A. Nag, N. Muralimanohar, R. Balasubramonian, J. P. Strachan, M. Hu, R. S. Williams, and V. Srikumar, “ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars,” in Proceedings of ACM/IEEE International Symposium on Computer Architecture, pp. 14–26, 2016. [17] L. Song, Y. Zhou, X. Qian, H. Li, and Y. Chen, “GraphR: Accelerating graph processing using ReRAM,” in Proceedings of IEEE International Symposium on High-Performance Computer Architecture, pp. 531–543, 2018. [18] T. Tang, L. Xia, B. Li, Y. Wang, and H. Yang, “Binary convolutional neural network on RRAM,” in Proceedings of IEEE/ACM International Conference on Computer-Aided Design, pp. 782–787, 2017. [19] Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and P. S. Yu, “A comprehensive survey on graph neural networks,” IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 1, pp. 4–24, 2020. [20] C. Yang, A. Bulu¸c, and J. D. Owens, “GraphBLAST: A high-performance linear algebra-based graph framework on the GPU,” ACM Transactions on Mathematical Software, vol. 48, no. 1, pp. 1–51, 2022. [21] T.-H. Yang, H.-Y. Cheng, C.-L. Yang, I.-C. Tseng, H.-W. Hu, H.-S. Chang, and H.-P. Li, “Sparse ReRAM Engine: Joint exploration of activation and weight sparsity in compressed neural networks,” in Proceedings of ACM/IEEE International Symposium on Computer Architecture, pp. 236–249, 2019.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/85079	-
dc.description.abstract	可變電阻式記憶體（ReRAM）是一種級具有前景的記憶體內處理技術（Process-In-Memory），可有效地降低在巨型複雜圖形處理中，運算單元和內存單元之間的資料移動（Data Movement）成本，ReRAM 單元可以與交叉開關陣列（Crossbar Array）相結合，加速優化圖形處理，並將 ReRAM 交叉開關陣列劃分為操作單元（Operation Unit, OU）可以進一步提升 ReRAM 交叉開關的計算精確度，以往的設計中沒有特別考慮優化操作單元的利用率，導致產生了額外的運算成本和能量損失。為了彌補這些缺點，在本篇論文中，我們提出了一種兩個階段的演算法，並且以交叉開關上的操作單元作為優化目標方案，來重新映射稀疏圖（Sparse Graph）的行列順序，以聚集稀疏圖上的有效資料，減緩圖稀疏性（Sparsity）所生成過多的能耗和運算成本。在本篇論文中，我們透過重映射索引算法並考量給定操作單元的大小，優化操作單元的使用率和能耗。實驗結果表明，與沒有經過任何優化的結果相比，我們提出的算法平均降低了交叉開關上的操作單元的總使用量65.0%，並提高了操作單元的總使用率36.34%，同時節省了50.4%的能源消耗。另一方面，與現有的演算法相比，實驗結果表明，我們提出的算法平均降低了交叉開關上的操作單元的總使用量31.4%，並提高了操作單元的總使用率10.6%，同時節省了17.2%的能源消耗。	zh_TW
dc.description.abstract	Resistive Random Access Memory (ReRAM) Crossbars are a promising processin-memory (PIM) technology to reduce enormous data movement overheads of largescale graph processing between computation and memory units. ReRAM cells can combine with crossbar arrays to effectively accelerate graph processing, and partitioning ReRAM crossbar arrays into Operation Units (OUs) can further improve computation accuracy of ReRAM crossbars. The operation unit utilization was not optimized in previous work, incurring extra computation cost and energy consumption. In this thesis, we propose a two-stage algorithm with a crossbar OU-aware scheme for sparse graph index remapping for ReRAM (SGIRR) crossbars, mitigating the influence of graph sparsity. In particular, this thesis is the first to consider the given operation unit size with the remapping index algorithm, optimizing the operation unit and power dissipation. Compared with the baseline work, experimental results show that our proposed algorithm reduces the utilization of crossbar OUs by 65.0%, improves the total OU block usage by 36.34%, and saves energy consumption by 50.4%, on average. On the other hand, compared with the previous work, experimental results show that our proposed algorithm reduces the utilization of crossbar OUs by 31.4%, improves the total OU block usage by 10.6%, and saves energy consumption by 17.2%, on average.	en
dc.description.provenance	Made available in DSpace on 2023-03-19T22:42:17Z (GMT). No. of bitstreams: 1 U0001-1108202212540500.pdf: 3364519 bytes, checksum: 661b05b4b0db2bb8b9dcdecfb6bbb443 (MD5) Previous issue date: 2022	en
dc.description.tableofcontents	Acknowledgements . . . iii Abstract (Chinese) . . . iv Abstract . . . vi List of Tables . . . x List of Figures . . . xi Chapter 1. Introduction . . . 1 1.1 Introduction . . . 1 1.2 Previous Work . . . 4 1.3 Related Work . . . 5 1.4 Our Contributions . . . 7 1.5 Thesis Organization . . . 8 Chapter 2. Preliminaries . . . 9 2.1 Challenges in Graph Processing . . . . . . . . . . . . . . . . . . . . . . 9 2.2 Resistive Random Access Memory (ReRAM) Architecture . . . . . . . . . . . . 10 2.3 ReRAM Crossbar Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.4 Operation Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.5 Ego Network Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.6 Terminologies and Notations . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.7 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 Chapter 3. Our Proposed Algorithm . . . 20 3.1 Algorithm Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.2 Order Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.2.1 Column Index Filtering by Crossbar OU Size . . . . . . . . . . . . 23 3.3 Order Compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 Chapter 4. Experimental Results 29 4.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.2 The results of crossbar OU utilization . . . . . . . . . . . . . . . . . . . . 31 Chapter 5. Conclusions and Future Work 44 5.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 5.2.1 Consider IR-drop effect on ReRAM crossbar arrays with operation unit . . . 45 5.2.2 Consider the sparse neural networks for the ReRAM crossbar array 47 Bibliography 49 Publication List 53
dc.language.iso	en
dc.subject	操作單元	zh_TW
dc.subject	交叉開關陣列	zh_TW
dc.subject	稀疏圖	zh_TW
dc.subject	記憶體內處理技術	zh_TW
dc.subject	可變電阻式記憶體	zh_TW
dc.subject	ReRAM	en
dc.subject	Sparse Graph	en
dc.subject	Operation Unit	en
dc.subject	Crossbar Array	en
dc.subject	Process-In-Memory	en
dc.title	基於交錯可變電阻式記憶體操作單位量及功耗優化之稀疏圖重映射演算法	zh_TW
dc.title	SGIRR: Sparse Graph Index Remapping for ReRAM Crossbar Operation Unit and Power Optimization	en
dc.type	Thesis
dc.date.schoolyear	110-2
dc.description.degree	碩士
dc.contributor.author-orcid	0000-0002-8698-7015
dc.contributor.coadvisor	張原豪(Yuan-Hao Chang)
dc.contributor.oralexamcommittee	江蕙如(Hui-Ru Jiang),陸寶森(Peter B. Luh),黃婷婷(Ting-Ting Hwang)
dc.subject.keyword	可變電阻式記憶體,記憶體內處理技術,交叉開關陣列,操作單元,稀疏圖,	zh_TW
dc.subject.keyword	ReRAM,Process-In-Memory,Crossbar Array,Operation Unit,Sparse Graph,	en
dc.relation.page	53
dc.identifier.doi	10.6342/NTU202202293
dc.rights.note	同意授權(限校園內公開)
dc.date.accepted	2022-08-15
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資料科學學位學程	zh_TW
dc.date.embargo-lift	2022-08-18	-
顯示於系所單位：	資料科學學位學程

文件中的檔案：

檔案	大小	格式
U0001-1108202212540500.pdf 授權僅限NTU校內IP使用（校園外請利用VPN校外連線服務）	3.29 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。