層內和層間變換以減少卷積神經網路計算的記憶體流量

Pin-Wei Liao; 廖品崴

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/19620

標題:	層內和層間變換以減少卷積神經網路計算的記憶體流量 Intra- and Inter- Layer Transformation to Reduce Memory Traffic for CNN Computation
作者:	Pin-Wei Liao 廖品崴
指導教授:	廖世偉(Shih-Wei Liao)
關鍵字:	卷積神經網路,迴圈變換,記憶體流量, CNN,loop transformation,memory traffic,
出版年 :	2021
學位:	碩士
摘要:	近年來，邊緣推理已廣受歡迎。有許多 AI 加速器被提出並且廣泛研究。此類設備通常裝有大量的 PE 和許多晶片上的 SRAM。成功實現 AI 加速的關鍵是有效地使用從晶片外 DRAM 傳輸到晶片上 SRAM 的數據。大多數現有研究針對單個卷積層優化了晶片上 SRAM 的使用，但它們往往會忽略層間數據重用的機會。我們提出了一種調度兩個相鄰的 CNN 層的算法。我們的目標是減少 DRAM 與本地內存之間的流量，而不是將緩衝區僅分配給單個層。我們跨層的調度有效的降低了內存流量。我們還通過 UC Berkeley 的 Gemmini 模擬器驗證了我們的內存流量減少模型的有效性。 Edge inference has gained much popularity in recent years. Many AI accelerators have been proposed and extensively studied. Such devices are often packed with a large number of PEs (Processing Elements), and lots of on-chip SRAM. The key to successful AI acceleration is to effectively use the data transferred from off-chip DRAM to the on-chip SRAM. Most prior studies optimize the use of on-chip SRAM for a single convolution layer, they tend to ignore the opportunity of inter-layer data reuse. We have proposed an algorithm to schedule two adjacent layers of CNN operations. Our goal is to reduce traffic between DRAM and local memory more than allocating the buffer to only a single layer. Our cross-layer scheduling effectively reduces the memory traffic. We hav also verified the validity of our memory traffic reduction model on the Gemmini simulator from UC Berkeley.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/19620
DOI:	10.6342/NTU202100638
全文授權:	未授權
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
U0001-0602202123400700.pdf 目前未授權公開取用	1.49 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。