深度學習推薦系統訓練之記憶體置換行為分析

賴宥儒; You-Ru Lai

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/83107

標題:	深度學習推薦系統訓練之記憶體置換行為分析 A Swapping Behavior Analysis of Deep Learning Recommendation System Training
其他標題:	A Swapping Behavior Analysis of Deep Learning Recommendation System Training
作者:	賴宥儒 You-Ru Lai
指導教授:	楊佳玲 Chia-Lin Yang
關鍵字:	深度學習學習推薦系統,置換系統,固態硬碟,讀寫特徵,垃圾回收機制, DLRM,Swapping system,SSD,I/O characteristic,Garbage Collection,
出版年 :	2022
學位:	碩士
摘要:	推薦系統廣泛用於提供個人化的建議，而推薦系統訓練時所需記憶體容量持續成長。使用swap將固態硬碟 (SSD) 作為記憶體的延伸，可以緩解訓練時記憶體的需求。由於使用swap會引入額外的讀寫延遲，並且考慮到模型重新訓練及部署的週期性，訓練推薦系統時使用swap必須關注對整體效率的影響。在本文中，我們觀察到使用swap會增加訓練時間達到 2 ~ 5 倍長。經過分析，我們歸納出以下影響訓練效率的原因： 1. 推薦系統訓練時操作記憶體是不規律的，造成記憶體利用率低且swap次數多，所以讀寫量很大。 2. 讀寫請求的大小多數小於32K，不利SSD內部頻寬的利用。 3. SSD的讀寫頻寬隨整體寫入量增加而下降，主要是受到SSD內部的垃圾回收機制影響。同時，我們使用 fio 模擬讀寫行為並探討改善SSD讀寫效率的方式，實驗結果如下，1.改變讀寫大小至128KB，有1.75倍的頻寬提升；2.改變寫入模式為順序寫，寫性能有4.37倍的提升。最後，我們提供了下列二個 swap 用於推薦系統訓練時的建議：1.聚集更多鄰近使用的swap資料來以較大的讀寫大小操作及換出(swap out)記憶體時以順序寫的方式操作SSD。2.採用 Open Channel SSD 或 ZNS SSD 作為 swap 可讓系統依需求安排 SSD 的資料讀寫及垃圾回收機制，以此提升效能。 The deep learning recommendation model(DLRM) is widely used for providing personalized suggestions, and the memory capacity requirement for DLRM training keeps growing. Using swap that turns SSD into a memory extension can alleviate the DRAM capacity demand of training. At the same time, it will introduce additional I/O latency, and considering the cycle time of model retrain and redeployment, training DLRM with swap needs to consider the influence on efficiency. In this thesis, we find that the training time becomes 2 ~ 5 times longer when using swap. Based on the analysis, we summarize the factors that influence the training efficiency as follows. 1. The memory access pattern is irregular when DLRM training, which causes the utilization of memory to be low and the number of swapping to be large. Thus, the I/O volume is huge. 2. Most of the I/O requests are less than 32K, which is unfavorable for utilizing the internal bandwidth of SSD. 3. As the write volume increases, the SSD read/write bandwidth decreases, which is mainly affected by internal garbage collection(GC) task in SSD. Besides, we use fio to simulate the I/O behavior and conduct experiment on how to improve SSD I/O efficiency. The result is the following. 1. Changing the I/O request size to 128K leads to 1.75x bandwidth improvement. 2. Changing the write pattern to sequential write leads to 4.37$x improvement of write bandwidth. In the end, we provide 2 suggestions for using swap in DLRM training. 1. Aggregate more swap data that will be used in close time to read/write with a bigger size and use sequential write to swap out memory 2. Choosing the Open Channel SSD or ZNS SSD allows the host to arrange the read/write and GC according to the demand, thereby improving the performance.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/83107
DOI:	10.6342/NTU202210058
全文授權:	同意授權(限校園內公開)
顯示於系所單位：	資訊網路與多媒體研究所

文件中的檔案：

檔案	大小	格式
U0001-1121221117414063.pdf 授權僅限NTU校內IP使用（校園外請利用VPN校外連線服務）	3.45 MB	Adobe PDF	檢視/開啟

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。