在基於非揮發性記憶體的儲存設備上通過輕量級的隨機洗牌方法實現高效率的機器學習

Zhi-Lin Ke; 柯志霖

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/72329

標題:	在基於非揮發性記憶體的儲存設備上通過輕量級的隨機洗牌方法實現高效率的機器學習 LIRS: Enabling efficient machine learning on NVM-based storage via a lightweight implementation of random shuffling
作者:	Zhi-Lin Ke 柯志霖
指導教授:	楊佳玲(Chia-Lin Yang)
關鍵字:	機器學習,支援向量機,深度神經網路,隨機洗牌,測試準確度,收斂速度,硬式磁碟機,非揮發性記憶體儲存裝置,固態硬碟, Machine learning,Support Vector Machine(SVM),Deep Neural Network(DNN),Random shuffling,Testing accuracy,Convergence rate,Hard disk device(HDD),Non-volatile memory-based storage(NVM-based storage),Solid-state drive(SSD),
出版年 :	2018
學位:	碩士
摘要:	支援向量機（Support Vector Machine, SVM）與深度神經網絡（Deep Neural Network, DNN）機器學習演算法在近幾年受到大家的關注。在訓練機器學習演算法時，對所有訓練資料進行隨機洗牌(Random shuffling) 可以提高測試準確度(Testing accuracy) 與收斂速度(Convergence rate)。然而，由於硬式磁碟機（Hard disk drive, HDD）中的隨機存取 (Random access) 速度慢，在實際系統中實現訓練資料的隨機洗牌並不是一個簡單的過程。為了避免頻繁地對硬式磁碟機的隨機存取，現有的解決方法通常會限制隨機洗牌的效果。由於新興的基於非揮發性記憶體的儲存裝置(Non-volatile memory-based storage) 提供快速的隨機存取，例如的Intel Optane SSD，我們提出一個輕量級的隨機洗牌方法 LIRS，透過隨機洗亂整個訓練數據集的索引，並直接從儲存裝置中讀取選定的訓練資料並組成批量(Batch) 以達到隨機洗牌的效果。實驗結果顯示，採用LIRS 可以使SVM 和DNN 的總訓練時間平均減少49.9% 和43.5%，並使在DNN 上的測試準確度平均提高1.01%。 Machine learning algorithms, such as Support Vector Machine (SVM) and Deep Neural Network (DNN), have gained a lot of interests recently. When training a machine learning algorithm, randomly shuffle all the training data can improve the testing accuracy and boost the convergence rate. Nevertheless, realizing training data random shuffling in a real system is not a straightforward process due to the slow random accesses in hard disk drive (HDD). To avoid frequent random disk access, the effect of random shuffling is often limited in existing approaches. With the emerging non-volatile memory-based storage device, such as Intel Optane SSD, which provides fast random accesses, we propose a lightweight implementation of random shuffling (LIRS) to randomly shuffle the indexes of the entire training dataset, and the selected training instances are directly accessed from the storage and packed into batches. Experimental results show that LIRS can reduce the total training time of SVM and DNN by 49.9% and 43.5% on average, and improve the final testing accuracy on DNN by 1.01%.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/72329
DOI:	10.6342/NTU201803514
全文授權:	有償授權
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-107-1.pdf 目前未授權公開取用	1.58 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。