可適性乘積量化方法用於有效深度學習模型壓縮

葉彥廷; Yan-Ting Ye

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/89053

標題:	可適性乘積量化方法用於有效深度學習模型壓縮 Adaptive Product Quantization for Effective Model Compression
作者:	葉彥廷 Yan-Ting Ye
指導教授:	陳銘憲 Ming-Syan Chen
關鍵字:	模型壓縮,乘積量化, Model compression,Product quantization,
出版年 :	2023
學位:	碩士
摘要:	在這篇論文中，我們提出了一種針對神經網絡壓縮的泛化乘積量化算法。相較於純量量化，乘積量化具有潛力達到極高的壓縮率。然而乘積量化存在區塊大小的限制，對於在給定記憶體空間內找到合適的量化參數構成了挑戰。為了克服這個限制，我們提出可適性補值，使得乘積量化可以使用任意大小的區塊，讓模型壓縮過程更靈活。可適性補值方法與以往基於最佳化方法的乘積量化是獨立的作法。此外，我們採用了一種簡單的方法來確定模型中各層的合適區塊大小，以達成更好的量化結果。實驗結果表明，我們的方法可以泛化乘積量化而不會明顯影響準確率，並能與以往的做法結合達到有效的提升表現。 In this thesis, we propose a generalized product quantization (PQ) algorithm for neural network compression. Compared to scalar quantization, PQ offers the potential to achieve an extremely high compression rate. However, the block size constraints pose a challenge in finding an appropriate quantization configuration under a restricted storage budget. To overcome this limitation, we propose an algorithm, adaptive padding, which enables PQ to be applied to arbitrary block sizes and makes the compression rate of a quantized model more flexible. Adaptive padding is orthogonal to previous PQ approaches which focus on better optimization. Moreover, we employ a simple approach to determine suitable block sizes for each layer. Experimental results demonstrate that our method can generalize PQ without additional accuracy drops and can effectively enhance the performances when incorporated with existing PQ works.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/89053
DOI:	10.6342/NTU202301786
全文授權:	未授權
顯示於系所單位：	電信工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-111-2.pdf 目前未授權公開取用	569.31 kB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。