可適性乘積量化方法用於有效深度學習模型壓縮

葉彥廷; Yan-Ting Ye

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/89053

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	陳銘憲	zh_TW
dc.contributor.advisor	Ming-Syan Chen	en
dc.contributor.author	葉彥廷	zh_TW
dc.contributor.author	Yan-Ting Ye	en
dc.date.accessioned	2023-08-16T16:56:06Z	-
dc.date.available	2023-11-09	-
dc.date.copyright	2023-08-16	-
dc.date.issued	2023	-
dc.date.submitted	2023-08-07	-
dc.identifier.citation	[1] Saeed Anwar and Nick Barnes. Real image denoising with feature attention. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2019. [2] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009. [3] Zhen Dong, Zhewei Yao, Amir Gholami, Michael W. Mahoney, and Kurt Keutzer. Hawq: Hessian aware quantization of neural networks with mixed-precision. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2019. [4] Amir Gholami, Sehoon Kim, Zhen Dong, Zhewei Yao, Michael W. Mahoney, and Kurt Keutzer. A survey of quantization methods for efficient neural network inference, 2021. [5] Yunchao Gong, Liu Liu, Ming Yang, and Lubomir Bourdev. Compressing deep convolutional networks using vector quantization, 2014. [6] Song Han, Huizi Mao, and William J. Dally. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding, 2016. [7] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016. [8] Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. Mobilenets: Efficient convolutional neural networks for mobile vision applications, 2017. [9] Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, AndrewHoward, Hartwig Adam, and Dmitry Kalenichenko. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018. [10] Jeff Johnson, Matthijs Douze, and Hervé Jégou. Billion-scale similarity search with gpus, 2017. [11] Herve Jégou, Matthijs Douze, and Cordelia Schmid. Product quantization for nearest neighbor search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(1):117–128, 2011. [12] Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. Cifar-10 (canadian institute for advanced research). [13] Ya Le and Xuan S. Yang. Tiny imagenet visual recognition challenge. 2015. [14] Mingbao Lin, Rongrong Ji, Zihan Xu, Baochang Zhang, Yan Wang, Yongjian Wu, Feiyue Huang, and Chia-Wen Lin. Rotated binary neural network. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 7474–7485. Curran Associates, Inc., 2020. [15] Xiaofan Lin, Cong Zhao, and Wei Pan. Towards accurate binary convolutional neural network. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. [16] Raphael Gontijo Lopes, Stefano Fenu, and Thad Starner. Data-free knowledge distillation for deep neural networks, 2017. [17] Jian-Hao Luo, Jianxin Wu, and Weiyao Lin. Thinet: A filter level pruning method for deep neural network compression. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Oct 2017. [18] James MacQueen et al. Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, volume 1, pages 281–297. Oakland, CA, USA, 1967. [19] Shervin Minaee, Yuri Boykov, Fatih Porikli, Antonio Plaza, Nasser Kehtarnavaz, and Demetri Terzopoulos. Image segmentation using deep learning: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(7):3523–3542, 2022. [20] Pengzhen Ren, Yun Xiao, Xiaojun Chang, Po-yao Huang, Zhihui Li, Xiaojiang Chen, and Xin Wang. A comprehensive survey of neural architecture search: Challenges and solutions. ACM Comput. Surv., 54(4), may 2021. [21] Pierre Stock, Angela Fan, Benjamin Graham, Edouard Grave, Rémi Gribonval, Herve Jegou, and Armand Joulin. Training with quantization noise for extreme model compression. In International Conference on Learning Representations, 2021. [22] Pierre Stock, Armand Joulin, Rémi Gribonval, Benjamin Graham, and Hervé Jégou. And the bit goes down: Revisiting the quantization of neural networks. In International Conference on Learning Representations, 2020. [23] Wonyong Sung, Sungho Shin, and Kyuyeon Hwang. Resiliency of deep neural networks under quantization, 2016. [24] Stefan Uhlich, Lukas Mauch, Fabien Cardinaux, Kazuki Yoshiyama, Javier Alonso Garcia, Stephen Tiedemann, Thomas Kemp, and Akira Nakamura. Mixed precision dnns: All you need is a good parametrization, 2020. [25] Kuan Wang, Zhijian Liu, Yujun Lin, Ji Lin, and Song Han. Haq: Hardware-aware automated quantization with mixed precision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019. [26] Jiaxiang Wu, Cong Leng, Yuhang Wang, Qinghao Hu, and Jian Cheng. Quantized convolutional neural networks for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016. [27] Yi-Heng Wu, Heng Lee, Yu Sheng Lin, and Shao-Yi Chien. Accelerator design for vector quantized convolutional neural network. In 2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS), pages 46–50, 2019. [28] Zhewei Yao, Zhen Dong, Zhangcheng Zheng, Amir Gholami, Jiali Yu, Eric Tan, Leyuan Wang, Qijing Huang, Yida Wang, Michael Mahoney, and Kurt Keutzer. Hawq-v3: Dyadic neural network quantization. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 11875–11886. PMLR, 18–24 Jul 2021. [29] Shuchang Zhou, Yuxin Wu, Zekun Ni, Xinyu Zhou, He Wen, and Yuheng Zou. Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients, 2018.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/89053	-
dc.description.abstract	在這篇論文中，我們提出了一種針對神經網絡壓縮的泛化乘積量化算法。相較於純量量化，乘積量化具有潛力達到極高的壓縮率。然而乘積量化存在區塊大小的限制，對於在給定記憶體空間內找到合適的量化參數構成了挑戰。為了克服這個限制，我們提出可適性補值，使得乘積量化可以使用任意大小的區塊，讓模型壓縮過程更靈活。可適性補值方法與以往基於最佳化方法的乘積量化是獨立的作法。此外，我們採用了一種簡單的方法來確定模型中各層的合適區塊大小，以達成更好的量化結果。實驗結果表明，我們的方法可以泛化乘積量化而不會明顯影響準確率，並能與以往的做法結合達到有效的提升表現。	zh_TW
dc.description.abstract	In this thesis, we propose a generalized product quantization (PQ) algorithm for neural network compression. Compared to scalar quantization, PQ offers the potential to achieve an extremely high compression rate. However, the block size constraints pose a challenge in finding an appropriate quantization configuration under a restricted storage budget. To overcome this limitation, we propose an algorithm, adaptive padding, which enables PQ to be applied to arbitrary block sizes and makes the compression rate of a quantized model more flexible. Adaptive padding is orthogonal to previous PQ approaches which focus on better optimization. Moreover, we employ a simple approach to determine suitable block sizes for each layer. Experimental results demonstrate that our method can generalize PQ without additional accuracy drops and can effectively enhance the performances when incorporated with existing PQ works.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-08-16T16:56:06Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2023-08-16T16:56:06Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	誌謝 ii 摘要 iii Abstract iv Contents v List of Figures vii List of Tables viii 1 Introduction 1 2 Related work 5 2.1 Scalar quantization 5 2.2 Product quantization 6 3 Preliminaries and Notations 8 3.1 Basic Product quantization 8 3.1.1 Clustering 9 3.1.2 Inference 9 3.1.3 Finetune 10 4 Methodology 11 4.1 Adaptive padding 11 4.2 Block Size Selection 15 5 Experiments 19 5.1 Experimental setting 19 5.2 Generalized PQ 20 5.3 mixed block size PQ 21 5.4 Huffman encoding 23 5.5 Ablation study 24 6 Conclusion 25 Bibliography 26	-
dc.language.iso	en	-
dc.subject	模型壓縮	zh_TW
dc.subject	乘積量化	zh_TW
dc.subject	Product quantization	en
dc.subject	Model compression	en
dc.title	可適性乘積量化方法用於有效深度學習模型壓縮	zh_TW
dc.title	Adaptive Product Quantization for Effective Model Compression	en
dc.type	Thesis	-
dc.date.schoolyear	111-2	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	楊奕軒;王鈺強;帥宏翰	zh_TW
dc.contributor.oralexamcommittee	Yi-Hsuan Yang;Yu-Chiang Wang;Hong-Han Shuai	en
dc.subject.keyword	模型壓縮,乘積量化,	zh_TW
dc.subject.keyword	Model compression,Product quantization,	en
dc.relation.page	30	-
dc.identifier.doi	10.6342/NTU202301786	-
dc.rights.note	未授權	-
dc.date.accepted	2023-08-09	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	電信工程學研究所	-
顯示於系所單位：	電信工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-111-2.pdf 未授權公開取用	569.31 kB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。