基於梯度與零激活值比率的混合精度量化

Kuan-Wei Tsai; 蔡冠偉

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/84784

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	陳銘憲(Ming-Syan Chen)
dc.contributor.author	Kuan-Wei Tsai	en
dc.contributor.author	蔡冠偉	zh_TW
dc.date.accessioned	2023-03-19T22:25:29Z	-
dc.date.copyright	2022-09-05
dc.date.issued	2022
dc.date.submitted	2022-08-31
dc.identifier.citation	Ron Banner, Itay Hubara, Elad Hoffer, and Daniel Soudry. Scalable methods for 8-bit training of neural networks. Advances in neural information processing systems, 31, 2018. Jungwook Choi, Zhuo Wang, Swagath Venkataramani, Pierce I-Jen Chuang, Vijayalakshmi Srinivasan, and Kailash Gopalakrishnan. Pact: Parameterized clipping activation for quantized neural networks. arXiv preprint arXiv:1805.06085, 2018. Zhen Dong, Zhewei Yao, Amir Gholami, Michael W Mahoney, and Kurt Keutzer. Hawq: Hessian aware quantization of neural networks with mixed-precision. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 293–302, 2019. Song Han, Jeff Pool, John Tran, and William Dally. Learning both weights and connections for efficient neural network. Advances in neural information processing systems, 28, 2015. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. Geoffrey Hinton, Oriol Vinyals, Jeff Dean, et al. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2(7), 2015. Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv: 1704.04861, 2017. Itay Hubara, Yury Nahshan, Yair Hanani, Ron Banner, and Daniel Soudry. Accurate post training quantization with small calibration sets. In International Conference on Machine Learning, pages 4466–4475. PMLR, 2021. Raghuraman Krishnamoorthi. Quantizing deep convolutional networks for efficient inference: A whitepaper, 2018. Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25, 2012. Ya Le and Xuan Yang. Tiny imagenet visual recognition challenge. CS 231N, 7(7): 3, 2015. Dongsoo Lee, Se Jung Kwon, Byeongwook Kim, Yongkweon Jeon, Baeseong Park, and Jeongin Yun. Flexor: Trainable fractional quantization. Advances in neural information processing systems, 33:1311–1321, 2020. Ji Lin, Yongming Rao, Jiwen Lu, and Jie Zhou. Runtime neural pruning. Advances in neural information processing systems, 30, 2017. Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European conference on computer vision, pages 740–755. Springer, 2014. Julieta Martinez, Jashan Shewakramani, Ting Wei Liu, Ioan Andrei Bârsan, Wenyuan Zeng, and Raquel Urtasun. Permute, quantize, and fine-tune: Efficient compression of neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15699–15708, 2021. Paulius Micikevicius, Sharan Narang, Jonah Alben, Gregory Diamos, Erich Elsen, David Garcia, Boris Ginsburg, Michael Houston, Oleksii Kuchaiev, Ganesh Venkatesh, et al. Mixed precision training. arXiv preprint arXiv:1710.03740, 2017. Seyed Iman Mirzadeh, Mehrdad Farajtabar, Ang Li, Nir Levine, Akihiro Matsukawa, and Hassan Ghasemzadeh. Improved knowledge distillation via teacher assistant. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 5191–5198, 2020. Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. Xnornet: Imagenet classification using binary convolutional neural networks. In European conference on computer vision, pages 525–542. Springer, 2016. Joseph Redmon and Ali Farhadi. Yolo9000: Better, faster, stronger. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 6517–6525, 2017. Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards realtime object detection with region proposal networks. Advances in neural information processing systems, 28, 2015. Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. Imagenet large scale visual recognition challenge. International journal of computer vision, 115(3):211–252, 2015. Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander A Alemi. Inception-v4, inception-resnet and the impact of residual connections on learning. In Thirty-first AAAI conference on artificial intelligence, 2017. Mingxing Tan and Quoc Le. Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning, pages 6105–6114. PMLR, 2019. Kuan Wang, Zhijian Liu, Yujun Lin, Ji Lin, and Song Han. Haq: Hardware-aware automated quantization with mixed precision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8612–8620, 2019. Naigang Wang, Jungwook Choi, Daniel Brand, Chia-Yu Chen, and Kailash Gopalakrishnan. Training deep neural networks with 8-bit floating point numbers. Advances in neural information processing systems, 31, 2018. Bichen Wu, Yanghan Wang, Peizhao Zhang, Yuandong Tian, Peter Vajda, and Kurt Keutzer. Mixed precision quantization of convnets via differentiable neural architecture search. arXiv preprint arXiv:1812.00090, 2018. Huanrui Yang, Lin Duan, Yiran Chen, and Hai Li. Bsq: Exploring bit-level sparsity for mixed-precision neural network quantization. arXiv preprint arXiv:2102.10462, 2021. Dongqing Zhang, Jiaolong Yang, Dongqiangzi Ye, and Gang Hua. Lq-nets: Learned quantization for highly accurate and compact deep neural networks. In Proceedings of the European conference on computer vision (ECCV), pages 365–382, 2018. Shuchang Zhou, Yuxin Wu, Zekun Ni, Xinyu Zhou, He Wen, and Yuheng Zou. Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160, 2016.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/84784	-
dc.description.abstract	最近幾年，卷積神經網絡在影像辨識、自然語言、物件辨識等多個領域都有得到不錯的成果。然而隨著越來越高的準確率，模型也是越來越龐大、越來越複雜，如果我們想要將這些模型部屬在邊緣裝置等硬體資源缺乏的平台上，勢必要壓縮這些龐大的模型或是加速模型推理時的運算速度。混精度量化(mixed-precision quantization)就是一種有效的模型壓縮方法，它利用各層不同的重要程度給予相應的位寬，位寬越大代表能夠保留更多資訊，各層合適的位寬配置能更大的壓縮模型大小與提高模型準確率。然而在之前的混精度量化方法中，使用搜尋法會耗費大量的時間在訓練上，而且經過激活函數的特徵在之前方法中大都是採用跟權重(weight)一樣的的位寬，或者是直接給定一個位寬做定精度的量化，這代表模型還有很大的優化空間。為解決上述問題我們提出了使用模型訓練時各層的梯度(gradient)大小來做為每層量化敏感度的依據，並根據這敏感度設計了一套訓練方法來進行更有效率的訓練。以及利用ReLU激活函數的特性，給予不同層的激活值不同的位寬，來達到更好的表現。並且我們使用的資訊都是簡單易得的，對訓練過程造成的額外負擔是微乎其微的。在CIFAR-10、Tiny ImageNet以及多種模型的實驗中，可以證明我們的方法確實有比之前的作品有更好的表現。	zh_TW
dc.description.abstract	Recently, convolutional neural networks have shown promising results in image classification, natural language processing, and object detection. With increasing accuracy, the model becomes larger and more complex. It is necessary to compress these huge models or speed up the calculation speed for model inference if we want to deploy these models on platforms with limited hardware resources. The mixed-precision quantization method is one of the effective methods for compressing models. Compared with fixed-precision quantization, mixed-precision quantization can give corresponding bit-widths based on layer importance. Model size can be compressed more and accuracy can be improved with appropriate bit-width configurations of each layer. In the previous mixed-precision quantization methods, the search-based method takes a great deal of time for training. Many prior works give activations the same bit as weights or do fixed-precision quantize on activations. However, the activations will also have an optimal bit-width, which is different from the weight. To solve the above problems, we reveal that the gradients can serve as sensitive indicators of a layer, and we design a training method to get the weights bit-width for each layer based on the sensitivity. By using the characteristics of the ReLU activation function, the activations of different layers are assigned different bit-width to achieve better performance. Additionally, the information we use is readily available, and there is little extra burden on the training process. Our experiments on CIFAR-10 and Tiny ImageNet datasets prove that our method can achieve higher performance than prior works.	en
dc.description.provenance	Made available in DSpace on 2023-03-19T22:25:29Z (GMT). No. of bitstreams: 1 U0001-2207202214550400.pdf: 528575 bytes, checksum: 0bbdbb77a9cbad1be6e18461f0c11c2e (MD5) Previous issue date: 2022	en
dc.description.tableofcontents	誌謝 i 摘要 ii Abstract iii Contents v List of Figures vii List of Tables viii 1 Introduction 1 2 Background 4 2.1 Quantization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 PTQ and QAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 ReLU Activation Function . . . . . . . . . . . . . . . . . . . . . . . 8 3 Related work 9 3.1 Fixed Precision Quantization. . . . . . . . . . . . . . . . . . . . . . 9 3.2 Mixed Precision Quantization. . . . . . . . . . . . . . . . . . . . . . 9 4 Methodology 11 4.1 Weight . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 4.2 Activation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 4.3 GBMTQ Framework . . . . . . . . . . . . . . . . . . . . . . . . . . 14 4.3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 4.3.2 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 4.3.2.1 Training Diagram . . . . . . . . . . . . . . . . . . 14 4.3.2.2 Training Details . . . . . . . . . . . . . . . . . . . 16 5 Experiments 18 5.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 5.2 Overall Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . 18 5.2.1 Comparison to Other Methods . . . . . . . . . . . . . . . . . 18 5.2.2 Implementations . . . . . . . . . . . . . . . . . . . . . . . . 19 5.2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 5.3 Ablation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 5.3.1 AlexNet on CIFAR-10. . . . . . . . . . . . . . . . . . . . . . 20 5.3.2 resnet20 on CIFAR-10. . . . . . . . . . . . . . . . . . . . . . 21 5.3.3 ResNet50 on Tiny ImageNet. . . . . . . . . . . . . . . . . . . 21 6 Conclusion 23 References 24 Appendix A — Supplementary Material 29 A.1 Training Cost Comparison . . . . . . . . . . . . . . . . . . . . . . . 29
dc.language.iso	en
dc.subject	模型壓縮	zh_TW
dc.subject	加速推理	zh_TW
dc.subject	混精度量化	zh_TW
dc.subject	卷積神經網絡	zh_TW
dc.subject	模型量化	zh_TW
dc.subject	inference acceleration	en
dc.subject	convolutional neural network	en
dc.subject	quantization of convolutional neural networks	en
dc.subject	mixed-precision quantization	en
dc.subject	model compression	en
dc.title	基於梯度與零激活值比率的混合精度量化	zh_TW
dc.title	Gradient-Based Mixed-Precision Quantization with Ratios of Zero Activations	en
dc.type	Thesis
dc.date.schoolyear	110-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	黃彥男(Yen-Nun Huang),賴冠廷(Kuan-Ting Lai),戴志華(Chih-Hua Tai)
dc.subject.keyword	卷積神經網絡,模型量化,混精度量化,模型壓縮,加速推理,	zh_TW
dc.subject.keyword	convolutional neural network,quantization of convolutional neural networks,mixed-precision quantization,model compression,inference acceleration,	en
dc.relation.page	29
dc.identifier.doi	10.6342/NTU202201640
dc.rights.note	同意授權(限校園內公開)
dc.date.accepted	2022-09-01
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	電機工程學研究所	zh_TW
dc.date.embargo-lift	2022-09-05	-
顯示於系所單位：	電機工程學系

文件中的檔案：

檔案	大小	格式
U0001-2207202214550400.pdf 授權僅限NTU校內IP使用（校園外請利用VPN校外連線服務）	516.19 kB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。