Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 電機工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/84784
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor陳銘憲(Ming-Syan Chen)
dc.contributor.authorKuan-Wei Tsaien
dc.contributor.author蔡冠偉zh_TW
dc.date.accessioned2023-03-19T22:25:29Z-
dc.date.copyright2022-09-05
dc.date.issued2022
dc.date.submitted2022-08-31
dc.identifier.citationRon Banner, Itay Hubara, Elad Hoffer, and Daniel Soudry. Scalable methods for 8-bit training of neural networks. Advances in neural information processing systems, 31, 2018. Jungwook Choi, Zhuo Wang, Swagath Venkataramani, Pierce I-Jen Chuang, Vijayalakshmi Srinivasan, and Kailash Gopalakrishnan. Pact: Parameterized clipping activation for quantized neural networks. arXiv preprint arXiv:1805.06085, 2018. Zhen Dong, Zhewei Yao, Amir Gholami, Michael W Mahoney, and Kurt Keutzer. Hawq: Hessian aware quantization of neural networks with mixed-precision. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 293–302, 2019. Song Han, Jeff Pool, John Tran, and William Dally. Learning both weights and connections for efficient neural network. Advances in neural information processing systems, 28, 2015. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. Geoffrey Hinton, Oriol Vinyals, Jeff Dean, et al. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2(7), 2015. Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv: 1704.04861, 2017. Itay Hubara, Yury Nahshan, Yair Hanani, Ron Banner, and Daniel Soudry. Accurate post training quantization with small calibration sets. In International Conference on Machine Learning, pages 4466–4475. PMLR, 2021. Raghuraman Krishnamoorthi. Quantizing deep convolutional networks for efficient inference: A whitepaper, 2018. Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25, 2012. Ya Le and Xuan Yang. Tiny imagenet visual recognition challenge. CS 231N, 7(7): 3, 2015. Dongsoo Lee, Se Jung Kwon, Byeongwook Kim, Yongkweon Jeon, Baeseong Park, and Jeongin Yun. Flexor: Trainable fractional quantization. Advances in neural information processing systems, 33:1311–1321, 2020. Ji Lin, Yongming Rao, Jiwen Lu, and Jie Zhou. Runtime neural pruning. Advances in neural information processing systems, 30, 2017. Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European conference on computer vision, pages 740–755. Springer, 2014. Julieta Martinez, Jashan Shewakramani, Ting Wei Liu, Ioan Andrei Bârsan, Wenyuan Zeng, and Raquel Urtasun. Permute, quantize, and fine-tune: Efficient compression of neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15699–15708, 2021. Paulius Micikevicius, Sharan Narang, Jonah Alben, Gregory Diamos, Erich Elsen, David Garcia, Boris Ginsburg, Michael Houston, Oleksii Kuchaiev, Ganesh Venkatesh, et al. Mixed precision training. arXiv preprint arXiv:1710.03740, 2017. Seyed Iman Mirzadeh, Mehrdad Farajtabar, Ang Li, Nir Levine, Akihiro Matsukawa, and Hassan Ghasemzadeh. Improved knowledge distillation via teacher assistant. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 5191–5198, 2020. Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. Xnornet: Imagenet classification using binary convolutional neural networks. In European conference on computer vision, pages 525–542. Springer, 2016. Joseph Redmon and Ali Farhadi. Yolo9000: Better, faster, stronger. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 6517–6525, 2017. Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards realtime object detection with region proposal networks. Advances in neural information processing systems, 28, 2015. Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. Imagenet large scale visual recognition challenge. International journal of computer vision, 115(3):211–252, 2015. Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander A Alemi. Inception-v4, inception-resnet and the impact of residual connections on learning. In Thirty-first AAAI conference on artificial intelligence, 2017. Mingxing Tan and Quoc Le. Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning, pages 6105–6114. PMLR, 2019. Kuan Wang, Zhijian Liu, Yujun Lin, Ji Lin, and Song Han. Haq: Hardware-aware automated quantization with mixed precision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8612–8620, 2019. Naigang Wang, Jungwook Choi, Daniel Brand, Chia-Yu Chen, and Kailash Gopalakrishnan. Training deep neural networks with 8-bit floating point numbers. Advances in neural information processing systems, 31, 2018. Bichen Wu, Yanghan Wang, Peizhao Zhang, Yuandong Tian, Peter Vajda, and Kurt Keutzer. Mixed precision quantization of convnets via differentiable neural architecture search. arXiv preprint arXiv:1812.00090, 2018. Huanrui Yang, Lin Duan, Yiran Chen, and Hai Li. Bsq: Exploring bit-level sparsity for mixed-precision neural network quantization. arXiv preprint arXiv:2102.10462, 2021. Dongqing Zhang, Jiaolong Yang, Dongqiangzi Ye, and Gang Hua. Lq-nets: Learned quantization for highly accurate and compact deep neural networks. In Proceedings of the European conference on computer vision (ECCV), pages 365–382, 2018. Shuchang Zhou, Yuxin Wu, Zekun Ni, Xinyu Zhou, He Wen, and Yuheng Zou. Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160, 2016.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/84784-
dc.description.abstract最近幾年,卷積神經網絡在影像辨識、自然語言、物件辨識等多個領域都有得到不錯的成果。然而隨著越來越高的準確率,模型也是越來越龐大、越來越複雜,如果我們想要將這些模型部屬在邊緣裝置等硬體資源缺乏的平台上,勢必要壓縮這些龐大的模型或是加速模型推理時的運算速度。混精度量化(mixed-precision quantization)就是一種有效的模型壓縮方法,它利用各層不同的重要程度給予相應的位寬,位寬越大代表能夠保留更多資訊,各層合適的位寬配置能更大的壓縮模型大小與提高模型準確率。然而在之前的混精度量化方法中,使用搜尋法會耗費大量的時間在訓練上,而且經過激活函數的特徵在之前方法中大都是採用跟權重(weight)一樣的的位寬,或者是直接給定一個位寬做定精度的量化,這代表模型還有很大的優化空間。為解決上述問題我們提出了使用模型訓練時各層的梯度(gradient)大小來做為每層量化敏感度的依據,並根據這敏感度設計了一套訓練方法來進行更有效率的訓練。以及利用ReLU激活函數的特性,給予不同層的激活值不同的位寬,來達到更好的表現。並且我們使用的資訊都是簡單易得的,對訓練過程造成的額外負擔是微乎其微的。在CIFAR-10、Tiny ImageNet以及多種模型的實驗中,可以證明我們的方法確實有比之前的作品有更好的表現。zh_TW
dc.description.abstractRecently, convolutional neural networks have shown promising results in image classification, natural language processing, and object detection. With increasing accuracy, the model becomes larger and more complex. It is necessary to compress these huge models or speed up the calculation speed for model inference if we want to deploy these models on platforms with limited hardware resources. The mixed-precision quantization method is one of the effective methods for compressing models. Compared with fixed-precision quantization, mixed-precision quantization can give corresponding bit-widths based on layer importance. Model size can be compressed more and accuracy can be improved with appropriate bit-width configurations of each layer. In the previous mixed-precision quantization methods, the search-based method takes a great deal of time for training. Many prior works give activations the same bit as weights or do fixed-precision quantize on activations. However, the activations will also have an optimal bit-width, which is different from the weight. To solve the above problems, we reveal that the gradients can serve as sensitive indicators of a layer, and we design a training method to get the weights bit-width for each layer based on the sensitivity. By using the characteristics of the ReLU activation function, the activations of different layers are assigned different bit-width to achieve better performance. Additionally, the information we use is readily available, and there is little extra burden on the training process. Our experiments on CIFAR-10 and Tiny ImageNet datasets prove that our method can achieve higher performance than prior works.en
dc.description.provenanceMade available in DSpace on 2023-03-19T22:25:29Z (GMT). No. of bitstreams: 1
U0001-2207202214550400.pdf: 528575 bytes, checksum: 0bbdbb77a9cbad1be6e18461f0c11c2e (MD5)
Previous issue date: 2022
en
dc.description.tableofcontents誌謝 i 摘要 ii Abstract iii Contents v List of Figures vii List of Tables viii 1 Introduction 1 2 Background 4 2.1 Quantization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 PTQ and QAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 ReLU Activation Function . . . . . . . . . . . . . . . . . . . . . . . 8 3 Related work 9 3.1 Fixed Precision Quantization. . . . . . . . . . . . . . . . . . . . . . 9 3.2 Mixed Precision Quantization. . . . . . . . . . . . . . . . . . . . . . 9 4 Methodology 11 4.1 Weight . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 4.2 Activation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 4.3 GBMTQ Framework . . . . . . . . . . . . . . . . . . . . . . . . . . 14 4.3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 4.3.2 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 4.3.2.1 Training Diagram . . . . . . . . . . . . . . . . . . 14 4.3.2.2 Training Details . . . . . . . . . . . . . . . . . . . 16 5 Experiments 18 5.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 5.2 Overall Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . 18 5.2.1 Comparison to Other Methods . . . . . . . . . . . . . . . . . 18 5.2.2 Implementations . . . . . . . . . . . . . . . . . . . . . . . . 19 5.2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 5.3 Ablation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 5.3.1 AlexNet on CIFAR-10. . . . . . . . . . . . . . . . . . . . . . 20 5.3.2 resnet20 on CIFAR-10. . . . . . . . . . . . . . . . . . . . . . 21 5.3.3 ResNet50 on Tiny ImageNet. . . . . . . . . . . . . . . . . . . 21 6 Conclusion 23 References 24 Appendix A — Supplementary Material 29 A.1 Training Cost Comparison . . . . . . . . . . . . . . . . . . . . . . . 29
dc.language.isoen
dc.subject模型壓縮zh_TW
dc.subject加速推理zh_TW
dc.subject混精度量化zh_TW
dc.subject卷積神經網絡zh_TW
dc.subject模型量化zh_TW
dc.subjectinference accelerationen
dc.subjectconvolutional neural networken
dc.subjectquantization of convolutional neural networksen
dc.subjectmixed-precision quantizationen
dc.subjectmodel compressionen
dc.title基於梯度與零激活值比率的混合精度量化zh_TW
dc.titleGradient-Based Mixed-Precision Quantization with Ratios of Zero Activationsen
dc.typeThesis
dc.date.schoolyear110-2
dc.description.degree碩士
dc.contributor.oralexamcommittee黃彥男(Yen-Nun Huang),賴冠廷(Kuan-Ting Lai),戴志華(Chih-Hua Tai)
dc.subject.keyword卷積神經網絡,模型量化,混精度量化,模型壓縮,加速推理,zh_TW
dc.subject.keywordconvolutional neural network,quantization of convolutional neural networks,mixed-precision quantization,model compression,inference acceleration,en
dc.relation.page29
dc.identifier.doi10.6342/NTU202201640
dc.rights.note同意授權(限校園內公開)
dc.date.accepted2022-09-01
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept電機工程學研究所zh_TW
dc.date.embargo-lift2022-09-05-
顯示於系所單位:電機工程學系

文件中的檔案:
檔案 大小格式 
U0001-2207202214550400.pdf
授權僅限NTU校內IP使用(校園外請利用VPN校外連線服務)
516.19 kBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved