透過漸進式學習實現基於LoRA的可變壓縮率深度影像壓縮

許興宇; Xingyu Xu

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/94359

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	吳家麟	zh_TW
dc.contributor.advisor	Ja-Ling Wu	en
dc.contributor.author	許興宇	zh_TW
dc.contributor.author	Xingyu Xu	en
dc.date.accessioned	2024-08-15T17:02:11Z	-
dc.date.available	2024-08-16	-
dc.date.copyright	2024-08-15	-
dc.date.issued	2024	-
dc.date.submitted	2024-07-23	-
dc.identifier.citation	William B. Pennebaker and Joan L. Mitchell. JPEG Still Image Data Compression Standard. Kluwer Academic Publishers, Norwell, MA, USA, 1992. Majid Rabbani and Rajan Joshi. An overview of the jpeg 2000 still image compression standard. Signal Processing: Image Communication, 17(1):3–48, 2002. JPEG2000. Fabrice Bellard. Bpg image format. https://bellard.org/bpg, 2014. Benjamin Bross, Ye-Kui Wang, Yan Ye, Shan Liu, Jianle Chen, Gary J. Sullivan, and Jens-Rainer Ohm. Overview of the versatile video coding (vvc) standard and its applications. IEEE Transactions on Circuits and Systems for Video Technology, 31(10):3736–3764, 2021. N. Sloane and Aaron Wyner. Coding theorems for a discrete source with a fidelity criterioninstitute of radio engineers, international convention record, vol. 7, 1959. pages 325–350, 11 2009. George Toderici, Sean M O’Malley, Sung Jin Hwang, Damien Vincent, David Minnen, Shumeet Baluja, Michele Covell, and Rahul Sukthankar. Variable rate image compression with recurrent neural networks. arXiv preprint arXiv:1511.06085, 2015. George Toderici, Damien Vincent, Nick Johnston, Sung Jin Hwang, David Minnen, Joel Shor, and Michele Covell. Full resolution image compression with recurrent neural networks. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 5306–5314, 2017. Yoojin Choi, Mostafa El-Khamy, and Jungwon Lee. Variable rate deep image compression with a conditional autoencoder. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3146–3154, 2019. Zhenhong Sun, Zhiyu Tan, Xiuyu Sun, Fangyi Zhang, Yichen Qian, Dongyang Li, and Hao Li. Interpolation variable rate image compression. In Proceedings of the 29th ACM international conference on multimedia, pages 5574–5582, 2021. Lucas Theis, Wenzhe Shi, Andrew Cunningham, and Ferenc Huszár. Lossy image compression with compressive autoencoders. In International conference on learning representations, 2022. Mohammad Akbari, Jie Liang, Jingning Han, and Chengjie Tu. Learned variable-rate image compression with residual divisive normalization. 2020 IEEE International Conference on Multimedia and Expo (ICME), pages 1–6, 2019. Ze Cui, Jing Wang, Shang Gao, Tiansheng Guo, Yihui Feng, and B. Bai. Asymmetric gained deep image compression with continuous rate adaptation. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10527–10536, 2020. Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021. David Minnen, George Toderici, Michele Covell, Troy Chinen, Nick Johnston, Joel Shor, Sung Jin Hwang, Damien Vincent, and Saurabh Singh. Spatially adaptive image compression using a tiled deep network. In 2017 IEEE International Conference on Image Processing (ICIP), pages 2796–2800. IEEE, 2017. Johannes Ballé, Valero Laparra, and Eero P Simoncelli. End-to-end optimized image compression. arXiv preprint arXiv:1611.01704, 2016. Mu Li, Wangmeng Zuo, Shuhang Gu, Debin Zhao, and David Zhang. Learning convolutional networks for content-weighted image compression. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3214–3223, 2018. Johannes Ballé, David Minnen, Saurabh Singh, Sung Jin Hwang, and Nick Johnston. Variational image compression with a scale hyperprior. arXiv preprint arXiv:1802.01436, 2018. Fabian Mentzer, Eirikur Agustsson, Michael Tschannen, Radu Timofte, and Luc Van Gool. Conditional probability models for deep image compression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4394–4402, 2018. David Minnen, Johannes Ballé, and George D Toderici. Joint autoregressive and hierarchical priors for learned image compression. Advances in neural information processing systems, 31, 2018. Jooyoung Lee, Seunghyun Cho, and Seung-Kwon Beack. Context-adaptive entropy model for end-to-end optimized image compression. arXiv preprint arXiv:1809.10452, 2018. Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. Non-local neural networks. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7794–7803, 2018. Haojie Liu, Tong Chen, Peiyao Guo, Qiu Shen, Xun Cao, Yao Wang, and Zhan Ma. Non-local attention optimized deep image compression. arXiv preprint arXiv:1904.09757, 2019. Zhengxue Cheng, Heming Sun, Masaru Takeuchi, and Jiro Katto. Learned image compression with discretized gaussian mixture likelihoods and attention modules. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7939–7948, 2020. Renjie Zou, Chunfeng Song, and Zhaoxiang Zhang. The devil is in the details: Window-based attention for image compression. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 17492–17501, 2022. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. Attention is all you need. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020. Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to-end object detection with transformers. In Computer Vision –ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I, page 213–229, Berlin, Heidelberg, 2020. Springer-Verlag. Hanting Chen, Yunhe Wang, Tianyu Guo, Chang Xu, Yiping Deng, Zhenhua Liu, Si-wei Ma, Chunjing Xu, Chao Xu, and Wen Gao. Pre-trained image processing transformer. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12294–12305, 2020. Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021. David Minnen and Saurabh Singh. Channel-wise autoregressive entropy models for learned image compression. In 2020 IEEE International Conference on Image Processing (ICIP), pages 3339–3343. IEEE, 2020. Dailan He, Yaoyan Zheng, Baocheng Sun, Yan Wang, and Hongwei Qin. Checkerboard context model for efficient learned image compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14771–14780, 2021. Myungseo Song, Jinyoung Choi, and Bohyung Han. Variable-rate deep image compression through spatially-adaptive feature transform. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2380–2389, 2021. Shiyu Qin, Yimin Zhou, Jinpeng Wang, Bin Chen, Baoyi An, Tao Dai, and Shu-Tao Xia. Progressive learning with visual prompt tuning for variable-rate image compression. arXiv preprint arXiv:2311.13846, 2023. Jean Bégaint, Fabien Racap’e, Simon Feltman, and Akshay Pushparaja. Compressai: a pytorch library and evaluation platform for end-to-end compression research. ArXiv, abs/2011.03029, 2020. Ivan Krasin, Tom Duerig, Neil Alldrin, Vittorio Ferrari, Sami Abu-El-Haija, Alina Kuznetsova, Hassan Rom, Jasper Uijlings, Stefan Popov, Andreas Veit, et al. Open-images: A public dataset for large-scale multi-label and multi-class image classification. Dataset available from https://github. com/openimages, 2(3):18, 2017. Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. CoRR, abs/1412.6980, 2014. Rich Franzen. Kodak lossless true color image suite. source: http://r0k.us/graphics/kodak, 4(2):9, 1999. George Toderici, Lucas Theis, Nick Johnston, Eirikur Agustsson, Fabian Mentzer, Johannes Ballé, Wenzhe Shi, and Radu Timofte. Clic 2020: Challenge on learned image compression, 2020, 2020. Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255, 2009. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/94359	-
dc.description.abstract	在數位時代，影像壓縮在眾多領域扮演著關鍵的角色，從網路媒體到串流服務，再到高解析度醫學影像和車聯網等，都有助於實現資料的有效儲存和傳輸。隨著對高品質圖像通訊的需求不斷增加，對先進壓縮技術的需求變得日益迫切。近年來，已提出了一些學習型影像壓縮方法，並在傳統標準下取得了令人信服的成果。然而，可變率影像壓縮仍然是一個待解決的問題。一些學習型影像壓縮方法利用多個網路實現不同壓縮率，而其他方法則使用單一模型，但這可能會增加計算複雜度並降低性能。在本文中，我們透過漸進式學習實現了一種基於參數高效微調方法，Low-Rank Adaptation（LoRA），的可變壓縮率影像壓縮方法。由於LoRA 的參數化合併，我們所提出的方法在推論時並不會增加任何的計算複雜度，並且在完整的實驗中表明，與基於多個模型的方法相比，該方法在性能相近的狀況下，在參數量上減少百之九十九，在數據集上減少百分之九十，在訓練步驟上減少百分之九十七。	zh_TW
dc.description.abstract	In the digital age, image compression is crucial for numerous applications, including web media, streaming services, high-resolution medical imaging, and connected vehicle networks, enabling efficient data storage and transmission. With the increasing demand for high-quality image communication, the need for advanced compression techniques becomes increasingly critical. Numerous learned image compression techniques have recently been introduced, showing impressive performance compared to traditional standards. However, variable rate image compression remains an unresolved issue. Specific learned image compression methods deploy multiple networks to attain different compression rates, whereas others use a single model, which often results in higher computational complexity and reduced performance. In this thesis, we propose a progressive learning approach for variable rate image compression based on the parameter-efficient fine-tuning method, the Low-Rank Adaptation. Due to the re-parameterized merging of Low-Rank Adaptation, our proposed method does not introduce additional computational complexity during inference. Compared to methods utilizing multiple models, comprehensive experiments demonstrate that our approach achieves similar performance, saving 99% in parameter storage, 90% in datasets, and 97% in training steps.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-08-15T17:02:11Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2024-08-15T17:02:11Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	Verification Letter from the Oral Examination Committee i Acknowledgements iii 摘要 v Abstract vii Contents ix List of Figures xi List of Tables xiii Chapter 1 Introduction 1 Chapter 2 Related Work 5 2.1 Deep Image Compression ............................... 5 2.2 Variable Rate Deep Image Compression ................. 7 2.3 Low-Rank Adaptation .................................. 18 Chapter 3 Proposed Method 9 3.1 Overview ............................................. 9 3.2 LORA Rate-Adaptive Module ............................ 12 3.3 Window Attention CNN with LoRA ....................... 13 3.4 Swin-Transformer Block with LoRA ..................... 14 3.5 Layer Selection ...................................... 15 3.6 Variable Rate-distortion Loss ........................ 16 Chapter 4 Experiments 19 4.1 Experimental Setup ................................... 19 4.2 Efficiency Comparison ................................ 20 4.3 Rate-Distortion Performance .......................... 21 4.4 Classification Task Influence ........................ 22 4.5 Different Rank r for LoRA ............................ 24 Chapter 5 Conclusion 25 References 27	-
dc.language.iso	en	-
dc.title	透過漸進式學習實現基於LoRA的可變壓縮率深度影像壓縮	zh_TW
dc.title	Variable-Rate Deep Image Compression based on Low-Rank Adaptation by Progressive Learning	en
dc.type	Thesis	-
dc.date.schoolyear	112-2	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	陳文進;許永真;胡敏君;陳駿丞	zh_TW
dc.contributor.oralexamcommittee	Wen-Chin Chen;Yung-Jen Hsu;Min-Chun Hu;Jun-Cheng Chen	en
dc.subject.keyword	深度影像壓縮,可變率影像壓縮,低秩適應,漸進式學習,	zh_TW
dc.subject.keyword	Deep image compression,Variable rate image compression,Low rank adaptation,Progressive learning,	en
dc.relation.page	32	-
dc.identifier.doi	10.6342/NTU202402012	-
dc.rights.note	同意授權(限校園內公開)	-
dc.date.accepted	2024-07-23	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	資訊網路與多媒體研究所	-
dc.date.embargo-lift	2029-07-21	-
顯示於系所單位：	資訊網路與多媒體研究所

文件中的檔案：

檔案	大小	格式
ntu-112-2.pdf 目前未授權公開取用	1.16 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。