請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/95676完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 郭斯彥 | zh_TW |
| dc.contributor.advisor | Sy-Yen Kuo | en |
| dc.contributor.author | 蔣沅均 | zh_TW |
| dc.contributor.author | Yuan-Chun Chiang | en |
| dc.date.accessioned | 2024-09-15T16:44:57Z | - |
| dc.date.available | 2024-09-16 | - |
| dc.date.copyright | 2024-09-14 | - |
| dc.date.issued | 2024 | - |
| dc.date.submitted | 2024-08-12 | - |
| dc.identifier.citation | [1] E. Agustsson and R. Timofte. Ntire 2017 challenge on single image super- resolution: Dataset and study. 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 1122–1131, 2017.
[2] S. Anwar and N. Barnes. Real image denoising with feature attention. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pages 3155–3164, 2019. [3] T. Brooks, A. Holynski, and A. A. Efros. Instructpix2pix: Learning to follow im- age editing instructions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18392–18402, 2023. [4] B. Cai, X. Xu, K. Jia, C. Qing, and D. Tao. Dehazenet: An end-to-end system for single image haze removal. IEEE Transactions on Image Processing, 25:5187–5198, 2016. [5] L. Chen, X. Chu, X. Zhang, and J. Sun. Simple baselines for image restoration. In European Conference on Computer Vision, 2022. [6] L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam. Encoder-decoder with atrous separable convolution for semantic image segmentation. In European Conference on Computer Vision, 2018. 31 doi:10.6342/NTU202404102 [7] W.-T. Chen, J.-J. Ding, and S.-Y. Kuo. Pms-net: Robust haze removal based on patch map for single images. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11673–11681, 2019. [8] W.-T. Chen, Z.-K. Huang, C.-C. Tsai, H.-H. Yang, J. Ding, and S.-Y. Kuo. Learn- ing multiple adverse weather removal via two-stage knowledge learning and multi- contrastive regularization: Toward a unified model. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 17632–17641, 2022. [9] X. Chen, H. Li, M. Li, and J. shan Pan. Learning a sparse transformer network for effective image deraining. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5896–5905, 2023. [10] S.-J. Cho, S. Ji, J.-P. Hong, S. Jung, and S.-J. Ko. Rethinking coarse-to-fine approach in single image deblurring. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 4621–4630, 2021. [11] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele. The cityscapes dataset for semantic urban scene understand- ing. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3213–3223, 2016. [12] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large- scale hierarchical image database. 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255, 2009. [13] P. Dhariwal and A. Nichol. Diffusion models beat gans on image synthesis. Advances in neural information processing systems, 34:8780–8794, 2021. 32 doi:10.6342/NTU202404102 [14] S. Diamond, V. Sitzmann, S. P. Boyd, G. Wetzstein, and F. Heide. Dirty pix- els: Optimizing image classification architectures for raw sensor data. ArXiv, abs/ 1701.06487, 2017. [15] C. Dong, C. C. Loy, K. He, and X. Tang. Image super-resolution using deep convo- lutional networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38:295–307, 2014. [16] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. ArXiv, abs/2010.11929, 2020. [17] R. Geirhos, C. R. Temme, J. Rauber, H. H. Schütt, M. Bethge, and F. A. Wichmann. Generalisation in humans and deep neural networks. Advances in neural information processing systems, 31, 2018. [18] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. Advances in neural information processing systems, 27, 2014. [19] S. Harary, E. Schwartz, A. Arbelle, P. W. J. Staar, S. A. Hussein, E. Amrani, R. Herzig, A. Alfassy, R. Giryes, H. Kuehne, D. Katabi, K. Saenko, R. Feris, and L. Karlinsky. Unsupervised domain generalization by learning a bridge across do- mains. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5270–5280, 2021. [20] K. He, J. Sun, and X. J. Tang. Single image haze removal using dark channel prior. 33 doi:10.6342/NTU202404102 IEEE Transactions on Pattern Analysis and Machine Intelligence, 33:2341–2353, 2011. [21] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2015. [22] D. Hendrycks and T. Dietterich. Benchmarking neural network robustness to com- mon corruptions and perturbations. Proceedings of the International Conference on Learning Representations, 2019. [23] D. Hendrycks and K. Gimpel. Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415, 2016. [24] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Neural Information Processing Systems, 2017. [25] J. Ho, A. Jain, and P. Abbeel. Denoising diffusion probabilistic models. ArXiv, abs/ 2006.11239, 2020. [26] A. Horé and D. Ziou. Image quality metrics: Psnr vs. ssim. 2010 20th International Conference on Pattern Recognition, pages 2366–2369, 2010. [27] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros. Image-to-image translation with condi- tional adversarial networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 5967–5976, 2016. [28] Z. Jiang, C. Mao, Z. Huang, A. Ma, Y. Lv, Y. Shen, D. Zhao, and J. Zhou. Res- 34 doi:10.6342/NTU202404102 tuning: A flexible and efficient tuning paradigm via unbinding tuner from backbone. Advances in Neural Information Processing Systems, 36, 2024. [29] Z. Jiang, C. Mao, Y. Pan, Z. Han, and J. Zhang. Scedit: Efficient and control- lable image diffusion generation via skip connection editing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8995– 9004, 2024. [30] D. D. Kalamkar, D. Mudigere, N. Mellempudi, D. Das, K. Banerjee, S. Avancha, D. T. Vooturi, N. Jammalamadaka, J. Huang, H. Yuen, J. Yang, J. Park, A. Heinecke, E. Georganas, S. M. Srinivasan, A. Kundu, M. Smelyanskiy, B. Kaul, and P. K. Dubey. A study of bfloat16 for deep learning training. ArXiv, abs/1905.12322, 2019. [31] T. Karras, T. Aila, S. Laine, and J. Lehtinen. Progressive growing of gans for im- proved quality, stability, and variation. ArXiv, abs/1710.10196, 2017. [32] T. Karras, S. Laine, and T. Aila. A style-based generator architecture for generative adversarial networks. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4396–4405, 2018. [33] S. Lee, T. Son, and S. Kwak. Fifo: Learning fog-invariant features for foggy scene segmentation. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 18889–18899, 2022. [34] B. Li, X. Liu, P. Hu, Z. Wu, J. Lv, and X. Peng. All-In-One Image Restoration for Unknown Corruption. In IEEE Conference on Computer Vision and Pattern Recognition, New Orleans, LA, June 2022. 35 doi:10.6342/NTU202404102 [35] R. Li, L. F. Cheong, and R. T. Tan. Heavy rain image restoration: Integrating physics model and conditional adversarial learning. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1633–1642, 2019. [36] R. Li, R. T. Tan, and L. F. Cheong. All in one bad weather removal using architectural search. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 3172–3182, 2020. [37] S. Li, Y. Zhao, R. Varma, O. Salpekar, P. Noordhuis, T. Li, A. Paszke, J. Smith, B. Vaughan, P. Damania, et al. Pytorch distributed: Experiences on accelerating data parallel training. arXiv preprint arXiv:2006.15704, 2020. [38] X. Li, J. Wu, Z. Lin, H. Liu, and H. Zha. Recurrent squeeze-and-excitation context aggregation net for single image deraining. ArXiv, abs/1807.05698, 2018. [39] J. Liang, J. Cao, G. Sun, K. Zhang, L. V. Gool, and R. Timofte. Swinir: Image restoration using swin transformer. 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), pages 1833–1844, 2021. [40] B. Lim, S. Son, H. Kim, S. Nah, and K. M. Lee. Enhanced deep residual networks for single image super-resolution. 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 1132–1140, 2017. [41] X. Lin, J. He, Z. Chen, Z. Lyu, B. Fei, B. Dai, W. Ouyang, Y. Qiao, and C. Dong. Diffbir: Towards blind image restoration with generative diffusion prior. arXiv preprint arXiv:2308.15070, 2023. [42] X. Y. Lin, J. He, Z.-Y. Chen, Z. Lyu, B. Fei, B. Dai, W. Ouyang, Y. Qiao, and C. Dong. Diffbir: Towards blind image restoration with generative diffusion prior. ArXiv, abs/ 2308.15070, 2023. 36 doi:10.6342/NTU202404102 [43] D. Liu, B. Wen, X. Liu, and T. S. Huang. When image denoising meets high-level vision tasks: A deep learning approach. ArXiv, abs/1706.04284, 2017. [44] W. Liu, G. Ren, R. Yu, S. Guo, J. Zhu, and L. Zhang. Image-adaptive yolo for object detection in adverse weather conditions. In AAAI Conference on Artificial Intelligence, 2021. [45] Z. Liu, H. Hu, Y. Lin, Z. Yao, Z. Xie, Y. Wei, J. Ning, Y. Cao, Z. Zhang, L. Dong, F. Wei, and B. Guo. Swin transformer v2: Scaling up capacity and resolution. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11999–12009, 2021. [46] I. Loshchilov and F. Hutter. Decoupled weight decay regularization. In International Conference on Learning Representations, 2017. [47] B. Lu, J.-C. Chen, and R. Chellappa. Unsupervised domain-specific deblurring via disentangled representations. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10217–10226, 2019. [48] Z. Luo, F. K. Gustafsson, Z. Zhao, J. Sjölund, and T. B. Schön. Controlling vision- language models for multi-task image restoration. In The Twelfth International Conference on Learning Representations, 2023. [49] F. Lv, J. Liang, S. Li, B. Zang, C. H. Liu, Z. Wang, and D. Liu. Causality inspired representation learning for domain generalization. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8036–8046, 2022. [50] X. Mao, G. Qi, Y. Chen, X. Li, R. Duan, S. Ye, Y. He, and H. Xue. Towards robust vision transformer. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12032–12041, 2021. 37 doi:10.6342/NTU202404102 [51] M. Mirza and S. Osindero. Conditional generative adversarial nets. ArXiv, abs/ 1411.1784, 2014. [52] C. Mou, X. Wang, L. Xie, Y. Wu, J. Zhang, Z. Qi, and Y. Shan. T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 4296–4304, 2024. [53] V. Nekrasov, C. Shen, and I. D. Reid. Light-weight refinenet for real-time semantic segmentation. ArXiv, abs/1810.03272, 2018. [54] X. Pan, X. Zhan, B. Dai, D. Lin, C. C. Loy, and P. Luo. Exploiting deep generative prior for versatile image restoration and manipulation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44:7474–7489, 2020. [55] Y. Pei, Y. Huang, Q. Zou, Y. Lu, and S. Wang. Does haze removal help cnn-based image classification? In European Conference on Computer Vision, 2018. [56] P. Pernias, D. Rampas, M. L. Richter, C. Pal, and M. Aubreville. Würstchen: An efficient architecture for large-scale text-to-image diffusion models. In The Twelfth International Conference on Learning Representations, 2024. [57] V. Potlapalli, S. W. Zamir, S. S. Khan, and F. S. Khan. Promptir: Prompting for all-in-one blind image restoration. ArXiv, abs/2306.13090, 2023. [58] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer. High-resolution image synthesis with latent diffusion models. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10674–10685, 2021. [59] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer. High-resolution 38 doi:10.6342/NTU202404102 image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022. [60] O. Ronneberger, P. Fischer, and T. Brox. U-net: Convolutional networks for biomed- ical image segmentation. ArXiv, abs/1505.04597, 2015. [61] C. Sakaridis, D. Dai, and L. Van Gool. ACDC: The adverse conditions dataset with correspondences for semantic driving scene understanding. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2021. [62] A. Sauer, D. Lorenz, A. Blattmann, and R. Rombach. Adversarial diffusion distilla- tion. ArXiv, abs/2311.17042, 2023. [63] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. CoRR, abs/1409.1556, 2014. [64] L. N. Smith and N. Topin. Super-convergence: very fast training of neural networks using large learning rates. In Defense + Commercial Sensing, 2018. [65] J. Sohl-Dickstein, E. Weiss, N. Maheswaranathan, and S. Ganguli. Deep unsuper- vised learning using nonequilibrium thermodynamics. In International conference on machine learning, pages 2256–2265. PMLR, 2015. [66] T. Son, J. Kang, N. Y. Kim, S. Cho, and S. Kwak. Urie: Universal image enhance- ment for visual recognition in the wild. ArXiv, abs/2007.08979, 2020. [67] Y. Song, Z. He, H. Qian, and X. Du. Vision transformers for single image dehazing. IEEE Transactions on Image Processing, 32:1927–1941, 2022. [68] X. Tao, H. Gao, Y. Wang, X. Shen, J. Wang, and J. Jia. Scale-recurrent network 39 doi:10.6342/NTU202404102 for deep image deblurring. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8174–8182, 2018. [69] R. Timofte, E. Agustsson, L. V. Gool, M.-H. Yang, L. Zhang, B. Lim, S. Son, H. Kim, S. Nah, K. M. Lee, X. Wang, Y. Tian, K. Yu, Y. Zhang, S. Wu, C. Dong, L. Lin, Y. Qiao, C. C. Loy, W. Bae, J. J. Yoo, Y. Han, J. C. Ye, J.-S. Choi, M. Kim, Y. Fan, J. Yu, W. Han, D. Liu, H. Yu, Z. Wang, H. Shi, X. Wang, T. S. Huang, Y. Chen, K. Zhang, W. Zuo, Z. Tang, L. Luo, S. Li, M. Fu, L. Cao, W. Heng, G. Bui, T. Le, Y. Duan, D. Tao, R. Wang, X. Lin, J. Pang, J. Xu, Y. Zhao, X. Xu, J. shan Pan, D. Sun, Y. Zhang, X. Song, Y. Dai, X. Qin, X.-P. Huynh, T. Guo, H. S. Mousavi, T. H. Vu, V. Monga, C. Cruz, K. O. Egiazarian, V. Katkovnik, R. Mehta, A. K. Jain, A. Agarwalla, C. Praveen, R. Zhou, H. Wen, C. Zhu, Z. Xia, Z. Wang, and Q. Guo. Ntire 2017 challenge on single image super-resolution: Methods and results. 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 1110–1121, 2017. [70] J. M. J. Valanarasu, R. Yasarla, and V. M. Patel. Transweather: Transformer-based restoration of images degraded by adverse weather conditions. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2343– 2353, 2021. [71] J. Wang, Z. Yue, S. Zhou, K. C. K. Chan, and C. C. Loy. Exploiting diffusion prior for real-world image super-resolution. ArXiv, abs/2305.07015, 2023. [72] T. Wang, X. Yang, K. Xu, S. Chen, Q. Zhang, and R. W. H. Lau. Spatial atten- tive single-image deraining with a high quality real rain dataset. 2019 IEEE/CVF 40 doi:10.6342/NTU202404102 Conference on Computer Vision and Pattern Recognition (CVPR), pages 12262– 12271, 2019. [73] X. Wang, K. Yu, C. Dong, and C. C. Loy. Recovering realistic texture in image super-resolution by deep spatial feature transform. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 606–615, 2018. [74] Y. Wang, Y. Cao, Z. Zha, J. Zhang, and Z. Xiong. Deep degradation prior for low- quality image classification. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11046–11055, 2020. [75] Z. Wang, X. Cun, J. Bao, and J. Liu. Uformer: A general u-shaped transformer for image restoration. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 17662–17672, 2021. [76] B. Xia, Y. Zhang, S. Wang, Y. Wang, X. Wu, Y. Tian, W. Yang, and L. V. Gool. Dif- fir: Efficient diffusion model for image restoration. 2023 IEEE/CVF International Conference on Computer Vision (ICCV), pages 13049–13059, 2023. [77] Z. Yang, J. Huang, J. Chang, M. Zhou, H. Yu, J. Zhang, and F. Zhao. Vi- sual recognition-driven image restoration for multiple degradation with intrinsic se- mantics recovery. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 14059–14070, 2023. [78] S. W. Zamir, A. Arora, S. Khan, M. Hayat, F. S. Khan, M.-H. Yang, and L. Shao. Multi-stage progressive image restoration. In CVPR, 2021. [79] S. W. Zamir, A. Arora, S. H. Khan, M. Hayat, F. S. Khan, and M.-H. Yang. Restormer: Efficient transformer for high-resolution image restoration. 2022 41 doi:10.6342/NTU202404102 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5718–5729, 2021. [80] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang. Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising. IEEE Transactions on Image Processing, 26:3142–3155, 2016. [81] K. Zhang, W. Zuo, and L. Zhang. Ffdnet: Toward a fast and flexible solution for cnn- based image denoising. IEEE Transactions on Image Processing, 27:4608–4622, 2017. [82] L. Zhang, A. Rao, and M. Agrawala. Adding conditional control to text-to-image diffusion models. 2023 IEEE/CVF International Conference on Computer Vision (ICCV), pages 3813–3824, 2023. [83] R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang. The unreasonable effectiveness of deep features as a perceptual metric. In CVPR, 2018. [84] Y. Zheng, J. Su, S. Zhang, M. Tao, and L. Wang. Dehaze-aggan: Unpaired remote sensing image dehazing using enhanced attention-guide generative adversarial net- works. IEEE Transactions on Geoscience and Remote Sensing, 60:1–13, 2022. [85] S. Zhou, K. Chan, C. Li, and C. C. Loy. Towards robust blind face restoration with codebook lookup transformer. Advances in Neural Information Processing Systems, 35:30599–30611, 2022. | - |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/95676 | - |
| dc.description.abstract | 近年來,生成式擴散模型因其優越的分布擬合能力,能夠學習自然影像的分布並生成真實且富有細節的高品質影像,因此在電腦視覺領域引起了廣泛關注。許多影像增強模型也成功利用生成式模型將低品質影像還原為更清晰真實的高品質影像。然而,由於深度學習模型在眾多電腦視覺任務中取得了巨大成功並逐漸應用於實際場景,影像增強模型不僅需提升視覺品質,還需關注下游任務在面對受損影像時的魯棒性。現有影像增強模型通常僅關注人類感知品質或機器識別品質中的單一方面,需要在兩者間進行取捨,難以在能夠整合兩者能力的整合模型中同時達到最佳效果。為解決此問題,本論文提出了一種通用的多目的影像增強模型,該模型利用預訓練生成式擴散模型的先驗知識與多任務學習,同時提升人類感知品質與機器識別品質。首先利用我們提出的適應性特徵還原模塊,將不同劣化類型的低品質影像特徵還原為高品質特徵,再近一步通過在大規模高品質影像資料集上完成預訓練的生成式擴散模型,還原出富含高品質自然影像語義資訊的特徵,並作為影像增強的先驗,最後使用多專家模組針對人類感知品質或機器識別品質分別訓練不同的專家,使模型能夠有效利用這些先驗知識以同時生成利於人類感知與機器識別的高品質影像。我們提出的通用架構能夠處理多種劣化類型的低品質影像,並針對不同任務需求生成具有不同特性的高品質影像。實驗結果表明,與現有影像還原與增強模型相比,我們的方法在面向人類感知的影像恢復、面向分類的影像增強以及面向語義分割的影像增強三項任務中,能顯著改善人類感知質量和機器識別質量,為實際影像增強任務提供了更通用的解決方案。 | zh_TW |
| dc.description.abstract | In recent years, generative diffusion models have garnered widespread attention in the field of computer vision due to their superior distribution fitting capabilities, allowing them to learn the distribution of natural images and generate high-quality images that are realistic and rich in detail. Many image enhancement models have successfully utilized generative diffusion models to restore low-quality images to clearer, more realistic high-quality versions. However, as deep learning models have achieved significant success across various computer vision tasks and have gradually been applied to practical scenarios, image enhancement models need not only to improve visual quality but also to enhance the robustness of downstream tasks when dealing with degraded images. Existing image enhancement models typically focus on either human perceptual quality or machine recognition quality, necessitating a trade-off between the two, and it is challenging to achieve optimal performance in a unified framework that integrates both capabilities. To tackle this challenge, we introduce a universal multi-objective image enhancement model that leverages the prior knowledge of pre-trained generative diffusion models to simultaneously improve human perceptual quality and machine recognition quality. Firstly, our proposed adaptive feature restoration module restores the features of low-quality images degraded by various types of corruption to high-quality features. Furthermore, by utilizing the generative diffusion model pre-trained on large-scale, high-quality image datasets, we can restore features rich in high-quality natural image semantics, serving as priors for task-specific image enhancement. Finally, we employ a multi-expert module to train different experts for human perceptual quality or machine recognition quality, enabling the model to effectively utilize these priors to generate high-quality images beneficial for both human perception and machine recognition. Our proposed universal framework can handle various types of degraded low-quality images and generate high-quality images with different characteristics tailored to different task requirements. Experimental results demonstrate that, compared to existing image restoration and enhancement models, our method significantly improves both human perceptual quality and machine recognition quality across three tasks: image restoration for human perception, image enhancement for classification, and image enhancement for semantic segmentation. Our approach provides a more general solution for practical image enhancement tasks. | en |
| dc.description.provenance | Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-09-15T16:44:57Z No. of bitstreams: 0 | en |
| dc.description.provenance | Made available in DSpace on 2024-09-15T16:44:57Z (GMT). No. of bitstreams: 0 | en |
| dc.description.tableofcontents | Verification Letter from the Oral Examination Committee i
Acknowledgements ii 摘要 iii Abstract v Contents vii List of Figures ix List of Tables x Chapter 1 Introduction 1 Chapter 2 Related Work 4 2.1 Image Restoration . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 Visual Recognition in the wild . . . . . . . . . . . . . . . . . . . . . 6 2.3 Generative Model Prior . . . . . . . . . . . . . . . . . . . . . . . . . 7 Chapter 3 Method 10 3.1 Latent Restoration . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.2 Task-Specific Enhancement . . . . . . . . . . . . . . . . . . . . . . 14 Chapter 4 Experiments 17 4.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 4.1.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 4.1.2 Training Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.2 Quantitative Comparisons . . . . . . . . . . . . . . . . . . . . . . . 19 4.2.1 Image Restoration . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.2.2 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.2.3 Semantic Segmentation . . . . . . . . . . . . . . . . . . . . . . . . 21 4.3 Qualitative Comparisons . . . . . . . . . . . . . . . . . . . . . . . . 22 4.4 Ablation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Chapter 5 Conclusion 26 References 28 | - |
| dc.language.iso | en | - |
| dc.subject | 機器視覺 | zh_TW |
| dc.subject | 擴散式模型 | zh_TW |
| dc.subject | 影像增強 | zh_TW |
| dc.subject | Diffusion Model | en |
| dc.subject | Image Enhancement | en |
| dc.subject | Machine Vision | en |
| dc.title | 基於擴散模型的感知及機器視覺通用影像增強 | zh_TW |
| dc.title | Universal Image Enhancement for Perception and Machine Vision Based on Diffusion Model | en |
| dc.type | Thesis | - |
| dc.date.schoolyear | 112-2 | - |
| dc.description.degree | 碩士 | - |
| dc.contributor.oralexamcommittee | 雷欽隆;顏嗣鈞;游家牧;陳英一 | zh_TW |
| dc.contributor.oralexamcommittee | Chin-Laung Lei;Hsu-chun Yen;Chia-Mu Yu;Ying-i Chen | en |
| dc.subject.keyword | 影像增強,機器視覺,擴散式模型, | zh_TW |
| dc.subject.keyword | Image Enhancement,Machine Vision,Diffusion Model, | en |
| dc.relation.page | 39 | - |
| dc.identifier.doi | 10.6342/NTU202404102 | - |
| dc.rights.note | 未授權 | - |
| dc.date.accepted | 2024-08-13 | - |
| dc.contributor.author-college | 電機資訊學院 | - |
| dc.contributor.author-dept | 電機工程學系 | - |
| 顯示於系所單位: | 電機工程學系 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-112-2.pdf 未授權公開取用 | 22.11 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
