請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/95676| 標題: | 基於擴散模型的感知及機器視覺通用影像增強 Universal Image Enhancement for Perception and Machine Vision Based on Diffusion Model |
| 作者: | 蔣沅均 Yuan-Chun Chiang |
| 指導教授: | 郭斯彥 Sy-Yen Kuo |
| 關鍵字: | 影像增強,機器視覺,擴散式模型, Image Enhancement,Machine Vision,Diffusion Model, |
| 出版年 : | 2024 |
| 學位: | 碩士 |
| 摘要: | 近年來,生成式擴散模型因其優越的分布擬合能力,能夠學習自然影像的分布並生成真實且富有細節的高品質影像,因此在電腦視覺領域引起了廣泛關注。許多影像增強模型也成功利用生成式模型將低品質影像還原為更清晰真實的高品質影像。然而,由於深度學習模型在眾多電腦視覺任務中取得了巨大成功並逐漸應用於實際場景,影像增強模型不僅需提升視覺品質,還需關注下游任務在面對受損影像時的魯棒性。現有影像增強模型通常僅關注人類感知品質或機器識別品質中的單一方面,需要在兩者間進行取捨,難以在能夠整合兩者能力的整合模型中同時達到最佳效果。為解決此問題,本論文提出了一種通用的多目的影像增強模型,該模型利用預訓練生成式擴散模型的先驗知識與多任務學習,同時提升人類感知品質與機器識別品質。首先利用我們提出的適應性特徵還原模塊,將不同劣化類型的低品質影像特徵還原為高品質特徵,再近一步通過在大規模高品質影像資料集上完成預訓練的生成式擴散模型,還原出富含高品質自然影像語義資訊的特徵,並作為影像增強的先驗,最後使用多專家模組針對人類感知品質或機器識別品質分別訓練不同的專家,使模型能夠有效利用這些先驗知識以同時生成利於人類感知與機器識別的高品質影像。我們提出的通用架構能夠處理多種劣化類型的低品質影像,並針對不同任務需求生成具有不同特性的高品質影像。實驗結果表明,與現有影像還原與增強模型相比,我們的方法在面向人類感知的影像恢復、面向分類的影像增強以及面向語義分割的影像增強三項任務中,能顯著改善人類感知質量和機器識別質量,為實際影像增強任務提供了更通用的解決方案。 In recent years, generative diffusion models have garnered widespread attention in the field of computer vision due to their superior distribution fitting capabilities, allowing them to learn the distribution of natural images and generate high-quality images that are realistic and rich in detail. Many image enhancement models have successfully utilized generative diffusion models to restore low-quality images to clearer, more realistic high-quality versions. However, as deep learning models have achieved significant success across various computer vision tasks and have gradually been applied to practical scenarios, image enhancement models need not only to improve visual quality but also to enhance the robustness of downstream tasks when dealing with degraded images. Existing image enhancement models typically focus on either human perceptual quality or machine recognition quality, necessitating a trade-off between the two, and it is challenging to achieve optimal performance in a unified framework that integrates both capabilities. To tackle this challenge, we introduce a universal multi-objective image enhancement model that leverages the prior knowledge of pre-trained generative diffusion models to simultaneously improve human perceptual quality and machine recognition quality. Firstly, our proposed adaptive feature restoration module restores the features of low-quality images degraded by various types of corruption to high-quality features. Furthermore, by utilizing the generative diffusion model pre-trained on large-scale, high-quality image datasets, we can restore features rich in high-quality natural image semantics, serving as priors for task-specific image enhancement. Finally, we employ a multi-expert module to train different experts for human perceptual quality or machine recognition quality, enabling the model to effectively utilize these priors to generate high-quality images beneficial for both human perception and machine recognition. Our proposed universal framework can handle various types of degraded low-quality images and generate high-quality images with different characteristics tailored to different task requirements. Experimental results demonstrate that, compared to existing image restoration and enhancement models, our method significantly improves both human perceptual quality and machine recognition quality across three tasks: image restoration for human perception, image enhancement for classification, and image enhancement for semantic segmentation. Our approach provides a more general solution for practical image enhancement tasks. |
| URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/95676 |
| DOI: | 10.6342/NTU202404102 |
| 全文授權: | 未授權 |
| 顯示於系所單位: | 電機工程學系 |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-112-2.pdf 未授權公開取用 | 22.11 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
