請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/96138| 標題: | 基於擴散模型引導之特徵增強於領域泛化語意分割 Diffusion-Guided Feature Enhancement for Domain Generalized Semantic Segmentation |
| 作者: | 張華恩 Hua-En Chang |
| 指導教授: | 郭斯彥 Sy-Yen Kuo |
| 關鍵字: | 語意分割,領域泛化,擴散模型, Semantic Segmentation,Domain Generalization,Diffusion Model, |
| 出版年 : | 2024 |
| 學位: | 碩士 |
| 摘要: | 近年來,深度神經網路技術的發展有顯著的進步,其中也包含了在電腦視覺領域中的語義分割任務。然而,當應用於域泛化場景時,即在未見過的領域中部署在特定源域上訓練的模型時,會發生明顯的性能下降。最近的研究工作主要集中在通過獲取域不變特徵和採用數據增強技術來增強模型的穩健性。然而,擴散模型在語義分割的域泛化(Domain Generalized Semantic Segmentation, DGSS)中提供有價值的先驗知識的潛力仍然大部分未被探索。本文提出了一個框架,旨在通過估計先驗來增強提取的特徵,與任何檢測網路架構兼容,從而解決 DGSS問題。我們的框架包括三個核心模組:先驗提取網路(PEN)、先驗融合網路(PFN)和擴散模型。具體而言,我們的框架採用了兩階段訓練方法。在初始階段,PEN將增強圖像特徵與其相應的源域對應部分融合,以得出一個先驗向量,而PFN則基於此先驗向量進行特徵融合。隨後,在第二階段,我們訓練擴散模型,僅從增強特徵中估計第一階段獲取的先驗。我們在廣泛使用的城市場景數據集(例如Cityscapes、Mapillary、BDDS、GTAV和SYNTHIA)上驗證了我們的方法對於提升深度學習模型穩健性的效果。 Deep neural networks have demonstrated remarkable advancements in the task of semantic segmentation. However, when applied in domain generalization scenarios, where models trained on a specific source domain are deployed in unseen domains, a substantial performance degradation is observed. Recent research efforts have primarily focused on enhancing model robustness through the acquisition of domain-invariant features and employing data augmentation techniques. Nevertheless, the potential of diffusion models to offer valuable prior knowledge for domain generalization in semantic segmentation (DGSS) remains largely unexplored. In this paper, we present a framework designed to address the DGSS problem by enhancing extracted feature with estimated prior, offering compatibility with any detection network architecture. Our framework comprises three core modules: the Prior Extraction Network (PEN), the Prior Fusion Network (PFN), and a diffusion model. Specifically, our framework adopts a two-stage training approach. In the initial stage, PEN amalgamates augmented image features with their corresponding source domain counterparts to derive a prior vector, while FPN conducts feature fusion based on this prior vector. Subsequently, we train the diffusion model in order to predict the prior information obtained in the first stage solely from augmented features. The effectiveness of our approach in improving the robustness of existing semantic segmentation networks is verified through experiments on urban scene datasets (i.e., Cityscapes, Mapillary, BDDS, GTAV, and SYNTHIA). |
| URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/96138 |
| DOI: | 10.6342/NTU202403250 |
| 全文授權: | 未授權 |
| 顯示於系所單位: | 電機工程學系 |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-113-1.pdf 未授權公開取用 | 4.52 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
