利用基於擴散的圖像編輯模型增強物件偵測的領域自適應能力

黃振哲; Chen-Che Huang

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/97135

標題:	利用基於擴散的圖像編輯模型增強物件偵測的領域自適應能力 Enhancing Domain Adaptive Object Detection with Diffusion-based Image Editing Model
作者:	黃振哲 Chen-Che Huang
指導教授:	陳駿丞 Jun-Cheng Chen
共同指導教授:	陳祝嵩 Chu-Song Chen
關鍵字:	領域自適應物件偵測,擴散的圖像編輯模型,可學習提示,感知損失, domain adaptive object detection,sion-based image editing model,learnable prompt,perceptual loss,
出版年 :	2025
學位:	碩士
摘要:	領域自適應物件偵測致力於減輕當在有標註的源域上訓練的檢測器被應用於無標註的目標域時發生模型性能退化之問題。近期的方法採用了教師-學生框架來克服領域間的差距問題。為了緩解因領域差異導致教師模型生成品質較低的偽標籤，可利用現成基於擴散的圖像編輯模型，通過手動定義的指令將源圖像編輯成類似目標域的圖像，這些類似目標域的圖像隨後可以用於監督訓練。然而，類似目標域的圖像風格可能與目標域的風格不完全相符，導致監督學習的改進效果不佳。我們研究出可利用結合可學習提示和感知相似度來更好地捕捉目標域的風格。此外，通過裁剪類似目標域圖像中的物體並用以增強目標域圖像，來降低偽標籤的偽陽性率。實驗結果證明，我們提出的方法相較於基線模型有明顯提升，並超過既有的方法。舉例來說，在 Cityscapes 到 Foggy Cityscapes 的場景中，我們在 Foggy Cityscapes 上達到 53.2% mAP，超過之前的最先進方法所達到的 52.5% mAP。 Domain adaptive object detection seeks to minimize performance degradation when a detector trained on a labeled source domain is applied to an unlabeled target domain. Recent methods employ a teacher-student framework to address the domain gap issue. To mitigate the issue of low-quality pseudo-labels produced by a teacher model due to the domain discrepancy, an off-the-shelf, diffusion-based image editing model can be utilized to edit source images and synthesize target-like images with manually defined instructions. These target-like images can then be utilized for supervised training. However, the style of the target-like images may not perfectly match that of the target images, leading to suboptimal improvement in supervised training. In this work, we combine a learnable prompt with perceptual similarity to better capture the target domain style. Furthermore, the false positive ratio of pseudo-labels can be reduced by augmenting the target images with the cropped objects from the target-like images. Experiments demonstrate that our proposed method significantly improves upon the baseline model and outperforms existing methods. For example, we achieve an mAP of 53.2% on Foggy Cityscapes in the Cityscapes to Foggy Cityscapes setting, surpassing the 52.5% mAP attained by the previous state-of-the-art approach.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/97135
DOI:	10.6342/NTU202500631
全文授權:	同意授權(全球公開)
電子全文公開日期:	2025-02-28
顯示於系所單位：	資料科學學位學程

文件中的檔案：

檔案	大小	格式
ntu-113-1.pdf	23.18 MB	Adobe PDF	檢視/開啟

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。