Skip navigation

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More
DSpace logo
English
中文
  • Browse
    • Communities
      & Collections
    • Publication Year
    • Author
    • Title
    • Subject
    • Advisor
  • Search TDR
  • Rights Q&A
    • My Page
    • Receive email
      updates
    • Edit Profile
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資料科學學位學程
Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/97135
Title: 利用基於擴散的圖像編輯模型增強物件偵測的領域自適應能力
Enhancing Domain Adaptive Object Detection with Diffusion-based Image Editing Model
Authors: 黃振哲
Chen-Che Huang
Advisor: 陳駿丞
Jun-Cheng Chen
Co-Advisor: 陳祝嵩
Chu-Song Chen
Keyword: 領域自適應物件偵測,擴散的圖像編輯模型,可學習提示,感知損失,
domain adaptive object detection,sion-based image editing model,learnable prompt,perceptual loss,
Publication Year : 2025
Degree: 碩士
Abstract: 領域自適應物件偵測致力於減輕當在有標註的源域上訓練的檢測器被應用於無標註的目標域時發生模型性能退化之問題。近期的方法採用了教師-學生框架來克服領域間的差距問題。為了緩解因領域差異導致教師模型生成品質較低的偽標籤,可利用現成基於擴散的圖像編輯模型,通過手動定義的指令將源圖像編輯成類似目標域的圖像,這些類似目標域的圖像隨後可以用於監督訓練。然而,類似目標域的圖像風格可能與目標域的風格不完全相符,導致監督學習的改進效果不佳。我們研究出可利用結合可學習提示和感知相似度來更好地捕捉目標域的風格。此外,通過裁剪類似目標域圖像中的物體並用以增強目標域圖像,來降低偽標籤的偽陽性率。實驗結果證明,我們提出的方法相較於基線模型有明顯提升,並超過既有的方法。舉例來說,在 Cityscapes 到 Foggy Cityscapes 的場景中,我們在 Foggy Cityscapes 上達到 53.2% mAP,超過之前的最先進方法所達到的 52.5% mAP。
Domain adaptive object detection seeks to minimize performance degradation when a detector trained on a labeled source domain is applied to an unlabeled target domain. Recent methods employ a teacher-student framework to address the domain gap issue. To mitigate the issue of low-quality pseudo-labels produced by a teacher model due to the domain discrepancy, an off-the-shelf, diffusion-based image editing model can be utilized to edit source images and synthesize target-like images with manually defined instructions. These target-like images can then be utilized for supervised training. However, the style of the target-like images may not perfectly match that of the target images, leading to suboptimal improvement in supervised training. In this work, we combine a learnable prompt with perceptual similarity to better capture the target domain style. Furthermore, the false positive ratio of pseudo-labels can be reduced by augmenting the target images with the cropped objects from the target-like images. Experiments demonstrate that our proposed method significantly improves upon the baseline model and outperforms existing methods. For example, we achieve an mAP of 53.2% on Foggy Cityscapes in the Cityscapes to Foggy Cityscapes setting, surpassing the 52.5% mAP attained by the previous state-of-the-art approach.
URI: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/97135
DOI: 10.6342/NTU202500631
Fulltext Rights: 同意授權(全球公開)
metadata.dc.date.embargo-lift: 2025-02-28
Appears in Collections:資料科學學位學程

Files in This Item:
File SizeFormat 
ntu-113-1.pdf23.18 MBAdobe PDFView/Open
Show full item record


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved