Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
  • 搜尋 TDR
  • 授權 Q&A
  • 幫助
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資料科學學位學程
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/97135
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor陳駿丞zh_TW
dc.contributor.advisorJun-Cheng Chenen
dc.contributor.author黃振哲zh_TW
dc.contributor.authorChen-Che Huangen
dc.date.accessioned2025-02-27T16:21:23Z-
dc.date.available2025-02-28-
dc.date.copyright2025-02-27-
dc.date.issued2025-
dc.date.submitted2025-02-13-
dc.identifier.citation[1] Olaf Ronneberger, Philipp Fischer, and Thomas Brox, “U-net: Convolutional networks for biomedical image segmentation,” MICCAI, 2015.
[2] Karen Simonyan and Andrew Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
[3] Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, and Oliver Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” CVPR, 2018.
[4] Yu-Jhe Li, Xiaoliang Dai, Chih-Yao Ma, Yen-Cheng Liu, Kan Chen, Bichen Wu, Zijian He, Kris Kitani, and Peter Vajda, “Cross-domain adaptive teacher for object detection,” CVPR, 2022.
[5] Shengcao Cao, Dhiraj Joshi, Liang-Yan Gui, and Yu-Xiong Wang, “Contrastive mean teacher for domain adaptive object detectors,” CVPR, 2023.
[6] Mikhail Kennerley, Jian-Gang Wang, Bharadwaj Veeravalli, and T. Robby Tan, “Cat: Exploiting inter-class dynamics for domain adaptive object detection,” CVPR, 2024.
[7] Jinhong Deng, Wen Li, Yuhua Chen, and Lixin Duan, “Unbiased mean teacher for cross-domain object detection,” CVPR, 2021.
[8] Antti Tarvainen and Harri Valpola, “Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results,” NIPS, 2017.
[9] Meilin Chen, Weijie Chen, Shicai Yang, Jie Song, Xin-hao Wang, Lei Zhang, Yun-feng Yan, Donglian Qi, Yueting Zhuang, Di Xie, and Shiliang Pu, “Learning domain adaptive object detection with probabilistic teacher,” ICML, 2022.
[10] Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” ICCV, 2017.
[11] Han-Kai Hsu, Chun-Han Yao, Yi-Hsuan Tsai, Wei-Chih Hung, Hung-Yu Tseng, Maneesh Singh, and Ming-Hsuan Yang, “Progressive domain adaptation for object detection,” WACV, 2020.
[12] Ziqiang Zheng, Yang Wu, Xinran Han, and Jianbo Shi, “Forkgan: Seeing into the rainy night,” ECCV, 2020.
[13] Tim Brooks, Aleksander Holynski, and Alexei A Efros, “Instructpix2pix: Learning to follow image editing instructions,” CVPR, 2023.
[14] Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele, “The cityscapes dataset for semantic urban scene understanding,” CVPR, 2016.
[15] Christos Sakaridis, Dengxin Dai, and Luc Van Gool, “Semantic foggy scene understanding with synthetic data,” IJCV, 2018.
[16] Chaoqi Chen, Zebiao Zheng, Xinghao Ding, Yue Huang, and Qi Dou, “Harmonizing transferability and discriminability for adapting object detectors,” CVPR, 2020.
[17] Naoto Inoue, Ryosuke Furuta, Toshihiko Yamasaki, and Kiyoharu Aizawa, “Cross-domain weakly-supervised object detection through progressive domain adaptation,” CVPR, 2018.
[18] Yuhua Chen, Wen Li, Christos Sakaridis, Dengxin Dai, and Luc Van Gool, “Domain adaptive faster r-cnn for object detection in the wild,” CVPR, 2018.
[19] Kuniaki Saito, Yoshitaka Ushiku, Tatsuya Harada, and Kate Saenko, “Strong-weak distribution alignment for adaptive object detection,” CVPR, 2019.
[20] Vibashan VS, Vikram Gupta, Poojan Oza, Vishwanath A. Sindagi, and Vishal M. Patel, “Mega-cda: Memory guided attention for category-aware unsupervised domain adaptive object detection,” CVPR, 2021.
[21] Chang-Dong Xu, Xingjie Zhao, Xin Jin, and Xiu-Shen Wei, “Exploring categorical regularization for domain adaptive object detection,” CVPR, 2020.
[22] Minghao Xu, Hang Wang, Bingbing Ni, Qi Tian, and Wenjun Zhang, “Cross-domain detection via graph-induced prototype alignment,” CVPR, 2020.
[23] Liang Zhao and Limin Wang, “Task-specific inconsistency alignment for domain adaptive object detection,” CVPR, 2022.
[24] Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick, “Momentum contrast for unsupervised visual representation learning,” CVPR, 2020.
[25] Jascha Sohl-Dickstein, Eric A. Weiss, Niru Maheswaranathan, and Surya Ganguli, “Deep unsupervised learning using nonequilibrium thermodynamics,” arXiv preprint arXiv:1503.03585, 2015.
[26] Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer, “High-resolution image synthesis with latent diffusion models,” CVPR, 2022.
[27] Diederik P. Kingma and Max Welling, “Auto-encoding variational bayes,” ICLR, 2014.
[28] Alec Radford, Jong Wook Kim, Hallacy Chris, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever, “Learning transferable visual models from natural language supervision,” ICML, 2021.
[29] Dan Hendrycks and Thomas Dietterich, “Benchmarking neural network robustness to common corruptions and perturbations,” ICLR, 2019.
[30] Claudio Michaelis, Benjamin Mitzkus, Robert Geirhos, Evgenia Rusak, Oliver Bringmann, Alexander S. Ecker, Matthias Bethge, and Wieland Brendel, “Benchmarking robustness in object detection: Autonomous driving when winter is coming,” arXiv preprint arXiv:1907.07484, 2019.
[31] Ilya Loshchilov and Frank Hutter, “Decoupled weight decay regularization,” ICLR, 2019.
[32] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” NIPS, 2015.
[33] Yuxin Wu, Alexander HutKirillovter, Francisco Massa, Wan-Yen Lo, and Ross Girshick, “Detectron2,” https://github.com/facebookresearch/detectron2, 2019.
[34] Mengzhe He, Yali Wang, Jiaxi Wu, Yiru Wang, Hanqing Li, Bo Li, Weihao Gan, Wei Wu, and Yu Qiao, “Cross domain object detection by target-perceived dual branch distillation,” CVPR, 2022.
[35] Onkar Krishna, Hiroki Ohashi, and Saptarshi Sinha, “Mila: Memory-based instance-level adaptation for cross-domain object detection,” CVPR, 2023.
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/97135-
dc.description.abstract領域自適應物件偵測致力於減輕當在有標註的源域上訓練的檢測器被應用於無標註的目標域時發生模型性能退化之問題。近期的方法採用了教師-學生框架來克服領域間的差距問題。為了緩解因領域差異導致教師模型生成品質較低的偽標籤,可利用現成基於擴散的圖像編輯模型,通過手動定義的指令將源圖像編輯成類似目標域的圖像,這些類似目標域的圖像隨後可以用於監督訓練。然而,類似目標域的圖像風格可能與目標域的風格不完全相符,導致監督學習的改進效果不佳。我們研究出可利用結合可學習提示和感知相似度來更好地捕捉目標域的風格。此外,通過裁剪類似目標域圖像中的物體並用以增強目標域圖像,來降低偽標籤的偽陽性率。實驗結果證明,我們提出的方法相較於基線模型有明顯提升,並超過既有的方法。舉例來說,在 Cityscapes 到 Foggy Cityscapes 的場景中,我們在 Foggy Cityscapes 上達到 53.2% mAP,超過之前的最先進方法所達到的 52.5% mAP。zh_TW
dc.description.abstractDomain adaptive object detection seeks to minimize performance degradation when a detector trained on a labeled source domain is applied to an unlabeled target domain. Recent methods employ a teacher-student framework to address the domain gap issue. To mitigate the issue of low-quality pseudo-labels produced by a teacher model due to the domain discrepancy, an off-the-shelf, diffusion-based image editing model can be utilized to edit source images and synthesize target-like images with manually defined instructions. These target-like images can then be utilized for supervised training. However, the style of the target-like images may not perfectly match that of the target images, leading to suboptimal improvement in supervised training. In this work, we combine a learnable prompt with perceptual similarity to better capture the target domain style. Furthermore, the false positive ratio of pseudo-labels can be reduced by augmenting the target images with the cropped objects from the target-like images. Experiments demonstrate that our proposed method significantly improves upon the baseline model and outperforms existing methods. For example, we achieve an mAP of 53.2% on Foggy Cityscapes in the Cityscapes to Foggy Cityscapes setting, surpassing the 52.5% mAP attained by the previous state-of-the-art approach.en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2025-02-27T16:21:22Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2025-02-27T16:21:23Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontents口試委員會審定書 i
Acknowledgements ii
摘要 iii
Abstract iv
Contents vi
List of Figures viii
List of Tables x
Chapter 1 Introduction 1
Chapter 2 Related Works 5
2.1 Unsupervised Domain Adaptation for Object Detection . . . . . . . . 5
Chapter 3 Preliminary 7
3.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2 Diffusion-based Image Editing Model . . . . . . . . . . . . . . . . . 7
3.3 Perceptual Similarity . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.4 Mean Teacher . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Chapter 4 Proposed Method 11
4.1 Target Domain Style Capturing . . . . . . . . . . . . . . . . . . . . 11
4.2 Image Transformation with Target-like Images . . . . . . . . . . . . 13
4.2.1 Transform Paired Source and Target-like Images . . . . . . . . . . . 14
4.2.2 Reducing False Positives Ratio of Pseudo-Labels . . . . . . . . . . . 15
Chapter 5 Experiments 17
5.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.2 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.3 Main Results and Comparisons . . . . . . . . . . . . . . . . . . . . 19
5.4 Image Editing Results . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.5 Ablation Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Chapter 6 Conclusion 26
References 27
-
dc.language.isoen-
dc.title利用基於擴散的圖像編輯模型增強物件偵測的領域自適應能力zh_TW
dc.titleEnhancing Domain Adaptive Object Detection with Diffusion-based Image Editing Modelen
dc.typeThesis-
dc.date.schoolyear113-1-
dc.description.degree碩士-
dc.contributor.coadvisor陳祝嵩zh_TW
dc.contributor.coadvisorChu-Song Chenen
dc.contributor.oralexamcommittee莊永裕;王鈺強zh_TW
dc.contributor.oralexamcommitteeYung-Yu Chuang;Yu-Chiang Frank Wangen
dc.subject.keyword領域自適應物件偵測,擴散的圖像編輯模型,可學習提示,感知損失,zh_TW
dc.subject.keyworddomain adaptive object detection,sion-based image editing model,learnable prompt,perceptual loss,en
dc.relation.page31-
dc.identifier.doi10.6342/NTU202500631-
dc.rights.note同意授權(全球公開)-
dc.date.accepted2025-02-13-
dc.contributor.author-college電機資訊學院-
dc.contributor.author-dept資料科學學位學程-
dc.date.embargo-lift2025-02-28-
顯示於系所單位:資料科學學位學程

文件中的檔案:
檔案 大小格式 
ntu-113-1.pdf23.18 MBAdobe PDF檢視/開啟
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved