Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 電機工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/96138
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor郭斯彥zh_TW
dc.contributor.advisorSy-Yen Kuoen
dc.contributor.author張華恩zh_TW
dc.contributor.authorHua-En Changen
dc.date.accessioned2024-11-15T16:07:34Z-
dc.date.available2024-11-16-
dc.date.copyright2024-11-15-
dc.date.issued2024-
dc.date.submitted2024-08-07-
dc.identifier.citation[1] Y. Balaji, S. Sankaranarayanan, and R. Chellappa. Metareg: Towards domain generalization using meta-regularization. In Neural Information Processing Systems,2018.
[2] Q. Bi, S. You, and T. Gevers. Learning content-enhanced mask transformer fordomain generalized urban-scene segmentation. In AAAI Conference on ArtificialIntelligence, 2023.
[3] L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam. Encoder-decoderwith atrous separable convolution for semantic image segmentation. In EuropeanConference on Computer Vision, 2018.
[4] B. Cheng, I. Misra, A. G. Schwing, A. Kirillov, and R. Girdhar. Masked-attentionmask transformer for universal image segmentation. 2022 IEEE/CVF Conferenceon Computer Vision and Pattern Recognition (CVPR), pages 1280–1289, 2021.
[5] B. Cheng, I. Misra, A. G. Schwing, A. Kirillov, and R. Girdhar. Masked-attentionmask transformer for universal image segmentation. 2022.
[6] B. Cheng, A. G. Schwing, and A. Kirillov. Per-pixel classification is not all you needfor semantic segmentation. In Neural Information Processing Systems, 2021.23
[7] S. Choi, S. Jung, H. Yun, J. T. Kim, S. Kim, and J. Choo. Robustnet: Improvingdomain generalization in urban-scene segmentation via instance selective whiten-ing. In Proceedings of the IEEE/CVF Conference on Computer Vision and PatternRecognition, pages 11580–11590, 2021.
[8] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke,S. Roth, and B. Schiele. The cityscapes dataset for semantic urban scene understand-ing. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR),pages 3213–3223, 2016.
[9] P. Dhariwal and A. Nichol. Diffusion models beat gans on image synthesis. ArXiv,abs/2105.05233, 2021.
[10] J. Ding, N. Xue, G. Xia, B. Schiele, and D. Dai. Hgformer: Hierarchical group-ing transformer for domain generalized semantic segmentation. 2023 IEEE/CVFConference on Computer Vision and Pattern Recognition (CVPR), pages 15413–15423, 2023.
[11] Q. Dou, D. C. de Castro, K. Kamnitsas, and B. Glocker. Domain generalizationvia model-agnostic learning of semantic features. In Neural Information ProcessingSystems, 2019.
[12] C. Gan, T. Yang, and B. Gong. Learning attributes equals multi-source domain gen-eralization. 2016 IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pages 87–97, 2016.
[13] L. Gao, L. Zhang, and Q. Zhang. Addressing domain gap via content invariant repre-sentation for semantic segmentation. In AAAI Conference on Artificial Intelligence,2021.24
[14] M. Ghifary, W. Kleijn, M. Zhang, and D. Balduzzi. Domain generalization for objectrecognition with multi-task autoencoders. 2015 IEEE International Conference onComputer Vision (ICCV), pages 2551–2559, 2015.
[15] R. Gong, W. Li, Y. Chen, and L. V. Gool. Dlow: Domain flow for adaptationand generalization. 2019 IEEE/CVF Conference on Computer Vision and PatternRecognition (CVPR), pages 2472–2481, 2018.
[16] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition.2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages770–778, 2015.
[17] J. Ho, A. Jain, and P. Abbeel. Denoising diffusion probabilistic models. ArXiv, abs/2006.11239, 2020.
[18] L. Hoyer, D. Dai, and L. V. Gool. Daformer: Improving network architectures andtraining strategies for domain-adaptive semantic segmentation. 2022 IEEE/CVFConference on Computer Vision and Pattern Recognition (CVPR), pages 9914–9925, 2021.
[19] J. Huang, D. Guan, A. Xiao, and S. Lu. Fsdr: Frequency space domain randomizationfor domain generalization. 2021 IEEE/CVF Conference on Computer Vision andPattern Recognition (CVPR), pages 6887–6898, 2021.
[20] L. Huang, Y. Zhou, F. Zhu, L. Liu, and L. Shao. Iterative normalization: Be-yond standardization towards efficient whitening. 2019 IEEE/CVF Conference onComputer Vision and Pattern Recognition (CVPR), pages 4869–4878, 2019.
[21] W. Huang, C. W. Chen, Y. Li, J. Li, C. Li, F. Song, Y. Yan, and Z. Xiong. Style pro-jected clustering for domain generalized semantic segmentation. 2023 IEEE/CVF25Conference on Computer Vision and Pattern Recognition (CVPR), pages 3061–3071, 2023.
[22] C. Kamann and C. Rother. Benchmarking the robustness of semantic segmenta-tion models with respect to common corruptions. International Journal of ComputerVision, 129:462 – 483, 2019.
[23] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. CoRR, abs/1412.6980, 2014.
[24] D. P. Kingma, T. Salimans, B. Poole, and J. Ho. Variational diffusion models. ArXiv,abs/2107.00630, 2021.
[25] S. Lee, H. Seong, S. Lee, and E. Kim. Wildnet: Learning domain generalized seman-tic segmentation from the wild. 2022 IEEE/CVF Conference on Computer Visionand Pattern Recognition (CVPR), pages 9926–9936, 2022.
[26] D. Li, Y. Yang, Y.-Z. Song, and T. M. Hospedales. Learning to generalize: Meta-learning for domain generalization. In AAAI Conference on Artificial Intelligence,2017.
[27] D. Li, J. Zhang, Y. Yang, C. Liu, Y.-Z. Song, and T. M. Hospedales. Episodic trainingfor domain generalization. 2019 IEEE/CVF International Conference on ComputerVision (ICCV), pages 1446–1455, 2019.
[28] H. Li, S. J. Pan, S. Wang, and A. C. Kot. Domain generalization with adversar-ial feature learning. 2018 IEEE/CVF Conference on Computer Vision and PatternRecognition, pages 5400–5409, 2018.
[29] H. Li, Y. Yang, M. Chang, H. Feng, Z. hai Xu, Q. Li, and Y. ting Chen. Srdiff:26Single image super-resolution with diffusion probabilistic models. Neurocomputing,479:47–59, 2021.
[30] Y. Li, X. Tian, M. Gong, Y. Liu, T. Liu, K. Zhang, and D. Tao. Deep domain gen-eralization via conditional invariant adversarial networks. In European Conferenceon Computer Vision, 2018.
[31] Y. Li, Y. Yang, W. Zhou, and T. M. Hospedales. Feature-critic networks for hetero-geneous domain generalization. In International Conference on Machine Learning,2019.
[32] Q. Liu, C. Zhuang, P. Gao, and J. Qin. Cdformer:when degradation prediction em-braces diffusion model for blind image super-resolution. ArXiv, abs/2405.07648,2024.
[33] Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo. Swin trans-former: Hierarchical vision transformer using shifted windows. In Proceedings ofthe IEEE/CVF International Conference on Computer Vision (ICCV), 2021.
[34] F. Lv, T. Liang, X. Chen, and G. Lin. Cross-domain semantic segmentation viadomain-invariant interactive relation transfer. 2020 IEEE/CVF Conference onComputer Vision and Pattern Recognition (CVPR), pages 4333–4342, 2020.
[35] S. Motiian, M. Piccirilli, D. A. Adjeroh, and G. Doretto. Unified deep super-vised domain adaptation and generalization. 2017 IEEE International Conferenceon Computer Vision (ICCV), pages 5716–5726, 2017.
[36] K. Muandet, D. Balduzzi, and B. Scholkopf. Domain generalization via invariantfeature representation. In International Conference on Machine Learning, 2013.27
[37] H. Nam, H. Lee, J. Park, W. Yoon, and D. Yoo. Reducing domain gap by reducingstyle bias. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), pages 8686–8695, 2021.
[38] G. Neuhold, T. Ollmann, S. R. Bulò, and P. Kontschieder. The mapillary vis-tas dataset for semantic understanding of street scenes. 2017 IEEE InternationalConference on Computer Vision (ICCV), pages 5000–5009, 2017.
[39] X. Pan, P. Luo, J. Shi, and X. Tang. Two at once: Enhancing learning and general-ization capacities via ibn-net. In European Conference on Computer Vision, 2018.
[40] X. Pan, X. Zhan, J. Shi, X. Tang, and P. Luo. Switchable whitening for deep repre-sentation learning. 2019 IEEE/CVF International Conference on Computer Vision(ICCV), pages 1863–1871, 2019.
[41] D. Peng, Y. Lei, M. Hayat, Y. Guo, and W. Li. Semantic-aware domain general-ized segmentation. 2022 IEEE/CVF Conference on Computer Vision and PatternRecognition (CVPR), pages 2584–2595, 2022.
[42] D. Peng, Y. Lei, L. Liu, P. Zhang, and J. Liu. Global and local texture random-ization for synthetic-to-real semantic segmentation. IEEE Transactions on ImageProcessing, 30:6594–6608, 2021.
[43] D. Peng, Y. Lei, L. Liu, P. Zhang, and J. Liua. Global and local texture random-ization for synthetic-to-real semantic segmentation. IEEE Transactions on ImageProcessing, 2021.
[44] P. Pernias, D. Rampas, M. L. Richter, C. J. Pal, and M. Aubreville. Wuerstchen: Anefficient architecture for large-scale text-to-image diffusion models. 2023.28
[45] M. M. Rahman, C. Fookes, M. Baktash, and S. Sridharan. Correlation-aware adver-sarial domain adaptation and generalization. Pattern Recognit., 100:107124, 2019.
[46] S. R. Richter, V. Vineet, S. Roth, and V. Koltun. Playing for data: Ground truth fromcomputer games. ArXiv, abs/1608.02192, 2016.
[47] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer. High-resolutionimage synthesis with latent diffusion models. 2022 IEEE/CVF Conference onComputer Vision and Pattern Recognition (CVPR), pages 10674–10685, 2021.
[48] G. Ros, L. Sellart, J. Materzynska, D. Vázquez, and A. M. López. The synthiadataset: A large collection of synthetic images for semantic segmentation of ur-ban scenes. 2016 IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pages 3234–3243, 2016.
[49] C. Saharia, J. Ho, W. Chan, T. Salimans, D. J. Fleet, and M. Norouzi. Imagesuper-resolution via iterative refinement. IEEE Transactions on Pattern Analysisand Machine Intelligence, 45:4713–4726, 2021.
[50] S. Seo, Y. Suh, D. Kim, J. Han, and B. Han. Learning to optimize domain specificnormalization for domain generalization. ArXiv, abs/1907.04275, 2019.
[51] D. Ulyanov, A. Vedaldi, and V. S. Lempitsky. Improved texture networks: Maximiz-ing quality and diversity in feed-forward stylization and texture synthesis. 2017 IEEEConference on Computer Vision and Pattern Recognition (CVPR), pages 4105–4113,2017.
[52] J. Wang, Z. Yue, S. Zhou, K. C. K. Chan, and C. C. Loy. Exploiting diffusion priorfor real-world image super-resolution. ArXiv, abs/2305.07015, 2023.29
[53] B. Xia, Y. Zhang, S. Wang, Y. Wang, X. Wu, Y. Tian, W. Yang, and L. Van Gool.Diffir: Efficient diffusion model for image restoration. ICCV, 2023.
[54] E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Álvarez, and P. Luo. Segformer:Simple and efficient design for semantic segmentation with transformers. In NeuralInformation Processing Systems, 2021.
[55] Q. Xu, L. Yao, Z. Jiang, G. Jiang, W. Chu, W. Han, W. Zhang, C. Wang, and Y. Tai.Dirl: Domain-invariant representation learning for generalizable semantic segmen-tation. In AAAI Conference on Artificial Intelligence, 2022.
[56] F. Yu, H. Chen, X. Wang, W. Xian, Y. Chen, F. Liu, V. Madhavan, and T. Darrell.Bdd100k: A diverse driving dataset for heterogeneous multitask learning. 2020IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages2633–2642, 2018.
[57] X. Yue, Y. Zhang, S. Zhao, A. L. Sangiovanni-Vincentelli, K. Keutzer, and B. Gong.Domain randomization and pyramid consistency: Simulation-to-real generalizationwithout accessing target domain data. 2019 IEEE/CVF International Conference onComputer Vision (ICCV), pages 2100–2110, 2019.
[58] Y. Zhao, Z. Zhong, N. Zhao, N. Sebe, and G. H. Lee. Style-hallucinated dualconsistency learning for domain generalized semantic segmentation. In EuropeanConference on Computer Vision, 2022.
[59] Z. Zhong, Y. Zhao, G. H. Lee, and N. Sebe. Adversarial style augmentation fordomain generalized urban-scene segmentation. ArXiv, abs/2207.04892, 2022.
[60] K. Zhou, Y. Yang, T. M. Hospedales, and T. Xiang. Learning to generate noveldomains for domain generalization. ArXiv, abs/2007.03304, 2020.
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/96138-
dc.description.abstract近年來,深度神經網路技術的發展有顯著的進步,其中也包含了在電腦視覺領域中的語義分割任務。然而,當應用於域泛化場景時,即在未見過的領域中部署在特定源域上訓練的模型時,會發生明顯的性能下降。最近的研究工作主要集中在通過獲取域不變特徵和採用數據增強技術來增強模型的穩健性。然而,擴散模型在語義分割的域泛化(Domain Generalized Semantic Segmentation, DGSS)中提供有價值的先驗知識的潛力仍然大部分未被探索。本文提出了一個框架,旨在通過估計先驗來增強提取的特徵,與任何檢測網路架構兼容,從而解決 DGSS問題。我們的框架包括三個核心模組:先驗提取網路(PEN)、先驗融合網路(PFN)和擴散模型。具體而言,我們的框架採用了兩階段訓練方法。在初始階段,PEN將增強圖像特徵與其相應的源域對應部分融合,以得出一個先驗向量,而PFN則基於此先驗向量進行特徵融合。隨後,在第二階段,我們訓練擴散模型,僅從增強特徵中估計第一階段獲取的先驗。我們在廣泛使用的城市場景數據集(例如Cityscapes、Mapillary、BDDS、GTAV和SYNTHIA)上驗證了我們的方法對於提升深度學習模型穩健性的效果。zh_TW
dc.description.abstractDeep neural networks have demonstrated remarkable advancements in the task of semantic segmentation. However, when applied in domain generalization scenarios, where models trained on a specific source domain are deployed in unseen domains, a substantial performance degradation is observed. Recent research efforts have primarily focused on enhancing model robustness through the acquisition of domain-invariant features and employing data augmentation techniques. Nevertheless, the potential of diffusion models to offer valuable prior knowledge for domain generalization in semantic segmentation (DGSS) remains largely unexplored. In this paper, we present a framework designed to address the DGSS problem by enhancing extracted feature with estimated prior, offering compatibility with any detection network architecture. Our framework comprises three core modules: the Prior Extraction Network (PEN), the Prior Fusion Network (PFN), and a diffusion model. Specifically, our framework adopts a two-stage training approach. In the initial stage, PEN amalgamates augmented image features with their corresponding source domain counterparts to derive a prior vector, while FPN conducts feature fusion based on this prior vector. Subsequently, we train the diffusion model in order to predict the prior information obtained in the first stage solely from augmented features. The effectiveness of our approach in improving the robustness of existing semantic segmentation networks is verified through experiments on urban scene datasets (i.e., Cityscapes, Mapillary, BDDS, GTAV, and SYNTHIA).en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-11-15T16:07:34Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2024-11-15T16:07:34Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontentsAcknowledgement ... iii
摘要 ... v
Abstract ... vii
Contents ... ix
List of Figures ... xi
List of Tables ... xiii
Chapter 1 Introduction ... 1
1.1 Motivation and Overview of the Thesis ... 1
1.2 Organization of the Dissertation ... 3
Chapter 2 Related Work ... 5
2.1 Domain Generalization ... 5
2.2 Diffusion Models ... 6
Chapter 3 Method ... 7
3.1 Stage 1 ... 7
3.2 Stage 2 ... 11
3.2.1 Inference ... 12
Chapter 4 Experiments ... 13
4.1 Experiment setup ... 13
4.2 Datasets ... 13
4.2.1 Real-World Datasets ... 13
4.2.2 Synthetic Datasets ... 14
4.3 Implementation Details ... 14
4.4 Quantitative results ... 15
4.4.1 Real-World Source Domain ... 15
4.4.2 Synthetic Source Domain ... 17
4.4.3 Clear to Corruptions ... 18
4.5 Qualitative Results ... 19
4.6 Ablation Studies ... 19
4.6.1 Number of prior channel ... 20
4.6.2 Feature Enhancement Layer ... 20
Chapter 5 Conclusion ... 21
References ... 23
-
dc.language.isoen-
dc.subject領域泛化zh_TW
dc.subject擴散模型zh_TW
dc.subject語意分割zh_TW
dc.subjectDiffusion Modelen
dc.subjectSemantic Segmentationen
dc.subjectDomain Generalizationen
dc.title基於擴散模型引導之特徵增強於領域泛化語意分割zh_TW
dc.titleDiffusion-Guided Feature Enhancement for Domain Generalized Semantic Segmentationen
dc.typeThesis-
dc.date.schoolyear113-1-
dc.description.degree碩士-
dc.contributor.oralexamcommittee雷欽隆;游家牧;陳英一;顏嗣鈞zh_TW
dc.contributor.oralexamcommitteeChin-Laung Lei;Chia-Mu Yu;Ing-Yi Chen;Hsu-Chun Yenen
dc.subject.keyword語意分割,領域泛化,擴散模型,zh_TW
dc.subject.keywordSemantic Segmentation,Domain Generalization,Diffusion Model,en
dc.relation.page30-
dc.identifier.doi10.6342/NTU202403250-
dc.rights.note未授權-
dc.date.accepted2024-08-10-
dc.contributor.author-college電機資訊學院-
dc.contributor.author-dept電機工程學系-
顯示於系所單位:電機工程學系

文件中的檔案:
檔案 大小格式 
ntu-113-1.pdf
  未授權公開取用
4.52 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved