透過 CLIP 文字編碼器適應進行條件擴散模型中的語義編輯和去偏見

鄭廷瑋; Ting-Wei Cheng

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/94638

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	吳家麟	zh_TW
dc.contributor.advisor	Ja-Lin Wu	en
dc.contributor.author	鄭廷瑋	zh_TW
dc.contributor.author	Ting-Wei Cheng	en
dc.date.accessioned	2024-08-16T17:14:31Z	-
dc.date.available	2024-08-17	-
dc.date.copyright	2024-08-16	-
dc.date.issued	2024	-
dc.date.submitted	2024-08-13	-
dc.identifier.citation	[1]M. Brack, F. Friedrich, D. Hintersdorf, L. Struppek, P. Schramowski, and K. Kersting. Sega: Instructing text-to-image models using semantic guidance. In A. Oh,T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors, Advancesin Neural Information Processing Systems, volume 36, pages 25365–25389. CurranAssociates, Inc., 2023. [2] P. Dhariwal and A. Nichol. Diffusion models beat gans on image synthesis. Advancesin neural information processing systems, 34:8780–8794, 2021. [3] M. D’Incà, E. Peruzzo, M. Mancini, D. Xu, V. Goel, X. Xu, Z. Wang, H. Shi, andN. Sebe. Openbias: Open-set bias detection in text-to-image generative models.ArXiv, abs/2404.07990, 2024. [4] R. Gandikota, J. Materzynska, T. Zhou, A. Torralba, and D. Bau. Concept sliders:Lora adaptors for precise control in diffusion models, 2023. [5] E. Härkönen, A. Hertzmann, J. Lehtinen, and S. Paris. Ganspace: Discoveringinterpretable gan controls. In H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, andH. Lin, editors, Advances in Neural Information Processing Systems, volume 33,pages 9841–9850. Curran Associates, Inc., 2020.25doi:doi:10.6342/NTU202403591 [6] E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen.Lora: Low-rank adaptation of large language models, 2021. [7] A. Ramesh, P. Dhariwal, A. Nichol, C. Chu, and M. Chen. Dall·e 2: A moderntext-to-image generation model. arXiv preprint arXiv:2204.06125, 2022. [8] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer. High-resolutionimage synthesis with latent diffusion models. In Proceedings of the IEEE/CVFConference on Computer Vision and Pattern Recognition, pages 10684–10695,2022. [9] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer. High-resolutionimage synthesis with latent diffusion models. In Proceedings of the IEEE/CVFconference on computer vision and pattern recognition, pages 10684–10695, 2022. [10] Q. Wu, Y. Liu, H. Zhao, A. Kale, T. M. Bui, T. Yu, Z. Lin, Y. Zhang, and S. Chang.Uncovering the disentanglement capability in text-to-image diffusion models. 2023IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages1900–1910, 2022. [11] L. Zhang, A. Rao, and M. Agrawala. Adding conditional control to text-to-imagediffusion models. In Proceedings of the IEEE/CVF International Conference onComputer Vision, pages 3836–3847, 2023.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/94638	-
dc.description.abstract	本論文研究了 CLIP 文本編碼器在條件擴散模型中的適應，以解決語義編輯和去偏見相關的挑戰。我們探討了透過適應 (Adaptation) 在增強生成圖像語義屬性控制方面的有效性，同時減少內在偏見。研究利用了各種解耦策略，並對文字編碼器進行修改，以評估緩解與性別和種族相關偏見的潛力。我們的研究結果表明，通過針對性適應微調文本編碼器可以顯著提高語義控制的精確性和去偏見的有效性。本研究為圖像合成領域的更公平和可控的生成模型的發展做出了貢獻。	zh_TW
dc.description.abstract	This thesis investigates the adaptation of the CLIP text encoder for use in conditionaldiffusion models to address challenges related to semantic editing and debiasing. We explore the effectiveness of low-rank adaptations in enhancing the control over semanticattributes of generated images while simultaneously reducing inherent biases. The studyutilizes various disentanglement strategies and introduces modifications to the text encoder to evaluate the potential for mitigating biases related to gender and ethnicity. Ourfindings indicate that fine-tuning the text encoder with targeted adaptations can significantly improve semantic control’s precision and debiasing effectiveness. This work contributes to the development of more fair and controllable generative models in the field ofimage synthesis.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-08-16T17:14:31Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2024-08-16T17:14:31Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	Acknowledgements I 摘要 III Abstract V Contents VII List of Figures IX List of Tables XI Denotation XIII Chapter 1 Introduction 1 1.1 Research Objective and Questions . . . . . . . . . . . . . . . . . . . 1 1.2 Methodology Overview . . . . . . . . . . . . . . . . . . . . . . . . 2 Chapter 2 Related Works 3 2.1 Disentanglement in Image Generative Models . . . . . . . . . . . . . 3 2.2 Bias in Diffusion Models . . . . . . . . . . . . . . . . . . . . . . . 4 2.3 Guidance Based Methods . . . . . . . . . . . . . . . . . . . . . . . 4 2.4 Semantic Control . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Chapter 3 Background 7 3.1 Diffusion Models . . . . . . . . . .. . . . . . . . . . . . . . . 7 3.2 Low-Rank Adaptation . . . . . . . . . . . . . . . . . . . . . . . . . 8 Chapter 4 Method 9 4.1 Concepts Editing . . . . . . . .. . . . . . . . . . . . . . . . . . . 10 4.2 Debiasing . . . . .. . . .. . . . . . . . . . . . . . . . . . . . 11 4.2.1 Gender Fairness . . . . . . . . . . . . . . . . . . . . . . . . . . 12 4.2.2 Ethnic Fairness . . . . . . . . . . . . . . . . . . . . . . . . . 13 Chapter 5 Experiments 15 5.1 Concept Adjustment . . . . . . .. . . . . . . . . . . . . . . . . . 15 5.1.1 Linearly Adjustable . . . . . . . . . . . . . . . . . . . . . . . 15 5.1.2 Multiple Concepts Adjustable . . . . . . . . . . . . . . . . 16 5.1.3 Parameter Efficiency . . . . .. . . . . . . . . . . . . . . . . . . 16 5.2 Debiasing . . . . . . . . . . .. . . . . . . . . . . . . . . . 17 5.2.1 Gender Equality . . . . . .. . . . . . . . . . . . . . . . . . . . 18 5.2.2 Ethnic Diversity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 5.2.3 Training Cost . . . . . . . . . . . . . . . . . . . . . . . . . . 19 Chapter 6 Conclusion and Discussion 21 6.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 6.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 6.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 References 25	-
dc.language.iso	en	-
dc.subject	語意編輯	zh_TW
dc.subject	去偏見	zh_TW
dc.subject	圖片生成	zh_TW
dc.subject	LoRA	zh_TW
dc.subject	Image Generative	en
dc.subject	Semantic Editing	en
dc.subject	LoRA	en
dc.subject	Debiasing	en
dc.title	透過 CLIP 文字編碼器適應進行條件擴散模型中的語義編輯和去偏見	zh_TW
dc.title	Semantic Editing and Debiasing in Conditional Diffusion Models by CLIP Text-Encoder Adaptation	en
dc.type	Thesis	-
dc.date.schoolyear	112-2	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	許永真;胡敏君 ;陳駿丞;陳文進	zh_TW
dc.contributor.oralexamcommittee	Yung-Jen Hsu;Min-Chun Hu;Jun-Cheng Chen;Wen-Chin Chen	en
dc.subject.keyword	語意編輯,去偏見,圖片生成,LoRA,	zh_TW
dc.subject.keyword	Semantic Editing,Debiasing,Image Generative,LoRA,	en
dc.relation.page	26	-
dc.identifier.doi	10.6342/NTU202403591	-
dc.rights.note	同意授權(全球公開)	-
dc.date.accepted	2024-08-14	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	資訊工程學系	-
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-112-2.pdf	5.69 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。