擴散模型之連續客製化

廖宇謙; Yu-Chien Liao

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/101265

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	王鈺強	zh_TW
dc.contributor.advisor	Yu-Chiang Frank Wang	en
dc.contributor.author	廖宇謙	zh_TW
dc.contributor.author	Yu-Chien Liao	en
dc.date.accessioned	2026-01-13T16:08:43Z	-
dc.date.available	2026-01-14	-
dc.date.copyright	2026-01-13	-
dc.date.issued	2025	-
dc.date.submitted	2025-12-17	-
dc.identifier.citation	[1] Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023. [2] Omer Antverg and Yonatan Belinkov. On the pitfalls of analyzing individual neurons in language models. In ICLR, 2022. [3] Ruchika Chavhan, Da Li, and Timothy Hospedales. Conceptprune: Concept editing in diffusion models via skilled neuron pruning. arXiv preprint arXiv:2405.19237, 2024. [4] Hila Chefer, Yuval Alaluf, Yael Vinker, Lior Wolf, and Daniel Cohen-Or. Attend-and-excite: Attention-based semantic guidance for text-to-image diffusion models. In SIGGRAPH, 2023. [5] Xi Chen, Lianghua Huang, Yu Liu, Yujun Shen, Deli Zhao, and Hengshuang Zhao. Anydoor: Zero-shot object-level image customization. arxiv abs/2307.09481 (2023). In CVPR, 2024. [6] Fahim Dalvi, Nadir Durrani, Hassan Sajjad, Yonatan Belinkov, Anthony Bau, and James Glass. What is one grain of sand in the desert? analyzing individual neurons in deep nlp models. In Proceedings of the AAAI Conference on Artificial Intelligence, 2019. [7] Ganggui Ding, Canyu Zhao, Wen Wang, Zhen Yang, Zide Liu, Hao Chen, and Chun-hua Shen. Freecustom: Tuning-free customized image generation for multi-concept composition. In CVPR, 2024. [8] Jiahua Dong, Wenqi Liang, Hongliu Li, Duzhen Zhang, Meng Cao, Henghui Ding, Salman Khan, and Fahad Shahbaz Khan. How to continually adapt text-to-image diffusion models for flexible customization? In NIPS, 2024. [9] Nadir Durrani, Hassan Sajjad, Fahim Dalvi, and Yonatan Belinkov. Analyzing individual neurons in pretrained language models. In EMNLP, 2020. [10] Weixi Feng, Xuehai He, Tsu-Jui Fu, Varun Jampani, Arjun Akula, Pradyumna Narayana, Sugato Basu, Xin Eric Wang, and William Yang Wang. Training-free structured diffusion guidance for compositional text-to-image synthesis. In ICLR, 2023. [11] Rinon Gal, Yuval Alaluf, Yuval Atzmon, Or Patashnik, Amit H Bermano, Gal Chechik, and Daniel Cohen-Or. An image is worth one word: Personalizing text-to-image generation using textual inversion. In ICLR, 2023. [12] Rinon Gal, Moab Arar, Yuval Atzmon, Amit H Bermano, Gal Chechik, and Daniel Cohen-Or. Encoder-based domain tuning for fast personalization of text-to-image models. In SIGGRAPH, 2023. [13] Yuchao Gu, Xintao Wang, Jay Zhangjie Wu, Yujun Shi, Yunpeng Chen, Zihan Fan, Wuyou Xiao, Rui Zhao, Shuning Chang, Weijia Wu, et al. Mix-of-show: Decentralized low-rank adaptation for multi-concept customization of diffusion models. In NeurIPS, 2024. [14] Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. Lora: Low-rank adaptation of large language models. In ICLR, 2022. [15] Sangwon Jang, Jaehyeong Jo, Kimin Lee, and Sung Ju Hwang. Identity decoupling for multi-subject personalization of text-to-image models. In NIPS, 2024. [16] Jimyeong Kim, Jungwon Park, and Wonjong Rhee. Selectively informative description can reduce undesired embedding entanglements in text-to-image personalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024. [17] Zhe Kong, Yong Zhang, Tianyu Yang, Tao Wang, Kaihao Zhang, Bizhu Wu, Guanying Chen, Wei Liu, and Wenhan Luo. Omg: Occlusion-friendly personalized multi-concept generation in diffusion models. In ECCV, 2024. [18] Nupur Kumari, Bingliang Zhang, Richard Zhang, Eli Shechtman, and Jun-Yan Zhu. Multi-concept customization of text-to-image diffusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023. [19] Gihyun Kwon, Simon Jenni, Dingzeyu Li, Joon-Young Lee, Jong Chul Ye, and Fabian Caba Heilbron. Concept weaver: Enabling multi-concept fusion in text-to-image models. In CVPR, 2024. [20] Kimin Lee, Hao Liu, Moonkyung Ryu, Olivia Watkins, Yuqing Du, Craig Boutilier, Pieter Abbeel, Mohammad Ghavamzadeh, and Shixiang Shane Gu. Aligning text-to-image models using human feedback. arXiv preprint arXiv:2302.12192, 2023. [21] Dongxu Li, Junnan Li, and Steven Hoi. Blip-diffusion: Pre-trained subject representation for controllable text-to-image generation and editing. NIPS, 2023. [22] Zhiheng Liu, Ruili Feng, Kai Zhu, Yifei Zhang, Kecheng Zheng, Yu Liu, Deli Zhao, Jingren Zhou, and Yang Cao. Cones: Concept neurons in diffusion models for customized generation. In ICML, 2023. [23] Zhiheng Liu, Yifei Zhang, Yujun Shen, Kecheng Zheng, Kai Zhu, Ruili Feng, Yu Liu, Deli Zhao, Jingren Zhou, and Yang Cao. Cones 2: Customizable image synthesis with multiple subjects. In NeurIPS, 2023. [24] Wan-Duo Kurt Ma, Avisek Lahiri, John P Lewis, Thomas Leung, and W Bastiaan Kleijn. Directed diffusion: Direct control of object placement through attention guidance. In AAAI, 2024. [25] Tuna Han Salih Meral, Enis Simsar, Federico Tombari, and Pinar Yanardag. Clora: A contrastive approach to compose multiple lora models. arXiv preprint arXiv:2403.19776, 2024. [26] Saman Motamed, Danda Pani Paudel, and Luc Van Gool. Lego: Learning to disentangle and invert concepts beyond object appearance in text-to-image diffusion models. arXiv e-prints, 2023. [27] Ryan Po, Guandao Yang, Kfir Aberman, and Gordon Wetzstein. Orthogonal adaptation for modular customization of diffusion models. In CVPR, 2024. [28] Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sand-hini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. In ICML, 2021. [29] Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models. In CVPR, 2022. [30] Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, and Kfir Aberman. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In CVPR, 2023. [31] Viraj Shah, Nataniel Ruiz, Forrester Cole, Erika Lu, Svetlana Lazebnik, Yuanzhen Li, and Varun Jampani. Ziplora: Any subject in any style by effectively merging loras. In European Conference on Computer Vision, 2025. [32] James Seale Smith, Yen-Chang Hsu, Lingyu Zhang, Ting Hua, Zsolt Kira, Yilin Shen, and Hongxia Jin. Continual diffusion: Continual customization of text-to-image diffusion with clora. In TMLR, 2023. [33] Xavier Suau, Luca Zappella, and Nicholas Apostoloff. Finding experts in transformer models. arXiv preprint arXiv:2005.07647, 2020. [34] Gan Sun, Wenqi Liang, Jiahua Dong, Jun Li, Zhengming Ding, and Yang Cong. Create your world: Lifelong text-to-image diffusion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024. [35] Mingjie Sun, Zhuang Liu, Anna Bair, and J Zico Kolter. A simple and effective pruning approach for large language models. In ICLR, 2024. [36] A Vaswani. Attention is all you need. Advances in Neural Information Processing Systems, 2017. [37] Andrey Voynov, Qinghao Chu, Daniel Cohen-Or, and Kfir Aberman. p+: Extended textual conditioning in text-to-image generation. arXiv preprint arXiv:2303.09522, 2023. [38] Xiaozhi Wang, Kaiyue Wen, Zhengyan Zhang, Lei Hou, Zhiyuan Liu, and Juanzi Li. Finding skill neurons in pre-trained transformer-based language models. 2022. [39] Xiaoshi Wu, Keqiang Sun, Feng Zhu, Rui Zhao, and Hongsheng Li. Human preference score: Better aligning text-to-image models with human preference. In ICCV, 2023. [40] Xun Wu, Shaohan Huang, and Furu Wei. Mixture of lora experts. In ICLR, 2024. [41] Yang Yang, Wen Wang, Liang Peng, Chaotian Song, Yao Chen, Hengjia Li, Xiaolong Yang, Qinglin Lu, Deng Cai, Boxi Wu, et al. Lora-composer: Leveraging low-rank adaptation for multi-concept customization in training-free diffusion models. arXiv preprint arXiv:2403.11627, 2024. [42] Jiahui Yu, Yuanzhong Xu, Jing Yu Koh, Thang Luong, Gunjan Baid, Zirui Wang, Vijay Vasudevan, Alexander Ku, Yinfei Yang, Burcu Karagol Ayan, et al. Scaling autoregressive models for content-rich text-to-image generation. In TMLR, 2023. [43] Yanbing Zhang, Mengping Yang, Qin Zhou, and Zhe Wang. Attention calibration for disentangled text-to-image personalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024. [44] Zhengyan Zhang, Yankai Lin, Zhiyuan Liu, Peng Li, Maosong Sun, and Jie Zhou. Moefication: Transformer feed-forward layers are mixtures of experts. In ACL, 2022. [45] Zhengyan Zhang, Zhiyuan Zeng, Yankai Lin, Chaojun Xiao, Xiaozhi Wang, Xu Han, Zhiyuan Liu, Ruobing Xie, Maosong Sun, and Jie Zhou. Emergent modularity in pre-trained transformers. In ACL, 2023.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/101265	-
dc.description.abstract	在實際應用中，如何有效率的不斷更新擴散模型是個難題。因此，我們提出了一個創新的學習策略『萃取概念神經元』（Concept Neuron Selection）。這個簡單且有效的方法，可以在連續學習的條件下對擴散模型實現客製化。CNS 在擴散模型中辨認和目標概念相關的神經元。為了同時避免災難性遺忘以及保留零樣本文字轉圖像的生成能力，CNS 以增量方式微調概念神經元，並共同保留先前學到的概念知識。在現實世界資料集的評估下顯示，CNS 在只需最少參數調整的情況下達到最高的效能，並且無論在單一或多概念客製化任務中皆優於以往方法。CNS同時實現無融合操作，有效降低持續客製化所需的記憶體儲存與處理時間。	zh_TW
dc.description.abstract	Updating diffusion models in an incremental setting would be practical in real-world applications yet computationally challenging. We present a novel learning strategy of Concept Neuron Selection, a simple yet effective approach to perform personalization in a continual learning scheme. CNS uniquely identifies neurons in diffusion models that are closely related to the target concepts. In order to mitigate catastrophic forgetting problems while preserving zero-shot text-to-image generation ability, CNS finetunes concept neurons in an incremental manner and jointly preserves knowledge learned of previous concepts. Evaluation of real-world datasets demonstrates that CNS achieves state-of-the-art performance with minimal parameter adjustments, outperforming previous methods in both single and multi-concept personalization works. CNS also achieves fusion-free operation, reducing memory storage and processing time for continual personalization.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2026-01-13T16:08:43Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2026-01-13T16:08:43Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	Verification Letter from the Oral Examination Committee i Acknowledgements ii 摘要 iii Abstract iv Contents v List of Figures vii List of Tables ix Chapter 1 Introduction 1 Chapter 2 Related Work 4 2.1 Diffusion model personalization 4 2.2 Neuron selection 6 Chapter 3 Preliminary 8 Chapter 4 Method 10 4.1 Problem formulation and framework overview 10 4.2 Learning of concept neurons 11 4.3 Continual personalization with concept neurons 14 Chapter 5 Experiment 17 5.1 Experimental setup 17 5.2 Qualitative comparisons 18 5.3 Quantitative comparisons 19 5.4 Ablation Study 21 Chapter 6 Conclusion 23 References 24 Appendix A — Implementation Details and More Experiments 31 A.1 Implementation Details 31 A.1.1 Datasets 31 A.1.2 Baseline methods 33 A.1.3 Threshold of the neuron selection 34 A.2 Additional Experimental Results 35 A.2.1 Use of similarity scores 35 A.2.2 Percentage of the updated neurons 35 A.2.3 Percentage of the overlapping between concept neurons 36 A.2.4 Empirical experiment of general neurons 36 A.2.5 Visualization of ablation study 36 A.2.6 Region control 37	-
dc.language.iso	en	-
dc.subject	機器學習	-
dc.subject	文生圖模型	-
dc.subject	擴散模型	-
dc.subject	連續學習	-
dc.subject	客製化	-
dc.subject	Machine Learning	-
dc.subject	Text-to-image Model	-
dc.subject	Diffusion Model	-
dc.subject	Continual Learning	-
dc.subject	Personalization	-
dc.title	擴散模型之連續客製化	zh_TW
dc.title	Continual Personalization for Diffusion Models	en
dc.type	Thesis	-
dc.date.schoolyear	114-1	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	陳祝嵩;楊福恩	zh_TW
dc.contributor.oralexamcommittee	Chu-Song Chen;Fu-En Yang	en
dc.subject.keyword	機器學習,文生圖模型擴散模型連續學習客製化	zh_TW
dc.subject.keyword	Machine Learning,Text-to-image ModelDiffusion ModelContinual LearningPersonalization	en
dc.relation.page	39	-
dc.identifier.doi	10.6342/NTU202500989	-
dc.rights.note	同意授權(全球公開)	-
dc.date.accepted	2025-12-18	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	電信工程學研究所	-
dc.date.embargo-lift	2026-01-14	-
顯示於系所單位：	電信工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-114-1.pdf	64.97 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。