全身影像逆映射於生成對抗網路之潛在空間使用改進的訓練方法

朱健愷; Jian-Kai Zhu

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88295

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	莊永裕	zh_TW
dc.contributor.advisor	Yung-Yu Chuang	en
dc.contributor.author	朱健愷	zh_TW
dc.contributor.author	Jian-Kai Zhu	en
dc.date.accessioned	2023-08-09T16:24:51Z	-
dc.date.available	2023-11-09	-
dc.date.copyright	2023-08-09	-
dc.date.issued	2023	-
dc.date.submitted	2023-07-25	-
dc.identifier.citation	[1] R. Abdal, Y. Qin, and P. Wonka. Image2stylegan++: How to edit the embedded images?, 2019. [2] R. Abdal, Y. Qin, and P. Wonka. Image2stylegan: How to embed images into the stylegan latent space? In 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pages 4431–4440, 2019. [3] R. Abdal, P. Zhu, J. Femiani, N. J. Mitra, and P. Wonka. Clip2stylegan: Unsupervised extraction of stylegan edit directions. CoRR, abs/2112.05219, 2021. [4] Q. Bai, Y. Xu, J. Zhu, W. Xia, Y. Yang, and Y. Shen. Highfidelity gan inversion with padding space, 2022. [5] P. Cao, L. Yang, D. Liu, Z. Liu, S. Li, and Q. Song. What decreases editing capability? domainspecific hybrid refinement for improved gan inversion, 2023. [6] J. Deng, J. Guo, J. Yang, N. Xue, I. Kotsia, and S. Zafeiriou. ArcFace: Additive angular margin loss for deep face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(10):5962–5979, oct 2022. [7] Q. Feng, V. Shah, R. Gadde, P. Perona, and A. Martinez. Near perfect gan inversion, 2022. 46 doi:10.6342/NTU202301997 [8] J. Fu, S. Li, Y. Jiang, K.Y. Lin, C. Qian, C.C. Loy, W. Wu, and Z. Liu. Styleganhuman: A datacentric odyssey of human generation. arXiv preprint, arXiv:2204.11823, 2022. [9] I. J. Goodfellow, J. PougetAbadie, M. Mirza, B. Xu, D. WardeFarley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial networks, 2014. [10] J. He, W. Shi, K. Chen, L. Fu, and C. Dong. Gcfsr: a generative and controllable face super-resolution method without facial and gan priors, 2022. [11] X. Huang and S. Belongie. Arbitrary style transfer in realtime with adaptive instance normalization, 2017. [12] K. Kang, S. Kim, and S. Cho. Gan inversion for outofrange images with geometric transformations, 2021. [13] T. Karras, S. Laine, and T. Aila. A stylebased generator architecture for generative adversarial networks, 2018. [14] T. Karras, S. Laine, M. Aittala, J. Hellsten, J. Lehtinen, and T. Aila. Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8110–8119, 2020. [15] H. Kim, Y. Choi, J. Kim, S. Yoo, and Y. Uh. Exploiting spatial dimensions of latent in gan for realtime image editing, 2021. [16] H. Li, J. Liu, X. Zhang, Y. Bai, H. Wang, and K. Mueller. Transforming the latent space of stylegan for real face editing, 2021. [17] N. Meister, D. Zhao, A. Wang, V. V. Ramaswamy, R. Fong, and O. Russakovsky. Gender artifacts in visual datasets, 2022. 47 doi:10.6342/NTU202301997 [18] S. Moon and G.M. Park. Interestyle: Encoding an interest region for robust stylegan inversion, 2022. [19] T. Park, J.Y. Zhu, O. Wang, J. Lu, E. Shechtman, A. A. Efros, and R. Zhang. Swapping autoencoder for deep image manipulation. In Advances in Neural Information Processing Systems, 2020. [20] G. Parmar, Y. Li, J. Lu, R. Zhang, J.Y. Zhu, and K. K. Singh. Spatially-adaptive multilayer selection for gan inversion and editing, 2022. [21] O. Patashnik, Z. Wu, E. Shechtman, D. CohenOr, and D. Lischinski. Styleclip: Textdriven manipulation of stylegan imagery. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 2085–2094, October 2021. [22] Y. PoirierGinter, A. Lessard, R. Smith, and J.F. Lalonde. Overparameterization improves stylegan inversion, 2022. [23] K. Preechakul, N. Chatthee, S. Wizadwongsa, and S. Suwajanakorn. Diffusion autoencoders: Toward a meaningful and decodable representation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022. [24] A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever. Learning transferable visual models from natural language supervision, 2021. [25] D. Roich, R. Mokady, A. H. Bermano, and D. CohenOr. Pivotal tuning for latent based editing of real images, 2021. 48 doi:10.6342/NTU202301997 [26] Y. Shen, J. Gu, X. Tang, and B. Zhou. Interpreting the latent space of gans for semantic face editing. In CVPR, 2020. [27] Y. Shen and B. Zhou. Closedform factorization of latent semantics in gans. In CVPR, 2021. [28] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(56):1929–1958, 2014. [29] A. Tewari, M. Elgharib, G. Bharaj, F. Bernard, H.P. Seidel, P. Pérez, M. Zöllhofer, and C. Theobalt. Stylerig: Rigging stylegan for 3d control over portrait images, cvpr 2020. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, june 2020. [30] O. Tov, Y. Alaluf, Y. Nitzan, O. Patashnik, and D. CohenOr. Designing an encoder for stylegan image manipulation, 2021. [31] O. Tov, Y. Alaluf, Y. Nitzan, O. Patashnik, and D. CohenOr. Designing an encoder for stylegan image manipulation. arXiv preprint arXiv:2102.02766, 2021. [32] T. Wang, Y. Zhang, Y. Fan, J. Wang, and Q. Chen. Highfidelity gan inversion for image attribute editing, 2021. [33] T. Wei, D. Chen, W. Zhou, J. Liao, W. Zhang, L. Yuan, G. Hua, and N. Yu. E2style: Improve the efficiency and effectiveness of stylegan inversion. IEEE Transactions on Image Processing, 2022. [34] S. Woo, J. Park, J.Y. Lee, and I. S. Kweon. Cbam: Convolutional block attention module, 2018. 49 doi:10.6342/NTU202301997 [35] Z. Wu, D. Lischinski, and E. Shechtman. Stylespace analysis: Disentangled controls for stylegan image generation, 2020. [36] X. Yang, X. Xu, and Y. Chen. Photorealistic outofdomain gan inversion via in vertibility decomposition, 2022. [37] X. Yao, A. Newson, Y. Gousseau, and P. Hellier. Featurestyle encoder for style based gan inversion, 2022. [38] K. Zhang, Z. Zhang, Z. Li, and Y. Qiao. Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Processing Letters, 23(10):1499–1503, oct 2016. [39] F. Zhu, J. Zhu, W. Chu, X. Zhang, X. Ji, C. Wang, and Y. Tai. Blind face restoration via integrating face shape and generative priors. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 7652–7661, 2022. [40] P. Zhu, R. Abdal, Y. Qin, J. Femiani, and P. Wonka. Improved stylegan embedding: Where are the good latents?, 2020.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88295	-
dc.description.abstract	近年來，隨著生成式模型的發展漸趨成熟，將生成式模型投入現實應用也成為重要的研究趨勢。其中圖像式生成模型StyleGAN2因其生成的圖像具有高擬真品質以及豐富的生成多樣性而備受關注。利用其建模出的潛在空間中的分布，我們得以任意在其中取樣潛在向量作為描述圖片的低維屬性，並據此生成高維的高擬真圖像。相反地，若我們能將高維圖像逆映射成低維潛在向量，則可透過編輯潛在向量並生成圖像以達到高擬真圖像的編輯。此篇論文探討圖像式生成模型用於全身式圖像的逆映射與編輯的困難之處，並提出一種新的損失函數設計以制約編碼器的訓練過程，使其逆映射出兼具編輯性、重建品質與相似度的潛在向量，以克服現實世界中的圖像逆映射進生成式模型之潛在空間時的常見障礙。同時，我們提出一個新的編輯方法以增強我們逆映射出之潛在向量的可編輯性，使我們可編輯更多的屬性。在後續的實驗中我們展示我們提出的方法相比以往的方法，能更好地權衡逆映射潛在向量的編輯性與重建品質，並展示我們的方法具體可對哪些屬性作編輯。	zh_TW
dc.description.abstract	Recently, as the development of generative models gets matured, the research trend of applying generative models to real-world applications becomes important. One of the generative models, StyleGAN2, receives attention for its high-fidelity image generation and abundant generation variation. With its learned distribution in latent space, we can arbitrarily sample a latent vector as low-dimensional attributes to describe an image and generate the corresponding high-fidelity image in high dimensions by feeding the sampled vector into this generative model. On the other hand, if we can invert high-dimensional images into low-dimensional latent vectors, we can edit the inverted latent vector and feed them to StyleGAN2 to achieve high-fidelity image editing. This thesis discussed the difficulty of full-body image inversion and editing on the image generative model. We propose a new loss function design to regularize the training procedure of our encoder, forcing it to invert latent vectors with editability, reconstruction quality, and similarity, which overcomes the obstacles of inverting real-world full-body images into the latent space of generative models. At the same time, we propose a new editing method to enhance the editability of our inverted latent vector, enabling more attributes to edit. In the preceding experiments, we demonstrate that our proposed method can better balance the trade-off between editability and reconstruction quality of the inverted latent vector compared to the previous methods. We also show what attributes can be edited using our method.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-08-09T16:24:50Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2023-08-09T16:24:51Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	Verification Letter from the Oral Examination Committee i Acknowledgements ii 摘要 iii Abstract iv Contents vi List of Figures viii List of Tables x Chapter 1 Introduction 1 Chapter 2 Related Work 3 2.1 Generative Adversarial Network . . . . . . . . . . . . . . . . . . . . 3 2.2 Inversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.3 Editing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Chapter 3 Method 9 3.1 Preliminary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.1.1 StyleGAN2 Network . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.1.2 Feature-Style Encoder . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.2 Preliminary Study . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.2.1 Encoder Enhancement . . . . . . . . . . . . . . . . . . . . . . . . 13 3.2.1.1 Architecture refined . . . . . . . . . . . . . . . . . . . 13 3.2.1.2 Loss function revisited . . . . . . . . . . . . . . . . . 13 3.2.1.3 Editing . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.3 Loss function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.3.1 Reconstruction Loss . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.3.1.1 Pixelwise reconstruction loss . . . . . . . . . . . . . . 15 3.3.1.2 Multiscale Perceptual Loss . . . . . . . . . . . . . . . 16 3.3.2 Style Mixing Loss . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.3.3 Identity Loss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.3.4 Feature Reconstruction Loss . . . . . . . . . . . . . . . . . . . . . 19 3.4 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.5 Editing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 Chapter 4 Experiments 24 4.1 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . . 24 4.2 Inversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.2.1 Qualitative Results . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.2.2 Quantitative Evaluation . . . . . . . . . . . . . . . . . . . . . . . . 27 4.3 Style Mixing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.4 Editing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.5 Ablation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 Chapter 5 Conclusion 45 References 46	-
dc.language.iso	en	-
dc.title	全身影像逆映射於生成對抗網路之潛在空間使用改進的訓練方法	zh_TW
dc.title	Full-body Image Inversion on GAN Latent Space with Improved Training Procedure	en
dc.type	Thesis	-
dc.date.schoolyear	111-2	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	吳賦哲;葉正聖	zh_TW
dc.contributor.oralexamcommittee	Fu-Che Wu;Jeng-Sheng Yeh	en
dc.subject.keyword	生成對抗網路,圖像編輯,全身影像,GAN逆映射,編碼器,	zh_TW
dc.subject.keyword	GAN,Image editing,Full-body image,GAN Inversion,Encoder,	en
dc.relation.page	50	-
dc.identifier.doi	10.6342/NTU202301997	-
dc.rights.note	同意授權(限校園內公開)	-
dc.date.accepted	2023-07-27	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	資訊工程學系	-
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-111-2.pdf 目前未授權公開取用	22.57 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。