全身影像逆映射於生成對抗網路之潛在空間使用改進的訓練方法

朱健愷; Jian-Kai Zhu

Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88295

Title:	全身影像逆映射於生成對抗網路之潛在空間使用改進的訓練方法 Full-body Image Inversion on GAN Latent Space with Improved Training Procedure
Authors:	朱健愷 Jian-Kai Zhu
Advisor:	莊永裕 Yung-Yu Chuang
Keyword:	生成對抗網路,圖像編輯,全身影像,GAN逆映射,編碼器, GAN,Image editing,Full-body image,GAN Inversion,Encoder,
Publication Year :	2023
Degree:	碩士
Abstract:	近年來，隨著生成式模型的發展漸趨成熟，將生成式模型投入現實應用也成為重要的研究趨勢。其中圖像式生成模型StyleGAN2因其生成的圖像具有高擬真品質以及豐富的生成多樣性而備受關注。利用其建模出的潛在空間中的分布，我們得以任意在其中取樣潛在向量作為描述圖片的低維屬性，並據此生成高維的高擬真圖像。相反地，若我們能將高維圖像逆映射成低維潛在向量，則可透過編輯潛在向量並生成圖像以達到高擬真圖像的編輯。此篇論文探討圖像式生成模型用於全身式圖像的逆映射與編輯的困難之處，並提出一種新的損失函數設計以制約編碼器的訓練過程，使其逆映射出兼具編輯性、重建品質與相似度的潛在向量，以克服現實世界中的圖像逆映射進生成式模型之潛在空間時的常見障礙。同時，我們提出一個新的編輯方法以增強我們逆映射出之潛在向量的可編輯性，使我們可編輯更多的屬性。在後續的實驗中我們展示我們提出的方法相比以往的方法，能更好地權衡逆映射潛在向量的編輯性與重建品質，並展示我們的方法具體可對哪些屬性作編輯。 Recently, as the development of generative models gets matured, the research trend of applying generative models to real-world applications becomes important. One of the generative models, StyleGAN2, receives attention for its high-fidelity image generation and abundant generation variation. With its learned distribution in latent space, we can arbitrarily sample a latent vector as low-dimensional attributes to describe an image and generate the corresponding high-fidelity image in high dimensions by feeding the sampled vector into this generative model. On the other hand, if we can invert high-dimensional images into low-dimensional latent vectors, we can edit the inverted latent vector and feed them to StyleGAN2 to achieve high-fidelity image editing. This thesis discussed the difficulty of full-body image inversion and editing on the image generative model. We propose a new loss function design to regularize the training procedure of our encoder, forcing it to invert latent vectors with editability, reconstruction quality, and similarity, which overcomes the obstacles of inverting real-world full-body images into the latent space of generative models. At the same time, we propose a new editing method to enhance the editability of our inverted latent vector, enabling more attributes to edit. In the preceding experiments, we demonstrate that our proposed method can better balance the trade-off between editability and reconstruction quality of the inverted latent vector compared to the previous methods. We also show what attributes can be edited using our method.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88295
DOI:	10.6342/NTU202301997
Fulltext Rights:	同意授權(限校園內公開)
Appears in Collections:	資訊工程學系

Files in This Item:

File	Size	Format
ntu-111-2.pdf Restricted Access	22.57 MB	Adobe PDF	View/Open

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets