Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88295
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor莊永裕zh_TW
dc.contributor.advisorYung-Yu Chuangen
dc.contributor.author朱健愷zh_TW
dc.contributor.authorJian-Kai Zhuen
dc.date.accessioned2023-08-09T16:24:51Z-
dc.date.available2023-11-09-
dc.date.copyright2023-08-09-
dc.date.issued2023-
dc.date.submitted2023-07-25-
dc.identifier.citation[1] R. Abdal, Y. Qin, and P. Wonka. Image2stylegan++: How to edit the embedded images?, 2019.
[2] R. Abdal, Y. Qin, and P. Wonka. Image2stylegan: How to embed images into the stylegan latent space? In 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pages 4431–4440, 2019.
[3] R. Abdal, P. Zhu, J. Femiani, N. J. Mitra, and P. Wonka. Clip2stylegan: Unsupervised extraction of stylegan edit directions. CoRR, abs/2112.05219, 2021.
[4] Q. Bai, Y. Xu, J. Zhu, W. Xia, Y. Yang, and Y. Shen. Highfidelity gan inversion with padding space, 2022.
[5] P. Cao, L. Yang, D. Liu, Z. Liu, S. Li, and Q. Song. What decreases editing capability? domainspecific hybrid refinement for improved gan inversion, 2023.
[6] J. Deng, J. Guo, J. Yang, N. Xue, I. Kotsia, and S. Zafeiriou. ArcFace: Additive angular margin loss for deep face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(10):5962–5979, oct 2022.
[7] Q. Feng, V. Shah, R. Gadde, P. Perona, and A. Martinez. Near perfect gan inversion, 2022. 46 doi:10.6342/NTU202301997
[8] J. Fu, S. Li, Y. Jiang, K.Y. Lin, C. Qian, C.C. Loy, W. Wu, and Z. Liu. Styleganhuman: A datacentric odyssey of human generation. arXiv preprint, arXiv:2204.11823, 2022.
[9] I. J. Goodfellow, J. PougetAbadie, M. Mirza, B. Xu, D. WardeFarley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial networks, 2014.
[10] J. He, W. Shi, K. Chen, L. Fu, and C. Dong. Gcfsr: a generative and controllable face super-resolution method without facial and gan priors, 2022.
[11] X. Huang and S. Belongie. Arbitrary style transfer in realtime with adaptive instance normalization, 2017.
[12] K. Kang, S. Kim, and S. Cho. Gan inversion for outofrange images with geometric transformations, 2021.
[13] T. Karras, S. Laine, and T. Aila. A stylebased generator architecture for generative adversarial networks, 2018.
[14] T. Karras, S. Laine, M. Aittala, J. Hellsten, J. Lehtinen, and T. Aila. Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8110–8119, 2020.
[15] H. Kim, Y. Choi, J. Kim, S. Yoo, and Y. Uh. Exploiting spatial dimensions of latent in gan for realtime image editing, 2021.
[16] H. Li, J. Liu, X. Zhang, Y. Bai, H. Wang, and K. Mueller. Transforming the latent space of stylegan for real face editing, 2021.
[17] N. Meister, D. Zhao, A. Wang, V. V. Ramaswamy, R. Fong, and O. Russakovsky. Gender artifacts in visual datasets, 2022. 47 doi:10.6342/NTU202301997
[18] S. Moon and G.M. Park. Interestyle: Encoding an interest region for robust stylegan inversion, 2022.
[19] T. Park, J.Y. Zhu, O. Wang, J. Lu, E. Shechtman, A. A. Efros, and R. Zhang. Swapping autoencoder for deep image manipulation. In Advances in Neural Information Processing Systems, 2020.
[20] G. Parmar, Y. Li, J. Lu, R. Zhang, J.Y. Zhu, and K. K. Singh. Spatially-adaptive multilayer selection for gan inversion and editing, 2022.
[21] O. Patashnik, Z. Wu, E. Shechtman, D. CohenOr, and D. Lischinski. Styleclip: Textdriven manipulation of stylegan imagery. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 2085–2094, October 2021.
[22] Y. PoirierGinter, A. Lessard, R. Smith, and J.F. Lalonde. Overparameterization improves stylegan inversion, 2022.
[23] K. Preechakul, N. Chatthee, S. Wizadwongsa, and S. Suwajanakorn. Diffusion autoencoders: Toward a meaningful and decodable representation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
[24] A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever. Learning transferable visual models from natural language supervision, 2021.
[25] D. Roich, R. Mokady, A. H. Bermano, and D. CohenOr. Pivotal tuning for latent based editing of real images, 2021. 48 doi:10.6342/NTU202301997
[26] Y. Shen, J. Gu, X. Tang, and B. Zhou. Interpreting the latent space of gans for semantic face editing. In CVPR, 2020.
[27] Y. Shen and B. Zhou. Closedform factorization of latent semantics in gans. In CVPR, 2021.
[28] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(56):1929–1958, 2014.
[29] A. Tewari, M. Elgharib, G. Bharaj, F. Bernard, H.P. Seidel, P. Pérez, M. Zöllhofer, and C. Theobalt. Stylerig: Rigging stylegan for 3d control over portrait images, cvpr 2020. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, june 2020.
[30] O. Tov, Y. Alaluf, Y. Nitzan, O. Patashnik, and D. CohenOr. Designing an encoder for stylegan image manipulation, 2021.
[31] O. Tov, Y. Alaluf, Y. Nitzan, O. Patashnik, and D. CohenOr. Designing an encoder for stylegan image manipulation. arXiv preprint arXiv:2102.02766, 2021.
[32] T. Wang, Y. Zhang, Y. Fan, J. Wang, and Q. Chen. Highfidelity gan inversion for image attribute editing, 2021.
[33] T. Wei, D. Chen, W. Zhou, J. Liao, W. Zhang, L. Yuan, G. Hua, and N. Yu. E2style: Improve the efficiency and effectiveness of stylegan inversion. IEEE Transactions on Image Processing, 2022.
[34] S. Woo, J. Park, J.Y. Lee, and I. S. Kweon. Cbam: Convolutional block attention module, 2018. 49 doi:10.6342/NTU202301997
[35] Z. Wu, D. Lischinski, and E. Shechtman. Stylespace analysis: Disentangled controls for stylegan image generation, 2020.
[36] X. Yang, X. Xu, and Y. Chen. Photorealistic outofdomain gan inversion via in vertibility decomposition, 2022.
[37] X. Yao, A. Newson, Y. Gousseau, and P. Hellier. Featurestyle encoder for style based gan inversion, 2022.
[38] K. Zhang, Z. Zhang, Z. Li, and Y. Qiao. Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Processing Letters, 23(10):1499–1503, oct 2016.
[39] F. Zhu, J. Zhu, W. Chu, X. Zhang, X. Ji, C. Wang, and Y. Tai. Blind face restoration via integrating face shape and generative priors. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 7652–7661, 2022.
[40] P. Zhu, R. Abdal, Y. Qin, J. Femiani, and P. Wonka. Improved stylegan embedding: Where are the good latents?, 2020.
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88295-
dc.description.abstract近年來,隨著生成式模型的發展漸趨成熟,將生成式模型投入現實應用也成為重要的研究趨勢。
其中圖像式生成模型StyleGAN2因其生成的圖像具有高擬真品質以及豐富的生成多樣性而備受關注。
利用其建模出的潛在空間中的分布,我們得以任意在其中取樣潛在向量作為描述圖片的低維屬性,並據此生成高維的高擬真圖像。
相反地,若我們能將高維圖像逆映射成低維潛在向量,則可透過編輯潛在向量並生成圖像以達到高擬真圖像的編輯。
此篇論文探討圖像式生成模型用於全身式圖像的逆映射與編輯的困難之處,並提出一種新的損失函數設計以制約編碼器的訓練過程,使其逆映射出兼具編輯性、重建品質與相似度的潛在向量,以克服現實世界中的圖像逆映射進生成式模型之潛在空間時的常見障礙。
同時,我們提出一個新的編輯方法以增強我們逆映射出之潛在向量的可編輯性,使我們可編輯更多的屬性。
在後續的實驗中我們展示我們提出的方法相比以往的方法,能更好地權衡逆映射潛在向量的編輯性與重建品質,並展示我們的方法具體可對哪些屬性作編輯。
zh_TW
dc.description.abstractRecently, as the development of generative models gets matured, the research trend of applying generative models to real-world applications becomes important.
One of the generative models, StyleGAN2, receives attention for its high-fidelity image generation and abundant generation variation.
With its learned distribution in latent space, we can arbitrarily sample a latent vector as low-dimensional attributes to describe an image and generate the corresponding high-fidelity image in high dimensions by feeding the sampled vector into this generative model.
On the other hand, if we can invert high-dimensional images into low-dimensional latent vectors, we can edit the inverted latent vector and feed them to StyleGAN2 to achieve high-fidelity image editing.
This thesis discussed the difficulty of full-body image inversion and editing on the image generative model.
We propose a new loss function design to regularize the training procedure of our encoder, forcing it to invert latent vectors with editability, reconstruction quality, and similarity, which overcomes the obstacles of inverting real-world full-body images into the latent space of generative models.
At the same time, we propose a new editing method to enhance the editability of our inverted latent vector, enabling more attributes to edit.
In the preceding experiments, we demonstrate that our proposed method can better balance the trade-off between editability and reconstruction quality of the inverted latent vector compared to the previous methods.
We also show what attributes can be edited using our method.
en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-08-09T16:24:50Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2023-08-09T16:24:51Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontentsVerification Letter from the Oral Examination Committee i
Acknowledgements ii
摘要 iii
Abstract iv
Contents vi
List of Figures viii
List of Tables x
Chapter 1 Introduction 1
Chapter 2 Related Work 3
2.1 Generative Adversarial Network . . . . . . . . . . . . . . . . . . . . 3
2.2 Inversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.3 Editing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Chapter 3 Method 9
3.1 Preliminary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.1.1 StyleGAN2 Network . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.1.2 Feature-Style Encoder . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2 Preliminary Study . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2.1 Encoder Enhancement . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2.1.1 Architecture refined . . . . . . . . . . . . . . . . . . . 13
3.2.1.2 Loss function revisited . . . . . . . . . . . . . . . . . 13
3.2.1.3 Editing . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.3 Loss function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.3.1 Reconstruction Loss . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.3.1.1 Pixel­wise reconstruction loss . . . . . . . . . . . . . . 15
3.3.1.2 Multi­scale Perceptual Loss . . . . . . . . . . . . . . . 16
3.3.2 Style Mixing Loss . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.3.3 Identity Loss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.3.4 Feature Reconstruction Loss . . . . . . . . . . . . . . . . . . . . . 19
3.4 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.5 Editing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Chapter 4 Experiments 24
4.1 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.2 Inversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.2.1 Qualitative Results . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.2.2 Quantitative Evaluation . . . . . . . . . . . . . . . . . . . . . . . . 27
4.3 Style Mixing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.4 Editing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.5 Ablation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Chapter 5 Conclusion 45
References 46
-
dc.language.isoen-
dc.subjectGAN逆映射zh_TW
dc.subject生成對抗網路zh_TW
dc.subject圖像編輯zh_TW
dc.subject編碼器zh_TW
dc.subject全身影像zh_TW
dc.subjectGAN Inversionen
dc.subjectFull-body imageen
dc.subjectGANen
dc.subjectImage editingen
dc.subjectEncoderen
dc.title全身影像逆映射於生成對抗網路之潛在空間使用改進的訓練方法zh_TW
dc.titleFull-body Image Inversion on GAN Latent Space with Improved Training Procedureen
dc.typeThesis-
dc.date.schoolyear111-2-
dc.description.degree碩士-
dc.contributor.oralexamcommittee吳賦哲;葉正聖zh_TW
dc.contributor.oralexamcommitteeFu-Che Wu;Jeng-Sheng Yehen
dc.subject.keyword生成對抗網路,圖像編輯,全身影像,GAN逆映射,編碼器,zh_TW
dc.subject.keywordGAN,Image editing,Full-body image,GAN Inversion,Encoder,en
dc.relation.page50-
dc.identifier.doi10.6342/NTU202301997-
dc.rights.note同意授權(限校園內公開)-
dc.date.accepted2023-07-27-
dc.contributor.author-college電機資訊學院-
dc.contributor.author-dept資訊工程學系-
dc.date.embargo-lift2024-12-31-
顯示於系所單位:資訊工程學系

文件中的檔案:
檔案 大小格式 
ntu-111-2.pdf
授權僅限NTU校內IP使用(校園外請利用VPN校外連線服務)
22.57 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved