CTGAN: 基於 StyleGAN2 編碼器之條件式紋理貼圖生成器

潘奕廷; Yi-Ting Pan

Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/89711

Full metadata record

???org.dspace.app.webui.jsptag.ItemTag.dcfield???	Value	Language
dc.contributor.advisor	莊永裕	zh_TW
dc.contributor.advisor	Yung-Yu Chuang	en
dc.contributor.author	潘奕廷	zh_TW
dc.contributor.author	Yi-Ting Pan	en
dc.date.accessioned	2023-09-18T16:05:55Z	-
dc.date.available	2023-10-31	-
dc.date.copyright	2022-10-14	-
dc.date.issued	2022	-
dc.date.submitted	2002-01-01	-
dc.identifier.citation	A. Brock, J. Donahue, and K. Simonyan. Large scale GAN training for high fidelity natural image synthesis. In International Conference on Learning Representations, 2019. A. X. Chang, T. A. Funkhouser, L. J. Guibas, P. Hanrahan, Q. Huang, Z. Li, S. Savarese, M. Savva, S. Song, H. Su, J. Xiao, L. Yi, and F. Yu. Shapenet: An information-rich 3d model repository. CoRR, abs/1512.03012, 2015. A. Chen, R. Liu, L. Xie, Z. Chen, H. Su, and J. Yu. Sofgan: A portrait image generator with dynamic styling. ACM Transactions on Graphics (TOG), 41(1):126, 2022. X. Chen, H. Fan, R. Girshick, and K. He. Improved baselines with momentum contrastive learning. arXiv preprint arXiv:2003.04297, 2020. C.H. Chiu, Y. Koyama, Y.C. Lai, T. Igarashi, and Y. Yue. Humanintheloop differential subspace search in high-dimensional latent space. ACM Transactions on Graphics (TOG), 39:85:1 – 85:15, 2020. E. Collins, R. Bala, B. Price, and S. Süsstrunk. Editing in style: Uncovering the local semantics of GANs. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020. T. Doan, J. Monteiro, I. Albuquerque, B. Mazoure, A. Durand, J. Pineau, and R. D. Hjelm. Online adaptative curriculum learning for gans. CoRR, abs/1808.00020, 2018. I. P. Durugkar, I. Gemp, and S. Mahadevan. Generative multiadversarial networks. CoRR, abs/1611.01673, 2016. I. Goodfellow, J. PougetAbadie, M. Mirza, B. Xu, D. WardeFarley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In Advances in neural information processing systems, pages 2672–2680, 2014. J. Gu, L. Liu, P. Wang, and C. Theobalt. Stylenerf: A stylebased 3d aware generator for high-resolution image synthesis. In International Conference on Learning Representations, 2022. S. Guan, Y. Tai, B. Ni, F. Zhu, F. Huang, and X. Yang. Collaborative learning for faster stylegan embedding, 2020. M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilibrium. 2017. J. Johnson, A. Alahi, and L. FeiFei. Perceptual losses for realtime style transfer and super-resolution. volume 9906, pages 694–711, 10 2016. T. Karras, T. Aila, S. Laine, and J. Lehtinen. Progressive growing of gans for improved quality, stability, and variation, 2017. T. Karras, M. Aittala, J. Hellsten, S. Laine, J. Lehtinen, and T. Aila. Training generative adversarial networks with limited data. In Proc. NeurIPS, 2020. T. Karras, M. Aittala, S. Laine, E. Härkönen, J. Hellsten, J. Lehtinen, and T. Aila. Aliasfree generative adversarial networks. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P. Liang, and J. W. Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, pages 852–863. Curran Associates, Inc., 2021. T. Karras, S. Laine, and T. Aila. A stylebased generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4401–4410, 2019. T. Karras, S. Laine, M. Aittala, J. Hellsten, J. Lehtinen, and T. Aila. Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8110–8119, 2020. H. Kim, Y. Choi, J. Kim, S. Yoo, and Y. Uh. Exploiting spatial dimensions of latent in gan for real-time image editing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021. D. P. Kingma and J. Ba. Adam: A method for stochastic optimization, 2014. G. Lee, J. Yim, and M. Kim. Stylandgan: A stylegan based landscape image synthesis using depthmap. In CoRR, 2022. G. Mordido, H. Yang, and C. Meinel. Dropoutgan: Learning from a dynamic ensemble of discriminators. CoRR, abs/1807.11346, 2018. M. Oechsle, L. Mescheder, M. Niemeyer, T. Strauss, and A. Geiger. Texture fields: Learning texture representations in function space. In Proceedings IEEE International Conf. on Computer Vision (ICCV), 2019. X. Pan, A. Tewari, L. Liu, and C. Theobalt. Gan2x: Nonlambertian inverse rendering of image gans, 2022. T. Park, M.Y. Liu, T.C. Wang, and J.Y. Zhu. Semantic image synthesis with spatially-adaptive normalization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019. A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer. Automatic differentiation in pytorch. 2017. N. Ravi, J. Reizenstein, D. Novotny, T. Gordon, W.Y. Lo, J. Johnson, and G. Gkioxari. Accelerating 3d deep learning with pytorch3d. arXiv:2007.08501, 2020. E. Richardson, Y. Alaluf, O. Patashnik, Y. Nitzan, Y. Azar, S. Shapiro, and D. CohenOr. Encoding in style: a stylegan encoder for image-to-image translation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2021. Y. Shen, J. Gu, X. Tang, and B. Zhou. Interpreting the latent space of gans for semantic face editing. In CVPR, 2020. Y. Shen and B. Zhou. Closedform factorization of latent semantics in gans. In CVPR, 2021. Y. Siddiqui, J. Thies, F. Ma, Q. Shan, M. Nießner, and A. Dai. Texturify: Generating textures on 3d shape surfaces, 2022. F. Tan, S. Fanello, A. Meka, S. OrtsEscolano, D. Tang, R. Pandey, J. Taylor, P. Tan, and Y. Zhang. Voluxgan: A generative model for 3d face synthesis with hdri relighting, 2022. F. Wimbauer, S. Wu, and C. Rupprecht. Derendering 3d objects in the wild. In CVPR, 2022. Z. Wu, D. Lischinski, and E. Shechtman. Stylespace analysis: Disentangled controls for stylegan image generation. C. Yang, Y. Shen, and B. Zhou. Semantic hierarchy emerges in deep generative representations for scene synthesis. International Journal of Computer Vision, 2020. L. Yang, P. Luo, C. C. Loy, and X. Tang. A largescale car dataset for finegrained categorization and verification, 2015. R. Yu, Y. Dong, P. Peers, and X. Tong. Learning texture generators for 3d shape collections from internet photo sets, 2021. H. Zhang, I. Goodfellow, D. Metaxas, and A. Odena. Selfattention generative adversarial networks. In K. Chaudhuri and R. Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 7354–7363. PMLR, 09–15 Jun 2019. R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang. The unreasonable effectiveness of deep features as a perceptual metric. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 586–595, Los Alamitos, CA, USA, jun 2018. IEEE Computer Society. Y. Zhang, W. Chen, H. Ling, J. Gao, Y. Zhang, A. Torralba, and S. Fidler. Imagegans meet differentiable rendering for inverse graphics and interpretable 3d neural rendering. In International Conference on Learning Representations, 2021. Y. Zhang, H. Ling, J. Gao, K. Yin, J.F. Lafleche, A. Barriuso, A. Torralba, and S. Fidler. Datasetgan: Efficient labeled data factory with minimal human effort. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10145–10155, 2021. P. Zhu, R. Abdal, Y. Qin, and P. Wonka. Sean: Image synthesis with semantic region adaptive normalization. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/89711	-
dc.description.abstract	3D 模型最普遍的表示法由幾何 (geometry) 與紋理 (texture) 兩個資訊所組成，其中紋理資訊包含了多種不同的材質 (materials) 貼圖，透過改變材質的屬性資訊，相同的 3D 模型能呈現出不同的視覺感受。過去為 3D 模型製作一張貼圖需要專業人員耗費大量時間才能完成，近年來，隨著深度學習的興起，生成對抗網路（GAN）在電腦視覺與電腦圖學等領域中取得了巨大的進展，其中 StyleGAN2 因具備生成高解析度逼真合成影像的能力以及其架構對於影像潛在空間有著良好的操控潛力而備受關注。許多研究中已透過調整不同尺度下的風格碼輸入，達到控制合成影像不同屬性的生成結果。然而，儘管在 2D 合成影像上取得了驚人的結果，為 3D 模型生成逼真的紋理貼圖仍然是個困難的任務。在這篇論文中，我們使用 StyleGAN2 作為紋理貼圖生成器，對於使用者輸入的 3D 模型，我們透過六面圖投影的方式，將模型以 2D 圖片表示 3D 的紋理與形狀，利用 StyleGAN2 生成出高解析度逼真的紋理貼圖。此外，我們提出了一個基於語意資訊的 StyleGAN2編碼器，透過使用者對六面圖進行少量的語意標註，編碼器能根據輸入的語意資訊生成更加符合模型幾何結構的風格碼，使得 StyleGAN2 生成出更加貼合模型的紋理貼圖。我們也結合了 StyleGAN2 的風格編碼器，使用者能輸入指定的風格圖片來決定紋理貼圖的生成風格。我們的訓練流程中全程使用 2D 影像作為訓練資料，過程中並未使用到任何真實世界的 3D 資料，減少了訓練資料收集的難易度。經過我們的實驗證實，我們的方法相較於前人的方法有更好的生成結果。	zh_TW
dc.description.abstract	The representation of a 3D model consists of geometry and texture information, and texture information includes different types of materials. The same 3D model can be demonstrated with different visual effects by changing the value of attributes in materials. However, creating a texture is time-consuming and needs professional skills. In recent years, generative adversarial networks (GANs) have demonstrated compelling image synthesis results. Among them, StyleGAN2 has attracted much attention for its ability to generate photo-realistic images and disentangled latent space. Many research has proposed diverse approaches to control the generated results by manipulating latent codes. However, despite the impressive results on 2D images, generating photo-realistic textures for 3D models is still challenging. In this paper, we use StyleGAN2 as a texture generator. Given a 3D model, we first parameterize the 3D models to a 2D domain using the view-based texture projection parameterization, then generate the texture with a pretrained StyleGAN2 generator. We proposed a segmentation-based StyleGAN2 encoder, which encodes input segmentation maps to style codes that control the geometry part of generated results. We also employ another StyleGAN2 encoder for encoding the style images to style codes that control the style part of generated results. Our training process only uses 2D images as training data, reducing data preparation’s difficulty. In our experiments, we demonstrate that our method can generate better results compared with previous approaches.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-09-18T16:05:55Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2023-09-18T16:05:55Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	Verification Letter from the Oral Examination Committee i Acknowledgements ii 摘要iii Abstract v Contents vii List of Figures x List of Tables xii Chapter 1 Introduction 1 Chapter 2 Related Work 5 2.1 Image Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Texture Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 Latent Space Manipulation . . . . . . . . . . . . . . . . . . . . . . . 7 Chapter 3 Method 8 3.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.1.1 StyleGAN2 Network . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.2 Texture Parameterization . . . . . . . . . . . . . . . . . . . . . . . . 9 3.3 Observation on LTG . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.3.1 Lack of Geometry Details . . . . . . . . . . . . . . . . . . . . . . . 10 3.3.2 View and Style Inconsistency . . . . . . . . . . . . . . . . . . . . . 11 3.4 Overview of Our Architecture . . . . . . . . . . . . . . . . . . . . . 12 3.5 Texture Generator . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.6 StyleGAN2 Encoder . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.6.1 Encode Texture Styles to Style Codes . . . . . . . . . . . . . . . . 15 3.6.2 Encode 3D Models to Style Codes . . . . . . . . . . . . . . . . . . 16 3.7 Texture Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.8 Loss Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 Chapter 4 Experiments and Results 21 4.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.1.1 2D Images Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.1.2 Segmentation Maps with Pseudo Labeling . . . . . . . . . . . . . . 21 4.1.3 3D shapes Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.2 Implement details . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4.3 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4.3.1 Comparison Methods Description . . . . . . . . . . . . . . . . . . 24 4.4 Quantitative comparison . . . . . . . . . . . . . . . . . . . . . . . . 27 4.5 Qualitative comparison . . . . . . . . . . . . . . . . . . . . . . . . . 27 4.6 Ablation studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.6.1 Style Codes Manipulation . . . . . . . . . . . . . . . . . . . . . . . 28 4.6.2 The Architecture of Geometry Encoder . . . . . . . . . . . . . . . . 38 4.6.3 The Class Number of Segmentation Labels . . . . . . . . . . . . . . 38 Chapter 5 Limitations and Future Work 41 5.1 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 5.1.1 Self-Occlusion Issues of NonConvex Shapes . . . . . . . . . . . . 41 5.1.2 Seams Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 Chapter 6 Conclusion 44 References 45 Appendix A — Fine-tune Our Architecture 51 A.1 Three Steps Training . . . . . . . . . . . . . . . . . . . . . . . . . . 51 A.2 Train with 3D models . . . . . . . . . . . . . . . . . . . . . . . . . 54	-
dc.language.iso	zh_TW	-
dc.subject	紋理貼圖生成	zh_TW
dc.subject	編碼器	zh_TW
dc.subject	條件對抗式網路	zh_TW
dc.subject	電腦視覺	zh_TW
dc.subject	深度學習	zh_TW
dc.subject	Conditional GANs	en
dc.subject	Encoder	en
dc.subject	Computer Vision	en
dc.subject	Texture Generation	en
dc.subject	Deep Learning	en
dc.title	CTGAN: 基於 StyleGAN2 編碼器之條件式紋理貼圖生成器	zh_TW
dc.title	CTGAN: Learning Conditional Texture Generator via StyleGAN2 Encoder	en
dc.type	Thesis	-
dc.date.schoolyear	111-1	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	朱宏國;胡敏君;姚智原	zh_TW
dc.contributor.oralexamcommittee	Hung-Kuo Chu;Min-Chun Hu;Chih-Yuan Yao	en
dc.subject.keyword	紋理貼圖生成,條件對抗式網路,編碼器,電腦視覺,深度學習,	zh_TW
dc.subject.keyword	Texture Generation,Conditional GANs,Encoder,Computer Vision,Deep Learning,	en
dc.relation.page	55	-
dc.identifier.doi	10.6342/NTU202204204	-
dc.rights.note	同意授權(限校園內公開)	-
dc.date.accepted	2022-10-06	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	資訊工程學系	-
dc.date.embargo-lift	2023-10-31	-
Appears in Collections:	資訊工程學系

Files in This Item:

File	Size	Format
ntu-111-1.pdf Access limited in NTU ip range	22.86 MB	Adobe PDF

Show simple item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets