請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/94106完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 洪一平 | zh_TW |
| dc.contributor.advisor | Yi-Ping Hung | en |
| dc.contributor.author | 洪世彬 | zh_TW |
| dc.contributor.author | Shih-Ping Hung | en |
| dc.date.accessioned | 2024-08-14T16:43:17Z | - |
| dc.date.available | 2024-08-15 | - |
| dc.date.copyright | 2024-08-14 | - |
| dc.date.issued | 2024 | - |
| dc.date.submitted | 2024-08-07 | - |
| dc.identifier.citation | [1] M.A.Aliari,A.Beauchamp,T.Popa,andE.Paquette.Faceeditingusingpart-based optimization of the latent space. In Computer Graphics Forum, volume 42, pages 269–279. Wiley Online Library, 2023.
[2] T. Aumentado-Armstrong, S. Tsogkas, A. Jepson, and S. Dickinson. Geometric dis- entanglement for generative latent shape models. In Proceedings of the IEEE/CVF international conference on computer vision, pages 8181–8190, 2019. [3] T. Bagautdinov, C. Wu, J. Saragih, P. Fua, and Y. Sheikh. Modeling facial geometry using compositional vaes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3877–3886, 2018. [4] V. Blanz and T. Vetter. A morphable model for the synthesis of 3d faces. In Seminal Graphics Papers: Pushing the Boundaries, Volume 2, pages 157–164. 2023. [5] J.Booth,A.Roussos,A.Ponniah,D.Dunaway,andS.Zafeiriou.Largescale3dmor- phable models. International Journal of Computer Vision, 126(2):233–254, 2018. [6] G. Bouritsas, S. Bokhnyak, S. Ploumpis, M. Bronstein, and S. Zafeiriou. Neural 3d morphable models: Spiral convolutional networks for 3d shape representation learning and generation. In Proceedings of the IEEE/CVF international conference on computer vision, pages 7213–7222, 2019. [7] T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakan- tan, P. Shyam, G. Sastry, A. Askell, et al. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020. [8] C. Cao, Y. Weng, S. Zhou, Y. Tong, and K. Zhou. Facewarehouse: A 3d facial expression database for visual computing. IEEE Transactions on Visualization and Computer Graphics, 20(3):413–425, 2013. [9] T. Chen, S. Kornblith, M. Norouzi, and G. Hinton. A simple framework for con- trastive learning of visual representations. In International conference on machine learning, pages 1597–1607. PMLR, 2020. [10] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018. [11] S. Foti, B. Koo, D. Stoyanov, and M. J. Clarkson. 3d shape variational au- toencoder latent disentanglement via mini-batch feature swapping for bodies and faces. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 18730–18739, 2022. [12] S. Foti, B. Koo, D. Stoyanov, and M. J. Clarkson. 3d generative model latent dis- entanglement via local eigenprojection. In Computer Graphics Forum, volume 42, page e14793. Wiley Online Library, 2023. [13] D. Ghafourzadeh, S. Fallahdoust, C. Rahgoshay, A. Beauchamp, A. Aubame, T. Popa, and E. Paquette. Local control editing paradigms for part-based 3d face morphable models. Computer Animation and Virtual Worlds, 32(6):e2028, 2021. [14] D. Ghafourzadeh, C. Rahgoshay, S. Fallahdoust, A. Aubame, A. Beauchamp, T. Popa, and E. Paquette. Part-based 3d face morphable model with anthropometric local control. In Graphics Interface 2020, 2019. [15] J.-B.Grill,F.Strub,F.Altché,C.Tallec,P.Richemond,E.Buchatskaya,C.Doersch, B. Avila Pires, Z. Guo, M. Gheshlaghi Azar, et al. Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems, 33:21271–21284, 2020. [16] K. He, X. Chen, S. Xie, Y. Li, P. Dollár, and R. Girshick. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16000–16009, 2022. [17] K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick. Momentum contrast for unsuper- vised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9729–9738, 2020. [18] Y. Jung, W. Jang, S. Kim, J. Yang, X. Tong, and S. Lee. Deep deformable 3d caricatures with learned shape control. In ACM SIGGRAPH 2022 Conference Proceedings, pages 1–9, 2022. [19] D. P. Kingma and M. Welling. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013. [20] T.Li,T.Bolkart,M.J.Black,H.Li,andJ.Romero.Learningamodeloffacialshape and expression from 4d scans. ACM Trans. Graph., 36(6):194–1, 2017. [21] Y. Liang, S. Zhao, B. Yu, J. Zhang, and F. He. Meshmae: Masked autoencoders for 3d mesh data analysis. In European Conference on Computer Vision, pages 37–54. Springer, 2022. [22] I. Maxon. ZBrush, 2024. Accessed: 2024-07-15. [23] A. Radford, K. Narasimhan, T. Salimans, I. Sutskever, et al. Improving language understanding by generative pre-training. 2018. [24] T. W. Sederberg and S. R. Parry. Free-form deformation of solid geometric models. In Proceedings of the 13th annual conference on Computer graphics and interactive techniques, pages 151–160, 1986. [25] H. Su, X. Liu, J. Niu, J. Wan, and X. Wu. 3deformer: A common framework for image-guided mesh deformation. arXiv preprint arXiv:2307.09892, 2023. [26] Q. Tan, L. Gao, Y.-K. Lai, and S. Xia. Variational autoencoders for deforming 3d mesh models. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5841–5850, 2018. [27] M. Tarasiou, R. A. Potamias, E. O’Sullivan, S. Ploumpis, and S. Zafeiriou. Locally adaptive neural 3d morphable models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1867–1876, 2024. [28] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017. | - |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/94106 | - |
| dc.description.abstract | 在面部編輯領域中,現有方法多採用基於最佳化的技術來調整編輯結果。雖 然這類方法能達到不錯的編輯效果,但在效率上卻遠不及直接推論。然而,由於 缺乏合適的訓練資料,使得面部編輯任務難以進行監督式學習,也就難以做到直 接推論。我們提出了一種創新的兩階段自監督式深度學習方法,成功克服面部編 輯領域中無法直接推論的困難。
為了有效處理面部網格數據,我們提出了一種隨機聚類方法,將整個面部網 格劃分為數個相同面數的區塊。這種細粒度的劃分使模型能夠捕捉更豐富的面部 特徵信息,同時減少人為偏見,提高模型的表達能力和適應性。我們的模型架構 採用了編碼器-解碼器結構,在預訓練過程中,模型隨機遮罩部分面部區塊,編碼 器學習未遮罩區域的特徵,而解碼器則嘗試重建被遮罩的部分。這種方法使得模 型能夠捕捉面部區塊之間的關聯性,而非僅僅記憶面部區塊內部的特徵。在微調 階段,模型引入了控制點的概念,使用兩組不同的 3D 面部網格進行訓練。通過 凍結預訓練的編碼器,並結合控制點特徵,模型學會了根據給定的控制點生成精 確的面部編輯結果。為了優化模型性能,我們設計了新的損失函數,確保編輯結 果的準確性和結構完整性。實驗結果表明,我們的模型能夠在約 0.01 秒內生成高 品質的編輯後面部網格,大幅提升了面部編輯的效率和互動性。我們提出的自監 督式深度學習方法在編輯效果和操作便利性方面與過去的方法相比取得顯著進 步,為實時面部編輯應用提供了新的可能性。 | zh_TW |
| dc.description.abstract | In the field of facial editing, existing methods often rely on optimization-based techniques to adjust the editing results. While these methods can achieve good editing effects, they are far less efficient than direct inference. However, the lack of suitable training data makes it difficult to perform supervised learning for facial editing tasks, making direct inference challenging. We propose an innovative two-stage self-supervised deep learning method that successfully overcomes the difficulty of direct inference in the field of facial editing.
To effectively handle facial mesh data, we propose a random clustering method that divides the entire facial mesh into several blocks with the same number of faces. This fine-grained division allows the model to capture richer facial feature information, while reducing human bias and improving the model’s expressiveness and adaptability. Our model architecture adopts an encoder-decoder structure. During the pre-training process, the model randomly masks parts of the facial blocks. The encoder learns the features of the unmasked areas, while the decoder attempts to reconstruct the masked parts. This ap- proach enables the model to capture the relationships between facial blocks, rather than merely memorizing the features within the facial blocks. In the fine-tuning stage, the model introduces the concept of control points and uses two sets of different 3D facial meshes for training. By freezing the pre-trained encoder and combining the features of the control points, the model learns to generate precise facial editing results based on the given control points. To optimize the model’s performance, we designed a new loss function to ensure the accuracy and structural integrity of the editing results. Experimental results show that our model can generate high-quality edited facial meshes in approximately 0.01 seconds, significantly improving the efficiency and interactivity of facial editing. Our pro- posed self-supervised deep learning method makes significant progress in editing effects and operational convenience compared to previous methods, providing new possibilities for real-time facial editing applications. | en |
| dc.description.provenance | Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-08-14T16:43:17Z No. of bitstreams: 0 | en |
| dc.description.provenance | Made available in DSpace on 2024-08-14T16:43:17Z (GMT). No. of bitstreams: 0 | en |
| dc.description.tableofcontents | Verification Letter from the Oral Examination Committee I
Acknowledgements iii 摘要 v Abstract vii Contents ix List of Figures xiii List of Tables xv Chapter 1 Introduction 1 Chapter 2 Related Work 5 2.1 Self-SupervisedLearning........................ 6 2.2 FacialEditing.............................. 7 2.2.1 3DMorphableModels ........................ 7 2.2.2 DeepLearning............................. 8 Chapter 3 Proposed Method 11 3.1 FacePatches .............................. 11 3.2 ControlPoints ............................. 14 3.3 NetworkArchitecture.......................... 15 3.3.1 Inputs&Outputs ........................... 16 3.3.2 Tokenization.............................. 17 3.3.3 Encoder&Decoder.......................... 17 3.4 Training................................. 18 3.4.1 Pre-training .............................. 18 3.4.2 Fine-tuning .............................. 19 3.5 LossFunction.............................. 20 Chapter 4 Experiments 25 4.1 ImplementationDetails......................... 25 4.1.1 Dataset................................. 25 4.1.2 DataAugmentation .......................... 25 4.1.3 DataPreprocessing .......................... 26 4.1.4 Training................................ 26 4.2 ExperimentalResults.......................... 27 4.2.1 FaceReconstruction.......................... 27 4.2.2 FacialEditing ............................. 29 4.2.3 FacialExpressionEditing....................... 34 4.3 AblationStudies ............................ 35 4.4 FacialEditingApplication ....................... 42 Chapter 5 Conclusion & Future Works 45 5.1 Limitation................................ 45 5.2 Conclusion ............................... 46 5.3 FutureWorks.............................. 47 References 49 | - |
| dc.language.iso | en | - |
| dc.subject | 面部編輯 | zh_TW |
| dc.subject | 3D 面部模型 | zh_TW |
| dc.subject | 自監督式學習 | zh_TW |
| dc.subject | 遮罩式自動編碼器 | zh_TW |
| dc.subject | Facial Editing | en |
| dc.subject | 3D face model | en |
| dc.subject | Self Supervised Learning | en |
| dc.subject | Masked Autoencoder | en |
| dc.title | 基於自監督式學習之即時臉部編輯 | zh_TW |
| dc.title | Real-Time Facial Editing through Self-Supervised Learning | en |
| dc.type | Thesis | - |
| dc.date.schoolyear | 112-2 | - |
| dc.description.degree | 碩士 | - |
| dc.contributor.oralexamcommittee | 李明穗;林維暘;蘇柏齊 | zh_TW |
| dc.contributor.oralexamcommittee | Ming-Sui Lee;Wei-Yang Lin;Po-Chyi Su | en |
| dc.subject.keyword | 面部編輯,3D 面部模型,自監督式學習,遮罩式自動編碼器, | zh_TW |
| dc.subject.keyword | Facial Editing,3D face model,Self Supervised Learning,Masked Autoencoder, | en |
| dc.relation.page | 52 | - |
| dc.identifier.doi | 10.6342/NTU202403712 | - |
| dc.rights.note | 未授權 | - |
| dc.date.accepted | 2024-08-10 | - |
| dc.contributor.author-college | 電機資訊學院 | - |
| dc.contributor.author-dept | 資訊工程學系 | - |
| 顯示於系所單位: | 資訊工程學系 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-112-2.pdf 未授權公開取用 | 13.37 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
