請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/85110完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 傅立成(Li-Chen Fu) | |
| dc.contributor.author | Febrina Wijaya | en |
| dc.contributor.author | 黃慈均 | zh_TW |
| dc.date.accessioned | 2023-03-19T22:44:16Z | - |
| dc.date.copyright | 2022-08-23 | |
| dc.date.issued | 2022 | |
| dc.date.submitted | 2022-08-11 | |
| dc.identifier.citation | [1] H. Nesher Shoshan and W. Wehrt, 'Understanding “Zoom fatigue”: A mixed-method approach,' Applied Psychology, vol. 71, no. 3, p. 25, doi: https://doi.org/10.1111/apps.12360. [2] C. Blehm, S. Vishnu, A. Khattak, S. Mitra, and R. W. Yee, 'Computer Vision Syndrome: A Review,' Survey of Ophthalmology, vol. 50, no. 3, pp. 253-262, 2005/05/01/ 2005, doi: https://doi.org/10.1016/j.survophthal.2005.02.008. [3] G. Pavlakos, V. Choutas, N. Ghorbani, T. Bolkart, A. A. Osman, D. Tzionas, and M. J. Black, 'Expressive body capture: 3d hands, face, and body from a single image,' in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 10975-10985. [4] R. Li, Y. Xiu, S. Saito, Z. Huang, K. Olszewski, and H. Li, 'Monocular real-time volumetric performance capture,' in European Conference on Computer Vision, 2020: Springer, pp. 49-67. [5] Y. Deng, J. Yang, S. Xu, D. Chen, Y. Jia, and X. Tong, 'Accurate 3d face reconstruction with weakly-supervised learning: From single image to image set,' in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2019, pp. 0-0. [6] Y. Feng, H. Feng, M. J. Black, and T. Bolkart, 'Learning an animatable detailed 3D face model from in-the-wild images,' ACM Transactions on Graphics (TOG), vol. 40, no. 4, pp. 1-13, 2021. [7] S. Sanyal, T. Bolkart, H. Feng, and M. J. Black, 'Learning to regress 3D face shape and expression from an image without 3D supervision,' in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 7763-7772. [8] A. Tewari, M. Zollhofer, H. Kim, P. Garrido, F. Bernard, P. Perez, and C. Theobalt, 'Mofa: Model-based deep convolutional face autoencoder for unsupervised monocular reconstruction,' in Proceedings of the IEEE International Conference on Computer Vision Workshops, 2017, pp. 1274-1283. [9] F.-J. Chang, A. T. Tran, T. Hassner, I. Masi, R. Nevatia, and G. Medioni, 'Expnet: Landmark-free, deep, 3d facial expressions,' in 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), 2018: IEEE, pp. 122-129. [10] X. Tu, J. Zhao, Z. Jiang, Y. Luo, M. Xie, Y. Zhao, L. He, Z. Ma, and J. Feng, 'Joint 3d face reconstruction and dense face alignment from a single image with 2d-assisted self-supervised learning,' arXiv preprint arXiv:1903.09359, vol. 1, no. 2, 2019. [11] A. S. Jackson, A. Bulat, V. Argyriou, and G. Tzimiropoulos, 'Large pose 3D face reconstruction from a single image via direct volumetric CNN regression,' in Proceedings of the IEEE international conference on computer vision, 2017, pp. 1031-1039. [12] P. Dou, S. K. Shah, and I. A. Kakadiaris, 'End-to-end 3D face reconstruction with deep neural networks,' in proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 5908-5917. [13] Y. Feng, F. Wu, X. Shao, Y. Wang, and X. Zhou, 'Joint 3d face reconstruction and dense alignment with position map regression network,' in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 534-551. [14] R. Alp Guler, G. Trigeorgis, E. Antonakos, P. Snape, S. Zafeiriou, and I. Kokkinos, 'Densereg: Fully convolutional dense shape regression in-the-wild,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 6799-6808. [15] H. Wei, S. Liang, and Y. Wei, '3d dense face alignment via graph convolution networks,' arXiv preprint arXiv:1904.05562, 2019. [16] V. F. Abrevaya, A. Boukhayma, P. H. Torr, and E. Boyer, 'Cross-modal deep face normals with deactivable skip connections,' in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 4979-4989. [17] V. Blanz and T. Vetter, 'A morphable model for the synthesis of 3D faces,' in Proceedings of the 26th annual conference on Computer graphics and interactive techniques, 1999, pp. 187-194. [18] P. Paysan, R. Knothe, B. Amberg, S. Romdhani, and T. Vetter, 'A 3D face model for pose and illumination invariant face recognition,' in 2009 sixth IEEE international conference on advanced video and signal based surveillance, 2009: Ieee, pp. 296-301. [19] T. Li, T. Bolkart, M. J. Black, H. Li, and J. Romero, 'Learning a model of facial shape and expression from 4D scans,' ACM Trans. Graph., vol. 36, no. 6, pp. 194:1-194:17, 2017. [20] D. Schott, P. Saalfeld, G. Schmidt, F. Joeres, C. Boedecker, F. Huettl, H. Lang, T. Huber, B. Preim, and C. Hansen, 'A vr/ar environment for multi-user liver anatomy education,' in 2021 IEEE Virtual Reality and 3D User Interfaces (VR), 2021: IEEE, pp. 296-305. [21] S. Thanyadit, P. Punpongsanon, and T.-C. Pong, 'ObserVAR: Visualization system for observing virtual reality users using augmented reality,' in 2019 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), 2019: IEEE, pp. 258-268. [22] T. Mahmood, W. Fulmer, N. Mungoli, J. Huang, and A. Lu, 'Improving information sharing and collaborative analysis for remote geospatial visualization using mixed reality,' in 2019 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), 2019: IEEE, pp. 236-247. [23] P. Sasikumar, M. Collins, H. Bai, and M. Billinghurst, 'XRTB: A Cross Reality Teleconference Bridge to incorporate 3D interactivity to 2D Teleconferencing,' in Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems, 2021, pp. 1-4. [24] T. Piumsomboon, G. A. Lee, J. D. Hart, B. Ens, R. W. Lindeman, B. H. Thomas, and M. Billinghurst, 'Mini-me: An adaptive avatar for mixed reality remote collaboration,' in Proceedings of the 2018 CHI conference on human factors in computing systems, 2018, pp. 1-13. [25] Y. Wu, Y. Wang, S. Jung, S. Hoermann, and R. W. Lindeman, 'Using a fully expressive avatar to collaborate in virtual reality: Evaluation of task performance, presence, and attraction,' Frontiers in Virtual Reality, p. 10, 2021. [26] S. Orts-Escolano, C. Rhemann, S. Fanello, W. Chang, A. Kowdle, Y. Degtyarev, D. Kim, P. L. Davidson, S. Khamis, and M. Dou, 'Holoportation: Virtual 3d teleportation in real-time,' in Proceedings of the 29th annual symposium on user interface software and technology, 2016, pp. 741-754. [27] M. Loper, N. Mahmood, J. Romero, G. Pons-Moll, and M. J. Black, 'SMPL: A skinned multi-person linear model,' ACM transactions on graphics (TOG), vol. 34, no. 6, pp. 1-16, 2015. [28] J. Romero, D. Tzionas, and M. J. Black, 'Embodied hands: Modeling and capturing hands and bodies together,' arXiv preprint arXiv:2201.02610, 2022. [29] Y. Rong, T. Shiratori, and H. Joo, 'Frankmocap: A monocular 3d whole-body pose estimation system via regression and integration,' in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 1749-1759. [30] A. Kreskowski, S. Beck, and B. Froehlich, 'Output-Sensitive Avatar Representations for Immersive Telepresence,' IEEE Transactions on Visualization and Computer Graphics, 2020. [31] S. Saito, T. Simon, J. Saragih, and H. Joo, 'Pifuhd: Multi-level pixel-aligned implicit function for high-resolution 3d human digitization,' in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 84-93. [32] S. Saito, Z. Huang, R. Natsume, S. Morishima, A. Kanazawa, and H. Li, 'Pifu: Pixel-aligned implicit function for high-resolution clothed human digitization,' in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 2304-2314. [33] W. E. Lorensen and H. E. Cline, 'Marching cubes: A high resolution 3D surface construction algorithm,' ACM siggraph computer graphics, vol. 21, no. 4, pp. 163-169, 1987. [34] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, 'Gradient-based learning applied to document recognition,' Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998, doi: 10.1109/5.726791. [35] K. He, X. Zhang, S. Ren, and J. Sun, 'Deep residual learning for image recognition,' in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778. [36] R. Ramamoorthi and P. Hanrahan, 'An efficient representation for irradiance environment maps,' in Proceedings of the 28th annual conference on Computer graphics and interactive techniques, 2001, pp. 497-500. [37] S. Wold, K. Esbensen, and P. Geladi, 'Principal component analysis,' Chemometrics and intelligent laboratory systems, vol. 2, no. 1-3, pp. 37-52, 1987. [38] G. Levi and T. Hassner, 'Age and gender classification using convolutional neural networks,' in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2015, pp. 34-42. [39] S. C. Hung, J.-H. Lee, T. S. Wan, C.-H. Chen, Y.-M. Chan, and C.-S. Chen, 'Increasingly packing multiple facial-informatics modules in a unified deep-learning model via lifelong learning,' in Proceedings of the 2019 on International Conference on Multimedia Retrieval, 2019, pp. 339-343. [40] C.-Y. Hung, C.-H. Tu, C.-E. Wu, C.-H. Chen, Y.-M. Chan, and C.-S. Chen, 'Compacting, picking and growing for unforgetting continual learning,' Advances in Neural Information Processing Systems, vol. 32, 2019. [41] T. Kim, 'Generalizing MLPs With Dropouts, Batch Normalization, and Skip Connections,' arXiv preprint arXiv:2108.08186, 2021. [42] J. Technologies. 'Jorjin J7EF Plus AR Glasses.' https://www.jorjin.com/products-zh-hant/ar-smart%E2%80%A8glasses-product-zh-hant/j-reality-zh-hant/j-reality-j7ef-plus/?lang=zh-hant (accessed 21 July, 2022). [43] J. Postel, 'User datagram protocol,' 1980. [44] J. Postel, 'Transmission control protocol,' 1981. [45] G. K. Wallace, 'The JPEG still picture compression standard,' IEEE transactions on consumer electronics, vol. 38, no. 1, pp. xviii-xxxiv, 1992. [46] J. K. Haas, 'A history of the unity game engine,' 2014. Worcester Polytechnic Institute. [47] 'Augmented Reality (Easy AR).' https://www.easyar.com/ (accessed 4 May, 2022). [48] 'Augmented Reality (Vuforia).' http://www.vuforia.com (accessed 4 May, 2022). [49] 'Blender.' https://www.blender.org/ (accessed 7 July, 2022). [50] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, and L. Antiga, 'Pytorch: An imperative style, high-performance deep learning library,' Advances in neural information processing systems, vol. 32, 2019. [51] G. vanRossum, 'Python reference manual,' Department of Computer Science [CS], no. R 9525, 1995. [52] 'Jorjin Technologies, Inc.' https://www.jorjin.com/ (accessed 19 May, 2022). [53] T. Karras, T. Aila, S. Laine, and J. Lehtinen, 'Progressive growing of gans for improved quality, stability, and variation,' arXiv preprint arXiv:1710.10196, 2017. [54] 'Generated Photos.' https://generated.photos (accessed 23 May, 2022). [55] R. D. Larsen, 'Box-and-whisker plots,' Journal of Chemical Education, vol. 62, no. 4, p. 302, 1985. [56] U. R. Muhammad, M. Svanera, R. Leonardi, and S. Benini, 'Hair detection, segmentation, and hairstyle classification in the wild,' Image and Vision Computing, vol. 71, pp. 25-37, 2018. [57] L. Hu, S. Saito, L. Wei, K. Nagano, J. Seo, J. Fursund, I. Sadeghi, C. Sun, Y.-C. Chen, and H. Li, 'Avatar digitization from a single image for real-time rendering,' ACM Transactions on Graphics (ToG), vol. 36, no. 6, pp. 1-14, 2017. [58] N. D. Reddy, L. Guigues, L. Pishchulin, J. Eledath, and S. G. Narasimhan, 'Tessetrack: End-to-end learnable multi-person articulated 3d pose tracking,' in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 15190-15200. [59] R. Bashirov, A. Ianina, K. Iskakov, Y. Kononenko, V. Strizhkova, V. Lempitsky, and A. Vakhitov, 'Real-Time RGBD-Based Extended Body Pose Estimation,' in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2021, pp. 2807-2816. [60] Y. Jiang, T. Van Wouwe, F. De Groote, and C. K. Liu, 'Synthesis of biologically realistic human motion using joint torque actuation,' ACM Transactions On Graphics (TOG), vol. 38, no. 4, pp. 1-12, 2019. [61] K. Nagano, H. Luo, Z. Wang, J. Seo, J. Xing, L. Hu, L. Wei, and H. Li, 'Deep face normalization,' ACM Transactions on Graphics (TOG), vol. 38, no. 6, pp. 1-16, 2019. [62] J. Li, Z. Kuang, Y. Zhao, M. He, K. Bladin, and H. Li, 'Dynamic facial asset and rig generation from a single scan,' ACM Trans. Graph., vol. 39, no. 6, pp. 215:1-215:18, 2020. | |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/85110 | - |
| dc.description.abstract | 近年來,擴增實境 (AR) 引起了廣泛的關注。其中一種應用是用於擴增實境遠程會議。在這種以臨場體驗為重的應用中,我們期望能夠看到逼真的虛擬分身,並能在互動的過程中觀察到彼此的面部表情變化。然而,目前絕大多數的多使用者擴增實境系統是使用具靜態面部表情之虛擬分身,或是具預設面部動畫之虛擬分身來代表每個使用者。在電腦視覺領域中有許多研究提出基於深度學習的方法從單張影像中重建三維人臉模型,但目前尚未出現關於如何從三維人臉模型產生全身虛擬分身,以及如何將其實際運用到實時 AR/VR 應用中的相關研究。 在這篇論文中,我們提出了一個完整的系統架構來生成逼真的虛擬分身,並以最小的延遲持續更新虛擬分身的面部表情。首先,我們採用在當前最新的人臉重建技術,DECA,根據單張使用者的面部影像,初始化並重建出逼真的頭部模型。透過整合基於深度學習的性別分類器,系統將建立三維身體模型,以呈現更好的虛擬分身視覺效果。接著將頭部模型與身體模型進行合併來產生使用者的全身虛擬分身。在進行擴增實境遠程會議的過程中,我們利用筆記型電腦上的攝像頭來捕捉使用者的即時面部影像,並實時更新虛擬分身的面部表情及姿勢。透過此方法,使用者就能即時地觀察到彼此的面部表情變化。 我們在擴增實境遠程會議的應用場景中實現了即時虛擬分身重建系統。結果顯示本系統能呈現良好的視覺效果。另外,我們招募了7位受試者來體驗AR遠程會議中不同的虛擬分身,並以問卷調查的方式對系統進行評估。結果顯示我們的系統能有效的增進擴增實境協作系統中的臨場感並提供更好的使用者體驗。 | zh_TW |
| dc.description.abstract | Augmented reality (AR) technologies have attracted a lot of attention in recent years. One of its application is for AR teleconferencing. In this kind of multi-user AR systems where social presence plays an important role, we prefer to see realistic 3D avatars and observe the facial expression of the other remote users during the interaction. However, in most of the existing systems, the avatars are represented either by predefined 3D virtual characters with static face, or dynamic face with simple predefined animations. Many deep-learning based methods are proposed to reconstruct 3D face from single RGB image, but they do not investigate how to integrate the reconstructed face into full-body avatars, and how to efficiently apply the model into real-time AR/VR applications. In this thesis, we present a complete framework to generate realistic avatars of users, and efficiently update the facial expression of the avatars with minimized latency. We adopt a state-of-the-art face reconstruction model, DECA, to first reconstruct the realistic head model from a profile picture. To enhance the appearance of the avatars, we integrate a gender classifier module to generate the body mesh. The head model will then be merged together with the body model to provide a full-body avatar representation. During the AR teleconference, we update the facial expression and pose of users through images captured from webcam. This approach allows users to be aware of each other’s facial expression in real-time. To validate the effectiveness of our system, we conduct a user study to evaluate different types of avatar representations in AR teleconference scenario. The result shows that our approach significantly improves the social presence in the AR collaboration and the real-time avatar update framework can run in 10 fps. | en |
| dc.description.provenance | Made available in DSpace on 2023-03-19T22:44:16Z (GMT). No. of bitstreams: 1 U0001-0206202210483800.pdf: 5610045 bytes, checksum: d177cbe040dbd70bba56fd9dbd42483c (MD5) Previous issue date: 2022 | en |
| dc.description.tableofcontents | 口試委員會審定書 i 誌謝 ii 中文摘要 iii ABSTRACT iv CONTENTS v LIST OF FIGURES viii LIST OF TABLES xi Chapter 1 Introduction 1 1.1 Background 1 1.2 Motivation 3 1.3 Related work 5 1.3.1 Monocular 3D Face Reconstruction 5 1.3.2 Avatar Representation in AR/VR Remote Collaboration Systems 7 1.3.3 Realistic Full-Body Avatar Generation 8 1.4 Contribution 10 1.5 Thesis organization 11 Chapter 2 Preliminaries 13 2.1 FLAME Face Model 13 2.2 Convolutional Neural Network 14 2.2.1 Basic Components 15 2.2.2 ResNet 20 2.3 3D Face Reconstruction Frameworks 22 2.3.1 DECA 22 2.4 3D Model Representation 25 2.4.1 Texture Mapping 26 2.4.2 OBJ File Format 27 2.5 SMPL-X Body Model 29 2.6 Gender Classification Network 30 Chapter 3 Methodology 33 3.1 System Overview 33 3.2 Module Architecture 35 3.2.1 Offline Reconstruction Module 35 3.2.2 Online Reconstruction Module 37 3.2.3 Head and Body Model Merging 39 3.3 Real-Time Implementation in AR Teleconference Application 46 3.3.1 Server-Client Communication 46 3.3.2 AR Application Development 49 Chapter 4 Experiments 54 4.1 User Study 54 4.1.1 Experimental Setup 54 4.1.2 Participants 59 4.1.3 Procedure 59 4.1.4 Results and Discussion 61 4.2 Qualitative Results 65 4.2.1 Gender Classification Results 65 4.2.2 Body Skin Tone Adjustment Results 66 4.2.3 Adding Hair Model to Avatar 68 4.2.4 Robustness to Occlusions 69 4.3 Runtime Performance Evaluation 71 4.3.1 Experimental Setup 72 4.3.2 Results and Discussion 73 Chapter 5 Conclusions 75 REFERENCE 77 | |
| dc.language.iso | en | |
| dc.subject | 擴增實境 | zh_TW |
| dc.subject | 三維人臉重建 | zh_TW |
| dc.subject | 虛擬分身 | zh_TW |
| dc.subject | Realistic Avatar | en |
| dc.subject | 3D Face Reconstruction | en |
| dc.subject | Augmented Reality | en |
| dc.title | 具即時表情之三維虛擬分身重建系統應用於擴增實境遠程會議 | zh_TW |
| dc.title | Comprehensive 3D Avatar Reconstruction System with Real-Time Expression for Teleconference Application in Augmented Reality | en |
| dc.type | Thesis | |
| dc.date.schoolyear | 110-2 | |
| dc.description.degree | 碩士 | |
| dc.contributor.oralexamcommittee | 歐陽明(Ming Ouhyoung),洪一平(Yi-Ping Hung),莊永裕(Yung-Yu Chuang) | |
| dc.subject.keyword | 三維人臉重建,虛擬分身,擴增實境, | zh_TW |
| dc.subject.keyword | 3D Face Reconstruction,Realistic Avatar,Augmented Reality, | en |
| dc.relation.page | 84 | |
| dc.identifier.doi | 10.6342/NTU202200859 | |
| dc.rights.note | 同意授權(限校園內公開) | |
| dc.date.accepted | 2022-08-12 | |
| dc.contributor.author-college | 電機資訊學院 | zh_TW |
| dc.contributor.author-dept | 資訊工程學研究所 | zh_TW |
| dc.date.embargo-lift | 2025-08-10 | - |
| 顯示於系所單位: | 資訊工程學系 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| U0001-0206202210483800.pdf 授權僅限NTU校內IP使用(校園外請利用VPN校外連線服務) | 5.48 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
