網格用於高斯潑濺的即時動態場景渲染

吳軍霆; Chun-Tin Wu

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/94488

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	楊奕軒	zh_TW
dc.contributor.advisor	Yi-Hsuan Yang	en
dc.contributor.author	吳軍霆	zh_TW
dc.contributor.author	Chun-Tin Wu	en
dc.date.accessioned	2024-08-16T16:19:41Z	-
dc.date.available	2024-08-17	-
dc.date.copyright	2024-08-16	-
dc.date.issued	2024	-
dc.date.submitted	2024-08-14	-
dc.identifier.citation	[1] B. Attal, J.-B. Huang, C. Richardt, M. Zollh ̈ofer, J. Kopf, M. O’Toole, andC. Kim. Hyperreel: High-fidelity 6-dof video with ray-conditioned sampling.arXiv preprint arXiv:2301.02238, 2023. [2] J. T. Barron, B. Mildenhall, M. Tancik, P. Hedman, R. Martin-Brualla, andP. P. Srinivasan. Mip-nerf: A multiscale representation for anti-aliasing neuralradiance fields. In Proceedings of the IEEE/CVF International Conference onComputer Vision (ICCV), pages 5855–5864, 2021. [3] A. Cao and J. Johnson. Hexplane: A fast representation for dynamic scenes.In Proceedings of the IEEE/CVF Conference on Computer Vision and PatternRecognition (CVPR), pages 130–141, 2023. [4] J. Fang, T. Yi, X. Wang, L. Xie, X. Zhang, W. Liu, M. Nießner, and Q. Tian.Fast dynamic radiance fields with time-aware neural voxels. arXiv preprintarXiv:2205.15285, 2022. [5] S. Fridovich-Keil, G. Meanti, F. Warburg, B. Recht, and A. Kanazawa. K-planes: Explicit radiance fields in space, time, and appearance. arXiv preprintarXiv:2301.10241, 2023. [6] X. Gao, J. Yang, J. Kim, S. Peng, Z. Liu, and X. Tong. Mps-nerf: Generalizable3d human rendering from multiview images. IEEE Transactions on PatternAnalysis and Machine Intelligence, 2022. [7] X. Guo, J. Sun, Y. Dai, G. Chen, X. Ye, X. Tan, E. Ding, Y. Zhang, andJ. Wang. Forward flow for novel view synthesis of dynamic scenes. In Proceed-ings of the IEEE/CVF International Conference on Computer Vision (ICCV),pages 16022–16033, 2023. [8] B. Kerbl, G. Kopanas, T. Leimkuhler, and G. Drettakis. 3d gaussian splattingfor real-time radiance field rendering. ACM Transactions on Graphics (ToG),42(4):1–14, 2023.26 [9] T. Li, M. Slavcheva, M. Zollhoefer, S. Green, C. Lassner, C. Kim, T. Schmidt,S. Lovegrove, M. Goesele, R. Newcombe, et al. Neural 3d video synthesis frommulti-view video. In Proceedings of the IEEE/CVF Conference on ComputerVision and Pattern Recognition (CVPR), pages 5521–5531, 2022. [10] H. Lin, S. Peng, Z. Xu, T. Xie, X. He, H. Bao, and X. Zhou. High-fidelityand real-time novel view synthesis for dynamic scenes. In SIGGRAPH AsiaConference Proceedings, 2023. [11] Y.-L. Liu, C. Gao, A. Meuleman, H.-Y. Tseng, A. Saraf, C. Kim, Y.-Y. Chuang,J. Kopf, and J.-B. Huang. Robust dynamic radiance fields. In Proceedings of theIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),pages 13–23, 2023. [12] S. Lombardi, T. Simon, G. Schwartz, M. Zollhoefer, Y. Sheikh, and J. Saragih.Mixture of volumetric primitives for efficient neural rendering. ACM Transac-tions on Graphics (ToG), 40(4):1–13, 2021. [13] T. Lu, M. Yu, L. Xu, Y. Xiangli, L. Wang, D. Lin, and B. Dai. Scaffold-gs: Structured 3d gaussians for view-adaptive rendering. arXiv preprintarXiv:2312.00109, 2023. [14] B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, andR. Ng. Nerf: Representing scenes as neural radiance fields for view synthesis. InProceedings of the European Conference on Computer Vision (ECCV), pages405–421. Springer, 2020. [15] K. Park, U. Sinha, J. T. Barron, S. Bouaziz, D. B. Goldman, S. M. Seitz, andR. Martin-Brualla. Nerfies: Deformable neural radiance fields. In Proceedings ofthe IEEE/CVF International Conference on Computer Vision (ICCV), pages5865–5874, 2021. [16] K. Park, U. Sinha, P. Hedman, J. T. Barron, S. Bouaziz, D. B. Goldman,R. Martin-Brualla, and S. M. Seitz. Hypernerf: A higher-dimensional rep-resentation for topologically varying neural radiance fields. arXiv preprintarXiv:2106.13228, 2021. [17] S. Peng, Y. Yan, Q. Shuai, H. Bao, and X. Zhou. Representing volumetricvideos as dynamic mlp maps. In Proceedings of the IEEE/CVF Conference onComputer Vision and Pattern Recognition (CVPR), pages 4252–4262, 2023.27 [18] A. Pumarola, E. Corona, G. Pons-Moll, and F. Moreno-Noguer. D-nerf: Neuralradiance fields for dynamic scenes. In Proceedings of the IEEE/CVF Conferenceon Computer Vision and Pattern Recognition (CVPR), pages 10318–10327,2021. [19] J. L. Sch ̈onberger and J.-M. Frahm. Structure-from-motion revisited. In Pro-ceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pages 4104–4113, 2016. [20] R. Shao, Z. Zheng, H. Tu, B. Liu, H. Zhang, and Y. Liu. Tensor4d: Efficientneural 4d decomposition for high-fidelity dynamic reconstruction and rendering.In Proceedings of the IEEE/CVF Conference on Computer Vision and PatternRecognition (CVPR), pages 16632–16642, 2023. [21] L. Song, A. Chen, Z. Li, Z. Chen, L. Chen, J. Yuan, Y. Xu, and A. Geiger.Nerfplayer: A streamable dynamic scene representation with decomposed neu-ral radiance fields. arXiv preprint arXiv:2210.15947, 2022. [22] C. Sun, M. Sun, and H.-T. Chen. Direct voxel grid optimization: Super-fast con-vergence for radiance fields reconstruction. arXiv preprint arXiv:2306.01496,2023. [23] F. Wang, Z. Chen, G. Wang, Y. Song, and H. Liu. Masked space-time hashencoding for efficient dynamic scene reconstruction. In Advances in NeuralInformation Processing Systems (NeurIPS), 2023. [24] F. Wang, S. Tan, X. Li, Z. Tian, Y. Song, and H. Liu. Mixed neural voxels forfast multiview video synthesis. In Proceedings of the IEEE/CVF InternationalConference on Computer Vision (ICCV), pages 19706–19716, 2023. [25] L. Wang, J. Zhang, X. Liu, F. Zhao, Y. Zhang, Y. Zhang, M. Wu, J. Yu, andL. Xu. Fourier plenoctrees for dynamic radiance field rendering in real-time.In Proceedings of the IEEE/CVF Conference on Computer Vision and PatternRecognition (CVPR), pages 13524–13534, June 2022. [26] Y. Wang, Q. Han, M. Habermann, K. Daniilidis, C. Theobalt, and L. Liu.Neus2: Fast learning of neural implicit surfaces for multi-view reconstruction.In Proceedings of the IEEE/CVF International Conference on Computer Vision(ICCV), pages 3295–3306, 2023.28 [27] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. Image qualityassessment: From error visibility to structural similarity. IEEE Transactionson Image Processing, 13(4):600–612, 2004. [28] G. Wu, T. Yi, J. Fang, L. Xie, X. Zhang, W. Wei, W. Liu, Q. Tian, andX. Wang. 4d gaussian splatting for real-time dynamic scene rendering. arXivpreprint arXiv:2310.08528, 2023. [29] Q. Xu, W. Kong, W. Tao, and M. Pollefeys. Multi-scale geometric consis-tency guided and planar prior assisted multi-view stereo. IEEE Transactionson Pattern Analysis and Machine Intelligence, 45(4):4945–4963, 2022. [30] T. Yi, J. Fang, X. Wang, and W. Liu. Generalizable neural voxels for fasthuman radiance fields. arXiv preprint arXiv:2303.15387, 2023. [31] A. Yu, R. Li, M. Tancik, H. Li, R. Ng, and A. Kanazawa. Plenoctrees forreal-time rendering of neural radiance fields. In Proceedings of the IEEE/CVFInternational Conference on Computer Vision (ICCV), pages 5752–5761, 2021. [32] M. Yu, T. Lu, L. Xu, L. Jiang, Y. Xiangli, and B. Dai. Gsdf: 3dgs meets sdffor improved rendering and reconstruction. arXiv preprint arXiv:2403.16964,2024. [33] K. Zhang, G. Riegler, N. Snavely, and V. Koltun. Nerf++: Analyzing andimproving neural radiance fields. arXiv preprint arXiv:2010.07492, 2020.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/94488	-
dc.description.abstract	新視角合成在3D視覺應用中扮演著關鍵角色，如虛擬現實（VR）、增強現實（AR）和電影製作，能夠生成從場景中任意視點拍攝的圖像。這一任務對於動態場景至關重要，但由於複雜的運動和稀疏的數據，面臨諸多挑戰。為了解決這一任務並推動高質量渲染，神經輻射場（NeRF）被提出來通過使用神經隱式函數來表示場景。此外，其繼任者3D高斯濺射（3D-GS）通過採用高效的3D高斯投影進一步加速了渲染速度達到實時水平。然而，由於3D-GS的顯式特性，在使用4D數據渲染動態場景時，即使通過考慮高斯變形場來擴展3D-GS至4D高斯濺射（4D-GS）以實現準確和高效的渲染，其訓練和渲染成本仍然昂貴。為了解決這些問題，本論文引入了基於網格的4D-GS方法，通過網格表示來進一步提高4D-GS在動態場景渲染中的效率。通過整合網格，我們的框架優化了計算複雜度，加速了處理時間，顯著減少了內存使用，同時保持了高渲染質量。這些在實時渲染能力方面的進步為未來在動態場景操控、重建和下游任務中的應用鋪平了道路。	zh_TW
dc.description.abstract	Novel view synthesis plays a pivotal role in 3D vision applications like virtual reality (VR), augmented reality (AR), and movie production, enabling the generation of images from arbitrary viewpoints within a scene. This task, essential for dynamic scenes, faces challenges due to complex motion and sparse data. Neural Radiance Field (NeRF) has been proposed to tackle the task and advanced this field for high-quality rendering by representing scene with neural implicit functions. In addition, its successor, 3D Gaussian Splatting (3D-GS), have further accelerated the rendering speed to be real-time by employing efficient 3D Gaussian projections. However, due to the explicit nature of 3D-GS, when rendering dynamic scenes with 4D data, the training and rendering costs are still expensive, even after extending 3D-GS to 4D Gaussian Splatting (4D-GS) by considering temporal dynamics through leveraging Gaussian deformation fields for accurate and efficient rendering. To address these issues, this thesis introduces Voxels to 4D-GS, enhancing 4D-GS with voxel-based representations for even greater efficiency in dynamic scene rendering. By integrating voxels, our framework optimizes computational complexity, accelerates processing times, and reduces memory usage significantly while maintaining high rendering quality. These advancements in real-time rendering capabilities pave the way for future applications in dynamic scene manipulation, reconstruction and downstream tasks.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-08-16T16:19:40Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2024-08-16T16:19:41Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	Acknowledgments i Abstract ii 摘要 iii Contents iv List of Tables vi List of Figures vii 1 Introduction 1 2 Related Works 3 2.1 NeRF Based Dynamic Scene Rendering . . . . . . . . . . . . . . . . . 3 2.2 4D K-Planes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.3 4D Gaussian Splatting . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3 Method 6 3.1 Model Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.2.1 3D Gaussian Splatting . . . . . . . . . . . . . . . . . . . . . . 7 3.2.2 Grid-based Gaussian Splatting . . . . . . . . . . . . . . . . . . 8 3.2.3 Gaussian Deformation Fields . . . . . . . . . . . . . . . . . . . 9 3.3 Scene Voxelization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.4 Neural Gaussian Generation . . . . . . . . . . . . . . . . . . . . . . . 11 3.5 Deformation Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.5.1 Spatial-Temporal Encoder . . . . . . . . . . . . . . . . . . . . 12 3.5.2 Gaussian Deformation Decoder . . . . . . . . . . . . . . . . . 13 3.6 Optimization Process . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.6.1 First Phase: Coarse Training Stage . . . . . . . . . . . . . . . 13 3.6.2 Second Phase: Fine Training Stage . . . . . . . . . . . . . . . 14 4 Experiment and Discussions 15 4.1 Experimental Settings . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.1.1 Real-world Datasets . . . . . . . . . . . . . . . . . . . . . . . 15 4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.2.1 Quantitative Results . . . . . . . . . . . . . . . . . . . . . . . 15 4.2.2 Qualitative Results . . . . . . . . . . . . . . . . . . . . . . . . 17 4.3 Ablation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.3.1 Volume regularization . . . . . . . . . . . . . . . . . . . . . . 20 4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.4.1 Training Iterations . . . . . . . . . . . . . . . . . . . . . . . . 22 4.4.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 5 Conclusions 25 Bibliography 26	-
dc.language.iso	en	-
dc.subject	三維電腦視覺	zh_TW
dc.subject	動態場景渲染	zh_TW
dc.subject	三維高斯潑濺	zh_TW
dc.subject	深度學習	zh_TW
dc.subject	Deep Learning	en
dc.subject	Dynamic Scene Rendering	en
dc.subject	3D Gaussian Spatting	en
dc.subject	3D Computer Vision	en
dc.title	網格用於高斯潑濺的即時動態場景渲染	zh_TW
dc.title	Real-Time Dynamic Scene Rendering with Voxelized Gaussian Splatting	en
dc.type	Thesis	-
dc.date.schoolyear	112-2	-
dc.description.degree	碩士	-
dc.contributor.coadvisor	陳駿丞	zh_TW
dc.contributor.coadvisor	Jun-Cheng Chen	en
dc.contributor.oralexamcommittee	王鈺強;陳祝嵩	zh_TW
dc.contributor.oralexamcommittee	Yu-Chiang Wang;Chu-Song Chen	en
dc.subject.keyword	深度學習,三維電腦視覺,三維高斯潑濺,動態場景渲染,	zh_TW
dc.subject.keyword	Deep Learning,3D Computer Vision,3D Gaussian Spatting,Dynamic Scene Rendering,	en
dc.relation.page	29	-
dc.identifier.doi	10.6342/NTU202404161	-
dc.rights.note	未授權	-
dc.date.accepted	2024-08-14	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	電信工程學研究所	-
顯示於系所單位：	電信工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-112-2.pdf 未授權公開取用	14.68 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。