基於形狀感知的室內場景三維高斯散射重建與補全方法

周奕節; Yi-Chieh Chou

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98437

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	簡韶逸	zh_TW
dc.contributor.advisor	Shao-Yi Chien	en
dc.contributor.author	周奕節	zh_TW
dc.contributor.author	Yi-Chieh Chou	en
dc.date.accessioned	2025-08-14T16:06:55Z	-
dc.date.available	2025-08-15	-
dc.date.copyright	2025-08-14	-
dc.date.issued	2025	-
dc.date.submitted	2025-07-31	-
dc.identifier.citation	B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng, “Nerf: Representing scenes as neural radiance fields for view synthesis,” in ECCV, 2020. vii, 2, 9 B. Kerbl, G. Kopanas, T. Leimkühler, and G. Drettakis, “3d gaussian splatting for real-time radiance field rendering,” ACM Transactions on Graphics, vol. 42, no. 4, July 2023. [Online]. Available: https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/ vii, 2, 10 D. Charatan, S. Li, A. Tagliasacchi, and V. Sitzmann, “pixelsplat: 3d gaussian splats from image pairs for scalable generalizable 3d reconstruction,” in CVPR, 2024. vii, 4, 10, 54 Z. Liu, H. Ouyang, Q. Wang, K. L. Cheng, J. Xiao, K. Zhu, N. Xue, Y. Liu, Y. Shen, and Y. Cao, “Infusion: Inpainting 3d gaussians via learning depth completion from diffusion prior,” arXiv preprint arXiv:2404.11613, 2024. viii, 13 L. Liu, X. Wang, J. Qiu, T. Lin, X. Zhou, and Z. Su, “Gaussian object carver: Object-compositional gaussian splatting with surface completion,” arXiv preprint arXiv:2412.02075, 2024. viii, 13, 54 Y. Chen, H. Xu, C. Zheng, B. Zhuang, M. Pollefeys, A. Geiger, T.-J. Cham, and J. Cai, “Mvsplat: Efficient 3d gaussian splatting from sparse multi-view images,” arXiv preprint arXiv:2403.14627, 2024. 4, 10, 54 P. Wang, L. Liu, Y. Liu, C. Theobalt, T. Komura, and W. Wang, “Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction,” arXiv preprint arXiv:2106.10689, 2021. 9 L. Yariv, J. Gu, Y. Kasten, and Y. Lipman, “Volume rendering of neural implicit surfaces,” in Thirty-Fifth Conference on Neural Information Processing Systems, 2021. 9 A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo, P. Dollár, and R. Girshick, “Segment anything,” arXiv:2304.02643, 2023. 11, 18 W. Abdulla, “Mask r-cnn for object detection and instance segmentation on keras and tensorflow,” https://github.com/matterport/Mask_RCNN, 2017. 12 C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “Pointnet: Deep learning on point sets for 3d classification and segmentation,” arXiv preprint arXiv:1612.00593, 2016. 12 A. Dai and M. Nießner, “3dmv: Joint 3d-multi-view prediction for 3d semantic scene segmentation,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018. 12 M. Ye, M. Danelljan, F. Yu, and L. Ke, “Gaussian grouping: Segment and edit anything in 3d scenes,” in ECCV, 2024. 12 I. Vizzo, X. Chen, N. Chebrolu, J. Behley, and C. Stachniss, “Poisson Surface Reconstruction for LiDAR Odometry and Mapping,” in Proc. of the IEEE Intl. Conf. on Robotics & Automation (ICRA), 2021. 12 W. Yuan, T. Khot, D. Held, C. Mertz, and M. Hebert, “Pcn: Point completion network,” in 3D Vision (3DV), 2018 International Conference on, 2018. 13 T. Groueix, M. Fisher, V. G. Kim, B. Russell, and M. Aubry, “AtlasNet: A Papier-Mâché Approach to Learning 3D Surface Generation,” in Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2018. 13 X. Yu, Y. Rao, Z. Wang, Z. Liu, J. Lu, and J. Zhou, “Pointr: Diverse point cloud completion with geometry-aware transformers,” in ICCV, 2021. 13 X. Yan, L. Lin, N. J. Mitra, D. Lischinski, D. Cohen-Or, and H. Huang, “Shapeformer: Transformer-based shape completion via sparse representation,” 2022. 13, 28 K. He, X. Chen, S. Xie, Y. Li, P. Dollár, and R. Girshick, “Masked autoencoders are scalable vision learners,” arXiv:2111.06377, 2021. 14 Y. Pang, W. Wang, F. E. Tay, W. Liu, Y. Tian, and L. Yuan, “Masked autoencoders for point cloud self-supervised learning,” in Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part II. Springer, 2022, pp. 604–621. 14 L. Yang, B. Kang, Z. Huang, X. Xu, J. Feng, and H. Zhao, “Depth anything: Unleashing the power of large-scale unlabeled data,” in CVPR, 2024. 22 A. Eftekhar, A. Sax, J. Malik, and A. Zamir, “Omnidata: A scalable pipeline for making multi-task mid-level vision datasets from 3d scans,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 10 786–10 796. 22 A. X. Chang, T. Funkhouser, L. Guibas, P. Hanrahan, Q. Huang, Z. Li, S. Savarese, M. Savva, S. Song, H. Su et al., “Shapenet: An information-rich 3d model repository,” in arXiv preprint arXiv:1512.03012, 2015. 39 Q. Ma, Y. Li, B. Ren, N. Sebe, E. Konukoglu, T. Gevers, L. V. Gool, and D. P. Paudel, “Shapesplat: A large-scale dataset of gaussian splats and their self-supervised pretraining,” 2024. [Online]. Available: https://arxiv.org/abs/2408.10906 39 J. Straub, T. Whelan, L. Ma, Y. Chen, E. Wijmans, S. Green, J. J. Engel, R. Mur-Artal, C. Ren, S. Verma, A. Clarkson, M. Yan, B. Budge, Y. Yan, X. Pan, J. Yon, Y. Zou, K. Leon, N. Carter, J. Briales, T. Gillingham, E. Mueggler, L. Pesqueira, M. Savva, D. Batra, H. M. Strasdat, R. D. Nardi, M. Goesele, S. Lovegrove, and R. Newcombe, “The Replica dataset: A digital replica of indoor spaces,” arXiv preprint arXiv:1906.05797, 2019. 40	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98437	-
dc.description.abstract	室內場景的三維重建涉及從捕獲的數據（如圖像或 LiDAR）創建物理空間的三維表示。此過程對於數位孿生創建、AR/VR 場景生成和室內導航等應用至關重要。雖然近期的神經表示方法如神經輻射場（NeRF）和三維高斯散射（3DGS）能夠實現逼真的新視角合成，而稀疏視角方法降低了視角要求，但現實世界的捕獲場景仍然受到遮擋和有限視角的影響，導致不完整的觀測，在重建場景中產生空洞和偽影。本論文提出了一個三階段的室內三維場景重建與補全方法。首先，我們使用多線索監督進行初始場景重建，結合深度和法向量監督以確保精確的表面對齊並減少 3DGS 表示中的偽影。其次，我們通過分割和追踪將複雜場景分解為個別物體，使物體層級的補全變得可行。最後，我們進行物體層級的補全，使用幾何優先的方法將幾何重建與外觀合成分離。我們使用形狀補全模型從部分點雲重建缺失的幾何，然後引入形狀感知高斯遮罩自編碼器，修改遮罩策略，選擇性地對新補全的區域進行遮罩處理，同時將原始可見區域視為未遮罩輸入。解碼器隨後為這些新增區域重建完整的三維高斯屬性。實驗結果表明，我們的方法優於現有方法。與目前專注於局部空洞填補或物體移除的 3DGS 修復技術，或僅完成網格表示而不生成完整 3DGS 輸出的混合方法不同，我們的方法產生完整的室內場景 3DGS 表示。	zh_TW
dc.description.abstract	3D reconstruction of indoor scenes involves creating three-dimensional digital representations of physical spaces from captured data such as images or LiDAR. This process is crucial for applications in the digital twin creation, AR/VR scene generation, and indoor navigation. While recent neural representations like Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) enable photorealistic novel view synthesis, and sparse-view methods like pixelSplat reduce viewpoint requirements, real-world capture scenarios still suffer from occlusions and limited viewpoints, resulting in incomplete observations that cause holes and artifacts in reconstructed scenes. This thesis presents a unified approach to reconstruction and completion for indoor 3D scenes through a novel three-stage pipeline. First, we perform initial scene reconstruction with multi-cue supervision, incorporating depth and normal supervision to ensure precise surface alignment and reduced artifacts in the 3DGS representation. Second, we decompose complex scenes into individual objects through segmentation and tracking, making completion tractable at the object level. Third, we employ object-level completion using a geometry-first approach that separates geometric reconstruction from appearance synthesis. We employ shape completion models to reconstruct missing geometry from partial point clouds, then introduce a Shape-Aware Gaussian-MAE that modifies the masking strategy to selectively mask only the newly completed regions while treating the originally visible areas as unmasked input. The decoder then reconstructs full 3D Gaussian attributes for these newly added regions. Experimental results demonstrate that our approach outperforms existing methods. Unlike current 3DGS inpainting techniques, which focus on localized hole filling or object removal, or hybrid methods that only complete mesh representations without generating full 3DGS outputs, our method produces complete 3DGS representations of indoor scenes.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2025-08-14T16:06:54Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2025-08-14T16:06:55Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	Master’s Thesis Acceptance Certificate i 致謝 iii 中文摘要 v Abstract vii Contents ix List of Figures xiii List of Tables xv 1 Introduction 1 1.1 3D Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 3D Gaussian Splatting . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.5 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . 6 2 Related Work 9 2.1 Neural 3D Scene Representations . . . . . . . . . . . . . . . . . 9 2.1.1 Sparse-View 3D Gaussian Splatting . . . . . . . . . . . . 10 2.2 Scene Decomposition and Segmentation . . . . . . . . . . . . . . 11 2.3 3D Shape Completion . . . . . . . . . . . . . . . . . . . . . . . . 12 2.4 3D Gaussian Splatting Completion and Editing . . . . . . . . . . 13 2.5 Masked Autoencoders and Self-Supervised Learning . . . . . . . 14 3 Proposed Method 17 3.1 Pipeline Overview . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.2 Stage 1 - Scene Reconstruction with Multi-Cue Supervision . . . 18 3.2.1 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . 18 3.2.2 Multi-Cue 3DGS Optimization . . . . . . . . . . . . . . . 23 3.3 Stage 2 - Scene Decomposition . . . . . . . . . . . . . . . . . . . 25 3.3.1 Object-Level Extraction Process . . . . . . . . . . . . . . 25 3.3.2 Spatial Coherence and Boundary Refinement . . . . . . . 26 3.3.3 Benefits of Object-Level Decomposition . . . . . . . . . 26 3.4 Stage 3 - Object-Level Completion . . . . . . . . . . . . . . . . . 27 3.4.1 Geometry Completion (ShapeFormer) . . . . . . . . . . . 27 3.4.2 3D Gaussian Completion (Shape-Aware G-MAE) . . . . . 28 3.5 Post-processing . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.5.1 Addressing Domain Gap Issues . . . . . . . . . . . . . . 36 3.5.2 Appearance Refinement Strategies . . . . . . . . . . . . . 36 3.5.3 Integration and Final Output . . . . . . . . . . . . . . . . 36 4 Experiments 39 4.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.1.1 ShapeNet Dataset . . . . . . . . . . . . . . . . . . . . . . 39 4.1.2 ShapeSplat Dataset . . . . . . . . . . . . . . . . . . . . . 39 4.1.3 Replica Dataset . . . . . . . . . . . . . . . . . . . . . . . 40 4.2 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . 41 4.2.1 3D Gaussian Splatting Configuration . . . . . . . . . . . 41 4.2.2 ShapeFormer Configuration . . . . . . . . . . . . . . . . 41 4.2.3 Gaussian-MAE Configuration . . . . . . . . . . . . . . . 42 4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 4.3.1 ShapeSplat Objects Results . . . . . . . . . . . . . . . . . 43 4.3.2 Replica Scenes Results . . . . . . . . . . . . . . . . . . . 45 4.3.3 Computational Performance Analysis . . . . . . . . . . . 52 4.4 Ablation Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 4.5 Comparison with Related Methods . . . . . . . . . . . . . . . . . 54 4.5.1 Comparison with Gaussian Object Carver (GOC) . . . . . 54 4.5.2 Comparison with Sparse-View 3D Gaussian Splatting . . 54 5 Conclusion 57 5.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 5.2 Limitations and Future Work . . . . . . . . . . . . . . . . . . . . 58 5.2.1 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . 58 5.2.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . 58 Reference 61	-
dc.language.iso	en	-
dc.subject	三維高斯散射	zh_TW
dc.subject	室內場景重建	zh_TW
dc.subject	遮罩自編碼器	zh_TW
dc.subject	物體層級分解	zh_TW
dc.subject	場景補全	zh_TW
dc.subject	Scene Completion	en
dc.subject	Object-Level Decomposition	en
dc.subject	Masked Autoencoder	en
dc.subject	3D Gaussian Splatting	en
dc.subject	Indoor Scene Reconstruction	en
dc.title	基於形狀感知的室內場景三維高斯散射重建與補全方法	zh_TW
dc.title	Filling the Gaps: Shape-Aware Completion of 3D Gaussian Splatting for Indoor Scenes	en
dc.type	Thesis	-
dc.date.schoolyear	113-2	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	陳駿丞;陳永耀;盧奕璋	zh_TW
dc.contributor.oralexamcommittee	Jun-Cheng Chen;Yung-Yao Chen;Yi-Chang Lu	en
dc.subject.keyword	室內場景重建,三維高斯散射,場景補全,物體層級分解,遮罩自編碼器,	zh_TW
dc.subject.keyword	Indoor Scene Reconstruction,3D Gaussian Splatting,Scene Completion,Object-Level Decomposition,Masked Autoencoder,	en
dc.relation.page	64	-
dc.identifier.doi	10.6342/NTU202503130	-
dc.rights.note	未授權	-
dc.date.accepted	2025-08-05	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	電子工程學研究所	-
dc.date.embargo-lift	N/A	-
顯示於系所單位：	電子工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-113-2.pdf 未授權公開取用	24.45 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。