邁向複雜3D視覺建構： 從物件層級補全與生成，到場景層級跨模態定位與修復

黃聖喻; Sheng-Yu Huang

Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/101440

Title:	邁向複雜3D視覺建構：從物件層級補全與生成，到場景層級跨模態定位與修復 Towards Complex 3D Visual Modeling: From Object-Level Completion and Generation to Scene-Level Cross-Modal Localization and Inpainting
Authors:	黃聖喻 Sheng-Yu Huang
Advisor:	王鈺強 Yu-Chiang Frank Wang
Keyword:	深度學習,電腦視覺3D 電腦視覺點雲3D生成3D視覺定位3D高斯潑濺場景修補 deep learning,computer vision3D computer visionpoint cloud3D generation3D visual groundingGaussian Splatting3D scene inpainting
Publication Year :	2025
Degree:	博士
Abstract:	近年來，3D電腦視覺的進展顯著提升了我們對複雜真實場景的理解能力。然而，真實世界的3D資料常因視角差異而變得不完整或存在跨視角的歧義，為穩健的3D建模帶來巨大挑戰。本論文構建了一個統一的複雜3D視覺建模框架，研究分為物件層級與場景層級兩大任務。在物件層級中，首先解決了部分點雲的語義感知補全，以重建完整的幾何結構與語義標註；接著擴展單一模態生成至跨模態合成，將文字描述生成為3D模型。在場景層級中，我們示範了透過自然語言輸入精確定位3D場景中的目標物件，並在此基礎上執行場景級修補，移除指定物件後以多視角一致性技術填補缺失區域。這四大主題的系統整合顯著提升了建模的穩定性、生成品質、定位準確度與跨視角一致性，為沉浸式計算與機器人應用奠定了堅實基礎。 Recent advancements in 3D computer vision have significantly enhanced our capability to interpret complex real-world scenes. However, real-world 3D data often exhibit incompleteness and ambiguity across viewpoints, presenting considerable challenges for robust 3D modeling. This thesis focuses on constructing a unified framework for complex 3D visual modeling, organized into two main categories: Object-Level and Scene-Level. In the Object-Level sections, we first explore 3D Modeling via Semantic-Aware Completion, reconstructing 3D objects with complete geometry and semantic labels from partial point clouds and semantic information; then we examine 3D Modeling via Cross-Modal Generation, extending single-modal point cloud-to-point cloud transformations to text-driven 3D content synthesis. For Scene-Level sections, we start from introducing 3D Modeling via Scene-Level Visual Grounding, demonstrating how natural language prompts can precisely identify target objects within 3D scenes; finally, with the capability of locating any object in a given 3D scene, we present 3D Modeling via Scene-Level Inpainting, showcasing methods to remove specified objects and coherently fill missing regions using multi-view consistent techniques. Through the systematic integration of these four interconnected themes, our research demonstrates significant advancements in modeling robustness, generative fidelity, localization precision, and cross-view consistency, establishing a solid foundation for future applications in immersive computing, robotics, and beyond.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/101440
DOI:	10.6342/NTU202600037
Fulltext Rights:	同意授權(限校園內公開)
metadata.dc.date.embargo-lift:	2026-02-04
Appears in Collections:	電信工程學研究所

Files in This Item:

File	Size	Format
ntu-114-1.pdf Access limited in NTU ip range	28.67 MB	Adobe PDF

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets