MAC-RAD：透過從 RGB 視訊分離可重複使用資產實現模組化虛擬分身合成

余承諺; Cheng-Yen Yu

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/99611

標題:	MAC-RAD：透過從 RGB 視訊分離可重複使用資產實現模組化虛擬分身合成 MAC-RAD: Modular Avatar Composition via Reusable Assets Disentanglement from RGB Video
作者:	余承諺 Cheng-Yen Yu
指導教授:	傅立成 Li-Chen Fu
關鍵字:	虛擬分身,三維人體重建,模組化分身合成,服裝解纏,高斯潑濺,擴散模型, Virtual Avatar,3D Human Reconstruction,Modular Avatar Composition,Garment Disentanglement,Gaussian Splatting,Diffusion Model,
出版年 :	2025
學位:	碩士
摘要:	在本論文中，我們提出了一個兩階段模組化框架，用於從單目 RGB 影片中重建虛擬分身，旨在支援服裝級控制和虛擬分身組合。為此，我們定義了兩個階段，即解纏階段和組合階段。在解纏階段，我們的系統將輸入影片分解為具有語義意義的組件，包括皮膚紋理和一組帶有紋理的服裝網格，每個組件都與一個規範姿勢對齊，並由用戶提供的文字提示引導。此階段利用參數模型（例如 SMPL-X）進行姿勢和形狀估計，確保跨幀的一致性。在組合階段，解纏後的組件將根據使用者定義的設定（例如體型和服裝選擇）重新組合成一個統一的、可動畫化的網格。產生的虛擬分身支援運動重新導向和渲染，同時支援靈活的服裝重組。我們的框架強調模組化、可重複使用性和速度，允許快速創建虛擬分身，且僅需少量品質權衡。實驗結果展現了高度的視覺連貫性、服裝完整性以及對構圖的支持。作為未來的發展方向，我們設想在系統中擴展一個 CLIP 引導的服裝檢索模組。這將使用戶能夠透過自然語言描述進行直覺的、基於文字的虛擬分身編輯。 In this thesis, we present a two-stage modular framework for avatar reconstruction from monocular RGB videos, designed to support garment-level control and avatar composition. To this end, two stages are defined, namely, Disentanglement Stage and Composition Stage. In the Disentanglement Stage, our system decomposes input video into semantically meaningful components, including a skin texture and a set of textured clothing meshes, each aligned to a canonical pose and guided by user-provided textual prompts. This stage leverages parametric models (\textit{e.g.}, SMPL-X) for pose and shape estimation, ensuring consistency across frames. In the Composition Stage, the disentangled components are reassembled into a unified, animatable mesh based on user-defined settings, such as body shape and clothing selection. The resulting avatar supports motion retargeting and rendering while enabling flexible garment recombination. Our framework emphasizes modularity, reusability, and speed, allowing rapid avatar creation with only minor quality trade-offs. Experimental results demonstrate high visual coherence, garment integrity, and support for composition. As a future direction, we envision extending the system with a CLIP-guided garment retrieval module. This would enable intuitive, text-based avatar editing through natural language descriptions.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/99611
DOI:	10.6342/NTU202503094
全文授權:	同意授權(全球公開)
電子全文公開日期:	2028-08-01
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-113-2.pdf 此日期後於網路公開 2028-08-01	37.44 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。