三平面注意力於快速文字到 3D 生成

吳彬世; Bin-Shih Wu

Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/96805

Title:	三平面注意力於快速文字到 3D 生成 TPA3D: Triplane Attention for Fast Text-to-3D Generation
Authors:	吳彬世 Bin-Shih Wu
Advisor:	王鈺強 Yu-Chiang Frank Wang
Keyword:	電腦視覺,3D 視覺,生成式AI,文字到3D生成,注意力機制, computer vision,3D vision,generative AI,text-to-3D generation,attention mechanism,
Publication Year :	2025
Degree:	碩士
Abstract:	由於缺乏大規模的文字與3D對應資料，近期的文字轉3D生成工作主要依賴於使用2D擴散模型來合成3D資料。由於基於擴散模型的方法通常在訓練和推理過程中需要大量的優化時間，因此基於GAN（生成式對抗網絡）的模型在快速生成3D資料方面仍然具有吸引力。在這篇論文中，我們提出了一種名為三平面注意力的文字引導3D生成方法（TPA3D），這是一個端到端可訓練的基於GAN的深度學習模型，專為快速文字到3D生成而設計。在訓練過程中僅觀察到3D形狀資料及其渲染的2D圖像，我們的TPA3D旨在檢索詳細的視覺描述，以合成對應的3D網格資料。這是通過我們提出的對句子和單詞級別文字特徵進行的注意力機制實現的。在實驗中，我們展示了TPA3D生成的高質量3D紋理形狀與細粒度描述相一致，同時展現了令人印象深刻的計算效率。 Due to the lack of large-scale text-3D correspondence data, recent text-to-3D generation works mainly rely on utilizing 2D diffusion models for synthesizing 3D data. Since diffusion-based methods typically require significant optimization time for both training and inference, the use of GAN-based models would still be desirable for fast 3D generation. In this work, we propose Triplane Attention for text-guided 3D generation (TPA3D), an end-to-end trainable GAN-based deep learning model for fast text-to-3D generation. With only 3D shape data and their rendered 2D images observed during training, our TPA3D is designed to retrieve detailed visual descriptions for synthesizing the corresponding 3D mesh data. This is achieved by the proposed attention mechanisms on the extracted sentence and word-level text features. In our experiments, we show that TPA3D generates high-quality 3D textured shapes aligned with fine-grained descriptions, while impressive computation efficiency can be observed.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/96805
DOI:	10.6342/NTU202401418
Fulltext Rights:	同意授權(限校園內公開)
metadata.dc.date.embargo-lift:	2025-02-22
Appears in Collections:	電信工程學研究所

Files in This Item:

File	Size	Format
ntu-113-1.pdf Access limited in NTU ip range	24.63 MB	Adobe PDF

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets