請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/99272完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 莊永裕 | zh_TW |
| dc.contributor.advisor | Yung-Yu Chuang | en |
| dc.contributor.author | 陳瑾瑭 | zh_TW |
| dc.contributor.author | Chin-Tang Chen | en |
| dc.date.accessioned | 2025-08-21T17:04:16Z | - |
| dc.date.available | 2025-08-22 | - |
| dc.date.copyright | 2025-08-21 | - |
| dc.date.issued | 2025 | - |
| dc.date.submitted | 2025-08-01 | - |
| dc.identifier.citation | [1] Xi Chen, Lianghua Huang, Yu Liu, Yujun Shen, Deli Zhao, and Hengshuang Zhao. AnyDoor: Zero-shot object-level image customization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 6593–6602, 2024.
[2] Ayaan Haque, Matthew Tancik, Alexei A. Efros, Aleksander Holynski, and Angjoo Kanazawa. Instruct-NeRF2NeRF: Editing 3D scenes with instructions. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 19740–19750, 2023. [3] Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 3836–3847, 2023. [4] Xingang Pan, Ayush Tewari, Thomas Leimkühler, Lingjie Liu, Abhimitra Meka, and Christian Theobalt. Drag Your GAN: Interactive point-based manipulation on the generative image manifold. In Proceedings of ACM SIGGRAPH, 11 pages, 2023. [5] Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 9650–9660, 2021. [6] Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. NeRF: Representing scenes as neural radiance fields for view synthesis. In Proceedings of the European Conference on Computer Vision (ECCV), pages 405–421, 2020. [7] Chandan Yeshwanth, Yueh-Cheng Liu, Matthias Nießner, and Angela Dai. ScanNet++: A high-fidelity dataset of 3D indoor scenes. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 12–22, 2023. [8] Unsplash. Free high-resolution photos. https://unsplash.com. [9] Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. In Proceedings of the 34th International Conference on Neural Information Processing Systems (NeurIPS), pages 6840–6851, 2020. [10] Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10684–10695, 2022. [11] Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. In Proceedings of the 38th International Conference on Machine Learning (ICML), volume 139, pages 8748–8763, 2021. [12] Subhadeep Koley, Ayan Kumar Bhunia, Aneeshan Sain, Pinaki Nath Chowdhury, Tao Xiang, and Yi-Zhe Song. Picture that sketch: Photorealistic image generation from abstract sketches. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 6850–6861, 2023. [13] Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V. Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mahmoud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Hervé Jégou, Julien Mairal, Patrick Labatut, Armand Joulin, and Piotr Bojanowski. DINOv2: Learning robust visual features without supervision. In Proceedings of the International Conference on Learning Representations (ICLR), journal track, 2025. [14] Yujun Shi, Chuhui Xue, Jun Hao Liew, Jiachun Pan, Hanshu Yan, Wenqing Zhang, Vincent Y. F. Tan, and Song Bai. DragDiffusion: Harnessing diffusion models for interactive point-based image editing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8839–8849, 2024. [15] Chong Mou, Xintao Wang, Jiechong Song, Ying Shan, and Jian Zhang. DragonDiffusion: Enabling drag-style manipulation on diffusion models. In Proceedings of the International Conference on Learning Representations (ICLR), spotlight, 2024. | - |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/99272 | - |
| dc.description.abstract | 本研究提出一套具備透視感知能力的室內家具影像客製化方法,旨在提升物件插入結果的幾何一致性與視覺真實感。我們擴展 AnyDoor 框架,加入「Perspective BBox Branch」,以消失點對齊的 3D 邊界框投影作為條件輸入,引導 diffusion 模型生成符合場景透視的合成結果。同時,我們設計了互動式使用者介面,使用戶能標註消失點、生成一致的 3D BBox,並繪製參考物件遮罩,實現可控且直覺的物件放置流程。
我們使用 ScanNet++ 室內資料集進行模型訓練與驗證,實驗結果顯示,本方法在不犧牲視覺品質的情況下,有效提升透視一致性。量化指標(CLIP 與 DINO 分數)與質化視覺結果皆支持本方法的有效性。此外,透過對於未見資料(out-of-distribution)場景的測試,我們證實模型具備良好泛化能力。本研究展示了幾何感知引導在生成式模型中的潛力,為未來具備控制性與結構意識的影像合成方法奠定基礎。 | zh_TW |
| dc.description.abstract | We propose a perspective-aware framework for object-level image customization in indoor scenes, aiming to improve geometric consistency and visual realism in object insertion tasks. Building upon the AnyDoor architecture, we introduce the Perspective BBox Branch, which encodes vanishing-point-aligned 3D bounding box projections as conditional guidance for the diffusion model. To support controllable and user-friendly annotation, we also develop an interactive user interface that allows users to mark vanishing points, generate perspective-aligned 3D BBoxes, and draw reference object masks.
We train and evaluate our model on the ScanNet++ dataset, and experimental results demonstrate that our method significantly improves perspective alignment without compromising visual fidelity. Quantitative metrics (CLIP and DINO scores) and qualitative visualizations both confirm the effectiveness of our approach. Furthermore, tests on out-of-distribution indoor images show that the model generalizes well to real-world scenarios. Our work highlights the potential of geometry-aware conditioning in generative models and offers a practical foundation for future research on controllable indoor image synthesis. | en |
| dc.description.provenance | Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2025-08-21T17:04:16Z No. of bitstreams: 0 | en |
| dc.description.provenance | Made available in DSpace on 2025-08-21T17:04:16Z (GMT). No. of bitstreams: 0 | en |
| dc.description.tableofcontents | 口試委員會審定書 i
Acknowledgements ii 摘要 iii Abstract iv Contents v List of Figures vii List of Tables viii Chapter 1 Introduction 1 Chapter 2 Related Work 3 2.1 Image Customization with Diffusion Models 3 2.2 Geometry-Aware and Perspective-Controlled Synthesis 3 2.3 Controllable UI Tools for Composition 3 Chapter 3 Method 5 3.1 Architecture 5 3.1.1 UI – BBox Annotator and Mask Editor 5 3.1.2 Identity Token Branch 5 3.1.3 Detail Collage Branch 6 3.1.4 Perspective BBox Branch 6 3.2 UI Design for BBox Annotation 7 3.2.1 Vanishing Points Estimation 7 3.2.2 BBox Translation and Scaling 9 3.2.3 Selection of Visible Plane Chirality 12 3.2.4 Mask for the Reference Object Image 12 3.3 Training 13 3.3.1 Image Pair from Video 13 3.3.2 ScanNet++ Dataset 14 3.3.3 Augmentation 15 3.3.4 Training Details 15 Chapter 4 Experiments 16 4.1 Grid Resolution Study 16 4.2 Perspective BBox Branch Analysis 20 4.3 Visualization and Evaluation of Test Results 20 4.3.1 Image Composition Results 21 4.3.2 Image Editing Results 22 4.3.3 User Study 23 Chapter 5 Conclusion 25 References 26 | - |
| dc.language.iso | en | - |
| dc.subject | 3D 邊界框 | zh_TW |
| dc.subject | 影像客製化 | zh_TW |
| dc.subject | 室內場景 | zh_TW |
| dc.subject | 擴散模型 | zh_TW |
| dc.subject | 透視感知 | zh_TW |
| dc.subject | 3D bounding box | en |
| dc.subject | indoor scene | en |
| dc.subject | diffusion model | en |
| dc.subject | image customization | en |
| dc.subject | perspective awareness | en |
| dc.title | 具有透視感知的室內家具物件層級影像客製化 | zh_TW |
| dc.title | Object-Level Image Customization for Indoor Furnishing with Perspective Awareness | en |
| dc.type | Thesis | - |
| dc.date.schoolyear | 113-2 | - |
| dc.description.degree | 碩士 | - |
| dc.contributor.oralexamcommittee | 葉正聖;吳賦哲 | zh_TW |
| dc.contributor.oralexamcommittee | Jeng-Sheng Yeh;Fu-Che Wu | en |
| dc.subject.keyword | 透視感知,影像客製化,3D 邊界框,擴散模型,室內場景, | zh_TW |
| dc.subject.keyword | perspective awareness,image customization,3D bounding box,diffusion model,indoor scene, | en |
| dc.relation.page | 29 | - |
| dc.identifier.doi | 10.6342/NTU202501818 | - |
| dc.rights.note | 同意授權(限校園內公開) | - |
| dc.date.accepted | 2025-08-06 | - |
| dc.contributor.author-college | 電機資訊學院 | - |
| dc.contributor.author-dept | 資訊工程學系 | - |
| dc.date.embargo-lift | 2025-08-22 | - |
| 顯示於系所單位: | 資訊工程學系 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-113-2.pdf 授權僅限NTU校內IP使用(校園外請利用VPN校外連線服務) | 15.23 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
