Skip navigation

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More
DSpace logo
English
中文
  • Browse
    • Communities
      & Collections
    • Publication Year
    • Author
    • Title
    • Subject
    • Advisor
  • Search TDR
  • Rights Q&A
    • My Page
    • Receive email
      updates
    • Edit Profile
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/100916
Title: AGTree:基於注意力引導的視覺概念分解
AGTree: Attention Guided Visual Concept Decomposition
Authors: 陳韋傑
Wei-Jie Chen
Advisor: 許永真
Jane Yung-Jen Hsu
Co-Advisor: 鄭文皇
Wen-Huang Cheng
Keyword: 注意力機制,擴散模型生成式人工智慧視覺概念概念分解
Attention Mechanism,Diffusion ModelGenerative AIVisual ConceptConcept Decomposition
Publication Year : 2025
Degree: 碩士
Abstract: 在設計嶄新的視覺概念時,設計師經常從既有的想法中汲取靈感,藉由重新組合各種元素,以創造出獨特且原創的作品。隨著文字轉圖像(Text-to-Image, T2I)模型的快速發展,機器現已能夠協助此一創作過程,特別是在分解複雜視覺概念並將其與現有概念重新組合方面。然而,現有的視覺概念分解方法通常依賴於多樣化的輸入圖像,當輸入圖像在視覺上過於相似時,這些方法往往難以將物體從顯著的背景中分離出來,導致所產出的結果難以理解或應用。

本研究揭示了被分解的視覺子概念與擴散模型中 U-Net 的交叉注意力圖(cross-attention maps)之間的強烈關聯。基於此觀察,我們提出了一種新方法:AGTree。該方法使用交叉注意力圖作為內生遮罩,藉此有效抑制背景雜訊,並在訓練過程中引入隨機丟棄機制(random drop),以提升語意上的關聯性。此外,我們擴展了現有的評估指標,使其能更全面地評估模型表現。我們的方法在基於 CLIP 的量化指標上提升了 8.62%,而質化分析也證實了其在將背景資訊從學習到的表徵中解耦的有效性。原始程式碼可於以下網址取得:https://github.com/JackChen890311/AGTree。
When designing novel visual concepts, designers often draw inspiration from existing ideas, recombining elements to create something unique and original. With the rapid advancement of text-to-image (T2I) models, machines can now assist in this creative process, particularly in decomposing complex visual concepts and recombining them with existing ones. However, current decomposition methods for visual concepts typically rely on diverse input images. When the inputs are visually similar, these methods struggle to isolate objects from prominent backgrounds, often resulting in outputs that are hard to interpret or apply.

In this work, we reveal a strong correlation between decomposed visual subconcepts and cross-attention maps within the diffusion U-Net. Building on this insight, we propose AGTree, a novel method that uses cross-attention maps as intrinsic masks to effectively suppress background noise, along with incorporating random dropout during training to further enhance semantic relevance. Additionally, we extend the existing evaluation metric to provide a more comprehensive assessment of model performance. Quantitative results show that our method achieves an 8.62% improvement on a CLIP-based metric, while qualitative analyses demonstrate its effectiveness in disentangling background information from the learned representations. Code is available at: https://github.com/JackChen890311/AGTree.
URI: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/100916
DOI: 10.6342/NTU202502288
Fulltext Rights: 同意授權(限校園內公開)
metadata.dc.date.embargo-lift: 2030-07-22
Appears in Collections:資訊工程學系

Files in This Item:
File SizeFormat 
ntu-113-2.pdf
  Restricted Access
30.24 MBAdobe PDFView/Open
Show full item record


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved