Skip navigation

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More
DSpace logo
English
中文
  • Browse
    • Communities
      & Collections
    • Publication Year
    • Author
    • Title
    • Subject
    • Advisor
  • Search TDR
  • Rights Q&A
    • My Page
    • Receive email
      updates
    • Edit Profile
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98694
Title: 結合時間特徵金字塔以釋放 Deformable DETR 於零樣本時間動作區段生成之潛力
TP2-DETR: Unlocking Deformable DETR for Zero-Shot Temporal Action Proposal Generation with Temporal Feature Pyramids
Authors: 鄭雅勻
Ya-Yun Cheng
Advisor: 許永真
Yung-Jen Hsu
Co-Advisor: 鄭文皇
Wen-Huang Cheng
Keyword: 時間動作區間生成,零樣本學習,可變形DETR,特徵金字塔網路,短動作定位,
Temporal Action Proposal Generation,Zero-Shot Learning,Deformable DETR,Feature Pyramid Network,Short Action Localization,
Publication Year : 2025
Degree: 碩士
Abstract: 在時間動作定位任務中,由於影片本身幀與幀之間變化慢,使用標準Transformer 注意力機制時,易造成過度平滑的現象。其中一種有效的解法是引入Deformable DETR 中的可變形注意力機制。然而,特別是在零樣本設定下,所使用的特徵多來自視覺語言模型,因缺乏直觀的時間特徵金字塔,使得現有方法難以充分發揮 Deformable DETR 在偵測短動作方面的潛力,正如其原先在圖像中對小物體偵測所展現的優勢。
為了解決此一限制,我們提出 TP2-DETR,這是一種創新的端對端架構,融合特別設計的時間特徵金字塔網路,以全面釋放 Deformable DETR 在零樣本時間動作區間生成上的潛能。我們探索了不同的 FPN 變體來更好地讓 Deformable DETR 發揮功效。而進一步為了整體系統的效率與訓練穩定性,我們設計了一個共享、輕量且具多尺度感知能力的顯著性預測頭進行早期監督,並加以多層輔助的動作區間預測頭提供深層監督訊號。
我們在 THUMOS14 與 ActivityNet1.3 資料集上進行實驗, TP2-DETR在多數零樣本分割設定中達到最先進的表現,特別是在短動作比例較高的 THUMOS14 資料集中,在兩種常見的零樣本設定下,平均 mAP 分別提升了 5.14% 與 10.27%。上述結果顯示,我們所提出的設計能有效釋放 Deformable DETR 在零樣本時間動作區間生成任務中的潛力。
In temporal action localization, the inherent slowness of videos often leads to over-smoothing when using standard transformer attention mechanisms. A promising solution is to leverage deformable attention from Deformable DETR. However, due to the lack of an intuitive temporal feature pyramid, especially in zero-shot settings where features are extracted from vision-language models, existing methods underutilize Deformable DETR's ability to detect short actions, in the same way it benefits small object detection in images.
In this paper, we introduce TP2-DETR, a novel end-to-end framework that integrates a dedicated Temporal Feature Pyramid Network (FPN) to unlock the full potential of Deformable DETR for Zero-Shot Temporal Action Proposal Generation (ZS-TAPG). We explore different FPN variants to better leverage the capabilities of Deformable DETR. To further ensure efficiency and training stability in the end-to-end system, we design a shared, lightweight, and multi-scale-aware salient head for early supervision, complemented by auxiliary prediction heads for deep supervision.
We conducted experiments on the Thumos14 and ActivityNet1.3 datasets, demonstrating that TP2-DETR achieves state-of-the-art performance across most zero-shot split settings. Notably, it yields particularly significant improvements on Thumos14, which contains a high proportion of short actions, with average mAP gains of 5.14% and 10.27% under two common zero-shot split settings. These findings demonstrate the effectiveness of our design in fully harnessing Deformable DETR for ZS-TAPG.
URI: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98694
DOI: 10.6342/NTU202503736
Fulltext Rights: 同意授權(全球公開)
metadata.dc.date.embargo-lift: 2025-08-19
Appears in Collections:資訊工程學系

Files in This Item:
File SizeFormat 
ntu-113-2.pdf2.36 MBAdobe PDFView/Open
Show full item record


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved