請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/97461| 標題: | 影片事件之間跨時間理解的基準測試集 REXTIME: A Benchmark Suite for Reasoning-Across-Time in Videos |
| 作者: | 陳志臻 Jr-Jen Chen |
| 指導教授: | 王鈺強 Yu-Chiang Frank Wang |
| 關鍵字: | 影片理解,跨時間推理,多模態大型語言模型,基準測試集,深度學習, Video Understanding,Reasoning Across Time,Multimodal Large Language Model,Benchmark,Deep Learning, |
| 出版年 : | 2025 |
| 學位: | 碩士 |
| 摘要: | 我們提出 ReXTime,這是一個旨在嚴格測試人工智慧模型在影片事件中進行時間推理能力的基準測試。具體而言,ReXTime 專注於「跨時間推理」,即類似人類的理解能力,當問題及其相應答案出現在不同的影片片段中。這種推理形式要求對影片片段之間的因果關係有高級理解,即使對最前沿的多模態大型語言模型也構成了重大挑戰。為了促進這一評估,我們開發了一個自動化流程來生成時間推理問答對,顯著減少了對勞動密集型人工標註的需求。我們的基準測試包括921個經過仔細審核的驗證樣本和2,143個測試樣本,每個樣本都經過人工篩選以確保準確性和相關性。評估結果表明,雖然前沿大型語言模型優於學術模型,但它們的表現仍然比人類差距顯著,準確率相差14.3%。此外,我們的流程創建了9,695個機器生成的訓練數據集樣本,無需人工努力,實證研究表明這些樣本可以通過微調增強跨時間推理能力。 We introduce RexTime, a benchmark designed to rigorously test AI models' ability to perform temporal reasoning within video events. Specifically, RexTime focuses on reasoning across time, i.e. human-like understanding when the question and its corresponding answer occur in different video segments. This form of reasoning, requiring advanced understanding of cause-and-effect relationships across video segments, poses significant challenges to even the frontier multimodal large language models. To facilitate this evaluation, we develop an automated pipeline for generating temporal reasoning question-answer pairs, significantly reducing the need for labor-intensive manual annotations. Our benchmark includes 921 carefully vetted validation samples and 2,143 test samples, each manually curated for accuracy and relevance. Evaluation results show that while frontier large language models outperform academic models, they still lag behind human performance by a significant 14.3\% accuracy gap. Additionally, our pipeline creates a training dataset of 9,695 machine generated samples without manual effort, which empirical studies suggest can enhance the across-time reasoning via fine-tuning. |
| URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/97461 |
| DOI: | 10.6342/NTU202500991 |
| 全文授權: | 同意授權(全球公開) |
| 電子全文公開日期: | 2025-06-19 |
| 顯示於系所單位: | 電信工程學研究所 |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-113-2.pdf | 11.27 MB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
