Skip navigation

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More
DSpace logo
English
中文
  • Browse
    • Communities
      & Collections
    • Publication Year
    • Author
    • Title
    • Subject
    • Advisor
  • Search TDR
  • Rights Q&A
    • My Page
    • Receive email
      updates
    • Edit Profile
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/101167
Title: 針對潛在不可行指令之視覺語言導航之基準與方法設計
NAV-NF: A Benchmark and Framework for Vision-Language Navigation under Infeasible Instructions
Authors: 王廷郡
Ting-Jun Wang
Advisor: 徐宏民
Winston H. Hsu
Keyword: 視覺語言導航,視覺語言模型
VLN,Vision-Language NavigationVision-Language Model
Publication Year : 2025
Degree: 碩士
Abstract: 現有的視覺語言導航(VLN)多假設使用者指令皆可達成,忽略現實中人類常因記憶錯誤而提供不存在的目標物,導致機器人無限搜尋或過早停止。為使系統能處理此類不可靠指令,本研究提出新任務 Navigation Not Found(NAV-NF),要求機器人在抵達目標房間後,能於確認目標物不存在時輸出 NOT-FOUND。
我們設計一套以大型語言模型(LLM)為核心的資料生成流程,透過指令重寫與開放詞彙物件辨識驗證物件缺失,以建立語意自然但事實錯誤的指令;人工檢驗顯示錯誤率低於 2%。此外,我們提出新的評估指標,包括 Reach & Found SR 與 Reach & Found SPL,以量化模型在不確定情境下的探索效率與判定品質。
實驗結果顯示,現有監督式與 LLM 式 VLN 模型在 NAV-NF 表現不佳,Reach & Found SR 指標僅 5.4%–34.1%。為此,我們提出 ROAM(Room-Object Aware Movement),一個粗到細的雙階段框架:先以監督式模型進行房間定位,再由 VLM/LLM 進行房內探索。ROAM 在所有指標皆達到最佳成績,Reach & Found SR 指標提升至 41.4%。
本研究提供首個處理不可行指令的 VLN 基準資料集與強健模型,推動 VLN 機器人領域朝更可靠、具錯誤辨識能力的方向前進。
Conventional Vision-and-Language Navigation agents assume that tasks are always feasible and lack mechanisms to handle cases where the target object cannot be found. To address this limitation, we propose a novel task, Navigation Not Found(NAV-NF), which introduces unreliable instructions—scenarios where the target object does not exist—reflecting real-world situations where humans may provide erroneous instructions. We develop a data generation pipeline that leverages Large Language Models to revise existing instructions and verify their correctness. Experimental results demonstrate that state-of-the-art models, whether supervised or LLM-based, struggle with exploration and often hallucinate or terminate prematurely. To mitigate this, we introduce a hybrid framework, Room Object Aware Movement (ROAM), which achieves state-of-the-art performance across all evaluation metrics.
URI: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/101167
DOI: 10.6342/NTU202504771
Fulltext Rights: 同意授權(全球公開)
metadata.dc.date.embargo-lift: 2026-01-01
Appears in Collections:資訊工程學系

Files in This Item:
File SizeFormat 
ntu-114-1.pdf9.51 MBAdobe PDFView/Open
Show full item record


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved