請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/97355| 標題: | 設計任務操作歷史檢索機制以增強基於大型語言模型的網頁瀏覽代理人的效能 Designing a Task Action History Retrieval Mechanism to Enhance the Performance of LLM-Based Web Navigation Agents |
| 作者: | 林敬翔 Ching-Hsiang Lin |
| 指導教授: | 黃乾綱 Chien-Kang Huang |
| 關鍵字: | 大型語言模型,網頁瀏覽,代理人, Large Language Model,Web Navigation,Agent, |
| 出版年 : | 2025 |
| 學位: | 碩士 |
| 摘要: | 隨著大型語言模型(Large Language Models, LLMs)的迅速發展,基於 LLM 的人工智慧代理人(AI Agents)在網頁瀏覽任務(Web Navigation Tasks)中展現出高度潛力。然而,如何有效整合過往經驗以提升代理人的決策能力與泛化表現,仍是一項待解決的挑戰。
在過去的作法中,Synapse提出軌跡做為範例(Trajectory-as-Exemplar, TaE)機制,透過語意相似性檢索與任務描述相近的範例作為提示;代理人工作流記憶機制(Agent Workflow Memory, AWM)則進一步整合常見工作流作為提示使用。然而,AWM 僅依任務所處環境檢索資訊,容易導致檢索到與當前任務無關的資訊,限制效能提升。 本研究提出記憶整合式相似性機制(Memory-Integrated Mechanism with Similarity, MIMS),在 AWM 架構上結合動作目標預測與工作流的語意相似性檢索。任務執行時,MIMS 首先會根據任務描述、歷史軌跡與觀察資訊預測當前動作目標(Action Objective),再從向量資料庫中檢索與該目標最相近的工作流,並納入 LLM 的提示上下文中,輔助動作預測與任務執行。 在 Mind2Web 資料集上的實驗顯示,MIMS 在 Cross-Task 任務中相較於 Synapse 與 AWM 有顯著提升,於 Element Accuracy(EA)、Step Success Rate(SSR)與 Success Rate(SR)分別達到 40.6、37.0 與 5.1。雖然在 Cross-Website 與 Cross-Domain 任務中的表現與現有方法相近,但本研究展現了基於語意相似性的工作流檢索在網頁瀏覽任務中的潛力。 With the rapid advancement of large language models (LLMs), LLM-based AI agents have demonstrated significant potential in web navigation tasks. However, how to effectively leverage prior experience to enhance agents' decision-making and generalization capabilities remains an open challenge. Among prior approaches, Synapse proposed the Trajectory-as-Exemplar (TaE) mechanism, which retrieves semantically similar trajectories as prompts based on task descriptions. The Agent Workflow Memory (AWM) framework further incorporates common workflows as prompt components. Nevertheless, AWM relies solely on environment-based retrieval, which often results in selecting contextually irrelevant information, thereby limiting performance gains. To address this issue, we propose a Memory-Integrated Mechanism with Similarity (MIMS), which extends AWM by incorporating action objective prediction and semantically guided workflow retrieval. During task execution, MIMS first predicts the current action objective based on the task description, trajectory history, and current observation. It then retrieves the most semantically relevant workflows from a vector database and incorporates them into the LLM’s prompt context to support action prediction and task completion. Experiments conducted on the Mind2Web dataset show that MIMS significantly outperforms Synapse and AWM in Cross-Task settings, achieving 40.6 in Element Accuracy (EA), 37.0 in Step Success Rate (SSR), and 5.1 in overall Success Rate (SR). While the performance in Cross-Website and Cross-Domain settings is comparable to existing methods, our results highlight the potential of semantically driven workflow retrieval in enhancing LLM-based agents for web navigation tasks. |
| URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/97355 |
| DOI: | 10.6342/NTU202500827 |
| 全文授權: | 同意授權(全球公開) |
| 電子全文公開日期: | 2025-05-08 |
| 顯示於系所單位: | 工程科學及海洋工程學系 |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-113-2.pdf | 4.67 MB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
