Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 工學院
  3. 工程科學及海洋工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/97355
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor黃乾綱zh_TW
dc.contributor.advisorChien-Kang Huangen
dc.contributor.author林敬翔zh_TW
dc.contributor.authorChing-Hsiang Linen
dc.date.accessioned2025-05-07T16:09:46Z-
dc.date.available2025-05-08-
dc.date.copyright2025-05-07-
dc.date.issued2025-
dc.date.submitted2025-04-28-
dc.identifier.citation[1] Y. Li, H. Wen, W. Wang, X. Li, Y. Yuan, G. Liu, J. Liu, W. Xu, X. Wang, Y. Sun, R. Kong, Y. Wang, H. Geng, J. Luan, X. Jin, Z. Ye, G. Xiong, F. Zhang, X. Li, M. Xu, Z. Li, P. Li, Y. Liu, Y.-Q. Zhang, and Y. Liu, “Personal llm agents: Insights and survey about the capability, efficiency and security,” arXiv preprint arXiv:2401.05459, 2024.
[2] X. Deng, Y. Gu, B. Zheng, S. Chen, S. Stevens, B. Wang, H. Sun, and Y. Su,“Mind2web: Towards a generalist agent for the web,” Advances in Neural Information Processing Systems, vol. 36, pp. 28 091–28 114, 2023.
[3] L. Zheng, R. Wang, X. Wang, and B. An, “Synapse: Trajectory-as-exemplar prompting with memory for computer control,” in The Twelfth International Conference on Learning Representations, 2023.
[4] Z. Z. Wang, J. Mao, D. Fried, and G. Neubig, “Agent workflow memory,” 2024. [Online]. Available: https://arxiv.org/abs/2409.07429
[5] OpenAI, “Introducing chatgpt,” November 2022, accessed: 2025-02-10. [Online]. Available: https://openai.com/index/chatgpt/
[6] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. u. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds., vol. 30. Curran Associates, Inc., 2017. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
[7] B. Roziere, J. Gehring, F. Gloeckle, S. Sootla, I. Gat, X. E. Tan, Y. Adi, J. Liu, R. Sauvestre, T. Remez et al., “Code llama: Open foundation models for code,” arXiv preprint arXiv:2308.12950, 2023.
[8] Z. Xi, W. Chen, X. Guo, W. He, Y. Ding, B. Hong, M. Zhang, J. Wang, S. Jin, E. Zhou et al., “The rise and potential of large language model based agents: A survey,” Science China Information Sciences, vol. 68, no. 2, p. 121101, 2025.
[9] Apple Inc., “Siri –Apple,” 2025, accessed: 2025-02-18. [Online]. Available: https://www.apple.com/siri/
[10] Amazon.com, Inc., “Alexa, the Voice Assistant,” 2025, accessed: 2025-02-18. [Online]. Available: https://www.alexa.com/
[11] Microsoft, “Cortana support,” 2025, accessed: 2025-02-18. [Online]. Available: https://support.microsoft.com/en-us/cortana
[12] Google Inc., “Google Assistant SDK RPC Reference,” 2025, accessed: 2025-02-18. [Online]. Available: https://developers.google.com/assistant/sdk/reference/rpc?hl=zh-tw
[13] Apple Inc., “SiriKit Documentation,” 2025, accessed: 2025-02-18. [Online]. Available: https://developer.apple.com/documentation/sirikit/
[14] J. Kaplan, S. McCandlish, T. Henighan, T. B. Brown, B. Chess, R. Child, S. Gray, A. Radford, J. Wu, and D. Amodei, “Scaling laws for neural language models,” arXiv preprint arXiv:2001.08361, 2020.
[15] Anthropic, “Claude (oct 8 version),” [Large language model], 2023. [Online]. Available: https://www.anthropic.com/
[16] S. Zhou, F. F. Xu, H. Zhu, X. Zhou, R. Lo, A. Sridhar, X. Cheng, Y. Bisk, D. Fried, U. Alon et al., “Webarena: A realistic web environment for building autonomous agents,” arXiv preprint arXiv:2307.13854, 2023. [Online]. Available: https://webarena.dev
[17] S. Yao, H. Chen, J. Yang, and K. Narasimhan, “Webshop: Towards scalable realworld web interaction with grounded language agents,” in ArXiv, preprint.
[18] E. Z. Liu, K. Guu, P. Pasupat, T. Shi, and P. Liang, “Reinforcement learning on web interfaces using workflow-guided exploration,” ArXiv, vol. abs/1802.08802, 2018. [Online]. Available: https://api.semanticscholar.org/CorpusID:3530344
[19] P. He, X. Liu, J. Gao, and W. Chen, “{DEBERTA}: {DECODING}-{enhanced} {bert} {with} {disentangled} {attention},” in International Conference on Learning Representations, 2021. [Online]. Available: https://openreview.net/forum?id=XPZIaotutsD
[20] H. W. Chung, L. Hou, S. Longpre, B. Zoph, Y. Tay, W. Fedus, Y. Li, X. Wang, M. Dehghani, S. Brahma et al., “Scaling instruction-finetuned language models,” Journal of Machine Learning Research, vol. 25, no. 70, pp. 1–53, 2024.
[21] OpenAI. (2024, 7) Gpt-4o mini: Advancing cost-efficient intelligence. Accessed: 2025-04-07. [Online]. Available: https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/
[22] ——. (2024, 5) Hello gpt-4o. Accessed: 2025-04-07. [Online]. Available: https://openai.com/index/hello-gpt-4o/
[23] ——. (2023, 3) GPT-4. Accessed: 2025-04-07. [Online]. Available: https://openai.com/index/gpt-4/
[24] ——. (2023, 12) New and improved embedding model. Accessed: 2025-04-07. [Online]. Available: https://openai.com/index/new-and-improved-embedding-model/
[25] M. Douze, A. Guzhva, C. Deng, J. Johnson, G. Szilvasy, P.-E. Mazaré, M. Lomeli, L. Hosseini, and H. Jégou, “The faiss library,” arXiv preprint arXiv:2401.08281, 2024.
[26] M. Levy, A. Jacoby, and Y. Goldberg, “Same task, more tokens: the impact of input length on the reasoning performance of large language models,” in Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume1: Long Papers), 2024, pp. 15 339–15 353.
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/97355-
dc.description.abstract隨著大型語言模型(Large Language Models, LLMs)的迅速發展,基於 LLM 的人工智慧代理人(AI Agents)在網頁瀏覽任務(Web Navigation Tasks)中展現出高度潛力。然而,如何有效整合過往經驗以提升代理人的決策能力與泛化表現,仍是一項待解決的挑戰。
在過去的作法中,Synapse提出軌跡做為範例(Trajectory-as-Exemplar, TaE)機制,透過語意相似性檢索與任務描述相近的範例作為提示;代理人工作流記憶機制(Agent Workflow Memory, AWM)則進一步整合常見工作流作為提示使用。然而,AWM 僅依任務所處環境檢索資訊,容易導致檢索到與當前任務無關的資訊,限制效能提升。
本研究提出記憶整合式相似性機制(Memory-Integrated Mechanism with Similarity, MIMS),在 AWM 架構上結合動作目標預測與工作流的語意相似性檢索。任務執行時,MIMS 首先會根據任務描述、歷史軌跡與觀察資訊預測當前動作目標(Action Objective),再從向量資料庫中檢索與該目標最相近的工作流,並納入 LLM 的提示上下文中,輔助動作預測與任務執行。
在 Mind2Web 資料集上的實驗顯示,MIMS 在 Cross-Task 任務中相較於 Synapse 與 AWM 有顯著提升,於 Element Accuracy(EA)、Step Success Rate(SSR)與 Success Rate(SR)分別達到 40.6、37.0 與 5.1。雖然在 Cross-Website 與 Cross-Domain 任務中的表現與現有方法相近,但本研究展現了基於語意相似性的工作流檢索在網頁瀏覽任務中的潛力。
zh_TW
dc.description.abstractWith the rapid advancement of large language models (LLMs), LLM-based AI agents have demonstrated significant potential in web navigation tasks. However, how to effectively leverage prior experience to enhance agents' decision-making and generalization capabilities remains an open challenge.
Among prior approaches, Synapse proposed the Trajectory-as-Exemplar (TaE) mechanism, which retrieves semantically similar trajectories as prompts based on task descriptions. The Agent Workflow Memory (AWM) framework further incorporates common workflows as prompt components. Nevertheless, AWM relies solely on environment-based retrieval, which often results in selecting contextually irrelevant information, thereby limiting performance gains.
To address this issue, we propose a Memory-Integrated Mechanism with Similarity (MIMS), which extends AWM by incorporating action objective prediction and semantically guided workflow retrieval. During task execution, MIMS first predicts the current action objective based on the task description, trajectory history, and current observation. It then retrieves the most semantically relevant workflows from a vector database and incorporates them into the LLM’s prompt context to support action prediction and task completion.
Experiments conducted on the Mind2Web dataset show that MIMS significantly outperforms Synapse and AWM in Cross-Task settings, achieving 40.6 in Element Accuracy (EA), 37.0 in Step Success Rate (SSR), and 5.1 in overall Success Rate (SR). While the performance in Cross-Website and Cross-Domain settings is comparable to existing methods, our results highlight the potential of semantically driven workflow retrieval in enhancing LLM-based agents for web navigation tasks.
en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2025-05-07T16:09:46Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2025-05-07T16:09:46Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontents口試委員審定書 i
謝辭 iii
摘要 v
Abstract vii
目次 ix
圖次 xiii
表次 xv
專有名詞與縮寫對照表 xvii
第一章 緒論 1
1.1 LLM 於網頁瀏覽代理任務中的背景與研究目標 . . . . . . . . . . . 1
1.2 方法概述 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 研究貢獻 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 章節安排 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
第二章 相關文獻探討 5
2.1 人工智慧代理人(Artificial Intelligence Agent, AI Agent) . . . . . . 5
2.2 大型語言模型代理人(Large Language Model-Based Agent, LLM-Based Agent) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.1 LLM-Based Agent 於網頁瀏覽任務中的應用 . . . . . . . . . . . 7
2.3 網頁瀏覽任務資料集 . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3.1 網頁瀏覽任務類型 . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3.2 常見網頁瀏覽任務資料集 . . . . . . . . . . . . . . . . . . . . . . 8
2.3.3 Mind2Web:用於泛化能力評估的資料集 . . . . . . . . . . . . . 9
2.3.3.1 子資料集分類 . . . . . . . . . . . . . . . . . . . . . . 9
2.3.3.2 資料集內容 . . . . . . . . . . . . . . . . . . . . . . . 10
2.3.3.3 任務流程 . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4 LLM-Based Agents on Web Navigation 研究與發展 . . . . . . . . . . 12
2.4.1 MindAct:具資訊過濾機制的網路瀏覽代理人 . . . . . . . . . . 12
2.4.2 Synapse:以軌跡做為範例的語義檢索機制 . . . . . . . . . . . . 14
2.4.3 Agent Workflow Memory:使用共同工作流的提示機制 . . . . . 16
第三章 研究方法 19
3.1 現有系統分析:Synapse 與 AWM . . . . . . . . . . . . . . . . . . . 19
3.1.1 實驗設定與評估方法 . . . . . . . . . . . . . . . . . . . . . . . . 19
3.1.1.1 實驗環境 . . . . . . . . . . . . . . . . . . . . . . . . 20
3.1.1.2 資料集 . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.1.1.3 評估指標 . . . . . . . . . . . . . . . . . . . . . . . . 21
3.1.2 TaE 檢索機制分析 . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.1.3 Workflow 檢索機制分析 . . . . . . . . . . . . . . . . . . . . . . . 24
3.2 MIMS 系統架構 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.2.1 HTML 資訊過濾 . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2.2 動作目標預測 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2.3 Workflows 檢索 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2.3.1 資料前處理 . . . . . . . . . . . . . . . . . . . . . . . 27
3.2.3.2 測試階段運作流程 . . . . . . . . . . . . . . . . . . . 29
3.2.4 TaEs 檢索 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2.5 動作預測 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
第四章 實驗結果與討論 33
4.1 Workflows 數量與 TaE 組合對 SSR 表現影響之分析 . . . . . . . . . 33
4.1.1 實驗設定與評估方法 . . . . . . . . . . . . . . . . . . . . . . . . 33
4.1.2 於 Cross-Task 之 SSR 表現分析 . . . . . . . . . . . . . . . . . . . 35
4.1.3 於 Cross-Website 之 SSR 表現分析 . . . . . . . . . . . . . . . . . 36
4.1.4 於 Cross-Domain 之 SSR 表現分析 . . . . . . . . . . . . . . . . . 37
4.1.5 不同任務情境下,Workflows 數量對 Step Success Rate 影響之表現 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.2 MIMS 與現有方法之比較分析 . . . . . . . . . . . . . . . . . . . . . 39
4.2.1 實驗設定與評估方法 . . . . . . . . . . . . . . . . . . . . . . . . 39
4.2.2 實驗結果與分析 . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
第五章 結論與未來展望 41
5.1 結論 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.2 未來展望 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
參考文獻 43
-
dc.language.isozh_TW-
dc.subject網頁瀏覽zh_TW
dc.subject代理人zh_TW
dc.subject大型語言模型zh_TW
dc.subjectAgenten
dc.subjectLarge Language Modelen
dc.subjectWeb Navigationen
dc.title設計任務操作歷史檢索機制以增強基於大型語言模型的網頁瀏覽代理人的效能zh_TW
dc.titleDesigning a Task Action History Retrieval Mechanism to Enhance the Performance of LLM-Based Web Navigation Agentsen
dc.typeThesis-
dc.date.schoolyear113-2-
dc.description.degree碩士-
dc.contributor.oralexamcommittee張瑞益;馬尚彬zh_TW
dc.contributor.oralexamcommitteeRay-I Chang;Shang-Pin Maen
dc.subject.keyword大型語言模型,網頁瀏覽,代理人,zh_TW
dc.subject.keywordLarge Language Model,Web Navigation,Agent,en
dc.relation.page45-
dc.identifier.doi10.6342/NTU202500827-
dc.rights.note同意授權(全球公開)-
dc.date.accepted2025-04-28-
dc.contributor.author-college工學院-
dc.contributor.author-dept工程科學及海洋工程學系-
dc.date.embargo-lift2025-05-08-
顯示於系所單位:工程科學及海洋工程學系

文件中的檔案:
檔案 大小格式 
ntu-113-2.pdf4.67 MBAdobe PDF檢視/開啟
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved