請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/91745
完整後設資料紀錄
DC 欄位 | 值 | 語言 |
---|---|---|
dc.contributor.advisor | 黃乾綱 | zh_TW |
dc.contributor.advisor | Chien-Kang Huang | en |
dc.contributor.author | 黃仙惠 | zh_TW |
dc.contributor.author | Hsien-Hui Huang | en |
dc.date.accessioned | 2024-02-22T16:31:34Z | - |
dc.date.available | 2024-02-23 | - |
dc.date.copyright | 2024-02-22 | - |
dc.date.issued | 2024 | - |
dc.date.submitted | 2024-02-03 | - |
dc.identifier.citation | [1] Marcos Baez, Florian Daniel, and Fabio Casati. 2020. Conversational web interaction: proposal of a dialog-based natural language interaction paradigm for the web. In Chatbot Research and Design: Third International Workshop, CONVERSATIONS 2019, Amsterdam, The Netherlands, November 19–20, 2019, Revised Selected Papers 3. Springer, 94–110.
[2] Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Pra- fulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Re- won Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Had- sell, M.F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 1877– 1901. https://proceedings.neurips.cc/paper_files/paper/2020/file/ 1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf [3] Paweł Budzianowski, Tsung-Hsien Wen, Bo-Hsiang Tseng, Iñigo Casanueva,Stefan Ultes, Osman Ramadan, and Milica Gašić. 2018. MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling (Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing). Association for Computational Linguistics, 5016–5026. [4] Chainlit. 2023. Chainlit. https://github.com/Chainlit/chainlit [5] Harrison Chase. 2022. Langchain. https://www.langchain.com/ [6] Jieshan Chen, Mulong Xie, Zhenchang Xing, Chunyang Chen, Xiwei Xu, Liming Zhu, and Guoqiang Li. 2020. Object detection for graphical user interface: Old fashioned or deep learning or a combination?. In proceedings of the 28th ACM joint meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1202–1214. [7] Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, and Se- bastian Gehrmann. 2022. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311 (2022). [8] Naver Corporation. 2017. Naver Clova. https://clova.ai/ [9] Nat Friedman. 2022. natbot. https://github.com/nat/natbot [10] Ehsan Hosseini-Asl, Bryan McCann, Chien-Sheng Wu, Semih Yavuz, and Richard Socher. 2020. A simple language model for task-oriented dialogue. Advances in Neural Information Processing Systems 33 (2020), 20179–20191. [11] Apple Inc. 2011. Siri. https://www.apple.com/tw/siri/ [12] Wei Jason, Wang Xuezhi, Schuurmans Dale, Bosma Maarten, ichter brian, Xia Fei, H. Chi Ed, V. Le Quoc, and Zhou Denny. 2022. Chain of Thought Prompting Elicits Reasoning in Large Language Models. In Advances in Neural Information Processing Systems, H. Oh Alice, Agarwal Alekh, Belgrave Danielle, and Cho Kyunghyun (Eds.). https://openreview.net/forum?id=_VjQlMeSB_J [13] Wei Jason, Tay Yi, Bommasani Rishi, Raffel Colin, Zoph Barret, Borgeaud Sebas- tian, Yogatama Dani, Bosma Maarten, Zhou Denny, Metzler Donald, H. Chi Ed, Hashimoto Tatsunori, Vinyals Oriol, Liang Percy, Dean Jeff, and Fedus William. 2022. Emergent Abilities of Large Language Models. Transactions on Machine Learning Research (2022). https://openreview.net/forum?id=yzkSU5zdwD Survey Certification. [14] Sang-Woo Lee, Sungdong Kim, Donghyeon Ko, Donghoon Ham, Youngki Hong, Shin Ah Oh, Hyunhoon Jung, Wangkyo Jung, Kyunghyun Cho, and Donghyun Kwak. 2022. Can Current Task-oriented Dialogue Models Automate Real-world Scenarios in the Wild? arXiv preprint arXiv:2212.10504 (2022). [15] Google LLC. 2016. Google Assistant. https://assistant.google.com/ [16] Microsoft. 2019. Playwright. https://playwright.dev/python/ [17] Jinjie Ni, Tom Young, Vlad Pandelea, Fuzhao Xue, and Erik Cambria. 2023. Recent advances in deep learning based dialogue systems: a systematic survey. Artificial Intelligence Review 56, 4 (2023), 3055–3155. https://doi.org/10.1007/ s10462-022-10248-8 [18] OpenAI. 2022. ChatGPT. https://chat.openai.com/ [19] Yixuan Su, Cai, Yi-An Training for (Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1 Association for Computational Linguistics, 4661–4676. https://doi.org/10. 18653/v1/2022.acl-long.319 [20] Liangtai Sun, Xingyu Chen, Lu Chen, Tianle Dai, Zichen Zhu, and Kai Yu.2022. META-GUI: Towards Multi-modal Conversational Agents on Mobile GUI (Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing). Association for Computational Linguistics, 6699–6712. https://aclanthology.org/2022.emnlp-main.449 [21] KashyapTodi,LuisA.Leiva,DanielBuschek,PinTian,andAnttiOulasvirta.2021. Conversations with GUIs. , 1447–1457 pages. https://doi.org/10.1145/ 3461778.3462124 [22] Sai Vemprala, Rogerio Bonatti, Arthur Bucker, and Ashish Kapoor. 2023. ChatGPT for Robotics: Design Principles and Model Abili- ties. https://www.microsoft.com/en-us/research/publication/ chatgpt-for-robotics-design-principles-and-model-abilities/ [23] vicuna. 2023. vicuna. https://github.com/lm-sys/FastChat [24] Bryan Wang, Gang Li, and Yang Li. 2023. Enabling conversational interaction with mobile ui using large language models. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–17. [25] Zhizheng Zhang, Xiaoyi Zhang, Wenxuan Xie, and Yan Lu. 2023. Responsible Task Automation: Empowering Large Language Models as Responsible Task Automa- tors. arXiv preprint arXiv:2306.01242 (2023). | - |
dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/91745 | - |
dc.description.abstract | 填槽式任務導向對話框架(SF-TOD framework)預先定義任務專用槽(task- specific slots)並透過任務專用槽應用程式開發接口(task-specific slots API)來處理用戶的請求,但在實際應用上會遇到缺乏 task-specific slots API 的問題。在此問題上已有研究提出了一種基於圖形使用者介面的任務導向對話系統(GUI-TOD system),此系統通過學習操作 GUI 介面動作(例如點擊、滑動、輸入文字等)在 應用程式 GUI 介面上執行任務,不需要透過 task-specific slots API。然而,目前 GUI-TOD system 研究主要集中在行動應用程式的 GUI 介面,尚未有網頁 GUI 介 面的 GUI-TOD system。
本研究使用大型語言模型開發基於網頁介面互動的任務導向對話系統(Task Oriented Dialogue system with Web UI Interaction using Large Language Models, 簡稱 為 TOD-WebUII-LLM),TOD-WebUII-LLM 結合了大型語言模型和網頁自動化測試框架來實現具有網頁介面互動的 GUI-TOD system,實作過程中使用 LLMs 作為智慧型代理來理解和操作 GUI 介面,同時利用網頁測試自動化框架 Playwright 來 執行網頁 GUI 操作。本研究的主要貢獻為提供了一個對話系統,在該系統上使用 自然語言對話就能自動化地在 Web 上完成指定任務;同時,本研究也為該系統制 定了評估標準和蒐集 14 個測試案例作為討論基礎。 | zh_TW |
dc.description.abstract | The Slot Filling Task-Oriented Dialogue (SF-TOD) framework pre-defines task-specific slots and utilizes a Task-Specific Slots API to handle user requests. However, practical applications often face challenges due to the lack of a task-specific slots API. Previous research has introduced a Graphical User Interface Task-Oriented Dialogue system (GUI-TOD system), which learns to perform tasks on application GUI interfaces by understanding GUI actions such as clicks, scrolls, and text input, without relying on task-specific slots API. However, existing GUI-TOD system research has primarily focused on mobile application GUI interfaces, leaving a gap in the exploration of GUI-TOD systems for web interfaces.
This study presents the development of a Task-Oriented Dialogue system with Web UI Interaction using Large Language Models (TOD-WebUII-LLM) utilizing a large language model and a web automation testing framework. TOD-WebUII-LLM combines large language models and web automation testing using the Playwright framework to implement a GUI-TOD system with web interface interaction. In the implementation, large language models serve as intelligent agents to comprehend and manipulate GUI interfaces, while the Playwright web testing automation framework executes web GUI operations. The primary contribution of this research lies in providing a dialogue system that enables the automation of specified tasks on the web through natural language conversations. Additionally, the study establishes evaluation criteria and collects 14 test cases for discussion and analysis. | en |
dc.description.provenance | Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-02-22T16:31:34Z No. of bitstreams: 0 | en |
dc.description.provenance | Made available in DSpace on 2024-02-22T16:31:34Z (GMT). No. of bitstreams: 0 | en |
dc.description.tableofcontents | 謝辭 i
摘要 ii Abstract iii 目次 v 圖次 ix 表次 xi 第一章 緒論 1 1.1 研究背景 1 1.2 研究動機及目標 2 1.3 研究貢獻 3 第二章 相關背景知識及文獻 4 2.1 META-GUI 5 2.2 WebTOD 7 2.3 大型語言模型作為智慧型代理 9 2.3.1 GUI介面上的對話代理 9 2.3.2 可靠的GUI任務自動化代理 11 2.4 文獻總結 14 第三章 系統分析與實作 15 3.1 使用案例與需求分析 15 3.2 整體系統架構 17 3.2.1 系統模組概要 17 3.2.2 系統執行任務流程 19 3.3 網頁交互者模組 23 3.3.1 問題定義 23 3.3.2 nabot錯誤分析 25 3.3.3 研究方法 25 3.4 大型語言模型鏈模組 28 3.4.1 問題定義 28 3.4.2 研究方法 30 3.4.2.1 Langchain框架錯誤分析 30 3.4.2.2 LLM Chain 輸入輸出 31 3.4.2.3 客製化LLM 32 3.4.2.4 LLM Chain 整體架構 33 3.4.2.5 提示詞架構 35 3.4.3 四個大型語言模型鏈模組 37 3.4.3.1 模組一:任務步驟生成 38 3.4.3.2 模組二:判斷任務完成 39 3.4.3.3 模組三:網頁指令生成 39 3.4.3.4 模組四:提供網頁資訊 40 3.5 任務狀態追蹤模組 41 3.6 對話介面模組 42 第四章 實驗結果與應用案例討論 44 4.1 評估標準與實驗設置 45 4.1.1 端對端系統任務評估標準 45 4.1.2 大型語言模型鏈模組評估方法 46 4.1.3 實驗設置 47 4.2 評估流程 51 4.2.1 測試案例與評估資料集 51 4.2.1.1 蒐集測試案例 51 4.2.1.2 蒐集評估資料集 52 4.2.2 自動化評估結果和人工校正評估結果 54 4.3 評估結果 55 4.3.1 端對端系統任務性能 55 4.3.2 大型語言模型鏈任務性能 56 4.3.3 不同大型語言模型的系統任務性能比較 57 4.4 測試案例結果 58 4.4.1 測試案例討論與錯誤分析 58 4.4.2 使用者回饋 60 4.5 現有挑戰與討論 61 4.5.1 LLM應用程式開發的討論與挑戰 61 4.5.2 網頁GUI介面不存在或是無法完成任務 61 4.5.3 網站機器人阻擋機制和資料隱私權問題 62 4.5.4 對話過程中的使用者意圖追蹤及確認 62 第五章 結論與貢獻 64 參考文獻 65 附錄 A — 四個 LLM Chain 模型任務提示架構 PromptTemplate、Output Parser 的 yaml 檔 70 附錄 B — 人工校正評估結果細節 76 | - |
dc.language.iso | zh_TW | - |
dc.title | 使用大型語言模型開發基於網頁介面互動的任務導向對話系統 | zh_TW |
dc.title | Developing a task oriented dialogue system with web UI interaction using large language models | en |
dc.type | Thesis | - |
dc.date.schoolyear | 112-1 | - |
dc.description.degree | 碩士 | - |
dc.contributor.oralexamcommittee | 張瑞益;馬尚彬;李允中 | zh_TW |
dc.contributor.oralexamcommittee | Ray-I Chang;Shang-Pin Ma;Jonathan Lee | en |
dc.subject.keyword | 大型語言模型,網頁測試自動化框架,具有 GUI 介面互動的對話系統, | zh_TW |
dc.subject.keyword | Large Language Models,Web Test Automation,GUI-TOD system, | en |
dc.relation.page | 78 | - |
dc.identifier.doi | 10.6342/NTU202400041 | - |
dc.rights.note | 未授權 | - |
dc.date.accepted | 2024-02-05 | - |
dc.contributor.author-college | 工學院 | - |
dc.contributor.author-dept | 工程科學及海洋工程學系 | - |
顯示於系所單位: | 工程科學及海洋工程學系 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-112-1.pdf 目前未授權公開取用 | 7.58 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。