檢索增強生成適應性指標研究

魏冠宇; Kuan-Yu Wei

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/93239

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	曹承礎	zh_TW
dc.contributor.advisor	Seng-Cho Chou	en
dc.contributor.author	魏冠宇	zh_TW
dc.contributor.author	Kuan-Yu Wei	en
dc.date.accessioned	2024-07-23T16:27:07Z	-
dc.date.available	2024-07-24	-
dc.date.copyright	2024-07-23	-
dc.date.issued	2024	-
dc.date.submitted	2024-06-27	-
dc.identifier.citation	[1] A. Asai, Z. Wu, Y. Wang, A. Sil, and H. Hajishirzi. Self-rag: Learning to retrieve, generate, and critique through self-reflection, 2023. [2] J. Chen, H. Lin, X. Han, and L. Sun. Benchmarking large language models in retrieval-augmented generation, 2023. [3] S. Es, J. James, L. Espinosa-Anke, and S. Schockaert. Ragas: Automated evaluation of retrieval augmented generation, 2023. [4] Y. Gao, Y. Xiong, X. Gao, K. Jia, J. Pan, Y. Bi, Y. Dai, J. Sun, and H. Wang. Retrieval- augmented generation for large language models: A survey, 2023. [5] Z. JI, N. LEE, R. FRIESKE, T. YU, D. SU, Y. XU, E. ISHII, Y. BANG, W. DAI, A. MADOTTO, and P. FUNG. Survey of hallucination in natural language generation, 2022. [6] M. Lee. A mathematical investigation of hallucination and creativity in gpt models, 2023. [7] P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, M. L. Heinrich Küttler, W. tau Yih, T. Rocktäschel, S. Riedel, and D. Kiela. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks, 2020. [8] J. Menick, M. Trebacz, V. Mikulik, J. Aslanides, F. Song, M. Chadwick, M. Glaese, S. Young, L. Campbell-Gillingham, G. Irving, and N. McAleese. Teaching language models to support answers with verified quotes, 2022. [9] A. Neelakantan, T. Xu, R. Puri, A. Radford, J. M. Han, J. Tworek, Q. Yuan, N. Tezak, J. W. Kim, C. Hallacy, J. Heidecke, P. Shyam, B. Power, T. E. Nekoul, G. Sastry, G. Krueger, D. Schnurr, F. P. Such, K. Hsu, M. Thompson, T. Khan, T. Sherbakov, J. Jang, P. Welinder, and L. Weng. Text and code embeddings by contrastive pre-training, 2022. [10] Qdrant. Cloud platforms faq dataset. [11] P. Rajpurkar, R. Jia, and P. Liang. Know what you don’t know: Unanswerable questions for squad, 2018. [12] S. J. Semnani, H. C. Z. Violet Z. Yao, and M. S. Lam. Wikichat: Stopping the hallucination of large language model chatbots by few-shot grounding on wikipedia, 2023. [13] J. Wei, Y. Tay, R. Bommasani, C. Raffel, B. Zoph, S. Borgeaud, D. Yogatama, M. Bosma, D. Zhou, D. Metzler, E. H. Chi, T. Hashimoto, O. Vinyals, P. Liang, J. Dean, and W. Fedus. Emergent abilities of large language models, 2022. [14] J. Wei, X. Wang, D. Schuurmans, M. Bosma, B. Ichter, F. Xia, E. Chi, Q. Le, and D. Zhou. Chain-of-thought prompting elicits reasoning in large language models, 2022. [15] O. Weller, M. Marone, N. Weir, D. Lawrie, D. Khashabi, and B. V. Durme. “According to . . . ＂Prompting Language Models Improves Quoting from Pre-Training Data, 2023. [16] S. Wiegreffe, J. Hessel, S. Swayamdipta, M. Riedl, and Y. Choi. Reframing human-ai collaboration for generating free-text explanations, 2021. [17] T. Zhao, M. Wei, J. S. Preston, and H. Poon. Automatic calibration and error correction for generative large language models via pareto optimal self-supervision, 2023.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/93239	-
dc.description.abstract	大型語言模型（LLM, large language model）持續快速發展，為彌補模型的不足與因應不同情境下的使用需求，檢索增強生成（RAG, Retrieval-Augmented Generation）等模型已被成熟運用。然而，大型語言模型如何處理「檢索文檔」（Retrieved Documents）尚屬於一個「黑盒子」，其決策過程封閉且不透明，限制了解釋性與可追蹤性。本論文提出了檢索增強生成適應性指標（RAG Adaptability Metric），藉由提示工程（Prompt Engineering）使生成模型（Generation Model）在生成回答時同時輸出支持文檔（Supporting Documents），讓模型同時指出其認為的產生回答的依據。本論文提出了檢索增強生成適應性指標（RAG Adaptability Metric），通過提示工程（Prompt Engineering）使生成模型（Generation Model）在生成回答時同時輸出支持文檔（Supporting Documents），讓模型指出其認為的產生回答的依據。研究中也發現，生成模型在發現檢索文檔不足以支持生成回應時，會轉而採用生成模型訓練過程學習的知識，即記憶化參數來生成回應內容，在部分情境中，記憶化參數生成的回應是可接受的，但同時存在生成幻覺（Generative Hallucination）的風險，因此，本研究藉由取得支持文檔與提問、檢索文檔、回答間的內容相關性產生檢索增強生成適應性指標，賦予檢索增強生成模型，生成過程的解釋性與可追蹤性。研究結果顯示，檢索增強生成適應性指標適用於多種檢索方法與不同的生成模型，在識別潛在有風險的生成結果上表現良好，並且可以協助辨別大型語言模型是否依照檢索文檔生成回答，還是發生「拒絕回答」或「自行產生回答」的情境，提供調適模型或訓練模型的參考依據。	zh_TW
dc.description.abstract	Large Language Models (LLMs) are continuously advancing at a rapid pace. To address the shortcomings of these models and cater to various contextual usage needs, models like Retrieval-Augmented Generation (RAG) have been maturely utilized. However, how LLMs handle 'retrieved documents' remains a 'black box', with a decision-making process that is closed and opaque, limiting explainability and traceability. This paper proposes the RAG Adaptability Metric, which uses prompt engineering to enable the generation model to output supporting documents when generating responses, thus allowing the model to indicate the basis for its responses. The study found that when the retrieval documents are insufficient to support the generated response, the model tends to rely on the knowledge learned during its training process, i.e., memorized parameters, to generate the response content. In some scenarios, responses generated from memorized parameters are acceptable, but there is a risk of generative hallucination. Therefore, this study introduces the RAG Adaptability Metric by obtaining the relevance between supporting documents, queries, retrieved documents, and responses, enhancing the explainability and traceability of the RAG process. The results show that the RAG Adaptability Metric is applicable to various retrieval methods and different generation models, performing well in identifying potentially risky generated responses. It can help distinguish whether the LLM generates responses based on retrieved documents or if it occurs in situations of 'refusal to respond' or 'self-generated responses', providing reference for adjusting or training the models.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-07-23T16:27:07Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2024-07-23T16:27:07Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	Abstract i Contents iii 目次 v 第一章 Introduction 1 1.1 研究背景 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 研究動機 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 研究預期產出 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.4 研究假設 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 第二章 Related Work 8 2.1 Retrieval Augmented Generation . . . . . . . . . . . . . . . . . . . 8 2.2 Self-RAG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.3 Model Hallucination . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.4 Prompt Engineering . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.5 Refusal to Answer and L2R . . . . . . . . . . . . . . . . . . . . . . 11 2.6 Embeddings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.7 Retrieval Augmented Generation Assessment . . . . . . . . . . . . . 12 第三章 Methodology 14 3.1 模型設計 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.1.1 相關性閥值（Relevance Threshold） . . . . . . . . . . . . . . . . 15 3.1.2 檢索增強生成適應性指標 . . . . . . . . . . . . . . . . . . . . . . 15 3.1.3 主要指標（Primary Metric） . . . . . . . . . . . . . . . . . . . . 16 3.1.3.1 （1）支持文檔與提問的相關性指標 . . . . . . . . . 16 3.1.3.2 （2）支持文檔與檢索文檔的相關性指標 . . . . . . 16 3.1.3.3 （3）支持文檔與回答的相關性指標 . . . . . . . . . 17 3.1.4 輔助判斷指標（Auxiliary Assessment Metric） . . . . . . . . . . 17 3.1.4.1 （4）提問與檢索文檔相關性 . . . . . . . . . . . . . 17 3.1.4.2 （5）提問與回答相關性 . . . . . . . . . . . . . . . . 18 3.1.5 預期實驗設計 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.2 實驗模型設計 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.2.1 指標數值制定 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.2.2 結果評估 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.2.3 指標性能實驗 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.2.4 生成器實驗 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.3 實驗資料集 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.3.1 Cloud Platform FAQ Dataset . . . . . . . . . . . . . . . . . . . . . 23 3.3.2 Stanford Question Answering Dataset (SQuAD) . . . . . . . . . . . 23 第四章 Results 24 4.1 檢索增強生成適應性主要指標性能 . . . . . . . . . . . . . . . . . 24 4.1.1 ROC Curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4.1.2 Analysis of F1-Score and F2-Score . . . . . . . . . . . . . . . . . . 25 4.1.3 其他生成模型 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.2 輔助判斷指標性能 . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.3 區分回答依據 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 第五章 Discussion 34 5.1 實驗結果與應用 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 5.2 大型語言模型性能影響 . . . . . . . . . . . . . . . . . . . . . . . . 35 第六章 Conclusion 36 6.1 Limitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 6.1.1 相關性分數與閥值的限制 . . . . . . . . . . . . . . . . . . . . . . 37 6.1.2 僅適用於大型語言模型生成器 . . . . . . . . . . . . . . . . . . . 38 6.1.3 實驗過程限制 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 6.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 6.2.1 探討提示詞的影響 . . . . . . . . . . . . . . . . . . . . . . . . . . 39 6.2.2 與其他檢索增強生成架構互動的能力 . . . . . . . . . . . . . . . 40 6.2.3 相關性閥值與指標權重的調整 . . . . . . . . . . . . . . . . . . . 40 參考文獻 42	-
dc.language.iso	zh_TW	-
dc.subject	檢索增強生成適應性指標	zh_TW
dc.subject	檢索增強生成	zh_TW
dc.subject	檢索增強生成提示工程	zh_TW
dc.subject	RAG Prompt Engineering	en
dc.subject	RAG	en
dc.subject	RAG Adaptability Metric	en
dc.title	檢索增強生成適應性指標研究	zh_TW
dc.title	RAG Adaptability Metric Study	en
dc.type	Thesis	-
dc.date.schoolyear	112-2	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	陳建錦;杜志挺	zh_TW
dc.contributor.oralexamcommittee	Chien-Chin Chen;Timon Du	en
dc.subject.keyword	檢索增強生成,檢索增強生成適應性指標,檢索增強生成提示工程,	zh_TW
dc.subject.keyword	RAG,RAG Adaptability Metric,RAG Prompt Engineering,	en
dc.relation.page	44	-
dc.identifier.doi	10.6342/NTU202401285	-
dc.rights.note	同意授權(全球公開)	-
dc.date.accepted	2024-06-28	-
dc.contributor.author-college	管理學院	-
dc.contributor.author-dept	資訊管理學系	-
顯示於系所單位：	資訊管理學系

文件中的檔案：

檔案	大小	格式
ntu-112-2.pdf	7.17 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。