條款圖譜優化檢索增強生成模型：以壽險產業為例

李書成; Shu-Chen Lee

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/99210

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	曹承礎	zh_TW
dc.contributor.advisor	Seng-Cho Chou	en
dc.contributor.author	李書成	zh_TW
dc.contributor.author	Shu-Chen Lee	en
dc.date.accessioned	2025-08-21T16:49:24Z	-
dc.date.available	2025-08-22	-
dc.date.copyright	2025-08-21	-
dc.date.issued	2025	-
dc.date.submitted	2025-08-05	-
dc.identifier.citation	[1]A. Asai, Z. Wu, Y. Wang, A. Sil, and H. Hajishirzi. Self-rag: Learning to retrieve, generate, and critique through self-reflection. arXiv preprint arXiv:2310.11511, 2023. [2] D. Beauchemin. Quebec automobile insurance question-answering with retrieval- augmented generation. arXiv preprint arXiv:2410.09623, 2024. [3] L. Cao. Learn to refuse: Making large language models more controllable and re- liable through knowledge scope limitation and refusal mechanism. arXiv preprint arXiv:2311.01041, 2023. [4] D.Edge,N.C.HaTrinh,J.Bradley,A.M.AlexChao,S.Truitt,D.Metropolitansky, R. O. Ness, and J. Larson. From local to global: A graph rag approach to query- focused summarization. arXiv:2404.16130, 2024. [5] S. Es, J. James, L. Espinosa-Anke, and S. Schockaert. Ragas: Automated evaluation of retrieval augmented generation. arXiv preprint arXiv:2309.15217, 2023. [6] Z. Guo, L. Xia, Y. Yu, T. Ao, and C. Huang. Lightrag: Simple and fast retrieval- augmented generation. arXiv:2410.05779, 2024. [7] Z. Ji. Towards mitigating llm hallucination via self reflection. In Findings of the EMNLP Conference, 2023. [8] Z. Ji, Z. Liu, N. Lee, T. Yu, B. Wilie, M. Zeng, and P. Fung. Rho (ρ): Reducing hallu- cination in open-domain dialogues with knowledge grounding. arXiv:2212.01588v2 [cs.CL] 12 May 2023, 2023. [9] J.Li.Banishingllmhallucinationsrequiresrethinkinggeneralization.arXivpreprint arXiv:2406.17642, 2024. [10] B. Meskó. Prompt engineering as an important emerging skill for medical profes- sionals: Tutorial. doi:10.2196/50638, 2023. [11] A.Neelakantan,T.Xu,R.Puri,A.Radford,J.M.Han,J.Tworek,Q.Yuan,N.Tezak, J. W. Kim, C. Hallacy, J. Heidecke, P. Shyam, B. Power, T. E. Nekoul, G. Sastry, G. Krueger, D. Schnurr, F. P. Such, K. Hsu, M. Thompson, T. Khan, T. Sherbakov, J. Jang, P. Welinder, and L. Weng. Text and code embeddings by contrastive pre- training. arXiv preprint arXiv:2201.10005, 2022. [12] H. Orgad. Llms know more than they show: On the intrinsic representation of llm hallucinations. arXiv preprint arXiv:2410.02707, 2024. [13] A. Salemi and H. Zamani. Evaluating retrieval quality in retrieval-augmented gen- eration. In Proceedings of the ACM SIGIR Conference, 2024. [14] J. Singh. How rag models are revolutionizing question-answering systems: Advanc- ing healthcare, legal, and customer support domains. Data and AI Research Journal, 2024. [15] S.T.I.Tonmoy,S.M.M.Zaman,V.Jain,A.Rani,V.Rawte,A.Chadha,andA.Das. A comprehensive survey of hallucination mitigation techniques in large language models. arXiv:2401.01313v1, 2024. [16] J. White, Q. Fu, S. Hays, M. Sandborn, C. Olea, H. Gilbert, A. Elnashar, J. Spencer- Smith, and D. C. Schmidt. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382, 2023. [17] H. Xu, Z. Zhu, S. Zhang, D. Ma, S. Fan, L. Chen, and K. Yu. Rejection improves reliability: Training llms to refuse unknown questions using rl from knowledge feed- back. arXiv preprint arXiv:2403.18349, 2024.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/99210	-
dc.description.abstract	檢索增強生成（RAG）技術雖能有效提升大型語言模型處理特定知識的能力，但在面對如保險條款等具有高度內部關聯性的專業領域文件時，傳統的扁平化檢索方式常導致上下文理解不足，因此生成零碎化的答案。為了解決此問題，本研究提出並實現了名為 Clause-RAG 的新型模型框架。此模型框架的核心在於解析保單條款的條文之間邏輯關係，並且建構出此條款的圖譜，此圖譜用以捕捉並表達條文間的結構化關係，從而讓模型在檢索階段時，可以整合更完整的上下文資訊，進而在生成答案階段時，能具備更深層次的上下文理解能力，並且生成出更全面且精準的答案。本研究透過在真實壽險保單文件上進行的實驗，將 Clause-RAG 模型與標準 RAG 模型進行了嚴謹的比較。我們設計了兩種不同難度的問題集：「單點事實檢索型」與「多點邏輯推理型」，並由人類專家從全面性、多樣性與賦能性三個維度進行評估。實驗結果清晰地表明，Clause-RAG 模型在整體表現上顯著優於標準 RAG 模型。此優勢在處理需要整合多個條文資訊的複雜推理問題時尤為突出，尤其在提升回答的「多樣性」方面，Clause-RAG 展現了明顯優於標準 RAG 模型的表現。研究亦發現，即便在基礎的單點事實查詢中，Clause-RAG 也能提供品質更佳的答案。最後此研究也提出了 Clause-RAG 模型架構的限制以及未來研究方向，以期可以更近一步本研究的主要貢獻在於驗證了以圖譜結構增強上下文理解，是解決專業領域複雜問答任務的有效途徑，為壽險產業抑或是相關領域的應用發展提供了具體的實證與方向。	zh_TW
dc.description.abstract	While Retrieval-Augmented Generation (RAG) technology effectively enhances the ability of large language models to process specific knowledge, traditional flat retrieval methods often lead to insufficient contextual understanding and fragmented answers when dealing with documents from professional domains with high internal correlation, such as insurance policies. To address this issue, this study proposes and implements a novel model framework named Clause-RAG. The core of this framework lies in parsing the logical relationships between policy clauses and constructing a knowledge graph of these clauses. This graph is used to capture and express the structured relationships between clauses, enabling the model to integrate more complete contextual information during the retrieval phase. Consequently, during the answer generation phase, the model possesses a deeper level of contextual understanding and can produce more comprehensive and precise answers. Through experiments conducted on authentic life insurance policy documents, this study performs a rigorous comparison between the Clause-RAG model and a standard RAG model. We designed two question sets of varying difficulty: "single-fact retrieval" and "multi-hop logical reasoning." The models were evaluated by human experts across three dimensions: comprehensiveness, diversity, and empowerment. The experimental results clearly indicate that the Clause-RAG model significantly outperforms the standard RAG model in overall performance. This advantage is particularly prominent when processing complex reasoning problems that require the integration of information from multiple clauses. Especially in enhancing the "diversity" of the answers, Clause-RAG demonstrated a markedly superior performance compared to the standard RAG model. The study also found that even for basic single-fact queries, Clause-RAG can provide higher-quality answers. The main contribution of this study is the validation that enhancing contextual understanding through a graph structure is an effective approach for solving complex question-answering tasks in professional domains. This provides concrete empirical evidence and a clear direction for the development of next-generation intelligent question-answering systems.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2025-08-21T16:49:24Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2025-08-21T16:49:24Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	致謝 i 摘要 iii Abstract v 目次 vii 圖次 x 表次 xi 第一章介紹 1 1.1 研究背景 1 1.2 研究動機 5 1.3 研究預期產出 7 第二章相關研究 10 2.1 Retrieval Augmented Generation 10 2.2 Self-RAG 11 2.3 Large Language Model Hallucination 12 2.4 Prompt Engineering 13 2.5 Embeddings 14 2.6 Refuse to Answer 15 2.7 Learn to Refuse,L2R 16 2.8 Retrieval-Augmented Generation Evaluation 16 2.9 RAG with Graph 17 第三章研究方法 20 3.1 資料集 20 3.1.1 保單條款資料說明 20 3.1.2 保單條款資料前處理 20 3.1.2.1 資料清洗 21 3.1.2.2 結構化處理 21 3.1.2.3 數據存儲與管理 22 3.1.2.4 保單條款圖形化處理 22 3.1.3 問題集設計 23 3.1.3.1 問題生成 23 3.1.3.2 問題集答案標註 24 3.2 模型設計 24 3.2.1 線下前處理階段(Offline Preprocessing) 24 3.2.2 線上查詢處理階段 (Online Query Processing) 26 3.3 實驗設計 28 3.3.1 研究目標與假說 29 3.3.2 實驗模型 29 3.3.3 實驗結果評估 30 第四章 Result 31 4.1 實驗設定 31 4.1.1 評估基準 31 4.1.2 模型設定 32 4.2 標準 RAG 模型與 Clause-RAG 模型的表現比較 33 4.2.1 v3 問題集評估結果 33 4.2.2 v4 問題集評估結果 34 4.3 綜合分析 36 第五章 Discussion 37 5.1 研究限制 37 5.1.1 領域的侷限性 37 5.1.2 資料規模的侷限性 38 5.1.3 評估方法的侷限性 38 5.1.4 模型技術的侷限性 38 5.2 未來研究方向 39 5.2.1 擴展應用領域 39 5.2.2 大型語言模型自動化評估 39 5.2.3 優化檢索階段 41 第六章 Conclusion 43 參考文獻 45	-
dc.language.iso	zh_TW	-
dc.subject	檢索增強生成模型評估指標	zh_TW
dc.subject	壽險產業	zh_TW
dc.subject	條款圖譜	zh_TW
dc.subject	檢索增強生成	zh_TW
dc.subject	人工智慧應用	zh_TW
dc.subject	Clause-Graph	en
dc.subject	Life Insurance Industry	en
dc.subject	Artificial Intelligence Applications	en
dc.subject	Evaluation Metrics for RAG Models	en
dc.subject	Retrieval-Augmented Generation	en
dc.title	條款圖譜優化檢索增強生成模型：以壽險產業為例	zh_TW
dc.title	Enhancing RAG Models via Clause-Graphs for the Insurance Industry	en
dc.type	Thesis	-
dc.date.schoolyear	113-2	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	陳建錦;杜志挺	zh_TW
dc.contributor.oralexamcommittee	Chien-Chin Chen;Chih-Ting Du	en
dc.subject.keyword	檢索增強生成,檢索增強生成模型評估指標,人工智慧應用,壽險產業,條款圖譜,	zh_TW
dc.subject.keyword	Retrieval-Augmented Generation,Evaluation Metrics for RAG Models,Artificial Intelligence Applications,Life Insurance Industry,Clause-Graph,	en
dc.relation.page	47	-
dc.identifier.doi	10.6342/NTU202503232	-
dc.rights.note	同意授權(限校園內公開)	-
dc.date.accepted	2025-08-07	-
dc.contributor.author-college	管理學院	-
dc.contributor.author-dept	資訊管理學系	-
dc.date.embargo-lift	2025-08-22	-
顯示於系所單位：	資訊管理學系

文件中的檔案：

檔案	大小	格式
ntu-113-2.pdf 授權僅限NTU校內IP使用（校園外請利用VPN校外連線服務）	2.35 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。