Please use this identifier to cite or link to this item:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/101448| Title: | 透過雙重事件為中心的資訊增強與段落檢索提升檢索增強生成中大型語言模型的事實查核能力 Improving LLM Fact-Checking in RAG via Dual Event-Centric Information Augmentation and Passage Retrieval |
| Authors: | 吳昇陽 Sheng-Yang Wu |
| Advisor: | 鄭卜壬 Pu-Jen Cheng |
| Keyword: | 事實查核,檢索增強生成大型語言模型自然語言推論資訊增強文件改寫事件中心 fact-checking,RAGLLMNLIinformation augmentationdocument rewritingevent-centric |
| Publication Year : | 2025 |
| Degree: | 碩士 |
| Abstract: | 大型語言模型 (LLM) 會產生幻覺,即便採用檢索增強生成 (RAG),模型仍可能輸出與其參考資料矛盾的敘述,這凸顯事實查核的重要性。本文提出一種資訊增強方法,能同時將模型回答與證據文件重寫為「完整句子」,一個以事件為中心導向、可單獨理解、並包含原本上下文中明示與暗示資訊的句子。我們在多種情境下驗證完整句子的兩大效益:(1)作為更豐富的查證單位,能發現其他種類層級可能遺漏的錯誤;(2)透過重寫文件後,可提升文件段落檢索與驗證的精確度。我們生成了 500 筆基於 RAG 的事實查核資料集,並觀察到使用完整句子帶來的的效能提升:強化了 GPT‑4o mini 的事實查核表現,也使較小的 NLI 種類模型能以更低成本逼近 GPT 的準確度。我們的研究結果證實,不論是使用完整句子改寫要檢查的主張,或是處理文件端都能增進事實查核的準確性。 Large Language Models (LLMs) hallucinate, and even Retrieval‑Augmented Generation (RAG) can yield statements that contradict its own references. We propose an information‑augmentation method that rewrites both the LLM answer and the evidence documents into complete-sentence (CS) form—self‑contained, event‑centric sentences that encode explicit and implicit context. Across multiple scenarios, CS helps in two ways: (1) as a richer claim unit that surfaces errors missed by atomic facts, and (2) as a one‑time document rewrite that improves retrieval and verification accuracy. Using a 500‑sample synthetic RAG dataset with automated labels plus human checks, we show consistent gains: CS boosts GPT‑4o mini’s fact‑checking and lets a smaller DeBERTa-based NLI model approach GPT accuracy at lower cost. To our knowledge, prior work emphasized claim-side fixes; our results highlight that processing the document side can substantially enhance end-to-end factuality evaluation. |
| URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/101448 |
| DOI: | 10.6342/NTU202600226 |
| Fulltext Rights: | 未授權 |
| metadata.dc.date.embargo-lift: | N/A |
| Appears in Collections: | 資訊網路與多媒體研究所 |
Files in This Item:
| File | Size | Format | |
|---|---|---|---|
| ntu-114-1.pdf Restricted Access | 3.25 MB | Adobe PDF |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.
