透過雙重事件為中心的資訊增強與段落檢索提升檢索增強生成中大型語言模型的事實查核能力

吳昇陽; Sheng-Yang Wu

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/101448

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	鄭卜壬	zh_TW
dc.contributor.advisor	Pu-Jen Cheng	en
dc.contributor.author	吳昇陽	zh_TW
dc.contributor.author	Sheng-Yang Wu	en
dc.date.accessioned	2026-02-03T16:22:04Z	-
dc.date.available	2026-02-04	-
dc.date.copyright	2026-02-03	-
dc.date.issued	2025	-
dc.date.submitted	2026-01-26	-
dc.identifier.citation	S. Bird, E. Klein, and E. Loper. Natural language processing with Python: analyzing text with the natural language toolkit. ” O’Reilly Media, Inc.”, 2009. S. R. Bowman, G. Angeli, C. Potts, and C. D. Manning. A large annotated cor pus for learning natural language inference. In L. Màrquez, C. Callison-Burch, and J. Su, editors, Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 632–642, Lisbon, Portugal, Sept. 2015. Association for Computational Linguistics. T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Nee lakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M.Chen,E.Sigler, M.Litwin, S.Gray, B.Chess, J.Clark, C.Berner, S.McCandlish, A.Radford, I. Sutskever, and D. Amodei. Languagemodelsarefew-shotlearners. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS ’20, Red Hook, NY, USA, 2020. Curran Associates Inc. C.-H. Chiang and H.-y. Lee. Merging facts, crafting fallacies: Evaluating the contra dictory nature of aggregated factual claims in long-form generations. In L.-W. Ku, A. Martins, and V. Srikumar, editors, Findings of the Association for Computational Linguistics: ACL 2024, pages 2734–2751, Bangkok, Thailand, Aug. 2024. Associ ation for Computational Linguistics. B. Cohen-Wang, H. Shah, K. Georgiev, and A. Mądry. Contextcite: Attributing model generation to context. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, editors, Advances in Neural Information Processing Systems, volume 37, pages 95764–95807. Curran Associates, Inc., 2024. J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. In J. Burstein, C. Doran, and T. Solorio, editors, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. D. Edge, H. Trinh, N. Cheng, J. Bradley, A. Chao, A. Mody, S. Truitt, D. Metropoli tansky, R. O. Ness, and J. Larson. From local to global: A graph rag approach to query-focused summarization, 2025. Y.Gao,Y.Xiong,X.Gao,K.Jia,J.Pan,Y.Bi,Y.Dai,J.Sun,M.Wang,andH.Wang. Retrieval-augmented generation for large language models: A survey, 2024. N.Giarelis, C.Mastrokostas, andN.Karacapilidis. AunifiedLLM-KGframeworkto assist fact-checking in public deliberation. In A. Hautli-Janisz, G. Lapesa, L. Anasta siou, V. Gold, A. D. Liddo, and C. Reed, editors, Proceedings of the First Workshop on Language-driven Deliberation Technology (DELITE) @ LREC-COLING 2024, pages 13–19, Torino, Italia, May 2024. ELRA and ICCL. Z. Guo, L. Xia, Y. Yu, T. Ao, and C. Huang. Lightrag: Simple and fast retrieval augmented generation, 2025. P. He, J. Gao, and W. Chen. Debertav3: Improving deberta using electra-style pre training with gradient-disentangled embedding sharing, 2023. C.-W. Huang and Y.-N. Chen. FactAlign: Long-form factuality alignment of large language models. In Y. Al-Onaizan, M. Bansal, and Y.-N. Chen, editors, Findings of the Association for Computational Linguistics: EMNLP 2024, pages 16363–16375, Miami, Florida, USA, Nov. 2024. Association for Computational Linguistics. L. Huang, W. Yu, W. Ma, W. Zhong, Z. Feng, H. Wang, Q. Chen, W. Peng, X. Feng, B. Qin, and T. Liu. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. ACM Trans. Inf. Syst., 43(2), Jan. 2025. H. Iqbal, Y. Wang, M. Wang, G. N. Georgiev, J. Geng, I. Gurevych, and P. Nakov. OpenFactCheck: A unified framework for factuality evaluation of LLMs. In D. I. Hernandez Farias, T. Hope, and M. Li, editors, Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 219–229, Miami, Florida, USA, Nov. 2024. Association for Computational Linguistics. A. T. Kalai and S. S. Vempala. Calibrated language models must hallucinate. In Proceedings of the 56th Annual ACM Symposium on Theory of Computing, STOC 2024, page 160–171, New York, NY, USA, 2024. Association for Computing Ma chinery. P. Laban, T. Schnabel, P. N. Bennett, and M. A. Hearst. SummaC: Re-visiting NLI-based models for inconsistency detection in summarization. Transactions of the Association for Computational Linguistics, 10:163–177, 2022. N. Lee, Y. Bang, A. Madotto, and P. Fung. Towards few-shot fact-checking via per plexity. In K. Toutanova, A. Rumshisky, L. Zettlemoyer, D. Hakkani-Tur, I. Belt agy, S. Bethard, R. Cotterell, T. Chakraborty, and Y. Zhou, editors, Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1971–1981, On line, June 2021. Association for Computational Linguistics. P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. Küt tler, M. Lewis, W.-t. Yih, T. Rocktäschel, S. Riedel, and D. Kiela. Retrieval augmented generation for knowledge-intensive nlp tasks. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS ’20, Red Hook, NY, USA, 2020. Curran Associates Inc. Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettle moyer, and V. Stoyanov. Roberta: A robustly optimized bert pretraining approach, 2019. P. Manakul, A. Liusie, and M. Gales. SelfCheckGPT: Zero-resource black-box hal lucination detection for generative large language models. In H. Bouamor, J. Pino, and K. Bali, editors, Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 9004–9017, Singapore, Dec. 2023. Association for Computational Linguistics. S. Min, K. Krishna, X. Lyu, M. Lewis, W.-t. Yih, P. Koh, M. Iyyer, L. Zettlemoyer, and H. Hajishirzi. FActScore: Fine-grained atomic evaluation of factual precision in long form text generation. In H. Bouamor, J. Pino, and K. Bali, editors, Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 12076–12100, Singapore, Dec. 2023. Association for Computational Linguis tics. R. Nakano, J. Hilton, S. Balaji, J. Wu, L. Ouyang, C. Kim, C. Hesse, S. Jain, V. Kosaraju, W. Saunders, X. Jiang, K. Cobbe, T. Eloundou, G. Krueger, K. But ton, M. Knight, B. Chess, and J. Schulman. Webgpt: Browser-assisted question answering with human feedback, 2022. Y. Nie, A. Williams, E. Dinan, M. Bansal, J. Weston, and D. Kiela. Adversarial NLI: A new benchmark for natural language understanding. In D. Jurafsky, J. Chai, N. Schluter, and J. Tetreault, editors, Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4885–4901, Online, July 2020. Association for Computational Linguistics. H. Sansford, N. Richardson, H. P. Maretic, and J. N. Saada. Grapheval: A knowledge-graph based llm hallucination evaluation framework, 2024. T. Schuster, S. Chen, S. Buthpitiya, A. Fabrikant, and D. Metzler. Stretch ing sentence-pair NLI models to reason over long documents and clusters. In Y. Goldberg, Z. Kozareva, and Y. Zhang, editors, Findings of the Association for Computational Linguistics: EMNLP 2022, pages 394–412, AbuDhabi, United Arab Emirates, Dec. 2022. Association for Computational Linguistics. V. Setty. Factcheck editor: Multilingual text editor with end-to-end fact-checking. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’24, page 2744–2748, New York, NY, USA, 2024. Association for Computing Machinery. D. Sileo. tasksource: A large collection of NLP tasks with a structured dataset pre processing framework. In N. Calzolari, M.-Y. Kan, V. Hoste, A. Lenci, S. Sakti, and N. Xue, editors, Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 15655–15684, Torino, Italia, May 2024. ELRA and ICCL. C. V. Snell, J. Lee, K. Xu, and A. Kumar. Scaling LLM test-time compute opti mally can bemoreeffective than scaling parameters for reasoning. In The Thirteenth International Conference on Learning Representations, 2025. J. Stacey, P. Minervini, H. Dubossarsky, O.-M. Camburu, and M. Rei. Atomic infer ence for NLI with generated facts as atoms. In Y. Al-Onaizan, M. Bansal, and Y.-N. Chen, editors, Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 10188–10204, Miami, Florida, USA, Nov. 2024. Asso ciation for Computational Linguistics. S. Storks, Q. Gao, and J. Y. Chai. Recent advances in natural language inference: A survey of benchmarks, resources, and approaches, 2020. L. Tang, P. Laban, and G. Durrett. MiniCheck: Efficient fact-checking of LLMs on grounding documents. In Y. Al-Onaizan, M. Bansal, and Y.-N. Chen, editors, Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 8818–8847, Miami, Florida, USA, Nov. 2024. Association for Computational Linguistics. J. Thorne, A. Vlachos, C. Christodoulopoulos, and A. Mittal. FEVER: a large-scale dataset for fact extraction and VERification. In M. Walker, H. Ji, and A. Stent, editors, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 809–819, New Orleans, Louisiana, June 2018. As sociation for Computational Linguistics. X.Wang,J.Wei, D.Schuurmans, Q.V.Le, E.H.Chi, S.Narang, A.Chowdhery, and D. Zhou. Self-consistency improves chain of thought reasoning in language models. In The Eleventh International Conference on Learning Representations, 2023. Y. Wang, R. Gangi Reddy, Z. M. Mujahid, A. Arora, A. Rubashevskii, J. Geng, O. Mohammed Afzal, L. Pan, N. Borenstein, A. Pillai, I. Augenstein, I. Gurevych, and P. Nakov. Factcheck-bench: Fine-grained evaluation benchmark for automatic fact-checkers. In Y. Al-Onaizan, M. Bansal, and Y.-N. Chen, editors, Findings of the Association for Computational Linguistics: EMNLP 2024, pages 14199–14230, Miami, Florida, USA, Nov. 2024. Association for Computational Linguistics. B. Warner, A. Chaffin, B. Clavié, O. Weller, O. Hallström, S. Taghadouini, A. Gal lagher, R. Biswas, F. Ladhak, T. Aarsen, N. Cooper, G. Adams, J. Howard, and I. Poli. Smarter, better, faster, longer: A modern bidirectional encoder for fast, memory efficient, and long context finetuning and inference, 2024. J. Wei, X. Wang, D. Schuurmans, M. Bosma, B. Ichter, F. Xia, E. H. Chi, Q. V. Le, and D. Zhou. Chain-of-thought prompting elicits reasoning in large language models. In Proceedings of the 36th International Conference on Neural Information Processing Systems, NIPS ’22, Red Hook, NY, USA, 2022. Curran Associates Inc. J.Wei,C.Yang,X.Song,Y.Lu,N.Hu,J.Huang,D.Tran,D.Peng,R.Liu,D.Huang,C. Du, and Q. V. Le. Long-form factuality in large language models. In A. Glober son, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, edi tors, Advances in Neural Information Processing Systems, volume 37, pages 80756 80827. Curran Associates, Inc., 2024. A. Williams, N. Nangia, and S. Bowman. A broad-coverage challenge corpus for sentence understanding through inference. In M. Walker, H. Ji, and A. Stent, editors, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1112–1122, New Orleans, Louisiana, June 2018. Association for Computational Linguistics. Y. Xu, T. Cai, J. Jiang, and X. Song. Face4rag: Factual consistency evaluation for retrieval augmented generation in chinese. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD ’24, page 6083–6094, New York, NY, USA, 2024. Association for Computing Machinery. Z. Xu, S. Jain, and M. Kankanhalli. Hallucination is inevitable: An innate limitation of large language models, 2025. W. Yin, D. Radev, and C. Xiong. DocNLI: A large-scale dataset for document level natural language inference. In C. Zong, F. Xia, W. Li, and R. Navigli, edi tors, Findings oftheAssociationforComputationalLinguistics: ACL-IJCNLP2021, pages 4913–4922, Online, Aug. 2021. Association for Computational Linguistics.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/101448	-
dc.description.abstract	大型語言模型 (LLM) 會產生幻覺，即便採用檢索增強生成 (RAG)，模型仍可能輸出與其參考資料矛盾的敘述，這凸顯事實查核的重要性。本文提出一種資訊增強方法，能同時將模型回答與證據文件重寫為「完整句子」，一個以事件為中心導向、可單獨理解、並包含原本上下文中明示與暗示資訊的句子。我們在多種情境下驗證完整句子的兩大效益：（1）作為更豐富的查證單位，能發現其他種類層級可能遺漏的錯誤；（2）透過重寫文件後，可提升文件段落檢索與驗證的精確度。我們生成了 500 筆基於 RAG 的事實查核資料集，並觀察到使用完整句子帶來的的效能提升：強化了 GPT‑4o mini 的事實查核表現，也使較小的 NLI 種類模型能以更低成本逼近 GPT 的準確度。我們的研究結果證實，不論是使用完整句子改寫要檢查的主張，或是處理文件端都能增進事實查核的準確性。	zh_TW
dc.description.abstract	Large Language Models (LLMs) hallucinate, and even Retrieval‑Augmented Generation (RAG) can yield statements that contradict its own references. We propose an information‑augmentation method that rewrites both the LLM answer and the evidence documents into complete-sentence (CS) form—self‑contained, event‑centric sentences that encode explicit and implicit context. Across multiple scenarios, CS helps in two ways: (1) as a richer claim unit that surfaces errors missed by atomic facts, and (2) as a one‑time document rewrite that improves retrieval and verification accuracy. Using a 500‑sample synthetic RAG dataset with automated labels plus human checks, we show consistent gains: CS boosts GPT‑4o mini’s fact‑checking and lets a smaller DeBERTa-based NLI model approach GPT accuracy at lower cost. To our knowledge, prior work emphasized claim-side fixes; our results highlight that processing the document side can substantially enhance end-to-end factuality evaluation.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2026-02-03T16:22:04Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2026-02-03T16:22:04Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	Verification Letter from the Oral Examination Committee i Acknowledgements iii 摘要 v Abstract vii Contents ix List of Figures xiii List of Tables xv Chapter 1 Introduction 1 1.1 Fact-checking Scenarios . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.1 Fact-checking with evidence retrieval . . . . . . . . . . . . . . . . 2 1.1.2 Fact-checking RAG response against reference . . . . . . . . . . . 2 1.2 Fact-checking Methods . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2.1 Natural Language Inference (NLI) based . . . . . . . . . . . . . . . 3 1.2.2 Large Language Model (LLM) based . . . . . . . . . . . . . . . . . 3 1.3 Motivation and Contribution . . . . . . . . . . . . . . . . . . . . . . 4 Chapter 2 Related Works 5 2.1 Natural Language Inference (NLI) Models with Fact-checking . . . . 5 2.2 Large Language Models with Fact-checking . . . . . . . . . . . . . . 6 2.2.1 LLMFact-checker . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2.2 Enhancing LLM fact-checking . . . . . . . . . . . . . . . . . . . . 9 2.3 Fact-checking Dataset . . . . . . . . . . . . . . . . . . . . . . . . . 10 Chapter 3 Methodology 13 3.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.2 Information Augmentation with Complete Sentence . . . . . . . . . 16 3.2.1 Self-contained Complete Sentence . . . . . . . . . . . . . . . . . . 17 3.2.2 Information Augmentation in Fact-checking . . . . . . . . . . . . . 18 3.2.2.1 Complete Sentence as a Claim . . . . . . . . . . . . . 19 3.2.2.2 Complete Sentence for Document Paraphrasing . . . . 20 Chapter 4 Experiments 25 4.1 Dataset Construction . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.1.1 RAGResponse Generation . . . . . . . . . . . . . . . . . . . . . . 26 4.1.2 Annotation Method . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4.1.3 Human Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.2 Experiment Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.3 Main Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.5 Additional Experiments . . . . . . . . . . . . . . . . . . . . . . . . 37 4.5.1 Hyperparameter Analysis: Study on Top-k . . . . . . . . . . . . . . 38 4.5.2 Error Analysis: Study on False Negatives . . . . . . . . . . . . . . 40 4.5.3 Ablation Study: Document Side Augmentation Only . . . . . . . . 41 Chapter 5 Conclusion 51 5.1 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 5.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 References 53	-
dc.language.iso	en	-
dc.subject	事實查核	-
dc.subject	檢索增強生成	-
dc.subject	大型語言模型	-
dc.subject	自然語言推論	-
dc.subject	資訊增強	-
dc.subject	文件改寫	-
dc.subject	事件中心	-
dc.subject	fact-checking	-
dc.subject	RAG	-
dc.subject	LLM	-
dc.subject	NLI	-
dc.subject	information augmentation	-
dc.subject	document rewriting	-
dc.subject	event-centric	-
dc.title	透過雙重事件為中心的資訊增強與段落檢索提升檢索增強生成中大型語言模型的事實查核能力	zh_TW
dc.title	Improving LLM Fact-Checking in RAG via Dual Event-Centric Information Augmentation and Passage Retrieval	en
dc.type	Thesis	-
dc.date.schoolyear	114-1	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	馬偉雲;蔡銘峰;劉冠廷	zh_TW
dc.contributor.oralexamcommittee	Wei-Yun Ma;Ming-Feng Tsai;Guan-Ting Liu	en
dc.subject.keyword	事實查核,檢索增強生成大型語言模型自然語言推論資訊增強文件改寫事件中心	zh_TW
dc.subject.keyword	fact-checking,RAGLLMNLIinformation augmentationdocument rewritingevent-centric	en
dc.relation.page	60	-
dc.identifier.doi	10.6342/NTU202600226	-
dc.rights.note	未授權	-
dc.date.accepted	2026-01-27	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	資訊網路與多媒體研究所	-
dc.date.embargo-lift	N/A	-
顯示於系所單位：	資訊網路與多媒體研究所

文件中的檔案：

檔案	大小	格式
ntu-114-1.pdf 未授權公開取用	3.25 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。