LAREV：具洩漏感知的推理評估方法——以條件 V-資訊為基礎

李泓賢; Hong-Sian Li

Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/101846

Full metadata record

???org.dspace.app.webui.jsptag.ItemTag.dcfield???	Value	Language
dc.contributor.advisor	許永真	zh_TW
dc.contributor.advisor	Yung-Jen Hsu	en
dc.contributor.author	李泓賢	zh_TW
dc.contributor.author	Hong-Sian Li	en
dc.date.accessioned	2026-03-05T16:07:38Z	-
dc.date.available	2026-03-06	-
dc.date.copyright	2026-03-05	-
dc.date.issued	2026	-
dc.date.submitted	2026-02-03	-
dc.identifier.citation	[1] Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, 35:24824–24837, 2022. [2] Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Tom Griffiths, Yuan Cao, and Karthik Narasimhan. Tree of thoughts: Deliberate problem solving with large language models. Advances in neural information processing systems, 36:11809–11822, 2023. [3] Tao Lei, Regina Barzilay, and Tommi Jaakkola. Rationalizing neural predictions. arXiv preprint arXiv:1606.04155, 2016. [4] Mahdi Dhaini, Juraj Vladika, Ege Erdogan, Zineb Attaoui, and Gjergji Kasneci. Can llm-generated textual explanations enhance model classification performance? an empirical study. In International Conference on Artificial Neural Networks, pages 192–204. Springer, 2025. [5] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ”why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016. [6] Nazneen Fatema Rajani, Bryan McCann, Caiming Xiong, and Richard Socher. Explain yourself! leveraging language models for commonsense reasoning. arXiv preprint arXiv:1906.02361, 2019. [7] Faeze Brahman, Vered Shwartz, Rachel Rudinger, and Yejin Choi. Learning to rationalize for nonmonotonic reasoning with distant supervision. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 12592–12601, 2021. [8] Sarah Wiegreffe and Ana Marasović. Teach me to explain: A review of datasets for explainable natural language processing. arXiv preprint arXiv:2102.12060, 2021. [9] Yonatan Belinkov and James Glass. Analysis methods in neural language processing: A survey. Transactions of the Association for Computational Linguistics, 7:49–72, 2019. [10] Sawan Kumar and Partha Talukdar. Nile: Natural language inference with faithful natural language explanations. arXiv preprint arXiv:2005.12116, 2020. [11] Emily M Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency, pages 610–623, 2021. [12] Oana-Maria Camburu, Tim Rocktäschel, Thomas Lukasiewicz, and Phil Blunsom. e-snli: Natural language inference with natural language explanations. Advances in Neural Information Processing Systems, 31, 2018. [13] Miruna Clinciu, Arash Eshghi, and Helen Hastie. A study of automatic metrics for the evaluation of natural language explanations. arXiv preprint arXiv:2103.08545, 2021. [14] Hanjie Chen, Faeze Brahman, Xiang Ren, Yangfeng Ji, Yejin Choi, and Swabha Swayamdipta. Rev: Information-theoretic evaluation of free-text rationales. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2007–2030, 2023. [15] Ivan Aslanov and Ernesto Guerra. Tautological formal explanations: does prior knowledge affect their satisfiability? Frontiers in Psychology, 14:1258985, 2023. [16] Peter Hase, Shiyue Zhang, Harry Xie, and Mohit Bansal. Leakage-adjusted simulatability: Can models generate non-trivial explanations of their behavior in natural language? arXiv preprint arXiv:2010.04119, 2020. [17] Zheng Ping Jiang, Yining Lu, Hanjie Chen, Daniel Khashabi, Benjamin Van Durme, and Anqi Liu. Rora: Robust free-text rationale evaluation. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1070–1087, 2024. [18] Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric Xing, et al. Judging llm-as-ajudge with mt-bench and chatbot arena. Advances in neural information processing systems, 36:46595–46623, 2023. [19] Ziang Li, Manasi Ganti, Zixian Ma, Helena Vasconcelos, Qijia He, and Ranjay Krishna. Rethinking human preference evaluation of llm rationales. arXiv preprint arXiv:2509.11026, 2025. [20] Olga Golovneva, Moya Chen, Spencer Poff, Martin Corredor, Luke Zettlemoyer, Maryam Fazel-Zarandi, and Asli Celikyilmaz. Roscoe: A suite of metrics for scoring step-by-step reasoning. arXiv preprint arXiv:2212.07919, 2022. [21] Archiki Prasad, Swarnadeep Saha, Xiang Zhou, and Mohit Bansal. Receval: Evaluating reasoning chains via correctness and informativeness. arXiv preprint arXiv:2304.10703, 2023. [22] Sahana Ramnath, Brihi Joshi, Skyler Hallinan, Ximing Lu, Liunian Harold Li, Aaron Chan, Jack Hessel, Yejin Choi, and Xiang Ren. Tailoring self-rationalizers with multi-reward distillation. arXiv preprint arXiv:2311.02805, 2023. [23] Sarah Wiegreffe, Ana Marasović, and Noah A Smith. Measuring association between labels and free-text rationales. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 10266–10284, 2021. [24] Kawin Ethayarajh, Yejin Choi, and Swabha Swayamdipta. Understanding dataset difficulty with V-usable information. In International Conference on Machine Learning, pages 5988–6008. PMLR, 2022. [25] Yilun Xu, Shengjia Zhao, Jiaming Song, Russell Stewart, and Stefano Ermon. A theory of usable information under computational constraints. arXiv preprint arXiv:2002.10689, 2020. [26] John Hewitt, Kawin Ethayarajh, Percy Liang, and Christopher D Manning. Conditional probing: measuring usable information beyond a baseline. arXiv preprint arXiv:2109.09234, 2021. [27] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. [28] Martin Arjovsky, Léon Bottou, Ishaan Gulrajani, and David Lopez-Paz. Invariant risk minimization. arXiv preprint arXiv:1907.02893, 2019. [29] Dorottya Demszky, Kelvin Guu, and Percy Liang. Transforming question answering datasets into natural language inference datasets. arXiv preprint arXiv:1809.02922, 2018. [30] Jifan Chen, Eunsol Choi, and Greg Durrett. Can nli models verify qa systems’ predictions? arXiv preprint arXiv:2104.08731, 2021. [31] Shourya Aggarwal, Divyanshu Mandowara, Vishwajeet Agrawal, Dinesh Khandelwal, Parag Singla, and Dinesh Garg. Explanations for commonsenseqa: New dataset and models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 3050–3065, 2021. [32] Alon Talmor, Jonathan Herzig, Nicholas Lourie, and Jonathan Berant. Commonsenseqa: A question answering challenge targeting commonsense knowledge. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4149–4158, 2019. [33] Samuel Bowman, Gabor Angeli, Christopher Potts, and Christopher D Manning. A large annotated corpus for learning natural language inference. In Proceedings of the 2015 conference on empirical methods in natural language processing, pages 632–642, 2015.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/101846	-
dc.description.abstract	隨著大型語言模型在各類推理任務中廣泛採用，模型所產生之自然語言推理（rationale）已成為輔助決策與提升可解釋性的重要工具。然而，如何客觀評估推理文本本身是否真正提供額外、且與標籤相關的有用資訊，仍是一項具挑戰性的研究問題。近期研究提出以條件 V-資訊（Conditional V-information）為基礎的推理評估方法，在給定基準輸入（baseline input）的條件下，比較模型在有無推理文本時的預測行為，以估計推理相對於基準所帶來的可用資訊量。然而，在實務資料集中，此類基準輸入本身往往已包含高度的標籤洩漏（label leakage），使得評估模型可能過度依賴基準訊號，進而錯估推理品質，降低評估結果的可靠性。本論文提出 LAREV（Leakage-Aware Rationale Evaluation with Conditional V-information），一個具洩漏感知能力的推理評估框架，旨在提升條件式推理評估在基準輸入有洩漏的情境下之穩健性。LAREV 在不改變原有評估指標形式的前提下，透過兩項關鍵設計約束評估模型的學習行為：其一，利用不變風險最小化（Invariant Risk Minimization, IRM）降低模型對基準輸入中特定線索的依賴；其二，引入洩漏探測模型以量化並懲罰評估模型從基準輸入中提取標籤相關資訊的能力，從而引導模型將預測改善歸因於推理文本本身。實驗結果顯示，在 ECQA 與 e-SNLI 等具代表性的推理資料集中，LAREV 能有效拉大高品質推理與低品質推理之間的評估差距，並展現出較既有方法更穩定且具辨識力的排序行為。此外，分析結果亦顯示，LAREV 在降低由基準輸入中洩漏的依賴性之同時，仍能維持具競爭力的預測表現，驗證其作為一種可靠推理評估框架的實用性。	zh_TW
dc.description.abstract	With the widespread adoption of large language models in reasoning tasks, natural language rationales generated by models have become an important tool for supporting decision-making and improving interpretability. However, objectively evaluating whether a rationale truly provides additional, label-relevant information remains a challenging research problem. Recent work has proposed rationale evaluation methods based on Conditional V-information, which estimate the usable information contributed by a rationale by comparing model predictions with and without access to the rationale, conditioned on a given baseline input. In practice, however, such baseline inputs often already contain substantial label leakage, causing evaluation models to over-rely on baseline signals and systematically misestimate rationale quality, thereby undermining the reliability of evaluation results. In this thesis, we propose LAREV (Leakage-Aware Rationale Evaluation with Conditional V-information), a leakage-aware rationale evaluation framework designed to improve the robustness of baseline-conditioned evaluation under baseline leakage. Without altering the original evaluation metric, LAREV introduces two key training constraints on the evaluator model. First, it applies Invariant Risk Minimization (IRM) to reduce the model's reliance on spurious, baseline-specific cues. Second, it incorporates a leakage probing model to quantify and penalize the evaluator's ability to extract label-relevant information from the baseline input, thereby encouraging the model to attribute predictive improvements to the rationale itself. Experimental results on representative reasoning benchmarks, including ECQA and e-SNLI, demonstrate that LAREV effectively enlarges the evaluation gap between high-quality and low-quality rationales, and exhibits more stable and discriminative ranking behavior than existing methods. Further analysis shows that LAREV reduces dependence on leaked baseline information while maintaining competitive predictive performance, validating its effectiveness as a reliable framework for rationale evaluation.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2026-03-05T16:07:38Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2026-03-05T16:07:38Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	口試委員審定書 i Acknowledgements ii 摘要 iii Abstract iv Contents vi List of Figures ix List of Tables x Denotation xii Chapter 1 Introduction 1 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Proposed Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4 Outline of Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Chapter 2 Related Work 7 2.1 Preference-Based Evaluation and LLM-as-a-Judge . . . . . . . . . . 7 2.2 Rationale-Specific Automatic Metrics for Reasoning Chains . . . . . 8 2.3 Behavior-Based Rationale Evaluation . . . . . . . . . . . . . . . . . 9 2.4 Information-Theoretic Rationale Evaluation . . . . . . . . . . . . . . 9 Chapter 3 Problem Definition 11 3.1 Baseline-Conditioned Evaluation via Conditional V-information . . . 11 3.2 Formal Setup (REV-Specific) . . . . . . . . . . . . . . . . . . . . . 12 3.3 Failure Mode: Label Leakage in Baseline Inputs . . . . . . . . . . . 13 3.4 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Chapter 4 Methodology 15 4.1 Overview of the LAREV Framework . . . . . . . . . . . . . . . . . 15 4.2 Background and Preliminaries . . . . . . . . . . . . . . . . . . . . . 16 4.2.1 REV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 4.2.2 Integrated Gradients . . . . . . . . . . . . . . . . . . . . . . . 17 4.2.3 Invariant Risk Minimization . . . . . . . . . . . . . . . . . . . . . 18 4.3 Design intuition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4.4 Baseline Input Construction . . . . . . . . . . . . . . . . . . . . . . 19 4.5 Leakage-Aware Training of the Regular Model . . . . . . . . . . . . 21 4.5.1 Model Roles and Training Setup . . . . . . . . . . . . . . . . . . . 21 4.5.2 IRM-Based Control of Baseline Leakage . . . . . . . . . . . . . . . 22 4.5.3 Leakage Probe for Residual Baseline Leakage . . . . . . . . . . . . 25 4.5.4 Leakage Probe Objective . . . . . . . . . . . . . . . . . . . . . . . 28 4.6 Combined Leakage-Aware Training Objective . . . . . . . . . . . . 29 4.6.1 Unified Training Objective . . . . . . . . . . . . . . . . . . . . . . 29 4.6.2 Complementary Roles of IRM and Leakage Probing . . . . . . . . . 30 Chapter 5 Experiments 32 5.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 5.1.1 Open-Label Dataset: ECQA . . . . . . . . . . . . . . . . . . . . . 32 5.1.2 Fixed-Label Dataset: e-SNLI . . . . . . . . . . . . . . . . . . . . . 33 5.1.3 Dataset Splits and Statistics . . . . . . . . . . . . . . . . . . . . . . 33 5.2 Evaluation Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 5.2.1 Rationale Variants . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 5.3 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 5.3.1 Model Roles and Training Pipeline . . . . . . . . . . . . . . . . . . 37 5.3.2 IRM Environment Instantiation . . . . . . . . . . . . . . . . . . . . 38 5.3.3 Implementation Details and Hyperparameters . . . . . . . . . . . . 39 5.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 5.4.1 Rationale Evaluation on ECQA . . . . . . . . . . . . . . . . . . . . 40 5.4.2 Component-wise Analysis of Regularization Effects . . . . . . . . . 43 5.4.3 Generalization of the Leakage-Aware Evaluator . . . . . . . . . . . 44 5.4.4 Hyperparameter Sensitivity . . . . . . . . . . . . . . . . . . . . . . 46 5.4.5 Robustness across Task Models . . . . . . . . . . . . . . . . . . . . 47 5.4.6 Rationale Evaluation on e-SNLI . . . . . . . . . . . . . . . . . . . 48 Chapter 6 Conclusion 51 6.1 Summary of Contributions . . . . . . . . . . . . . . . . . . . . . . . 51 6.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 6.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 References 54 Appendix A — Experimental Pipeline 59 Appendix B — Results with BART-large Backbone 61 B.1 ECQA Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 B.2 e-SNLI Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 Appendix C — Additional Illustrative Examples 65 C.1 Rationale Variants in e-SNLI . . . . . . . . . . . . . . . . . . . . . . 65 C.2 LLM Prompt for Antonym Environment Instantiation . . . . . . . . . 66 C.3 Prompt for Rationale Generation from Task Models . . . . . . . . . . 66	-
dc.language.iso	en	-
dc.subject	推理評估	-
dc.subject	標籤洩漏	-
dc.subject	洩漏感知	-
dc.subject	資訊理論	-
dc.subject	條件 V-資訊	-
dc.subject	大型語言模型	-
dc.subject	Rationale Evaluation	-
dc.subject	Label Leakage	-
dc.subject	Leakage-Aware Evaluation	-
dc.subject	Information Theory	-
dc.subject	Conditional V-Information	-
dc.subject	Large Language Models	-
dc.title	LAREV：具洩漏感知的推理評估方法——以條件 V-資訊為基礎	zh_TW
dc.title	LAREV: Leakage Aware Rationale Evaluation with Conditional V-information	en
dc.type	Thesis	-
dc.date.schoolyear	114-1	-
dc.description.degree	碩士	-
dc.contributor.coadvisor	陳縕儂	zh_TW
dc.contributor.coadvisor	Yun-Nung Chen	en
dc.contributor.oralexamcommittee	陳信希;林英嘉	zh_TW
dc.contributor.oralexamcommittee	Hsin-Hsi Chen;Ying-Jia Lin	en
dc.subject.keyword	推理評估,標籤洩漏洩漏感知資訊理論條件 V-資訊大型語言模型	zh_TW
dc.subject.keyword	Rationale Evaluation,Label LeakageLeakage-Aware EvaluationInformation TheoryConditional V-InformationLarge Language Models	en
dc.relation.page	66	-
dc.identifier.doi	10.6342/NTU202600527	-
dc.rights.note	同意授權(全球公開)	-
dc.date.accepted	2026-02-05	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	資訊網路與多媒體研究所	-
dc.date.embargo-lift	2026-03-06	-
Appears in Collections:	資訊網路與多媒體研究所

Files in This Item:

File	Size	Format
ntu-114-1.pdf	1.52 MB	Adobe PDF	View/Open

Show simple item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets