Skip navigation

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More
DSpace logo
English
中文
  • Browse
    • Communities
      & Collections
    • Publication Year
    • Author
    • Title
    • Subject
    • Advisor
  • Search TDR
  • Rights Q&A
    • My Page
    • Receive email
      updates
    • Edit Profile
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/97913
Title: 基於自我參照引導校準方法提升大型語言模型生成與判斷能力關聯性之研究
Self-Reference-Guided Calibration for Enhancing the Correlation between Generation and Judgment Capabilities in Large Language Models
Authors: 林緯翔
Wei-Hsiang Lin
Advisor: 陳信希
Hsin-Hsi Chen
Keyword: 大型語言模型,大型語言模型作為裁判,
Large Language Model,LLM-as-Judge,
Publication Year : 2025
Degree: 碩士
Abstract: 「大型語言模型作為裁判」的框架在人工智慧評估中日益受到重視,然而關於模型的生成能力與判斷能力之間關係的研究結果卻仍不一致。我們透過系統性的資料集層級與樣本層級分析,針對 11 個模型與 21 種多樣任務,深入探討這一關係。儘管這兩種能力皆依賴於相同的基礎知識,我們的分析顯示它們之間僅存在微弱的相關性,主要原因在於大型語言模型對被評估答案的敏感性。為了解決此問題,我們提出一種自我參照引導的評估策略,利用模型自身的回答作為參考標準。此方法顯著增強了生成能力與判斷能力之間的關聯性,提供了一個實用的途徑來對齊這兩種技能,並因此為評估任務中的模型選擇提供了一個可靠的替代指標。
LLM-as-Judge frameworks are increasingly popular for AI evaluation, yet research findings on the relationship between models' generation and judgment abilities remain inconsistent. We investigate this relationship through systematic dataset- and instance-level analyses across 11 models and 21 diverse tasks. Despite both capabilities relying on the same underlying knowledge, our analyses reveal they are only weakly correlated, primarily due to LLMs' sensitivity to the responses being judged. To address this, we propose a self-reference-guided evaluation strategy that leverages a model's own answers as references. This approach significantly strengthens the correlation between generation and judgment abilities, offering a practical path to align these skills and, as a result, providing a reliable proxy for model selection in evaluation tasks.
URI: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/97913
DOI: 10.6342/NTU202501902
Fulltext Rights: 同意授權(限校園內公開)
metadata.dc.date.embargo-lift: 2030-07-15
Appears in Collections:資訊工程學系

Files in This Item:
File SizeFormat 
ntu-113-2.pdf
  Restricted Access
986.16 kBAdobe PDFView/Open
Show full item record


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved