Skip navigation

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More
DSpace logo
English
中文
  • Browse
    • Communities
      & Collections
    • Publication Year
    • Author
    • Title
    • Subject
    • Advisor
  • Search TDR
  • Rights Q&A
    • My Page
    • Receive email
      updates
    • Edit Profile
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98332
Title: 基於知識診斷之動態大型語言模型編輯框架
An Adaptive Editing Framework for Large Language Models based on Knowledge Diagnostics
Authors: 潘淙軒
Tsung-Hsuan Pan
Advisor: 陳信希
Hsin-Hsi Chen
Keyword: 大型語言模型,模型編輯,
Large Language Model,Model Editing,
Publication Year : 2025
Degree: 碩士
Abstract: 從頭開始訓練大型語言模型(LLMs)是一項昂貴的工作,特別是在世界知識不斷演變的背景下。為了保持 LLMs 的相關性與準確性,模型編輯已成為一個關鍵的研究領域。然而,儘管這些方法充滿潛力,但其成效與副作用背後的潛在因素,特別是知識本身的特性如何影響編輯結果,仍然大多未被探討。本論文對此進行了系統性的研究,深入分析了知識的三個核心維度——流行度、模型對其的熟悉度 以及問題類型 ——如何影響模型編輯的成功率與穩定性。我們的實驗結果一致地表明,編輯「流行」或模型「未知」的知識,成功率顯著更高;而不同的問題類型,尤其是「Why」與「Which」之間,也表現出截然不同的編輯難度與副作用程度。此外,本論文揭示了一個在傳統問答評測中無法發現的、全新的副作用,我們將其命名為「生成性失語症」。此現象表現為,即使模型在事實層面已成功編輯,其長文本生成能力卻可能嚴重受損,出現內容不連貫或重複等問題。這一發現證明了現有評估方法的不足之處。基於上述洞見,本論文進一步提出並驗證了一個「知識診斷編輯框架」,該框架能根據知識的特性,智能地調整編輯策略,從而有效提升困難案例的編輯成功率。總體而言,本研究不僅為模型編輯的成敗提供了系統性的解釋,也為未來發展更可靠、更安全的編輯技術與評估標準,指出了明確的方向。
Training large language models (LLMs) from scratch is an expensive endeavor, particularly as world knowledge continually evolves. To maintain the relevance and accuracy of LLMs, model editing has emerged as a pivotal research area. While these methods hold promise, they can also produce unintended side effects, and the influence of the target knowledge's characteristics on editing outcomes remains underexplored.

This thesis delves into these critical factors by systematically analyzing how the nature of knowledge---specifically its popularity, the model's prior familiarity with it, and the question type used to elicit it---affects the success and stability of model editing. Using the RealTimeQA dataset with representative editing methods on LLaMA-3 models, our experiments reveal several key findings. We demonstrate that edits involving popular or previously unknown knowledge are significantly more successful, and that question type has a profound impact on both editing efficacy and the severity of side effects .

Furthermore, while state-of-the-art methods like AlphaEdit exhibit perfect stability on standard question-answering benchmarks, we identify a novel and critical failure mode in more complex, long-form generative tasks, which we term "Generative Aphasia" . This phenomenon, characterized by disruptions in coherence and fluency despite accurate factual recall, reveals that conventional evaluation metrics are insufficient.

Based on our findings, we propose and validate a Knowledge-Diagnostic framework that adapts the editing strategy to the difficulty of the target knowledge, significantly improving performance in challenging cases. Our work underscores the necessity of a more nuanced, context-aware approach to model editing and calls for the adoption of more comprehensive evaluation protocols that include generative tasks to ensure truly robust and reliable knowledge integration.
URI: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98332
DOI: 10.6342/NTU202502519
Fulltext Rights: 未授權
metadata.dc.date.embargo-lift: N/A
Appears in Collections:資訊工程學系

Files in This Item:
File SizeFormat 
ntu-113-2.pdf
  Restricted Access
1.76 MBAdobe PDF
Show full item record


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved