Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/101907
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor莊永裕zh_TW
dc.contributor.advisorYung-Yu Chuangen
dc.contributor.author許家銓zh_TW
dc.contributor.authorChia-Chuan Hsuen
dc.date.accessioned2026-03-05T16:39:13Z-
dc.date.available2026-03-06-
dc.date.copyright2026-03-05-
dc.date.issued2026-
dc.date.submitted2026-02-05-
dc.identifier.citationQ. Chen, T. Zhang, C. Wang, X. He, D. Wang, and T. Liu. Attribution analysis meets model editing: Advancing knowledge correction in vision language models with visedit. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 2168–2176, 2025.
S. Cheng, B. Tian, Q. Liu, X. Chen, Y. Wang, H. Chen, and N. Zhang. Can we edit multimodal large language models? In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023.
Y. Du, K. Jiang, Z. Gao, C. Shi, Z. Zheng, S. Qi, and Q. Li. MMKE-bench: A multimodal editing benchmark for diverse visual knowledge. In International Conference on Learning Representations (ICLR), 2025.
Gemini Team. Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context, 2024.
A. Gupta, S. Baskaran, and G. Anumanchipalli. Rebuilding ROME: Resolving model collapse during sequential model editing. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 21738-21744, Miami,Florida, USA,Nov.2024.AssociationforComputationalLinguistics.
H. Huang, H. Zhong, T. Yu, Q. Liu, S. Wu, L. Wang, and T. Tan. VLKEB: A large vision-language model knowledge editing benchmark. Advances in Neural Information Processing Systems, 37:9257–9280, 2024.
Z. Jiang, J. Chen, B. Zhu, T. Luo, Y. Shen, and X. Yang. Devils in middle layers of large vision-language models: Interpreting, detecting and mitigating object hallucinations via attention lens. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 25004–25014, 2025.
K. Kuckreja, M. S. Danish, M. Naseer, A. Das, S. Khan, and F. S. Khan. GeoChat: Grounded large vision-language model for remote sensing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 27831-27840, 2024.
Y. Li, L. Wang, T. Wang, X. Yang, J. Luo, Q. Wang, Y. Deng, W. Wang, X. Sun, H. Li, et al. STAR: A first-ever dataset and a large-scale benchmark for scene graph generation in large-size satellite imagery. IEEE Trans. Pattern Anal. Mach. Intell, 47(3):1832–1849, 2025.
H. Liu, C. Li, Y. Li, and Y. J. Lee. Improved baselines with visual instruction tuning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 26296–26306, June 2024.
J. Luo, Z. Pang, Y. Zhang, T. Wang, L. Wang, B. Dang, J. Lao, J. Wang, J. Chen, Y. Tan, et al. SkySenseGPT: A fine-grained instruction tuning dataset and model for remote sensing vision-language understanding. arXiv preprint arXiv:2406.10100, 2024.
K. Meng, D. Bau, A. Andonian, and Y. Belinkov. Locating and editing factual associations in gpt. Advances in neural information processing systems, 35:17359-17372, 2022.
K. Meng, A. S. Sharma, A. Andonian, Y. Belinkov, and D. Bau. Mass-editing memory in a transformer. arXiv preprint arXiv:2210.07229, 2022.
C. Pang, J. Wu, J. Li, Y. Liu, J. Sun, W. Li, X. Weng, S. Wang, L. Feng, G.-S. Xia, et al. H2rsvlm: Towards helpful and honest remote sensing large vision language model. CoRR, 2024.
Z. Shi, B. Wang, C. Si, Y. Wu, J. Kim, and H. Pfister. DualEdit: Dual editing for knowledge updating in vision-language models. In Conference on Language Modeling (COLM), 2025.
Z. Zeng, L. Gu, X. Yang, Z. Duan, Z. Shi, and M. Wang. Visual-oriented fine-grained knowledge editing for multimodal large language models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2491–2500, 2025.
Y. Zhan, Z. Xiong, and Y. Yuan. SkyEyeGPT: Unifying remote sensing vision-language tasks via instruction tuning with large language model. ISPRS Journal of Photogrammetry and Remote Sensing, 221:64–77, 2025.
J. Zhang, M. Khayatkhoei, P. Chhikara, and F. Ilievski. MLLMs know where to look: Training-free perception of small visual details with multimodal LLMs. In International Conference on Learning Representations (ICLR), 2025. ICLR 2025.
W. Zhang, M. Cai, T. Zhang, Y. Zhuang, and X. Mao. EarthGPT: A universal multimodal large language model for multisensor image comprehension in remote sensing domain. IEEE Transactions on Geoscience and Remote Sensing, 62:1–20, 2024.
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/101907-
dc.description.abstract本論文探討在領域專屬遙測視覺問答(VQA)中,不經重新訓練的關係知識編輯。透過在 STAR 子集上的 attention-ratio 診斷分析,我們指出存在定位—推理的解耦:模型即使能關注到正確區域,仍可能產生帶偏誤的關係標籤。接著,我們將關係推理改寫為純文字情境任務並套用 ROME 式局部更新,揭示共軛干擾(conjugate interference)以及多次編輯順序對穩定性的高度敏感。最後的遷移測試顯示,語言端的語意修正難以可靠轉移到多模態 VQA 推理中,突顯跨模態泛化的關鍵限制。zh_TW
dc.description.abstractThis thesis studies relationship knowledge editing without retraining for domain-specific remote-sensing VQA. Using a curated STAR subset and attention-ratio diagnostics, we show a grounding–reasoning decoupling: models often attend to the correct regions yet still produce biased relationship labels. We then cast relationship reasoning as a text-only scenario task and apply ROME-style localized updates, revealing conjugate interference and strong sensitivity to multi-edit order. Finally, transfer tests indicate that language-side semantic edits do not reliably carry over to multimodal VQA inference, highlighting key limits of cross-modal generalization.en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2026-03-05T16:39:12Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2026-03-05T16:39:13Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontentsVerification Letter from Oral Examination Committee i
Acknowledgements iii
摘要 v
Abstract vii
Contents ix
List of Figures xiii
List of Tables xv
Chapter 1 Introduction 1
1.1 Background and Motivation 1
1.2 Problem Statement 2
1.3 Research Approach 3
1.4 Thesis Contributions 4
1.5 Thesis Organization 5
Chapter 2 Related Work 7
2.1 Remote Sensing Datasets and Relation-Aware RS-VLMs 8
2.2 Grounding Versus Reasoning Decoupling in Multimodal LLMs 9
2.3 Model Editing and Sequential Stability 10
2.4 Multimodal Editing and Cross-Modal Transfer 11
2.5 Positioning of This Thesis 12
Chapter 3 Datasets 15
3.1 Overview of Datasets 15
3.2 Limitations of Directly Using FIT-RS and FIT-RSRC for Diagnosis 16
3.3 Curated STAR Subset for Controlled Analysis 17
Chapter 4 Diagnosis: Grounding vs. Semantics 19
4.1 Diagnostic Framing: Grounding vs. Relationship Semantics 19
4.2 Grounding Analysis via Attention Ratio 20
4.2.1 Formal Definition of Relative Attention 21
4.2.2 Calculation of Entity-Specific Attention Ratio 21
4.2.3 Final Relationship Grounding Metric 22
4.3 Interpretation of Grounding Results 22
4.4 Relationship Accuracy and Bias Analysis 24
4.5 Diagnostic Conclusion 24
Chapter 5 Method 27
5.1 Problem Formulation: Text-Only Relationship Scenarios 27
5.1.1 Text-Only Scenario Construction 27
5.1.2 Query, Answer Space, and Relationship Families 28
5.2 Target Representation and Editing Location 29
5.2.1 Validation via Causal Tracing 30
5.2.2 Target Token and Layer Selection 32
5.3 ROME-Based Editing Procedure 32
5.3.1 Editing as a Rank-One Update 33
5.3.2 Key Representation for Relationship Editing 34
5.3.3 Value Specification via Target Relationship Labels 34
5.3.4 Editing Regime and Scope Clarification 34
5.4 Multi-Case Editing Strategy 35
5.4.1 Sequential Editing per Relationship Family 36
5.4.2 Pre-Edit Evaluation and Conditional Skipping 36
5.4.3 Constraint Recalculation Across Edits 37
5.5 Scope and Methodological Boundaries 38
5.6 Summary 38
Chapter 6 Experiments 39
6.1 Experimental Setup for Text-only Evaluation 39
6.2 Baseline Bias in Text-only Relationship Reasoning 40
6.3 Method Evolution for Multi-case Editing 41
6.4 Single-sided Steering and Conjugate Interference 42
6.5 Editing Order Sensitivity and Interaction-dependent Stability 44
6.6 Replication Across Relationship Families 46
Chapter 7 Transfer: VQA Evaluation 49
7.1 Evaluation Setup for Transfer Analysis 49
7.2 Transfer Results on the Curated Relationship Subset 50
7.3 Transfer Results on FIT-RSRC 52
7.4 Interpretation and Discussion 53
Chapter 8 Conclusion 55
8.1 Conclusion 55
8.2 Limitations and Future Work 56
8.2.1 From Binary Conjugates to Complex Relational Spaces 57
8.2.2 Overcoming Multimodal Inference Inertia 57
8.2.3 Enhancing Sequential Editing Stability 57
References 59
Appendix A —Causal Tracing Results of Other Relationship Families 63
-
dc.language.isoen-
dc.subject模型編輯-
dc.subject多模態學習-
dc.subject遙測影像-
dc.subject視覺問答-
dc.subject跨模態遷移-
dc.subjectModel Editing-
dc.subjectMultimodal Learning-
dc.subjectRemote Sensing-
dc.subjectVisual Question Answering-
dc.subjectCross-Modal Transferability-
dc.title不經重新訓練的關係知識編輯方法:以FIT-RSRC領域專屬視覺問答為例zh_TW
dc.titleEditing Relationship Knowledge Without Retraining: A Case Study on Domain-Specific VQA (FIT-RSRC)en
dc.typeThesis-
dc.date.schoolyear114-1-
dc.description.degree碩士-
dc.contributor.oralexamcommittee吳賦哲;葉正聖zh_TW
dc.contributor.oralexamcommitteeFu-Che Wu;Jeng-Sheng Yehen
dc.subject.keyword模型編輯,多模態學習遙測影像視覺問答跨模態遷移zh_TW
dc.subject.keywordModel Editing,Multimodal LearningRemote SensingVisual Question AnsweringCross-Modal Transferabilityen
dc.relation.page67-
dc.identifier.doi10.6342/NTU202600393-
dc.rights.note未授權-
dc.date.accepted2026-02-08-
dc.contributor.author-college電機資訊學院-
dc.contributor.author-dept資訊工程學系-
dc.date.embargo-liftN/A-
顯示於系所單位:資訊工程學系

文件中的檔案:
檔案 大小格式 
ntu-114-1.pdf
  未授權公開取用
1.73 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved