請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/101907完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 莊永裕 | zh_TW |
| dc.contributor.advisor | Yung-Yu Chuang | en |
| dc.contributor.author | 許家銓 | zh_TW |
| dc.contributor.author | Chia-Chuan Hsu | en |
| dc.date.accessioned | 2026-03-05T16:39:13Z | - |
| dc.date.available | 2026-03-06 | - |
| dc.date.copyright | 2026-03-05 | - |
| dc.date.issued | 2026 | - |
| dc.date.submitted | 2026-02-05 | - |
| dc.identifier.citation | Q. Chen, T. Zhang, C. Wang, X. He, D. Wang, and T. Liu. Attribution analysis meets model editing: Advancing knowledge correction in vision language models with visedit. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 2168–2176, 2025.
S. Cheng, B. Tian, Q. Liu, X. Chen, Y. Wang, H. Chen, and N. Zhang. Can we edit multimodal large language models? In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023. Y. Du, K. Jiang, Z. Gao, C. Shi, Z. Zheng, S. Qi, and Q. Li. MMKE-bench: A multimodal editing benchmark for diverse visual knowledge. In International Conference on Learning Representations (ICLR), 2025. Gemini Team. Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context, 2024. A. Gupta, S. Baskaran, and G. Anumanchipalli. Rebuilding ROME: Resolving model collapse during sequential model editing. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 21738-21744, Miami,Florida, USA,Nov.2024.AssociationforComputationalLinguistics. H. Huang, H. Zhong, T. Yu, Q. Liu, S. Wu, L. Wang, and T. Tan. VLKEB: A large vision-language model knowledge editing benchmark. Advances in Neural Information Processing Systems, 37:9257–9280, 2024. Z. Jiang, J. Chen, B. Zhu, T. Luo, Y. Shen, and X. Yang. Devils in middle layers of large vision-language models: Interpreting, detecting and mitigating object hallucinations via attention lens. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 25004–25014, 2025. K. Kuckreja, M. S. Danish, M. Naseer, A. Das, S. Khan, and F. S. Khan. GeoChat: Grounded large vision-language model for remote sensing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 27831-27840, 2024. Y. Li, L. Wang, T. Wang, X. Yang, J. Luo, Q. Wang, Y. Deng, W. Wang, X. Sun, H. Li, et al. STAR: A first-ever dataset and a large-scale benchmark for scene graph generation in large-size satellite imagery. IEEE Trans. Pattern Anal. Mach. Intell, 47(3):1832–1849, 2025. H. Liu, C. Li, Y. Li, and Y. J. Lee. Improved baselines with visual instruction tuning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 26296–26306, June 2024. J. Luo, Z. Pang, Y. Zhang, T. Wang, L. Wang, B. Dang, J. Lao, J. Wang, J. Chen, Y. Tan, et al. SkySenseGPT: A fine-grained instruction tuning dataset and model for remote sensing vision-language understanding. arXiv preprint arXiv:2406.10100, 2024. K. Meng, D. Bau, A. Andonian, and Y. Belinkov. Locating and editing factual associations in gpt. Advances in neural information processing systems, 35:17359-17372, 2022. K. Meng, A. S. Sharma, A. Andonian, Y. Belinkov, and D. Bau. Mass-editing memory in a transformer. arXiv preprint arXiv:2210.07229, 2022. C. Pang, J. Wu, J. Li, Y. Liu, J. Sun, W. Li, X. Weng, S. Wang, L. Feng, G.-S. Xia, et al. H2rsvlm: Towards helpful and honest remote sensing large vision language model. CoRR, 2024. Z. Shi, B. Wang, C. Si, Y. Wu, J. Kim, and H. Pfister. DualEdit: Dual editing for knowledge updating in vision-language models. In Conference on Language Modeling (COLM), 2025. Z. Zeng, L. Gu, X. Yang, Z. Duan, Z. Shi, and M. Wang. Visual-oriented fine-grained knowledge editing for multimodal large language models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2491–2500, 2025. Y. Zhan, Z. Xiong, and Y. Yuan. SkyEyeGPT: Unifying remote sensing vision-language tasks via instruction tuning with large language model. ISPRS Journal of Photogrammetry and Remote Sensing, 221:64–77, 2025. J. Zhang, M. Khayatkhoei, P. Chhikara, and F. Ilievski. MLLMs know where to look: Training-free perception of small visual details with multimodal LLMs. In International Conference on Learning Representations (ICLR), 2025. ICLR 2025. W. Zhang, M. Cai, T. Zhang, Y. Zhuang, and X. Mao. EarthGPT: A universal multimodal large language model for multisensor image comprehension in remote sensing domain. IEEE Transactions on Geoscience and Remote Sensing, 62:1–20, 2024. | - |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/101907 | - |
| dc.description.abstract | 本論文探討在領域專屬遙測視覺問答(VQA)中,不經重新訓練的關係知識編輯。透過在 STAR 子集上的 attention-ratio 診斷分析,我們指出存在定位—推理的解耦:模型即使能關注到正確區域,仍可能產生帶偏誤的關係標籤。接著,我們將關係推理改寫為純文字情境任務並套用 ROME 式局部更新,揭示共軛干擾(conjugate interference)以及多次編輯順序對穩定性的高度敏感。最後的遷移測試顯示,語言端的語意修正難以可靠轉移到多模態 VQA 推理中,突顯跨模態泛化的關鍵限制。 | zh_TW |
| dc.description.abstract | This thesis studies relationship knowledge editing without retraining for domain-specific remote-sensing VQA. Using a curated STAR subset and attention-ratio diagnostics, we show a grounding–reasoning decoupling: models often attend to the correct regions yet still produce biased relationship labels. We then cast relationship reasoning as a text-only scenario task and apply ROME-style localized updates, revealing conjugate interference and strong sensitivity to multi-edit order. Finally, transfer tests indicate that language-side semantic edits do not reliably carry over to multimodal VQA inference, highlighting key limits of cross-modal generalization. | en |
| dc.description.provenance | Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2026-03-05T16:39:12Z No. of bitstreams: 0 | en |
| dc.description.provenance | Made available in DSpace on 2026-03-05T16:39:13Z (GMT). No. of bitstreams: 0 | en |
| dc.description.tableofcontents | Verification Letter from Oral Examination Committee i
Acknowledgements iii 摘要 v Abstract vii Contents ix List of Figures xiii List of Tables xv Chapter 1 Introduction 1 1.1 Background and Motivation 1 1.2 Problem Statement 2 1.3 Research Approach 3 1.4 Thesis Contributions 4 1.5 Thesis Organization 5 Chapter 2 Related Work 7 2.1 Remote Sensing Datasets and Relation-Aware RS-VLMs 8 2.2 Grounding Versus Reasoning Decoupling in Multimodal LLMs 9 2.3 Model Editing and Sequential Stability 10 2.4 Multimodal Editing and Cross-Modal Transfer 11 2.5 Positioning of This Thesis 12 Chapter 3 Datasets 15 3.1 Overview of Datasets 15 3.2 Limitations of Directly Using FIT-RS and FIT-RSRC for Diagnosis 16 3.3 Curated STAR Subset for Controlled Analysis 17 Chapter 4 Diagnosis: Grounding vs. Semantics 19 4.1 Diagnostic Framing: Grounding vs. Relationship Semantics 19 4.2 Grounding Analysis via Attention Ratio 20 4.2.1 Formal Definition of Relative Attention 21 4.2.2 Calculation of Entity-Specific Attention Ratio 21 4.2.3 Final Relationship Grounding Metric 22 4.3 Interpretation of Grounding Results 22 4.4 Relationship Accuracy and Bias Analysis 24 4.5 Diagnostic Conclusion 24 Chapter 5 Method 27 5.1 Problem Formulation: Text-Only Relationship Scenarios 27 5.1.1 Text-Only Scenario Construction 27 5.1.2 Query, Answer Space, and Relationship Families 28 5.2 Target Representation and Editing Location 29 5.2.1 Validation via Causal Tracing 30 5.2.2 Target Token and Layer Selection 32 5.3 ROME-Based Editing Procedure 32 5.3.1 Editing as a Rank-One Update 33 5.3.2 Key Representation for Relationship Editing 34 5.3.3 Value Specification via Target Relationship Labels 34 5.3.4 Editing Regime and Scope Clarification 34 5.4 Multi-Case Editing Strategy 35 5.4.1 Sequential Editing per Relationship Family 36 5.4.2 Pre-Edit Evaluation and Conditional Skipping 36 5.4.3 Constraint Recalculation Across Edits 37 5.5 Scope and Methodological Boundaries 38 5.6 Summary 38 Chapter 6 Experiments 39 6.1 Experimental Setup for Text-only Evaluation 39 6.2 Baseline Bias in Text-only Relationship Reasoning 40 6.3 Method Evolution for Multi-case Editing 41 6.4 Single-sided Steering and Conjugate Interference 42 6.5 Editing Order Sensitivity and Interaction-dependent Stability 44 6.6 Replication Across Relationship Families 46 Chapter 7 Transfer: VQA Evaluation 49 7.1 Evaluation Setup for Transfer Analysis 49 7.2 Transfer Results on the Curated Relationship Subset 50 7.3 Transfer Results on FIT-RSRC 52 7.4 Interpretation and Discussion 53 Chapter 8 Conclusion 55 8.1 Conclusion 55 8.2 Limitations and Future Work 56 8.2.1 From Binary Conjugates to Complex Relational Spaces 57 8.2.2 Overcoming Multimodal Inference Inertia 57 8.2.3 Enhancing Sequential Editing Stability 57 References 59 Appendix A —Causal Tracing Results of Other Relationship Families 63 | - |
| dc.language.iso | en | - |
| dc.subject | 模型編輯 | - |
| dc.subject | 多模態學習 | - |
| dc.subject | 遙測影像 | - |
| dc.subject | 視覺問答 | - |
| dc.subject | 跨模態遷移 | - |
| dc.subject | Model Editing | - |
| dc.subject | Multimodal Learning | - |
| dc.subject | Remote Sensing | - |
| dc.subject | Visual Question Answering | - |
| dc.subject | Cross-Modal Transferability | - |
| dc.title | 不經重新訓練的關係知識編輯方法:以FIT-RSRC領域專屬視覺問答為例 | zh_TW |
| dc.title | Editing Relationship Knowledge Without Retraining: A Case Study on Domain-Specific VQA (FIT-RSRC) | en |
| dc.type | Thesis | - |
| dc.date.schoolyear | 114-1 | - |
| dc.description.degree | 碩士 | - |
| dc.contributor.oralexamcommittee | 吳賦哲;葉正聖 | zh_TW |
| dc.contributor.oralexamcommittee | Fu-Che Wu;Jeng-Sheng Yeh | en |
| dc.subject.keyword | 模型編輯,多模態學習遙測影像視覺問答跨模態遷移 | zh_TW |
| dc.subject.keyword | Model Editing,Multimodal LearningRemote SensingVisual Question AnsweringCross-Modal Transferability | en |
| dc.relation.page | 67 | - |
| dc.identifier.doi | 10.6342/NTU202600393 | - |
| dc.rights.note | 未授權 | - |
| dc.date.accepted | 2026-02-08 | - |
| dc.contributor.author-college | 電機資訊學院 | - |
| dc.contributor.author-dept | 資訊工程學系 | - |
| dc.date.embargo-lift | N/A | - |
| 顯示於系所單位: | 資訊工程學系 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-114-1.pdf 未授權公開取用 | 1.73 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
