DualRobust：針對指令驅動影像編輯的雙重穩健浮水印技術

楊旻勳; Min-Shiun Yang

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/101472

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	丁建均	zh_TW
dc.contributor.advisor	Jian-Jiun Ding	en
dc.contributor.author	楊旻勳	zh_TW
dc.contributor.author	Min-Shiun Yang	en
dc.date.accessioned	2026-02-03T16:32:51Z	-
dc.date.available	2026-02-04	-
dc.date.copyright	2026-02-03	-
dc.date.issued	2026	-
dc.date.submitted	2026-01-23	-
dc.identifier.citation	[1] T. Brooks, A. Holynski, and A. A. Efros, “InstructPix2Pix: Learning to follow image editing instructions,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, Canada, 2023, pp. 18392–18402. doi:10.1109/CVPR52688.2023.01819 [2] R. Hu, J. Zhang, T. Xu, J. Li, and T. Zhang, “Robust-Wide: Robust watermarking against instruction-driven image editing,” in European Conference on Computer Vision (ECCV), Milan, Italy, 2024, pp. 20–37. doi:10.1007/978-3-031-29262-0_2. [3] J. Zhu, R. Kaplan, J. Johnson, and F. F. Li, “HiDDeN: Hiding data with deep networks,” in European Conference on Computer Vision (ECCV), Munich, Germany, 2018, pp. 657–672. doi:10.1007/978-3-030-01225-0_37 [4] M. Tancik, B. Mildenhall, and R. Ng, “StegaStamp: Invisible hyperlinks in physical photographs,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2020, pp. 2117–2126. doi:10.1109/CVPR42600.2020.00222 [5] H. Mareen, L. Antchougov, G. van Wallendael, and P. Lambert, “Blind deep-learning-based image watermarking robust against geometric transformations,” in IEEE International Conference on Consumer Electronics (ICCE), pp. 1-2, Las Vegas, NV, USA, 2024. doi:10.1109/ICCE59016.2024.10444317 [6] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), Munich, Germany, 2015, pp. 234–241. doi:10.1007/978-3-319-24574-4_28 [7] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 770–778. doi:10.1109/CVPR.2016.90 [8] F. Yu and V. Koltun, “Multi-scale context aggregation by dilated convolutions,” in International Conference on Learning Representations (ICLR), San Juan, Puerto Rico, 2016. [9] D. P. Kingma and M. Welling, “Auto-encoding variational Bayes,” in International Conference on Learning Representations (ICLR), Banff, Canada, 2014. [10] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” in Advances in Neural Information Processing Systems (NeurIPS), Lake Tahoe, NV, USA, 2012, vol. 25, pp. 1097–1105. [11] T. Akiba, S. Sano, T. Yanase, T. Ohta, and M. Koyama, “Optuna: A next-generation hyperparameter optimization framework,” in ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Anchorage, AK, USA, 2019, pp. 2623–2631. doi:10.1145/ 3292500.3330701 [12] S. Lu, Z. Zhou, J. Lu, Y. Zhu, and A. W. K. Kong, “Robust watermarking using generative priors against image editing: From benchmarking to advances,” in International Conference on Learning Representations (ICLR), 2025. (arXiv preprint: arXiv:2410.18775). [13] H. Fu, Y. Luo, K. Qiao, M. Zhang, and J. Zhu, “ShallowDiffuse: Robust and invisible watermarking through low-dimensional subspaces in diffusion models,” in Advances in Neural Information Processing Systems (NeurIPS), 2025. [14] X. Zhang, R. Li, J. Yu, Y. Xu, W. Li, and J. Zhang, “EditGuard: Versatile image watermarking for tamper localization and copyright protection,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2024, pp. 11964–11974. doi:10.1109/ CVPR56392.2024.01178 [15] J. Bergstra, R. Bardenet, Y. Bengio, and B. Kégl, “Algorithms for hyper-parameter optimization,” in Advances in Neural Information Processing Systems (NeurIPS), Granada, Spain, 2011, vol. 24, pp. 2546–2554. [16] I. Loshchilov and F. Hutter, “SGDR: Stochastic gradient descent with warm restarts,” in Proc. International Conference on Learning Representations (ICLR), Toulon, France, 2017. [17] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: From error visibility to structural similarity,” IEEE Trans. Image Processing, vol. 13, no. 4, pp. 600–612, Apr. 2004. doi:10.1109/TIP.2003.819861 [18] R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 2018, pp. 586–595. doi:10.1109/ CVPR.2018.00068 [19] A. Hore and D. Ziou, “Image quality metrics: PSNR vs. SSIM,” in International Conference on Pattern Recognition (ICPR), Istanbul, Turkey, 2010, pp. 2366–2369. doi:10.1109/ICPR.2010.579 [20] K. Yang, J. Y. Koh, W. Yu, D. Fried, R. Krishna, and D. Klein, “MagicBrush: A manually annotated dataset for instruction-guided image editing,” in Advances in Neural Information Processing Systems (NeurIPS), New Orleans, LA, USA, 2023. [21] S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Trans. Knowledge and Data Engineering, vol. 22, no. 10, pp. 1345–1355, Oct. 2010. doi:10.1109/TKDE.2009.191. [22] MBRS: Z. Jia, H. Fang, and W. Zhang, "MBRS: Enhancing robustness of DNN-based watermarking by mini-batch of real and simulated JPEG compression," in Proc. ACM MM, 2021. [23] RoSteALS: M. Buehler, K. Peterson, et al., "RoSteALS: Robust Steganography using Autoencoder Latent Space," in CVPR Workshops, 2023. [24] Stable Signature: P. Fernandez et al., "The Stable Signature: Rooting Watermarks in Latent Diffusion Models," in ICCV, 2023. [25] LSB: R. G. van Schyndel, A. Z. Tirkel, and C. F. Osborne, "A digital watermark," in Proc. ICIP, 1994. [26] Spread Spectrum (DCT): I. J. Cox, J. Kilian, F. T. Leighton, and T. Shamoon, "Secure spread spectrum watermarking for multimedia," IEEE Transactions on Image Processing, vol. 6, no. 12, 1997. [27] DWT: X.-G. Xia, C. G. Boncelet, and G. R. Arce, "Wavelet transform based watermark for digital images," Optics Express, vol. 2, no. 12, 1997. [28] DFT/Geometric: J. J. K. O. Ruanaidh and T. Pun, "Rotation, scale and translation invariant spread spectrum digital image watermarking," Signal Processing, vol. 66, no. 3, 1998. [29] SVD/DWT Hybrid: R. Liu and T. Tan, "An SVD-based watermarking scheme for protecting rightful ownership," IEEE Transactions on Multimedia, 2002. [30] HiNet (INN): J. Jing et al., "HiNet: Deep Image Hiding by Invertible Network," in ICCV, 2021. [31] DeepMIH (INN): X. Guan et al., "DeepMIH: Deep Invertible Network for Multiple Image Hiding," in TPAMI, 2022. [32] CIN (RL Optimization): J. Zhu et al., "Learning to watermarking with reinforcement learning," in ACM Multimedia, 2019. [33] Tree-Ring: Y. Wen et al., "Tree-Ring Watermarks: Fingerprints for Diffusion Images that are Invisible and Robust," in NeurIPS, 2023.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/101472	-
dc.description.abstract	隨著生成式人工智慧技術的快速發展，深度學習模型在圖像生成和編輯領域取得了突破性進展。然而，這些技術的普及也帶來了新的挑戰，特別是在數位內容的版權保護和真實性驗證方面。傳統的水印技術往往無法有效抵抗基於深度學習的圖像編輯攻擊，因此迫切需要開發更加強健的水印嵌入和檢測方法。本研究提出了一種基於深度學習的對抗性水印技術，該技術能夠在圖像中嵌入不可見的水印，並在經過各種攻擊後仍能成功檢測和提取水印。本方法在現有技術基礎上進行了重要改進，保留了 PIDSG（Perceptual Image Distortion and Semantic Guidance）模組的核心思想，該模組能夠有效平衡水印的圖像品質和強健性（特別是基於指令的圖像編輯）。本方法的核心創新在於：首先，我們採用了兩階段對抗訓練策略，第一階段專注於水印嵌入的視覺品質，確保水印的不可見性；第二階段引入多樣化的噪聲攻擊模擬，提升模型對各種攻擊的強健性。這種分階段的方法能夠有效平衡水印的隱蔽性和抗攻擊能力。整個流程完全可微分，支持端到端的梯度優化，使得水印嵌入和檢測能夠聯合訓練。其次，我們使用了Optuna_adversarial_search的最佳化方式，使用貝葉斯優化（Optuna）自動搜尋最佳的超參數組合，採用多目標優化策略，同時考慮強健性、視覺品質和檢測準確性，並整合了 JPEG 壓縮、高斯噪聲、旋轉變換、裁切、模糊等多種攻擊模擬，可以有效平衡模型面對各種空間域攻擊時的強健性。實驗結果表明，本方法在保持高視覺品質的同時，能夠有效抵抗多種攻擊，平均位元錯誤率控制在 15% 以下。特別是在對抗基於指令的圖像編輯攻擊時，本方法展現出優異的強健性，為數位內容的版權保護提供了有效的技術解決方案。本研究的貢獻主要包括：（1）引入了創新的兩階段對抗訓練策略與原有的，有效平衡了水印的不可見性和強健性；（2）建立了全面的攻擊對抗機制，包括針對現代 AI 圖像編輯技術與傳統的空間域攻擊；（4）引入超參數優化系統，提升了方法的實用性和可擴展性；（5）保留了 PIDSG 模組的核心優勢，並在此基礎上進行了重要的技術改進。本研究為數位內容保護領域提供了重要的技術突破，具有廣闊的應用前景，包括數位版權保護、內容認證、溯源追蹤等領域。未來工作將進一步擴展到視頻水印、3D 內容保護等更廣泛的應用場景。	zh_TW
dc.description.abstract	With the rapid development of generative artificial intelligence, instruction-driven image editing based on diffusion models has revolutionized content creation. While offering unprecedented flexibility, this advancement complicates copyright protection, making digital watermarking an essential tool for content authentication. In the past, watermarking schemes were primarily designed to resist traditional geometric distortions such as rotation and scaling. Recently, learning-based methods have emerged to tackle semantic-level transformations introduced by AI editing. However, existing approaches typically excel in only one domain, failing to maintain robustness against both semantic edits and geometric attacks simultaneously. We propose a novel watermarking framework that bridges this gap, achieving dual robustness against both attack types while outperforming current standards. First, we introduce a two-stage training strategy to address the inherent trade-off between image quality and watermark robustness. Unlike conventional single-stage methods, we decouple the learning process: the first stage focuses exclusively on high-fidelity watermark embedding to establish stable patterns, while the second stage incorporates a configurable noise layer to simulate diverse attacks. This separation allows the model to learn resilient features without compromising visual quality, effectively solving the optimization conflicts found in previous works. Secondly, the overall model architecture integrates a U-Net-based encoder and a ResNet-based decoder, optimized through a unique correlation-guided optimization framework. To manage the computational complexity of training against numerous diverse attacks, we propose an attack correlation analysis matrix. This matrix quantifies the relationships between a small subset of training attacks and a broader range of evaluation attacks. By leveraging these correlations within a Bayesian optimization loop, we enable efficient knowledge transfer, allowing the model to generalize well to unseen distortions without exhaustive training on every specific attack type. Finally, experimental evaluations on the MagicBrush dataset demonstrate the effectiveness of our approach. Our model maintains a Bit Error Rate (BER) below 0.07 across all test scenarios, effectively handling both instruction-driven editing and traditional geometric perturbations. In comparison to existing state-of-the-art methods, our approach achieves a significantly lower BER standard deviation, providing a more balanced and reliable robustness profile for practical watermarking applications.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2026-02-03T16:32:51Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2026-02-03T16:32:51Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	誌謝 i 中文摘要 ii ABSTRACT iv CONTENTS vi LIST OF FIGURES ix LIST OF TABLES x Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Primary Contributions 2 1.2.1 Two-Stage Training Strategy 3 1.2.2 Correlation-Guided Optimization Framework 3 1.2.3 Dual Robustness Achievement 4 1.2.4 Balanced Robustness 4 Chapter 2 Related Works 5 2.1 Conventional Watermarking Methods 5 2.2 Deep Learning-Based Watermarking 6 2.3 Watermarking Against Instruction-Driven Image Editing 8 2.4 Hyperparameter Optimization in Watermarking 9 2.5 Other Related Methods 9 Chapter 3 Methodology 11 3.1 Network Architecture 12 3.2 Configurable noise Layer Design 14 3.3 Two-Stage Training Strategy 16 3.4 Loss Function Design 17 3.5 Correlation-Guided Hyperparameter Optimization 20 3.5.1 Bayesian Optimization with TPE Sampler 22 3.5.2 Attack Correlation Analysis 22 3.5.3 Objective Function Formulation 23 3.5.4 Efficiency Mechanisms 24 Chapter 4 Experiments And Comparison 25 4.1 Experimental Setup 25 4.1.1 Datasets and Generalization Protocol 25 4.1.2 Attack Simulations 26 4.1.3 Evaluation Metrics 27 4.2 Performance Comparison 28 4.2.1 Baselines 28 4.2.2 Robustness Analysis 29 4.2.3 Visual Quality Analysis 31 4.2.4 Summary 32 4.3 Ablation Study 33 4.3.1 Baseline: Single-Stage Training with 256-bit Capacity 34 4.3.2 Introducing Noise Layers for Spatial Robustness 34 4.3.3 Two-Stage Training for Quality–Robustness Decoupling 35 4.3.4 Penalty Loss for Artifact Suppression 35 4.3.5 Final Model with Correlation-Guided Optimization 36 Chapter 5 Conclusion and Future Work 39 5.1 Contributions and Key Findings 39 5.2 Limitations 41 5.3 Future Work 41 REFERENCE 43	-
dc.language.iso	en	-
dc.subject	數位水印	-
dc.subject	深度學習	-
dc.subject	對抗訓練	-
dc.subject	圖像編輯	-
dc.subject	版權保護	-
dc.subject	強健性	-
dc.subject	Digital Watermarking	-
dc.subject	Dual Robustness	-
dc.subject	Two-Stage Training	-
dc.subject	Correlation-Guided Optimization	-
dc.subject	Instruction-Driven Image Editing	-
dc.subject	Balanced Robustness	-
dc.title	DualRobust：針對指令驅動影像編輯的雙重穩健浮水印技術	zh_TW
dc.title	DualRobust: Dual-Robust Watermarking Against Instruction-Driven Image Editing	en
dc.type	Thesis	-
dc.date.schoolyear	114-1	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	盧奕璋;曾易聰;蘇柏齊	zh_TW
dc.contributor.oralexamcommittee	Yi-Chang Lu;Yi-Chong Zeng;Po-Chyi Su	en
dc.subject.keyword	數位水印,深度學習對抗訓練圖像編輯版權保護強健性	zh_TW
dc.subject.keyword	Digital Watermarking,Dual RobustnessTwo-Stage TrainingCorrelation-Guided OptimizationInstruction-Driven Image EditingBalanced Robustness	en
dc.relation.page	47	-
dc.identifier.doi	10.6342/NTU202600177	-
dc.rights.note	未授權	-
dc.date.accepted	2026-01-23	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	電信工程學研究所	-
dc.date.embargo-lift	N/A	-
顯示於系所單位：	電信工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-114-1.pdf 未授權公開取用	1.79 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。