請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88415
完整後設資料紀錄
DC 欄位 | 值 | 語言 |
---|---|---|
dc.contributor.advisor | 吳家麟 | zh_TW |
dc.contributor.advisor | Ja-Ling Wu | en |
dc.contributor.author | 劉旭庭 | zh_TW |
dc.contributor.author | Hsu-Ting Liu | en |
dc.date.accessioned | 2023-08-15T16:11:41Z | - |
dc.date.available | 2023-11-09 | - |
dc.date.copyright | 2023-08-15 | - |
dc.date.issued | 2023 | - |
dc.date.submitted | 2023-07-26 | - |
dc.identifier.citation | [1] T. Brooks, A. Holynski, and A. A. Efros. Instructpix2pix: Learning to follow image editing instructions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18392–18402, 2023.
[2] Y. Choi, M. Choi, M. Kim, J. Ha, S. Kim, and J. Choo. Stargan: Unified generative adversarial networks for multi-domain image-to-image translation. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pages 8789–8797. Computer Vision Foundation / IEEE Computer Society, 2018. [3] P. Fernandez, G. Couairon, H. Jégou, M. Douze, and T. Furon. The stable signature: Rooting watermarks in latent diffusion models. CoRR, abs/2303.15435, 2023. [4] P.Fernandez, A.Sablayrolles, T.Furon,H.Jégou,and M.Douze.Watermarking images in self-supervised latent spaces. In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022, pages 3054–3058. IEEE, 2022. [5] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. C. Courville, and Y. Bengio. Generative adversarial nets. In Z. Ghahra- mani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada, pages 2672–2680, 2014. [6] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In I. Guyon, U. von Luxburg, S. Bengio, H. M. Wallach, R. Fergus, S. V. N. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pages 6626–6637, 2017. [7] J. Ho, A. Jain, and P. Abbeel. Denoising diffusion probabilistic models. In H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020. [8] T. Karras, S. Laine, M. Aittala, J. Hellsten, J. Lehtinen, and T. Aila. Analyzing and improving the image quality of stylegan. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, pages 8107–8116. Computer Vision Foundation / IEEE, 2020. [9] D. P. Kingma and M. Welling. Auto-encoding variational bayes. In Y. Bengio and Y. LeCun, editors, 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings, 2014. [10] T. Lin, M. Maire, S. J. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick. Microsoft COCO: common objects in context. In D. J. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, editors, Computer Vision - ECCV 2014 - 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V, volume 8693 of Lecture Notes in Computer Science, pages 740–755. Springer, 2014. [11] X. Luo, R. Zhan, H. Chang, F. Yang, and P. Milanfar. Distortion agnostic deep watermarking. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, pages 13545– 13554. Computer Vision Foundation / IEEE, 2020. [12] S. Marcel and Y. Rodriguez. Torchvision the machine-vision package of torch. In A. D. Bimbo, S. Chang, and A. W. M. Smeulders, editors, Proceedings of the 18th International Conference on Multimedia 2010, Firenze, Italy, October 25-29, 2010, pages 1485–1488. ACM, 2010. [13] A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever. Learning transferable visual models from natural language supervision. In M. Meila and T. Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, volume 139 of Proceedings of Machine Learning Research, pages 8748–8763. PMLR, 2021. [14] A. Ramesh, P. Dhariwal, A. Nichol, C. Chu, and M. Chen. Hierarchical text-conditional image generation with CLIP latents. CoRR, abs/2204.06125, 2022. [15] A. Ramesh, M. Pavlov, G. Goh, S. Gray, C. Voss, A. Radford, M. Chen, and I. Sutskever. Zero-shot text-to-image generation. In M. Meila and T. Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, volume 139 of Proceedings of Machine Learning Research, pages 8821–8831. PMLR, 2021. [16] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer. High-resolution image synthesis with latent diffusion models. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022, pages 10674–10685. IEEE, 2022. [17] C. Saharia, W. Chan, H. Chang, C. A. Lee, J. Ho, T. Salimans, D. J. Fleet, and M. Norouzi. Palette: Image-to-image diffusion models. In M. Nandigjav, N. J. Mitra, and A. Hertzmann, editors, SIGGRAPH ’22: Special Interest Group on Computer Graphics and Interactive Techniques Conference, Vancouver, BC, Canada, August 7 - 11, 2022, pages 15:1–15:10. ACM, 2022. [18] C.Saharia, W.Chan, S.Saxena, L.Li, J.Whang,E.L.Denton,S.K.S.Ghasemipour, R. G. Lopes, B. K. Ayan, T. Salimans, J. Ho, D. J. Fleet, and M. Norouzi. Photorealistic text-to-image diffusion models with deep language understanding. In NeurIPS, 2022. [19] J.Sohl-Dickstein, E.A.Weiss, N.Maheswaranathan, and S.Ganguli.Deep unsupervised learning using nonequilibrium thermodynamics. In F. R. Bach and D. M. Blei, editors, Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015, volume 37 of JMLR Workshop and Conference Proceedings, pages 2256–2265. JMLR.org, 2015. [20] J. Song, C. Meng, and S. Ermon. Denoising diffusion implicit models. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021. [21] Y. Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole. Score-based generative modeling through stochastic differential equations. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021. [22] A. van den Oord, O. Vinyals, and K. Kavukcuoglu. Neural discrete representation learning. In I. Guyon, U. von Luxburg, S. Bengio, H. M. Wallach, R. Fergus, S. V. N. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pages 6306–6315, 2017. [23] P. von Platen, S. Patil, A. Lozhkov, P. Cuenca, N. Lambert, K. Rasul, M. Davaadorj, and T. Wolf. Diffusers: State-of-the-art diffusion models. https://github.com/huggingface/diffusers, 2022. [24] N. Yu, V. Skripniuk, S. Abdelnabi, and M. Fritz. Artificial fingerprinting for generative models: Rooting deepfake attribution in training data. In 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021, pages 14428–14437. IEEE, 2021. [25] N. Yu, V. Skripniuk, D. Chen, L. S. Davis, and M. Fritz. Responsible disclosure of generative models using scalable fingerprinting. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net, 2022. [26] L. Zhang and M. Agrawala. Adding conditional control to text-to-image diffusion models. CoRR, abs/2302.05543, 2023. [27] Y.Zhao, T.Pang, C.Du, X.Yang, N.Cheung, and M.Lin. A recipe for watermarking diffusion models. CoRR, abs/2303.10137, 2023. [28] J. Zhu, R. Kaplan, J. Johnson, and L. Fei-Fei. Hidden: Hiding data with deep networks. In V. Ferrari, M. Hebert, C. Sminchisescu, and Y. Weiss, editors, Computer Vision - ECCV 2018 - 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part XV, volume 11219 of Lecture Notes in Computer Science, pages 682–697. Springer, 2018. | - |
dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88415 | - |
dc.description.abstract | 近期許多研究人員深入探索複雜的潛在擴散模型領域,這些模型能依據多模態條件生成逼真的圖片,這種特性賦予了它們強大的潛力,適用於各種應用。同時,這類模型也牽涉到迫在眉睫的倫理困境,主要受限於如何負責任地部署這項技術。惡意使用者製造的圖像有可能造成重大的社會問題,需要嚴格的控制措施以減輕可能的危害。根源浮水印技術是解決此問題可行的方法,這項技術允許我們追踪那些惡意創建的圖像來源,從而利用法律向為非作歹為非作歹者咎責。本篇論文基於現有的方法加以改善,著重於提升根源浮水印對於圖像失真的抵抗能力。通過對現有方法模組的正規化和在訓練期間加入圖像噪聲,本篇論文加強了根源浮水印的強韌性同時保持生成圖像的視覺品質。研究結果顯示,在圖片縮小為原圖的10%後,浮水印解碼之為元準確率相較現有方法高出24%,此結果展現本篇論文的有效性,也在潛在擴散模型之根源浮水印領域中取得重要的進展。 | zh_TW |
dc.description.abstract | Recently, many researchers dived into the intricate realm of latent diffusion models, renowned for their competence in generating images reflecting a multimodal range of conditions. This property endows them with enormous potential, rendering them fit for a host of applications across various disciplines. Simultaneously, the power of such models introduces pressing ethical dilemmas, primarily revolving around the responsible deployment of this technology. The indiscriminate use of such models has the potential to cause significant harm, emphasizing the need for rigorous control measures to mitigate potential misuse. Root watermarking is introduced as a countermeasure to address these ethical concerns. This technology allows for the tracing of images that have been maliciously created back to the originators. By establishing this link, we can assign responsibility for any misuse, facilitating the enforcement of legal sanctions and deterring misuse. In this work, we focus on enhancing the resilience of common distortions in the generated images. We have expanded upon an existing method, refining it through the normalization of its module and the strategic injection of noise during the training. This approach has allowed us to significantly enhance the model’s robustness, all the while maintaining the perceptual quality of the generated images. Our quantitative analysis attests to the efficacy of our approach, as it has resulted in a substantial improvement in bitwise accuracy by up to 24%, even when the images undergo a 10% resizing. This outcome not only showcases the effectiveness of our method but also represents a significant stride forward in the ongoing journey to watermarking latent diffusion models. | en |
dc.description.provenance | Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-08-15T16:11:41Z No. of bitstreams: 0 | en |
dc.description.provenance | Made available in DSpace on 2023-08-15T16:11:41Z (GMT). No. of bitstreams: 0 | en |
dc.description.tableofcontents | Verification Letter from the Oral Examination Committee i
Acknowledgements ii 摘要 iii Abstract iv Contents vi List of Figures viii List of Tables x Chapter 1 Introduction 1 Chapter 2 Related Works 3 2.1 DNN-based watermarking 3 2.2 Rooting watermark 4 2.3 Diffusion Models 5 2.4 Diffusion Models-related rooting watermark 6 Chapter 3 Proposed Method 7 3.1 Overview 7 3.2 Pre-training the watermark embedder/extractor 8 3.3 Fine-tuning the VAE decoder 11 3.4 Noise layer 12 3.5 Whitening module 13 3.6 Training procedure 15 Chapter 4 Experiments 18 4.1 Setting 18 4.2 Watermark embedder/extractor 18 4.2.1 Noise layer 19 4.2.2 Whitening layer 19 4.2.3 Fine-tuning VAE decoder 19 4.2.4 Testing 20 4.3 Metric 20 4.4 Adding noise layer during fine-tuning 22 4.4.1 Robustness 22 4.4.2 Imperceptibility 23 4.5 Whitening layer 25 4.5.1 Re-weighting 26 4.5.2 Bit accuracy imbalance 28 Chapter 5 Conclusions 30 References 32 | - |
dc.language.iso | en | - |
dc.title | 加強潛在擴散模型中根源浮水印的強韌性 | zh_TW |
dc.title | Enhancing the Robustness of Rooting Watermarks in Latent Diffusion Models | en |
dc.type | Thesis | - |
dc.date.schoolyear | 111-2 | - |
dc.description.degree | 碩士 | - |
dc.contributor.oralexamcommittee | 許超雲;陳文進 | zh_TW |
dc.contributor.oralexamcommittee | Chao-Yun Hsu;Wen-Chin Chen | en |
dc.subject.keyword | 強韌浮水印,潛在擴散模型,圖像取證,生成模型,變分自動編碼器, | zh_TW |
dc.subject.keyword | Robust watermarking,Latent diffusion model,Image forensics,Generative model,Variational autoencoders, | en |
dc.relation.page | 37 | - |
dc.identifier.doi | 10.6342/NTU202302023 | - |
dc.rights.note | 同意授權(全球公開) | - |
dc.date.accepted | 2023-07-28 | - |
dc.contributor.author-college | 電機資訊學院 | - |
dc.contributor.author-dept | 資訊工程學系 | - |
顯示於系所單位: | 資訊工程學系 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-111-2.pdf | 32.89 MB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。