透過對比式學習建構潛在風格空間之任意圖像風格轉換模型

陳宏昇; Hong-Sheng Chen

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/97136

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	陳炳宇	zh_TW
dc.contributor.advisor	Bing-Yu Chen	en
dc.contributor.author	陳宏昇	zh_TW
dc.contributor.author	Hong-Sheng Chen	en
dc.date.accessioned	2025-02-27T16:21:40Z	-
dc.date.available	2025-02-28	-
dc.date.copyright	2025-02-27	-
dc.date.issued	2025	-
dc.date.submitted	2025-02-04	-
dc.identifier.citation	[1] WikiArt: Visual Art Encyclopedia. https://www.wikiart.org/. Accessed: 2025-01-27. [2] J. An, S. Huang, Y. Song, D. Dou, W. Liu, and J. Luo. Artflow: Unbiased image style transfer via reversible neural flows. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 862–871, Nashville, TN, USA (Held Online), 2021. IEEE/CVF. [3] M. Arjovsky, S. Chintala, and L. Bottou. Wasserstein generative adversarial net- works. In D. Precup and Y. W. Teh, editors, Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 214–223, Sydney, NSW, Australia, 06–11 Aug 2017. PMLR. [4] M. Caron, I. Misra, J. Mairal, P. Goyal, P. Bojanowski, and A. Joulin. Unsupervised learning of visual features by contrasting cluster assignments. Advances in neural information processing systems, 33:9912–9924, 2020. [5] T. Chen, S. Kornblith, M. Norouzi, and G. Hinton. A simple framework for con- trastive learning of visual representations. In International conference on machine learning, pages 1597–1607, Virtual Event, 2020. PMLR. [6] X. Chen and K. He. Exploring simple siamese representation learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 15750–15758, 2021. [7] S. Chopra, R. Hadsell, and Y. LeCun. Dimensionality reduction by learning an in- variant mapping. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), volume 2, pages 1735–1742, San Diego, CA, USA, 2005. IEEE. [8] M. Cimpoi, S. Maji, I. Kokkinos, S. Mohamed, and A. Vedaldi. Describing tex- tures: A metric learning approach to texture description. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 360–368, Columbus, OH, USA, 2014. IEEE. [9] Y. Deng, F. Tang, W. Dong, H. Huang, C. Ma, and C. Xu. Arbitrary video style transfer via multi-channel correlation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35 (2), pages 1210–1217, 2021. [10] Y. Deng, F. Tang, W. Dong, C. Ma, X. Pan, L. Wang, and C. Xu. Stytr2: Image style transfer with transformers. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11326–11336, New Orleans, LA, USA, 2022. IEEE/CVF. [11] V. Dumoulin, J. Shlens, and M. Kudlur. A learned representation for artistic style. arXiv preprint arXiv:1610.07629, 2016. [12] L. A. Gatys. A neural algorithm of artistic style. arXiv preprint arXiv:1508.06576, 2015. [13] L. A. Gatys, A. S. Ecker, M. Bethge, A. Hertzmann, and E. Shechtman. Controlling perceptual factors in neural style transfer. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3985–3993, Honolulu, HI, USA, 2017. IEEE. [14] G. Ghiasi, H. Lee, M. Kudlur, V. Dumoulin, and J. Shlens. Exploring the struc- ture of a real-time, arbitrary neural artistic stylization network. arXiv preprint arXiv:1705.06830, 2017. [15] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In Advances in Neural Information Processing Systems, pages 2672–2680, 2014. [16] J.-B.Grill,F.Strub,F.Altché,C.Tallec,P.Richemond,E.Buchatskaya,C.Doersch, B. Avila Pires, Z. Guo, M. Gheshlaghi Azar, et al. Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems, 33:21271–21284, 2020. [17] K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick. Momentum contrast for unsuper- vised visual representation learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9729–9738, 2020. [18] R. D. Hjelm, A. Fedorov, S. Lavoie-Marchildon, K. Grewal, P. Bachman, A. Trischler, and Y. Bengio. Learning deep representations by mutual infor- mation estimation and maximization. In International Conference on Learning Representations, New Orleans, LA, USA, 2019. ICLR. Presented as an oral pre- sentation at ICLR 2019. [19] X.HuangandS.Belongie.Arbitrarystyletransferinreal-timewithadaptiveinstance normalization. In Proceedings of the IEEE international conference on computer vision, pages 1501–1510, 2017. [20] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros. Image-to-image translation with con- ditional adversarial networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1125–1134, Honolulu, HI, USA, 2017. IEEE. [21] J. Johnson, A. Alahi, and L. Fei-Fei. Perceptual losses for real-time style transfer and super-resolution. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II 14, pages 694–711, Amsterdam, The Netherlands, 2016. Springer, Springer. [22] C. Li and M. Wand. Combining markov random fields and convolutional neural networks for image synthesis. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2479–2486, Las Vegas, NV, USA, 2016. IEEE. [23] Y.Li,N.Wang,J.Liu,andX.Hou.Demystifyingneuralstyletransfer.arXivpreprint arXiv:1701.01036, 2017. [24] M.-Y. Liu, T. Breuel, and J. Kautz. Unsupervised image-to-image translation net- works. In Advances in Neural Information Processing Systems, pages 700–708, 2017. [25] S. Liu, T. Lin, D. He, F. Li, M. Wang, X. Li, Z. Sun, Q. Li, and E. Ding. Adaattn: Revisit attention mechanism in arbitrary neural style transfer. In Proceedings of the IEEE/CVF international conference on computer vision, pages 6649–6658, Nashville, TN, USA (Held Online), 2021. IEEE. [26] X. Mao, Q. Li, H. Xie, R. Y. Lau, Z. Wang, and S. Paul Smolley. Least squares generative adversarial networks. In IEEE International Conference on Computer Vision (ICCV), pages 2813–2821, 2017. [27] T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida. Spectral normalization for generative adversarial networks. arXiv preprint arXiv:1802.05957, 2018. [28] D. Y. Park and K. H. Lee. Arbitrary style transfer with style-attentional net- works. In proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5880–5888, Long Beach, CA, USA, 2019. IEEE/CVF. [29] A. Radford, L. Metz, and S. Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. In International Conference on Learning Representations (ICLR), 2016. [30] small yellow duck and W. Kan. Painter by numbers. https://www.kaggle.com/ competitions/painter-by-numbers, 2016. Accessed: 2025-01-27. [31] Y. Tian, D. Krishnan, and P. Isola. Contrastive multiview coding. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XI, pages 776–794, Glasgow, UK, 2020. Springer, Springer. [32] A. van den Oord, Y. Li, and O. Vinyals. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018. [33] L.VanderMaatenandG.Hinton.Visualizingdatausingt-SNE.JournalofMachine Learning Research, 9(Nov):2579–2605, 2008. [34] Z. Wang and Z.-S. Liu. Stylemamba: State space model for efficient text-driven image style transfer. arXiv preprint arXiv:2405.05027, 2024. [35] Z. Wu, Y. Xiong, S. X. Yu, and D. Lin. Unsupervised feature learning via non- parametric instance discrimination. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3733–3742, Salt Lake City, UT, USA, 2018. IEEE. [36] Y. Zhang, F. Tang, W. Dong, H. Huang, C. Ma, T.-Y. Lee, and C. Xu. Domain en- hanced arbitrary image style transfer via contrastive learning. In ACM SIGGRAPH 2022 conference proceedings, pages 1–8, Vancouver, BC, Canada, 2022. Associa- tion for Computing Machinery. [37] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros. Unpaired image-to-image translation using cycle-consistent adversarial networks. In IEEE International Conference on Computer Vision (ICCV), pages 2223–2232, 2017.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/97136	-
dc.description.abstract	我們在本研究之中提出了一個新穎的影像風格轉換方法，稱為「對比式估計自適應正規化」（Adaptive Normalization with Contrastive Estimation, AdaNCE）。它整合了編碼器—解碼器架構、對比式學習、生成對抗網路和循環一致性，結合實現在多元場景下的穩定且高品質的藝術風格轉換。在此模型架構中，我們會首先從預訓練好的 VGG19 網路提取出影像多層特徵，通過 AdaIN 的機制內容與風格統計量進行調配達到風格轉換的效果。此外，提取出的 AdaIN 風格統計量也可以經過由全連接層組成的類神經網路投影至 2048 維度的向量，以此結合對比式學習的概念來構建風格潛在空間。這使得不同風格之間可以有效地被區隔開，並保證相同風格的聚合度。接下來，我們為了檢驗此論文中所提架構的效能，因而進一步地在 WikiArt、COCO、PBN 以及 DTD 等多個資料集上來實施了感知與量化實驗，實驗的衡量標準包括風格損失（style loss）、內容損失（content loss）、運算效能和視覺品質等項。得到的結果也印證本方法在萃取和保持風格特徵方面的穩定性以及其泛化能力。我們所推出的 AdaNCE 利用了對比式學習來做主要的模型優化與隱風格空間的建構，搭配對抗式網路、循環一致性的輔助，達到了速度與表現穩定度的理想平衡點。除此之外，我們也在相同實驗下通過消融實驗去驗證此模型每一個模塊的能力，此消融實驗證實了其三個組成部分—AdaIN、GAN 及對比式學習—對最終結果極為關鍵，只要缺乏任何之一的模塊，所達到的風格轉換效果即會因此而變差。此外，我們還設計了使用者實驗，當中邀請來自國立臺灣大學的 77 位大學生進行主觀美感評估，包含辨識合成影像與真實藝術創作之間的真實性，以及在多模型比較情境下對合成品質的偏好。從中所得到的實驗結果揭示了：AdaNCE不僅能在多數的輸入中產生逼真的風格轉換作品，也兼具適度的運算速度與廣泛的風格泛化能力。綜合以上觀點所述，本研究提出之 AdaNCE 方法在維持內容結構與表現藝術風格細節方面裡展現了顯著的優勢，具有應用於多種風格轉換任務的潛力。	zh_TW
dc.description.abstract	In this work, a brand new image style transfer approach is proposed: Adaptive Normalization with Contrastive Estimation (AdaNCE). For the purpose of stable and high-quality artistic style transfer across diverse scenarios, it integrates the encoder-decoder architecture, contrastive learning, generative adversarial networks, and cycle consistency. The overall process begins with the extraction of multi-layer features from the pre-trained VGG19 network. Through AdaIN, adjustments are made to the content and style statistics to achieve style transfer. The extracted AdaIN style statistics can further be projected into a 2048-dimensional vector using a fully connected neural network. This projection enables the construction of a latent style space, allowing contrastive learning to effectively distinguish different styles while ensuring cohesion within similar styles. Several datasets—WikiArt, COCO, PBN, and DTD—were involved in the experiments to verify the effectiveness and generalization of the proposed method. These experiments comprehensively investigated style loss, content loss, computational efficiency, and visual quality. The final results show the robustness of the proposed method in extracting and preserving style features. Our AdaNCE balances the trade-offs between speed and stability, aided by the incorporation of GANs and cycle consistency. Ablation experiments proved that AdaIN, GAN, and contrastive learning were indispensable to the final results, since removing any of them would significantly degrade the performance. Furthermore, a user study was conducted with 77 university students from National Taiwan University. Participants performed subjective aesthetic evaluations, including distinguishing between synthesized images and real artworks, as well as comparing synthesized quality preferences across different models. The final experimental results show that AdaNCE not only produces realistic style transfer outputs in most cases but also achieves moderate computational speed and broad style generalization capabilities. In a nutshell, AdaNCE demonstrates significant advantages in maintaining content structure while expressing detailed artistic styles. It has the potential to be applied to various style transfer tasks.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2025-02-27T16:21:39Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2025-02-27T16:21:40Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	Verification Letter from the Oral Examination Committee ………………………………………………. i Acknowledgements …………………………………………………………………………………………….. iii 摘要 …………………………………………………………………………………………………………….. v Abstract …………………………………………………………………………………………………………. vii Contents …………………………………………………………………………………………………………. ix List of Figures ………………………………………………………………………………………………….. xi List of Tables …………………………………………………………………………………………………… xiii Denotation ……………………………………………………………………………………………………….. xv Chapter 1: Introduction ……………………………………………………………………………….. 1 Chapter 2: Related Work …………………………………………………………………………….. 5 2.1 Contrastive Learning …………………………………………………………………………………. 5 2.2 Generative Adversarial Networks ………………………………………………………………… 6 2.3 Image Style Transfer …………………………………………………………………………………. 7 Chapter 3: Methodology ……………………………………………………………………………. 11 3.1 Model Architecture …………………………………………………………………………………. 11 3.1.1 Encoder-AdaIN-Decoder ……………………………………………………………………… 12 3.1.2 Generative Adversarial Loss ………………………………………………………………… 13 3.1.3 Cycle Consistency Loss ………………………………………………………………………. 14 3.1.4 Contrastive Learning ………………………………………………………………………….. 15 3.2 Final Objective ……………………………………………………………………………………….. 17 Chapter 4: Experimental Results and Discussion ………………………………………….. 19 4.1 Datasets …………………………………………………………………………………………………. 19 4.2 Training Parameters ………………………………………………………………………………… 20 4.3 Experiment I: Loss Curve Comparison ………………………………………………………. 20 4.4 Experiment II: Ablation Study …………………………………………………………………… 21 4.5 Experiment III: Latent Space Visualization ………………………………………………… 23 4.6 Experiment IV: Qualitative and Quantitative Evaluations ……………………………. 24 4.6.1 Qualitative Evaluation ………………………………………………………………………… 25 4.6.2 Quantitative Evaluation (User Study) …………………………………………………… 26 Chapter 5: Conclusion ……………………………………………………………………………… 29 References ……………………………………………………………………………………………………… 31 Appendix A — Qualitative Evaluation ………………………………………………………………… 37	-
dc.language.iso	en	-
dc.title	透過對比式學習建構潛在風格空間之任意圖像風格轉換模型	zh_TW
dc.title	Latent Style Space Construction via Contrastive Learning for Arbitrary Image Style Transfer	en
dc.type	Thesis	-
dc.date.schoolyear	113-1	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	朱宏國;王昱舜	zh_TW
dc.contributor.oralexamcommittee	Hung-Kuo Chu;Yu-Shuen Wang	en
dc.subject.keyword	藝術風格遷移,即時推斷,對比式估計自適應正規化,編碼器—解碼器架構,對比式學習,生成對抗網路,風格潛在空間,	zh_TW
dc.subject.keyword	StyleTransfer,Real-time Inference,Adaptive Normalization with Contrastive Estimation (AdaNCE),Encoder-Decoder,Contrastive Learning,Generative Adversarial Network,Style Latent Space,	en
dc.relation.page	40	-
dc.identifier.doi	10.6342/NTU202500065	-
dc.rights.note	同意授權(全球公開)	-
dc.date.accepted	2025-02-04	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	資訊網路與多媒體研究所	-
dc.date.embargo-lift	2025-02-28	-
顯示於系所單位：	資訊網路與多媒體研究所

文件中的檔案：

檔案	大小	格式
ntu-113-1.pdf	23.67 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。