Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊網路與多媒體研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/97136
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor陳炳宇zh_TW
dc.contributor.advisorBing-Yu Chenen
dc.contributor.author陳宏昇zh_TW
dc.contributor.authorHong-Sheng Chenen
dc.date.accessioned2025-02-27T16:21:40Z-
dc.date.available2025-02-28-
dc.date.copyright2025-02-27-
dc.date.issued2025-
dc.date.submitted2025-02-04-
dc.identifier.citation[1] WikiArt: Visual Art Encyclopedia. https://www.wikiart.org/. Accessed: 2025-01-27.
[2] J. An, S. Huang, Y. Song, D. Dou, W. Liu, and J. Luo. Artflow: Unbiased image style transfer via reversible neural flows. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 862–871, Nashville, TN, USA (Held Online), 2021. IEEE/CVF.
[3] M. Arjovsky, S. Chintala, and L. Bottou. Wasserstein generative adversarial net- works. In D. Precup and Y. W. Teh, editors, Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 214–223, Sydney, NSW, Australia, 06–11 Aug 2017. PMLR.
[4] M. Caron, I. Misra, J. Mairal, P. Goyal, P. Bojanowski, and A. Joulin. Unsupervised learning of visual features by contrasting cluster assignments. Advances in neural information processing systems, 33:9912–9924, 2020.
[5] T. Chen, S. Kornblith, M. Norouzi, and G. Hinton. A simple framework for con- trastive learning of visual representations. In International conference on machine learning, pages 1597–1607, Virtual Event, 2020. PMLR.
[6] X. Chen and K. He. Exploring simple siamese representation learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 15750–15758, 2021.
[7] S. Chopra, R. Hadsell, and Y. LeCun. Dimensionality reduction by learning an in- variant mapping. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), volume 2, pages 1735–1742, San Diego, CA, USA, 2005. IEEE.
[8] M. Cimpoi, S. Maji, I. Kokkinos, S. Mohamed, and A. Vedaldi. Describing tex- tures: A metric learning approach to texture description. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 360–368, Columbus, OH, USA, 2014. IEEE.
[9] Y. Deng, F. Tang, W. Dong, H. Huang, C. Ma, and C. Xu. Arbitrary video style transfer via multi-channel correlation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35 (2), pages 1210–1217, 2021.
[10] Y. Deng, F. Tang, W. Dong, C. Ma, X. Pan, L. Wang, and C. Xu. Stytr2: Image style transfer with transformers. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11326–11336, New Orleans, LA, USA, 2022. IEEE/CVF.
[11] V. Dumoulin, J. Shlens, and M. Kudlur. A learned representation for artistic style. arXiv preprint arXiv:1610.07629, 2016.
[12] L. A. Gatys. A neural algorithm of artistic style. arXiv preprint arXiv:1508.06576, 2015.
[13] L. A. Gatys, A. S. Ecker, M. Bethge, A. Hertzmann, and E. Shechtman. Controlling perceptual factors in neural style transfer. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3985–3993, Honolulu, HI, USA, 2017. IEEE.
[14] G. Ghiasi, H. Lee, M. Kudlur, V. Dumoulin, and J. Shlens. Exploring the struc- ture of a real-time, arbitrary neural artistic stylization network. arXiv preprint arXiv:1705.06830, 2017.
[15] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In Advances in Neural Information Processing Systems, pages 2672–2680, 2014.
[16] J.-B.Grill,F.Strub,F.Altché,C.Tallec,P.Richemond,E.Buchatskaya,C.Doersch, B. Avila Pires, Z. Guo, M. Gheshlaghi Azar, et al. Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems, 33:21271–21284, 2020.
[17] K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick. Momentum contrast for unsuper- vised visual representation learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9729–9738, 2020.
[18] R. D. Hjelm, A. Fedorov, S. Lavoie-Marchildon, K. Grewal, P. Bachman, A. Trischler, and Y. Bengio. Learning deep representations by mutual infor- mation estimation and maximization. In International Conference on Learning Representations, New Orleans, LA, USA, 2019. ICLR. Presented as an oral pre- sentation at ICLR 2019.
[19] X.HuangandS.Belongie.Arbitrarystyletransferinreal-timewithadaptiveinstance normalization. In Proceedings of the IEEE international conference on computer vision, pages 1501–1510, 2017.
[20] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros. Image-to-image translation with con- ditional adversarial networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1125–1134, Honolulu, HI, USA, 2017. IEEE.
[21] J. Johnson, A. Alahi, and L. Fei-Fei. Perceptual losses for real-time style transfer and super-resolution. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II 14, pages 694–711, Amsterdam, The Netherlands, 2016. Springer, Springer.
[22] C. Li and M. Wand. Combining markov random fields and convolutional neural networks for image synthesis. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2479–2486, Las Vegas, NV, USA, 2016. IEEE.
[23] Y.Li,N.Wang,J.Liu,andX.Hou.Demystifyingneuralstyletransfer.arXivpreprint arXiv:1701.01036, 2017.
[24] M.-Y. Liu, T. Breuel, and J. Kautz. Unsupervised image-to-image translation net- works. In Advances in Neural Information Processing Systems, pages 700–708, 2017.
[25] S. Liu, T. Lin, D. He, F. Li, M. Wang, X. Li, Z. Sun, Q. Li, and E. Ding. Adaattn: Revisit attention mechanism in arbitrary neural style transfer. In Proceedings of the IEEE/CVF international conference on computer vision, pages 6649–6658, Nashville, TN, USA (Held Online), 2021. IEEE.
[26] X. Mao, Q. Li, H. Xie, R. Y. Lau, Z. Wang, and S. Paul Smolley. Least squares generative adversarial networks. In IEEE International Conference on Computer Vision (ICCV), pages 2813–2821, 2017.
[27] T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida. Spectral normalization for generative adversarial networks. arXiv preprint arXiv:1802.05957, 2018.
[28] D. Y. Park and K. H. Lee. Arbitrary style transfer with style-attentional net- works. In proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5880–5888, Long Beach, CA, USA, 2019. IEEE/CVF.
[29] A. Radford, L. Metz, and S. Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. In International Conference on Learning Representations (ICLR), 2016.
[30] small yellow duck and W. Kan. Painter by numbers. https://www.kaggle.com/ competitions/painter-by-numbers, 2016. Accessed: 2025-01-27.
[31] Y. Tian, D. Krishnan, and P. Isola. Contrastive multiview coding. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XI, pages 776–794, Glasgow, UK, 2020. Springer, Springer.
[32] A. van den Oord, Y. Li, and O. Vinyals. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
[33] L.VanderMaatenandG.Hinton.Visualizingdatausingt-SNE.JournalofMachine Learning Research, 9(Nov):2579–2605, 2008.
[34] Z. Wang and Z.-S. Liu. Stylemamba: State space model for efficient text-driven image style transfer. arXiv preprint arXiv:2405.05027, 2024.
[35] Z. Wu, Y. Xiong, S. X. Yu, and D. Lin. Unsupervised feature learning via non- parametric instance discrimination. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3733–3742, Salt Lake City, UT, USA, 2018. IEEE.
[36] Y. Zhang, F. Tang, W. Dong, H. Huang, C. Ma, T.-Y. Lee, and C. Xu. Domain en- hanced arbitrary image style transfer via contrastive learning. In ACM SIGGRAPH 2022 conference proceedings, pages 1–8, Vancouver, BC, Canada, 2022. Associa- tion for Computing Machinery.
[37] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros. Unpaired image-to-image translation using cycle-consistent adversarial networks. In IEEE International Conference on Computer Vision (ICCV), pages 2223–2232, 2017.
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/97136-
dc.description.abstract我們在本研究之中提出了一個新穎的影像風格轉換方法,稱為「對比式估計自適應正規化」(Adaptive Normalization with Contrastive Estimation, AdaNCE)。它整合了編碼器—解碼器架構、對比式學習、生成對抗網路和循環一致性,結合實現在多元場景下的穩定且高品質的藝術風格轉換。在此模型架構中,我們會首先從預訓練好的 VGG19 網路提取出影像多層特徵,通過 AdaIN 的機制內容與風格統計量進行調配達到風格轉換的效果。此外,提取出的 AdaIN 風格統計量也可以經過由全連接層組成的類神經網路投影至 2048 維度的向量,以此結合對比式學習的概念來構建風格潛在空間。這使得不同風格之間可以有效地被區隔開,並保證相同風格的聚合度。

接下來,我們為了檢驗此論文中所提架構的效能,因而進一步地在 WikiArt、COCO、PBN 以及 DTD 等多個資料集上來實施了感知與量化實驗,實驗的衡量標準包括風格損失(style loss)、內容損失(content loss)、運算效能和視覺品質等項。得到的結果也印證本方法在萃取和保持風格特徵方面的穩定性以及其泛化能力。我們所推出的 AdaNCE 利用了對比式學習來做主要的模型優化與隱風格空間的建構,搭配對抗式網路、循環一致性的輔助,達到了速度與表現穩定度的理想平衡點。除此之外,我們也在相同實驗下通過消融實驗去驗證此模型每一個模塊的能力,此消融實驗證實了其三個組成部分—AdaIN、GAN 及對比式學習—對最終結果極為關鍵,只要缺乏任何之一的模塊,所達到的風格轉換效果即會因此而變差。

此外,我們還設計了使用者實驗,當中邀請來自國立臺灣大學的 77 位大學生進行主觀美感評估,包含辨識合成影像與真實藝術創作之間的真實性,以及在多模型比較情境下對合成品質的偏好。從中所得到的實驗結果揭示了:AdaNCE不僅能在多數的輸入中產生逼真的風格轉換作品,也兼具適度的運算速度與廣泛的風格泛化能力。綜合以上觀點所述,本研究提出之 AdaNCE 方法在維持內容結構與表現藝術風格細節方面裡展現了顯著的優勢,具有應用於多種風格轉換任務的潛力。
zh_TW
dc.description.abstractIn this work, a brand new image style transfer approach is proposed: Adaptive Normalization with Contrastive Estimation (AdaNCE). For the purpose of stable and high-quality artistic style transfer across diverse scenarios, it integrates the encoder-decoder architecture, contrastive learning, generative adversarial networks, and cycle consistency. The overall process begins with the extraction of multi-layer features from the pre-trained VGG19 network. Through AdaIN, adjustments are made to the content and style statistics to achieve style transfer. The extracted AdaIN style statistics can further be projected into a 2048-dimensional vector using a fully connected neural network. This projection enables the construction of a latent style space, allowing contrastive learning to effectively distinguish different styles while ensuring cohesion within similar styles.

Several datasets—WikiArt, COCO, PBN, and DTD—were involved in the experiments to verify the effectiveness and generalization of the proposed method. These experiments comprehensively investigated style loss, content loss, computational efficiency, and visual quality. The final results show the robustness of the proposed method in extracting and preserving style features. Our AdaNCE balances the trade-offs between speed and stability, aided by the incorporation of GANs and cycle consistency. Ablation experiments proved that AdaIN, GAN, and contrastive learning were indispensable to the final results, since removing any of them would significantly degrade the performance.

Furthermore, a user study was conducted with 77 university students from National Taiwan University. Participants performed subjective aesthetic evaluations, including distinguishing between synthesized images and real artworks, as well as comparing synthesized quality preferences across different models. The final experimental results show that AdaNCE not only produces realistic style transfer outputs in most cases but also achieves moderate computational speed and broad style generalization capabilities.

In a nutshell, AdaNCE demonstrates significant advantages in maintaining content structure while expressing detailed artistic styles. It has the potential to be applied to various style transfer tasks.
en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2025-02-27T16:21:39Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2025-02-27T16:21:40Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontentsVerification Letter from the Oral Examination Committee ………………………………………………. i
Acknowledgements …………………………………………………………………………………………….. iii
摘要 …………………………………………………………………………………………………………….. v
Abstract …………………………………………………………………………………………………………. vii
Contents …………………………………………………………………………………………………………. ix
List of Figures ………………………………………………………………………………………………….. xi
List of Tables …………………………………………………………………………………………………… xiii
Denotation ……………………………………………………………………………………………………….. xv
Chapter 1: Introduction ……………………………………………………………………………….. 1
Chapter 2: Related Work …………………………………………………………………………….. 5
2.1 Contrastive Learning …………………………………………………………………………………. 5
2.2 Generative Adversarial Networks ………………………………………………………………… 6
2.3 Image Style Transfer …………………………………………………………………………………. 7
Chapter 3: Methodology ……………………………………………………………………………. 11
3.1 Model Architecture …………………………………………………………………………………. 11
3.1.1 Encoder-AdaIN-Decoder ……………………………………………………………………… 12
3.1.2 Generative Adversarial Loss ………………………………………………………………… 13
3.1.3 Cycle Consistency Loss ………………………………………………………………………. 14
3.1.4 Contrastive Learning ………………………………………………………………………….. 15
3.2 Final Objective ……………………………………………………………………………………….. 17
Chapter 4: Experimental Results and Discussion ………………………………………….. 19
4.1 Datasets …………………………………………………………………………………………………. 19
4.2 Training Parameters ………………………………………………………………………………… 20
4.3 Experiment I: Loss Curve Comparison ………………………………………………………. 20
4.4 Experiment II: Ablation Study …………………………………………………………………… 21
4.5 Experiment III: Latent Space Visualization ………………………………………………… 23
4.6 Experiment IV: Qualitative and Quantitative Evaluations ……………………………. 24
4.6.1 Qualitative Evaluation ………………………………………………………………………… 25
4.6.2 Quantitative Evaluation (User Study) …………………………………………………… 26
Chapter 5: Conclusion ……………………………………………………………………………… 29
References ……………………………………………………………………………………………………… 31
Appendix A — Qualitative Evaluation ………………………………………………………………… 37
-
dc.language.isoen-
dc.title透過對比式學習建構潛在風格空間之任意圖像風格轉換模型zh_TW
dc.titleLatent Style Space Construction via Contrastive Learning for Arbitrary Image Style Transferen
dc.typeThesis-
dc.date.schoolyear113-1-
dc.description.degree碩士-
dc.contributor.oralexamcommittee朱宏國;王昱舜zh_TW
dc.contributor.oralexamcommitteeHung-Kuo Chu;Yu-Shuen Wangen
dc.subject.keyword藝術風格遷移,即時推斷,對比式估計自適應正規化,編碼器—解碼器架構,對比式學習,生成對抗網路,風格潛在空間,zh_TW
dc.subject.keywordStyleTransfer,Real-time Inference,Adaptive Normalization with Contrastive Estimation (AdaNCE),Encoder-Decoder,Contrastive Learning,Generative Adversarial Network,Style Latent Space,en
dc.relation.page40-
dc.identifier.doi10.6342/NTU202500065-
dc.rights.note同意授權(全球公開)-
dc.date.accepted2025-02-04-
dc.contributor.author-college電機資訊學院-
dc.contributor.author-dept資訊網路與多媒體研究所-
dc.date.embargo-lift2025-02-28-
顯示於系所單位:資訊網路與多媒體研究所

文件中的檔案:
檔案 大小格式 
ntu-113-1.pdf23.67 MBAdobe PDF檢視/開啟
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved