基於跨領域映射達成非監督式內容保留風格導向的圖片轉換

Hsin-Yu Chang; 張心愉

Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/21323

Full metadata record

???org.dspace.app.webui.jsptag.ItemTag.dcfield???	Value	Language
dc.contributor.advisor	莊永裕(Yung-Yu Chuang)
dc.contributor.author	Hsin-Yu Chang	en
dc.contributor.author	張心愉	zh_TW
dc.date.accessioned	2021-06-08T03:31:06Z	-
dc.date.copyright	2019-08-19
dc.date.issued	2019
dc.date.submitted	2019-08-13
dc.identifier.citation	[1] Y. Alharbi, N. Smith, and P. Wonka. Latent filter scaling for multimodal unsuper- vised image-to-image translation. In Proceedings of the IEEE Conference on Com- puter Vision and Pattern Recognition, pages 1458–1466, 2019. [2] A. Brock, J. Donahue, and K. Simonyan. Large scale gan training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096, 2018. [3] C. Chen, Q. Chen, J. Xu, and V. Koltun. Learning to see in the dark. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3291– 3300, 2018. [4] Y.-S. Chen, Y.-C. Wang, M.-H. Kao, and Y.-Y. Chuang. Deep photo enhancer: Un- paired learning for image enhancement from photographs with gans. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6306– 6314, 2018. [5] W. Cho, S. Choi, D. Keetae Park, I. Shin, and J. Choo. Image-to-image translation via group-wise deep whitening-and-coloring transformation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 10639–10647, 2019. [6] Y. Choi, M. Choi, M. Kim, J.-W. Ha, S. Kim, and J. Choo. Stargan: Unified genera- tive adversarial networks for multi-domain image-to-image translation. In Proceed- ings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 8789–8797, 2018. [7] L. A. Gatys, A. S. Ecker, and M. Bethge. Image style transfer using convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2414–2423, 2016. [8] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In Advances in neural information processing systems, pages 2672–2680, 2014. [9] K.He,X.Zhang,S.Ren,andJ.Sun.Delvingdeepintorectifiers:Surpassinghuman- level performance on imagenet classification. In Proceedings of the IEEE interna- tional conference on computer vision, pages 1026–1034, 2015. [10] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Advances in Neural Information Processing Systems, pages 6626–6637, 2017. [11] X. Hu, H. Mu, X. Zhang, Z. Wang, J. Sun, and T. Tan. Meta-sr: A magnification- arbitrary network for super-resolution. arXiv preprint arXiv:1903.00875, 2019. [12] X.HuangandS.Belongie.Arbitrarystyletransferinreal-timewithadaptiveinstance normalization. In Proceedings of the IEEE International Conference on Computer Vision, pages 1501–1510, 2017. [13] X.Huang,M.-Y.Liu,S.Belongie,andJ.Kautz.Multimodalunsupervisedimage-to- image translation. In Proceedings of the European Conference on Computer Vision (ECCV), pages 172–189, 2018. [14] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros. Image-to-image translation with con- ditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1125–1134, 2017. [15] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros. Image-to-image translation with con- ditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1125–1134, 2017. [16] J. Johnson, A. Alahi, and L. Fei-Fei. Perceptual losses for real-time style transfer and super-resolution. In European conference on computer vision, pages 694–711. Springer, 2016. [17] T. Karras, T. Aila, S. Laine, and J. Lehtinen. Progressive growing of gans for im- proved quality, stability, and variation. arXiv preprint arXiv:1710.10196, 2017. [18] T. Karras, S. Laine, and T. Aila. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4401–4410, 2019. [19] D. P. Kingma and M. Welling. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013. [20] H.-Y. Lee, H.-Y. Tseng, J.-B. Huang, M. Singh, and M.-H. Yang. Diverse image-to- image translation via disentangled representations. In Proceedings of the European Conference on Computer Vision (ECCV), pages 35–51, 2018. [21] X. Li, J. Hu, S. Zhang, X. Hong, Q. Ye, C. Wu, and R. Ji. Attribute guided un- paired image-to-image translation with semi-supervised learning. arXiv preprint arXiv:1904.12428, 2019. [22] X. Li, S. Liu, J. Kautz, and M.-H. Yang. Learning linear transformations for fast arbitrary style transfer. arXiv preprint arXiv:1808.04537, 2018. [23] Y. Li, C. Fang, J. Yang, Z. Wang, X. Lu, and M.-H. Yang. Universal style transfer via feature transforms. In Advances in neural information processing systems, pages 386–396, 2017. [24] M.-Y. Liu, T. Breuel, and J. Kautz. Unsupervised image-to-image translation net- works. In Advances in Neural Information Processing Systems, pages 700–708, 2017. [25] Q. Mao, H.-Y. Lee, H.-Y. Tseng, S. Ma, and M.-H. Yang. Mode seeking genera- tive adversarial networks for diverse image synthesis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1429–1437, 2019. [26] X. Mao, Q. Li, H. Xie, R. Y. Lau, Z. Wang, and S. Paul Smolley. Least squares gen- erative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, pages 2794–2802, 2017. [27] R. Mechrez, I. Talmi, and L. Zelnik-Manor. The contextual loss for image trans- formation with non-aligned data. In Proceedings of the European Conference on Computer Vision (ECCV), pages 768–783, 2018. [28] Y. A. Mejjati, C. Richardt, J. Tompkin, D. Cosker, and K. I. Kim. Unsupervised attention-guided image-to-image translation. In Advances in Neural Information Processing Systems, pages 3693–3703, 2018. [29] T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida. Spectral normalization for generative adversarial networks. arXiv preprint arXiv:1802.05957, 2018. [30] S. Mo, M. Cho, and J. Shin. Instagan: Instance-aware image-to-image translation. arXiv preprint arXiv:1812.10889, 2018. [31] A. Pumarola, A. Agudo, A. M. Martinez, A. Sanfeliu, and F. Moreno-Noguer. Gan- imation: Anatomically-aware facial animation from a single image. In Proceedings of the European Conference on Computer Vision (ECCV), pages 818–833, 2018. [32] S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, and H. Lee. Generative adversarial text to image synthesis. In ICML, 2016. [33] X.Wang,K.Yu,S.Wu,J.Gu,Y.Liu,C.Dong,Y.Qiao,andC.ChangeLoy.Esrgan: Enhanced super-resolution generative adversarial networks. In Proceedings of the European Conference on Computer Vision (ECCV), pages 0–0, 2018. [34] W. Wu, K. Cao, C. Li, C. Qian, and C. C. Loy. Transgaga: Geometry-aware unsu- pervised image-to-image translation. arXiv preprint arXiv:1904.09571, 2019. [35] T. Xiao, J. Hong, and J. Ma. Elegant: Exchanging latent encodings with gan for transferring multiple face attributes. In Proceedings of the European Conference on Computer Vision (ECCV), pages 168–184, 2018. [36] Z. Yi, H. Zhang, P. Tan, and M. Gong. Dualgan: Unsupervised dual learning for image-to-image translation. In Proceedings of the IEEE international conference on computer vision, pages 2849–2857, 2017. [37] Y. Zhang, Y. Tian, Y. Kong, B. Zhong, and Y. Fu. Residual dense network for image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2472–2481, 2018. [38] Z. Zheng, Z. Yu, H. Zheng, Y. Wu, B. Zheng, and P. Lin. Generative adversarial net- work with multi-branch discriminator for cross-species image-to-image translation. arXiv preprint arXiv:1901.10895, 2019. [39] J.-Y.Zhu,T.Park,P.Isola,andA.A.Efros.Unpairedimage-to-imagetranslationus- ing cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision, pages 2223–2232, 2017. [40] J.-Y.Zhu,R.Zhang,D.Pathak,T.Darrell,A.A.Efros,O.Wang,andE.Shechtman. Toward multimodal image-to-image translation. In Advances in Neural Information Processing Systems, pages 465–476, 2017.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/21323	-
dc.description.abstract	在不同領域間的非監督式圖片轉換在電腦視覺中是個很重要的問題。這問題最大的挑戰是要保留原圖的內容(content preserving)同時又要能做到風格轉換(style translation)。我們定義了這兩個問題的矛盾點(trade-off)並且定義了我們所要保留的內容(content)是什麼。現有的方法都會受到這個矛盾點(trade-off)的限制。舉例來說，在貓轉狗的問題上，主要的問題是貓跟狗的領域間差異很大，所以現有的方法就會無法生出一個合理的結果。我們提出了一個模型去保留屬於各自領域的內容(domain-specific content feature)來解決這個困難的問題，並且證明了我們提出的方法比起其他現有表現最佳的方法有更優異的表現。	zh_TW
dc.description.abstract	Unsupervised Image-to-Image across domains is an important problem in the field of computer vision. The most challenging task in this works is to preserve content information while performing style translation. We define what the content information we need to preserve in the cross domain translation and explore the trade-off between content preservation and style transfer. Current methods suffer from this trade-off and fail to strike a balance between content and style. For instance, the cat-to-dog translation task, the main problem is that there is a large gap between two domains such that current methods failed to generate a reasonable result. We propose a content-preserving model which learn the domain-specific content feature to control this trade-off and demonstrate that the proposed method achieves superior performance to the state-of-the-art methods.	en
dc.description.provenance	Made available in DSpace on 2021-06-08T03:31:06Z (GMT). No. of bitstreams: 1 ntu-108-R06922062-1.pdf: 4478690 bytes, checksum: ff43a6ce4afd5fad651ca45a3848e1ec (MD5) Previous issue date: 2019	en
dc.description.tableofcontents	誌謝 ii 摘要 iii Abstract iv 1 Introduction 1 2 Related Work 5 2.1 GenerativeAdversarialNetworks...................... 5 2.2 Image-to-ImageTranslation ........................ 5 2.3 StyleTransfer................................ 7 3 Proposed Method 8 3.1 ModelOverview .............................. 8 3.2 Domain-specificContentFeature...................... 9 3.3 Example-guidedTranslation ........................ 12 3.4 OtherLossFunctions............................ 13 4 Implementation 15 4.1 Encoder................................... 15 4.1.1 Contentencoder .......................... 15 4.1.2 Styleencoder............................ 16 4.1.3 MappingLayer........................... 16 4.2 Generator .................................. 16 4.3 TrainingDetails............................... 17 5 Experiments 18 5.1 Datasets................................... 18 5.2 ComparedMethods............................. 19 5.3 EvaluationMetrics ............................. 19 5.4 ComparisonswithState-of-the-Arts .................... 20 5.4.1 Qualitativecomparison. ...................... 20 5.4.2 Quantitativecomparison ...................... 24 5.5 RepresentationDisentanglement ...................... 25 5.5.1 Interpolationinlatentspace .................... 25 5.5.2 Example-guidedgeneration .................... 25 6 Conclusion 27 Bibliography 29
dc.language.iso	en
dc.title	基於跨領域映射達成非監督式內容保留風格導向的圖片轉換	zh_TW
dc.title	Unsupervised Content-Preserving Example-Guided Image-to-Image Translation via Domain Mapping	en
dc.type	Thesis
dc.date.schoolyear	107-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	林彥宇(Yen-Yu Lin),王鈺強(Yu-Chiang Wang),陳駿丞(Jun-Cheng Chen)
dc.subject.keyword	圖片轉換,內容保留,風格導向,	zh_TW
dc.subject.keyword	Image-to-Imagetranslation,Content-preserving,Example-guided translation,	en
dc.relation.page	33
dc.identifier.doi	10.6342/NTU201903078
dc.rights.note	未授權
dc.date.accepted	2019-08-13
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊工程學研究所	zh_TW
Appears in Collections:	資訊工程學系

Files in This Item:

File	Size	Format
ntu-108-1.pdf Restricted Access	4.37 MB	Adobe PDF

Show simple item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets