請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/80437完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 王鈺強 | zh_TW |
| dc.contributor.advisor | Yu-Chiang Frank Wang | en |
| dc.contributor.author | 范萬泉 | zh_TW |
| dc.contributor.author | Wan-Cyuan Fan | en |
| dc.date.accessioned | 2022-11-24T03:06:39Z | - |
| dc.date.available | 2023-11-10 | - |
| dc.date.copyright | 2022-02-21 | - |
| dc.date.issued | 2021 | - |
| dc.date.submitted | 2002-01-01 | - |
| dc.identifier.citation | [1] A. El-Nouby, S. Sharma, H. Schulz, D. Hjelm, L. E. Asri, S. E. Kahou, Y. Bengio, and G. W. Taylor, “Tell, draw, and repeat: Generating and modifying images based on continual linguistic instruction,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 10 304–312. viii, 1, 2, 3, 5, 16, 17, 18
[2] T. Zhang, H.-Y. Tseng, L. Jiang, W. Yang, H. Lee, and I. Essa, “Text as neural operator: Image manipulation by text instruction,” in Proceedings of the 29th ACM International Conference on Multimedia, 2021, pp. 1893–1902. viii, 1, 2, 3, 6, 8, 15, 16, 17, 18 [3] B. Li, X. Qi, T. Lukasiewicz, and P. H. Torr, “Manigan: Text-guided image manipulation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 7880–7889. viii, 1, 3, 5, 6, 16, 17, 18 [4] W. Xia, Y. Yang, J.-H. Xue, and B. Wu, “Tedigan: Text-guided diverse face image generation and manipulation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 2256–2265. viii, 1, 3, 5, 6, 16, 17, 18 [5] ——, “Towards open-world text-guided face image generation and manipulation,” arXiv preprint arXiv:2104.08910, 2021. 1 [6] R. Shetty, M. Fritz, and B. Schiele, “Adversarial scene editing: Automatic object removal from weak supervision,” arXiv preprint arXiv:1806.01911, 2018. 1, 2, 3, 6, 8, 19, 20 [7] B. Li, X. Qi, P. H. Torr, and T. Lukasiewicz, “Lightweight generative adversarial networks for text-guided image manipulation,” arXiv preprint arXiv:2010.12136, 2020. 1, 5, 12 [8] Y. Choi, M. Choi, M. Kim, J.-W. Ha, S. Kim, and J. Choo, “Stargan: Unified generative adversarial networks for multi-domain image-to-image translation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 8789–8797. 1, 6 [9] B. Li, X. Qi, T. Lukasiewicz, and P. H. Torr, “Controllable text-to-image generation,” arXiv preprint arXiv:1909.07083, 2019. 1, 5, 6 [10] H. Dhamo, A. Farshad, I. Laina, N. Navab, G. D. Hager, F. Tombari, and C. Rupprecht, “Semantic image manipulation using scene graphs,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 5213–5222. 2, 6 [11] X. Liu, Z. Lin, J. Zhang, H. Zhao, Q. Tran, X. Wang, and H. Li, “Open-edit: Open-domain image manipulation with open-vocabulary instructions,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XI 16. Springer, 2020, pp. 89–106. 5 [12] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2223–2232. 6 [13] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1125–1134. 6 [14] S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, and H. Lee, “Generative adversarial text to image synthesis,” in International Conference on Machine Learning. PMLR, 2016, pp. 1060–1069. 6 [15] R. Herzig, A. Bar, H. Xu, G. Chechik, T. Darrell, and A. Globerson, “Learning canonical representations for scene graph to image generation,” in European Conference on Computer Vision. Springer, 2020, pp. 210–227. 6 [16] J. Johnson, A. Gupta, and L. Fei-Fei, “Image generation from scene graphs,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 1219–1228. 6 [17] C.-F. Yang, W.-C. Fan, F.-E. Yang, and Y.-C. F. Wang, “Layouttransformer: Scene layout generation with conceptual and spatial diversity,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 3732–3741. 6 [18] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018. 8, 9, 10 [19] C. D. Manning, M. Surdeanu, J. Bauer, J. R. Finkel, S. Bethard, and D. Mc- Closky, “The stanford corenlp natural language processing toolkit,” in Pro- ceedings of 52nd annual meeting of the association for computational linguistics: system demonstrations, 2014, pp. 55–60. 8, 11, 14 [20] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014. 9 [21] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” Advances in neural information processing systems, vol. 27, 2014. 10 [22] S. Iizuka, E. Simo-Serra, and H. Ishikawa, “Globally and locally consistent image completion,” ACM Transactions on Graphics (ToG), vol. 36, no. 4, pp. 1–14, 2017. 10 [23] S. Bird, E. Klein, and E. Loper, Natural language processing with Python: analyzing text with the natural language toolkit. ” O’Reilly Media, Inc.”, 2009. 12 [24] J. Lu, D. Batra, D. Parikh, and S. Lee, “Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks,” arXiv preprint arXiv:1908.02265, 2019. 12 [25] A. Odena, C. Olah, and J. Shlens, “Conditional image synthesis with auxiliary classifier gans,” in International conference on machine learning. PMLR, 2017, pp. 2642–2651. 12 [26] J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for real-time style transfer and super-resolution,” in European conference on computer vision. Springer, 2016, pp. 694–711. 13 [27] C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, and P. J. Liu, “Exploring the limits of transfer learning with a unified text-to-text transformer,” arXiv preprint arXiv:1910.10683, 2019. 13, 14 [28] J. Johnson, B. Hariharan, L. Van Der Maaten, L. Fei Fei, C. Lawrence Zitnick, and R. Girshick, “Clevr: A diagnostic dataset for compositional language and elementary visual reasoning,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2901–2910. 15 [29] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in European conference on computer vision. Springer, 2014, pp. 740–755. 15 [30] M. Everingham, S. A. Eslami, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, “The pascal visual object classes challenge: A retrospective,” International journal of computer vision, vol. 111, no. 1, pp. 98–136, 2015. 15 [31] N. Vo, L. Jiang, C. Sun, K. Murphy, L.-J. Li, L. Fei Fei, and J. Hays, “Composing text and image for image retrieval-an empirical odyssey,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 6439–6448. 18 | - |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/80437 | - |
| dc.description.abstract | 在本論文中,我們研究了沒有目標圖像監督下的文本引導圖像編輯問題。在僅觀察輸入圖像、使用者給定指令和對應圖像之物件類別標籤,我們提出了一種迴圈式編輯GAN (cManiGAN) 來解決這一具有挑戰性的任務。首先,通過引入一個圖像-文本跨模態解釋器,用相應的指令對輸出圖像進行比對驗證,我們能夠為訓練圖像生成器提供單詞級的訓練反饋。此外,迴圈式編輯一致性的假設進一步用於圖像處理,它結合了『撤消』指令,用於處理後的輸出以還原輸入圖像,能夠在像素級別提供額外的監督。我們在CLEVR 以及COCO 的數據集上進行了廣泛的實驗。雖然後者由於其多樣化的視覺和語義信息而特別具有挑戰性,但我們在兩個數據集上的實驗結果證實了我們提出的方法的有效性和普遍性。 | zh_TW |
| dc.description.abstract | In this thesis, we study the problem of text-guided image manipulation without ground truth image supervision. With only the input image, desirable instruction, and object labels observed, we propose a Cyclic-Manipulation GAN (cManiGAN) for tackling this challenging task. By introducing an image-text cross-modal interpreter authenticating output images with the corresponding instruction, we are able to provide word-level training feedback for training the image generator. Moreover, an operational cycle-consistency is further utilized for image manipulation, which synthesizes the “undo” instruction for recovering the input image based on the manipulated output, offering additional supervision at the pixel level. We conduct extensive experiments on the datasets of CLEVR and COCO datasets. While the latter is particularly challenging due to its diverse visual and semantic information, our experimental results on both datasets confirm the effectiveness and generalizability of our proposed method. | en |
| dc.description.provenance | Made available in DSpace on 2022-11-24T03:06:39Z (GMT). No. of bitstreams: 1 U0001-2012202116591800.pdf: 2754587 bytes, checksum: 9df81387c888675f3d38f3f72dc6daf9 (MD5) Previous issue date: 2021 | en |
| dc.description.tableofcontents | 中文摘要 i
Abstract iii List of Figures vii List of Tables ix 1 Introduction 1 2 Related Work 5 2.1 Object-centric text-guided image manipulation. . . . . . . . . . . 5 2.2 Scene-level text-guided image manipulation. . . . . . . . . . . . . 5 2.3 Image editing with unpaired data . . . . . . . . . . . . . . . . . . 6 3 Methodology 7 3.1 Notations and algorithmic overview . . . . . . . . . . . . . . . . 7 3.2 Learning Image Editor for cManiGAN . . . . . . . . . . . . . . . 8 3.3 Cross-Modal Semantic-Level Interpreter . . . . . . . . . . . . . . 11 3.4 Reasoning of Text-Guided Image Manipulation via Cross-Modal Cycle Consistency . . . . . . . . . . . . . . . . . . . . . . . . . 13 4 Experiment. . . . . . . . . . . . . . . . .15 4.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.2 Qualitative Evaluation . . . . . . . . . . . . . . . . . . . . . . . 16 4.3 Quantitative Evaluation . . . . . . . . . . . . . . . . . . . . . . . 17 4.3.1 Quantitative comparisons . . . . . . . . . . . . . . . . . . 18 4.4 Real-World Target-Free Images . . . . . . . . . . . . . . . . . . . 19 4.5 Ablation studies . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 Conclusion 23 Reference 25 | - |
| dc.language.iso | zh_TW | - |
| dc.subject | 電腦視覺 | zh_TW |
| dc.subject | 影像操縱 | zh_TW |
| dc.subject | 文本至影像編輯 | zh_TW |
| dc.subject | image manipulation | en |
| dc.subject | computer vision | en |
| dc.subject | text-guided image manipulation | en |
| dc.title | 非目標式之由文本控制的影片操縱技術 | zh_TW |
| dc.title | Target-free Text-guided Image Manipulation | en |
| dc.type | Thesis | - |
| dc.date.schoolyear | 110-2 | - |
| dc.description.degree | 碩士 | - |
| dc.contributor.coadvisor | 陳祝嵩;邱維辰 | zh_TW |
| dc.contributor.coadvisor | Chu-Song Chen;Wei-Chen Walon Chiu | en |
| dc.contributor.oralexamcommittee | zh_TW | |
| dc.subject.keyword | 電腦視覺,影像操縱,文本至影像編輯, | zh_TW |
| dc.subject.keyword | computer vision,text-guided image manipulation,image manipulation, | en |
| dc.relation.page | 29 | - |
| dc.identifier.doi | 10.6342/NTU202104547 | - |
| dc.rights.note | 同意授權(限校園內公開) | - |
| dc.date.accepted | 2022-01-22 | - |
| dc.contributor.author-college | 電機資訊學院 | - |
| dc.contributor.author-dept | 電信工程學研究所 | - |
| 顯示於系所單位: | 電信工程學研究所 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-110-2.pdf 授權僅限NTU校內IP使用(校園外請利用VPN校外連線服務) | 2.69 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
