Skip navigation

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More
DSpace logo
English
中文
  • Browse
    • Communities
      & Collections
    • Publication Year
    • Author
    • Title
    • Subject
    • Advisor
  • Search TDR
  • Rights Q&A
    • My Page
    • Receive email
      updates
    • Edit Profile
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊網路與多媒體研究所
Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/55128
Full metadata record
???org.dspace.app.webui.jsptag.ItemTag.dcfield???ValueLanguage
dc.contributor.advisor徐宏民(Winston Hsu)
dc.contributor.authorYa-Liang Changen
dc.contributor.author張雅量zh_TW
dc.date.accessioned2021-06-16T03:48:15Z-
dc.date.available2020-08-21
dc.date.copyright2020-08-21
dc.date.issued2020
dc.date.submitted2020-08-04
dc.identifier.citation[1] S. Abu-El-Haija, N. Kothari, J. Lee, P. Natsev, G. Toderici, B. Varadarajan, and S. Vijayanarasimhan. Youtube-8m: A large-scale video classification benchmark. arXiv preprint arXiv:1609.08675, 2016.
[2] C. Barnes, E. Shechtman, A. Finkelstein, and D. B. Goldman. Patchmatch: A randomized correspondence algorithm for structural image editing. In ACM Transactions on Graphics (ToG), volume 28, page 24. ACM, 2009.
[3] M. Bertalmio, A. L. Bertozzi, and G. Sapiro. Navier-stokes, fluid dynamics, and image and video inpainting. In Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on, volume 1, pages I–I. IEEE, 2001.
[4] M. Bertalmio, G. Sapiro, V. Caselles, and C. Ballester. Image inpainting. In Proceedings of the 27th annual conference on Computer graphics and interactive techniques, pages 417–424. ACM Press/Addison-Wesley Publishing Co., 2000.
[5] R. Bornard, E. Lecan, L. Laborelli, and J.-H. Chenot. Missing data correction in still images and image sequences. In Proceedings of the tenth ACM international conference on Multimedia, pages 355–361. ACM, 2002.
[6] J. Carreira and A. Zisserman. Quo vadis, action recognition? a new model and the kinetics dataset. In proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6299–6308, 2017.
[7] Y.-L. Chang, Z. Y. Liu, K.-Y. Lee, and W. Hsu. Learnable gated temporal shift module for deep video inpainting. arXiv preprint arXiv:1907.01131, 2019.
[8] I. Drori, D. Cohen-Or, and H. Yeshurun. Fragment-based image completion. In ACM Transactions on graphics (TOG), volume 22, pages 303–312. ACM, 2003.
[9] L. A. Gatys, A. S. Ecker, and M. Bethge. A neural algorithm of artistic style. arXiv preprint arXiv:1508.06576, 2015.
[10] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In Advances in neural information processing systems, pages 2672–2680, 2014.
[11] M. Granados, J. Tompkin, K. Kim, O. Grau, J. Kautz, and C. Theobalt. How not to be seen—object removal from videos of crowded scenes. In Computer Graphics Forum, volume 31, pages 219–228. Wiley Online Library, 2012.
[12] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Advances in Neural Information Processing Systems, pages 6626–6637, 2017.
[13] J.-B. Huang, S. B. Kang, N. Ahuja, and J. Kopf. Temporally coherent completion of dynamic video. ACM Transactions on Graphics (TOG), 35(6):196, 2016.
[14] S. Iizuka, E. Simo-Serra, and H. Ishikawa. Globally and locally consistent image completion. ACM Transactions on Graphics (ToG), 36(4):107, 2017.
[15] J. Johnson, A. Alahi, and L. Fei-Fei. Perceptual losses for real-time style transfer and super-resolution. In European conference on computer vision, pages 694–711. Springer, 2016.
[16] D. Kim, S. Woo, J.-Y. Lee, and I. S. Kweon. Deep video inpainting. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5792–5801, 2019.
[17] C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, et al. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4681–4690, 2017.
[18] J. Lin, C. Gan, and S. Han. Temporal shift module for efficient video understanding. arXiv preprint arXiv:1811.08383, 2018.
[19] G. Liu, F. A. Reda, K. J. Shih, T.-C. Wang, A. Tao, and B. Catanzaro. Image inpainting for irregular holes using partial convolutions. arXiv preprint arXiv:1804.07723, 2018.
[20] T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida. Spectral normalization for generative adversarial networks. arXiv preprint arXiv:1802.05957, 2018.
[21] K. Nazeri, E. Ng, T. Joseph, F. Qureshi, and M. Ebrahimi. Edgeconnect: Generative image inpainting with adversarial edge learning. 2019.
[22] A. Newson, A. Almansa, M. Fradet, Y. Gousseau, and P. Pérez. Video inpainting of complex scenes. SIAM Journal on Imaging Sciences, 7(4):1993–2019, 2014.
[23] D. Pathak, P. Krahenbuhl, J. Donahue, T. Darrell, and A. A. Efros. Context encoders: Feature learning by inpainting. In Proceedings of the IEEE Conference on Computer
[24] E. Real, J. Shlens, S. Mazzocchi, X. Pan, and V. Vanhoucke. Youtubeboundingboxes: A large high-precision human-annotated data set for object detection in video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5296–5305, 2017.
[25] A. Rössler, D. Cozzolino, L. Verdoliva, C. Riess, J. Thies, and M. Nießner. Faceforensics: A large-scale video dataset for forgery detection in human faces. arXiv preprint arXiv:1803.09179, 2018.
[26] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, et al. Imagenet large scale visual recognition challenge. International journal of computer vision, 115(3):211–252, 2015.
[27] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
[28] C. Wang, H. Huang, X. Han, and J. Wang. Video inpainting by jointly learning temporal structure and spatial details. In Proceedings of the 33th AAAI Conference on Artificial Intelligence, 2019.
[29] T.-C. Wang, M.-Y. Liu, J.-Y. Zhu, G. Liu, A. Tao, J. Kautz, and B. Catanzaro. Video-to-video synthesis. arXiv preprint arXiv:1808.06601, 2018.
[30] Y. Wexler, E. Shechtman, and M. Irani. Space-time completion of video. IEEE Transactions on Pattern Analysis Machine Intelligence, (3):463–476, 2007.
[31] J. Xie, L. Xu, and E. Chen. Image denoising and inpainting with deep neural networks. In Advances in neural information processing systems, pages 341–349, 2012.
[32] N. Xu, L. Yang, Y. Fan, J. Yang, D. Yue, Y. Liang, B. Price, S. Cohen, and T. Huang. Youtube-vos: Sequence-to-sequence video object segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), pages 585–601, 2018.
[33] Z. Yan, X. Li, M. Li, W. Zuo, and S. Shan. Shift-net: Image inpainting via deep feature rearrangement. arXiv preprint arXiv:1801.09392, 2018.
[34] J. Yu, Z. Lin, J. Yang, X. Shen, X. Lu, and T. S. Huang. Free-form image inpainting with gated convolution. arXiv preprint arXiv:1806.03589, 2018.
[35] J. Yu, Z. Lin, J. Yang, X. Shen, X. Lu, and T. S. Huang. Generative image inpainting with contextual attention. arXiv preprint, 2018.
[36] R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang. The unreasonable effectiveness of deep features as a perceptual metric. arXiv preprint, 2018.
[37] B. Zhou, A. Lapedriza, A. Khosla, A. Oliva, and A. Torralba. Places: A 10 million image database for scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/55128-
dc.description.abstract任意形狀的影片修復是一項很有挑戰性的任務,可以廣泛用於影片編輯(如字幕移除、物體移除)。現有基於補丁修補的方法無法處理非重複性結構(如人臉),而將圖像修復模型直接應用於影片則會導致影片前後不一致。本篇論文提出一種基於深度學習的任意形狀影片修復模型,以三維門控卷積來應對任意形狀遮罩的不確定性,並加上時序性補丁對抗式生成網路(Temporal PatchGAN)來增強影片的一致性。此外,我們收集影片並設計一種任意形狀遮罩生成的演算法,以建立任意形狀影片修復(FVI)資料集,用於訓練和評估影片修復模型。在FaceForensics和FVI資料集上進行的實驗結果顯示,我們的方法優於現有的方法。相關程式碼、結果影片和FVI資料集都在Github上開源 https://github.com/amjltc295/Free-Form-Video-Inpainting。zh_TW
dc.description.abstractFree-form video inpainting is a very challenging task that could be widely used for video editing such as text removal. Existing patch-based methods could not handle non-repetitive structures such as faces, while directly applying image-based inpainting models to videos will result in temporal inconsistency (see videos http://bit.ly/2Fu1n6b). In this paper, we introduce a deep learning based free-form video inpainting model, with proposed 3D gated convolutions to tackle the uncertainty of free-form masks and a novel Temporal PatchGAN loss to enhance temporal consistency. In addition, we collect videos and design a free-form mask generation algorithm to build the free-form video inpainting (FVI) dataset for training and evaluation of video inpainting models. We demonstrate the benefits of these components and experiments on both the FaceForensics and our FVI dataset suggest that our method is superior to existing ones. Related source code, full-resolution result videos and the FVI dataset could be found on Github https://github.com/amjltc295/Free-Form-Video-Inpaintingen
dc.description.provenanceMade available in DSpace on 2021-06-16T03:48:15Z (GMT). No. of bitstreams: 1
U0001-3107202008502400.pdf: 1841482 bytes, checksum: 18a04e7a49d164af59a899627b27501c (MD5)
Previous issue date: 2020
en
dc.description.tableofcontents誌謝 ii
摘要 iii
Abstract iv
1 Introduction 1
2 Related Work 4
2.1 Image Inpainting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4
2.2 Free-form Image Inpainting. . . . . . . . . . . . . . . . . . . . . . . . .4
2.3 Video Inpainting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5
3 Proposed approach 7
3.1 Video Inpainting Generator. . . . . . . . . . . . . . . . . . . . . . . . .8
3.2 Spatial-temporally Aware 3D Gated Conv. . . . . . . . . . . . . . . . .8
3.3 Loss Functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9
3.3.1 Maskedl1 loss. . . . . . . . . . . . . . . . . . . . . . . . . . .9
3.3.2 Perceptual loss. . . . . . . . . . . . . . . . . . . . . . . . . . .9
3.3.3 Style loss. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10
3.3.4 Temporal PatchGAN loss. . . . . . . . . . . . . . . . . . . . .10
3.4 Free-form Video Masks Generation. . . . . . . . . . . . . . . . . . . .11
4 Experimental Results 14
4.1 Datasets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .14
4.1.1 FaceForensics. . . . . . . . . . . . . . . . . . . . . . . . . . . .14
4.1.2 Free-form video inpainting (FVI) dataset. . . . . . . . . . . . .14
4.2 Evaluation metrics. . . . . . . . . . . . . . . . . . . . . . . . . . . . .15
4.3 Quantitative Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . .15
4.4 Qualitative Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . .17
4.5 User Study. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .20
4.6 Ablation Study. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .20
4.7 Extension to Video Super-Resolution. . . . . . . . . . . . . . . . . . . .21
5 Discussion and Future Work 23
6 Conclusion 25
Bibliography 26
dc.language.isoen
dc.subject深度學習zh_TW
dc.subject影片修復zh_TW
dc.subject對抗式生成網路zh_TW
dc.subject電腦視覺zh_TW
dc.subjectVideo Inpaintingen
dc.subjectDeep Learningen
dc.subjectComputer Visionen
dc.subjectGANen
dc.title以三維門卷積與時序性補丁對抗式生成網路之任意形狀影片修復zh_TW
dc.titleFree-form Video Inpainting with 3D Gated Convolution and Temporal PatchGANen
dc.typeThesis
dc.date.schoolyear108-2
dc.description.degree碩士
dc.contributor.oralexamcommittee葉梅珍(Mei-Chen Yeh),陳永昇(Yong-Sheng Chen)
dc.subject.keyword影片修復,深度學習,電腦視覺,對抗式生成網路,zh_TW
dc.subject.keywordVideo Inpainting,Deep Learning,Computer Vision,GAN,en
dc.relation.page30
dc.identifier.doi10.6342/NTU202002144
dc.rights.note有償授權
dc.date.accepted2020-08-04
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept資訊網路與多媒體研究所zh_TW
Appears in Collections:資訊網路與多媒體研究所

Files in This Item:
File SizeFormat 
U0001-3107202008502400.pdf
  Restricted Access
1.8 MBAdobe PDF
Show simple item record


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved