Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊網路與多媒體研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/85673
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor李明穗(Ming-Sui Lee)
dc.contributor.authorWei-Lun Huangen
dc.contributor.author黃偉綸zh_TW
dc.date.accessioned2023-03-19T23:21:11Z-
dc.date.copyright2022-09-29
dc.date.issued2022
dc.date.submitted2022-09-26
dc.identifier.citation[1] S. Lin, L. Yang, I. Saleemi, and S. Sengupta, “Robust high-resolution video matting with temporal guidance,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2022, pp. 238–247. [2] Y. Sun, G. Wang, Q. Gu, C.-K. Tang, and Y.-W. Tai, “Deep video matting via spatio-temporal alignment and aggregation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 6975–6984. [3] Y. Zhang, C. Wang, M. Cui, P. Ren, X. Xie, X.-S. Hua, H. Bao, Q. Huang, and W. Xu, “Attention-guided temporally coherent video object matting,” in Proceedings of the 29th ACM International Conference on Multimedia, 2021, pp. 5128–5137. [4] H. Seong, S. W. Oh, B. Price, E. Kim, and J.-Y. Lee, “One-trimap video matting,” in European Conference on Computer Vision, 2022. [5] S. W. Oh, J.-Y. Lee, N. Xu, and S. J. Kim, “Video object segmentation using space-time memory networks,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 9226–9235. [6] N. Ballas, L. Yao, C. Pal, and A. Courville, “Delving deeper into convolutional networks for learning video representations,” arXiv preprint arXiv:1511.06432, 2015. [7] M. Rebol and P. Knöbelreiter, “Frame-to-frame consistent semantic segmentation,” in Joint Austrian Computer Vision And Robotics Workshop (ACVRW), April 2020. [8] Y.-Y. Chuang, B. Curless, D. H. Salesin, and R. Szeliski, “A bayesian approach to digital matting,” in Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, vol. 2. IEEE, 2001, pp.II–II. [9] J. Wang and M. F. Cohen, “Optimized color sampling for robust matting,” in 2007 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2007, pp.1–8. [10] E. S. Gastal and M. M. Oliveira, “Shared sampling for real-time alpha matting,” in Computer Graphics Forum, vol. 29, no. 2. Wiley Online Library, 2010, pp. 575–584. [11] K. He, C. Rhemann, C. Rother, X. Tang, and J. Sun, “A global sampling method for alpha matting,” in CVPR 2011. IEEE, 2011, pp. 2049–2056. [12] E. Shahrian, D. Rajan, B. Price, and S. Cohen, “Improving image matting using comprehensive sampling sets,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2013, pp. 636–643. [13] J. Sun, J. Jia, C.-K. Tang, and H.-Y. Shum, “Poisson matting,” in ACM SIGGRAPH 2004 Papers, 2004, pp. 315–321. [14] L. Grady, T. Schiwietz, S. Aharon, and R. Westermann, “Random walks for interactive alpha-matting,” in Proceedings of VIIP, vol. 2005, 2005, pp. 423–429. [15] A. Levin, D. Lischinski, and Y. Weiss, “A closed-form solution to natural image matting,” IEEE transactions on pattern analysis and machine intelligence, vol. 30, no. 2, pp. 228–242, 2007. [16] A. Levin, A. Rav-Acha, and D. Lischinski, “Spectral matting,” IEEE transactions on pattern analysis and machine intelligence, vol. 30, no. 10, pp. 1699–1712, 2008. [17] P. Lee and Y. Wu, “Nonlocal matting,” in CVPR 2011. IEEE, 2011, pp. 2193–2200. [18] Q. Chen, D. Li, and C.-K. Tang, “Knn matting,” IEEE transactions on pattern analysis and machine intelligence, vol. 35, no. 9, pp. 2175–2188, 2013. [19] X. Chen, D. Zou, S. Zhiying Zhou, Q. Zhao, and P. Tan, “Image matting with local and nonlocal smooth priors,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2013, pp. 1902–1907. [20] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel, “Backpropagation applied to handwritten zip code recognition,” Neural Computation, vol. 1, no. 4, pp. 541–551, 1989. [21] D. Cho, Y.-W. Tai, and I. Kweon, “Natural image matting using deep convolutional neural networks,” in European Conference on Computer Vision. Springer, 2016, pp. 626–643. [22] N. Xu, B. Price, S. Cohen, and T. Huang, “Deep image matting,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2970–2979. [23] H. Lu, Y. Dai, C. Shen, and S. Xu, “Indices matter: Learning to index for deep image matting,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 3266–3275. [24] Q. Hou and F. Liu, “Context-aware image matting for simultaneous foreground and alpha estimation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 4130–4139. [25] Y. Li and H. Lu, “Natural image matting via guided contextual attention,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 07, 2020, pp.11 450–11 457. [26] M. Forte and F. Pitié, “f , b, alpha matting,” arXiv preprint arXiv:2003.07711, 2020. [27] G. Park, S. Son, J. Yoo, S. Kim, and N. Kwak, “Matteformer: Transformer-based image matting via prior-tokens,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 11 696–11 706. [28] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention. Springer, 2015, pp. 234–241. [29] X. Shen, X. Tao, H. Gao, C. Zhou, and J. Jia, “Deep automatic portrait matting,” in European conference on computer vision. Springer, 2016, pp. 92–107. [30] Z. Ke, J. Sun, K. Li, Q. Yan, and R. W. Lau, “Modnet: Real-time trimap-free portrait matting via objective decomposition,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 1, 2022, pp. 1140–1147. [31] Y. Qiao, Y. Liu, X. Yang, D. Zhou, M. Xu, Q. Zhang, and X. Wei, “Attention-guided hierarchical structure aggregation for image matting,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 13 676–13 685. [32] J. Li, J. Zhang, S. J. Maybank, and D. Tao, “Bridging composite and real: towards end-to-end deep image matting,” International Journal of Computer Vision, vol. 130, no. 2, pp. 246–266, 2022. [33] J. Li, J. Zhang, and D. Tao, “Deep automatic natural image matting,” in Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21, Z.-H. Zhou, Ed. International Joint Conferences on Artificial Intelligence Organization, 8 2021, pp. 800–806, main Track. [Online]. Available: https://doi.org/10.24963/ijcai.2021/111 [34] G. Chen, Y. Liu, J. Wang, J. Peng, Y. Hao, L. Chu, S. Tang, Z. Wu, Z. Chen, Z. Yu et al., “Pp-matting: High-accuracy natural image matting,” arXiv preprint arXiv:2204.09433, 2022. [35] Q. Yu, J. Zhang, H. Zhang, Y. Wang, Z. Lin, N. Xu, Y. Bai, and A. Yuille, “Mask guided matting via progressive refinement network,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 1154–1163. [36] S. Sengupta, V. Jayaram, B. Curless, S. M. Seitz, and I. Kemelmacher-Shlizerman, “Background matting: The world is your green screen,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 2291–2300. [37] S. Lin, A. Ryabtsev, S. Sengupta, B. L. Curless, S. M. Seitz, and I. Kemelmacher-Shlizerman, “Real-time high-resolution background matting,” in Proceedings of the IEEE/ CVF Conference on Computer Vision and Pattern Recognition, 2021, pp.8762–8771. [38] N. Apostoloff and A. Fitzgibbon, “Bayesian video matting using learnt image priors,” in Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004., vol. 1. IEEE, 2004, pp. I–I. [39] I. Choi, M. Lee, and Y.-W. Tai, “Video matting using multi-frame nonlocal matting laplacian,” in European Conference on Computer Vision. Springer, 2012, pp. 540–553. [40] D. Li, Q. Chen, and C.-K. Tang, “Motion-aware knn laplacian for video matting,” in Proceedings of the IEEE International Conference on Computer Vision, 2013, pp.3599–3606. [41] E. Shahrian, B. Price, S. Cohen, and D. Rajan, “Temporally coherent and spatially accurate video matting,” in Computer Graphics Forum, vol. 33, no. 2. Wiley Online Library, 2014, pp. 381–390. [42] T. Wang, S. Liu, Y. Tian, K. Li, and M.-H. Yang, “Video matting via consistency- regularized graph neural networks,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 4902–4911. [43] Y.-Y. Chuang, A. Agarwala, B. Curless, D. H. Salesin, and R. Szeliski, “Video matting of complex scenes,” in Proceedings of the 29th annual conference on Computer graphics and interactive techniques, 2002, pp. 243–248. [44] H. K. Cheng, Y.-W. Tai, and C.-K. Tang, “Rethinking space-time networks with improved memory coverage for efficient video object segmentation,” Advances in Neural Information Processing Systems, vol. 34, pp. 11 781–11 794, 2021. [45] B. Koonce, “Mobilenetv3,” in Convolutional Neural Networks with Swift for Tensorflow. Springer, 2021, pp. 125–144. [46] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778. [47] J. Yu, Z. Lin, J. Yang, X. Shen, X. Lu, and T. S. Huang, “Free-form image inpainting with gated convolution,” in Proceedings of the IEEE International Conference on Computer Vision, 2019, pp. 4471–4480. [48] Y. Wu and K. He, “Group normalization,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 3–19. [49] S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, “Cbam: Convolutional block attention module,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 3–19. [50] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Pyramid scene parsing network,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2881–2890. [51] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss for dense object detection,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2980–2988. [52] L. Yang, Y. Fan, and N. Xu, “Video instance segmentation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 5188–5197. [53] C. Rhemann, C. Rother, J. Wang, M. Gelautz, P. Kohli, and P. Rott, “A perceptually motivated online benchmark for image matting,” in 2009 IEEE conference on computer vision and pattern recognition. IEEE, 2009, pp. 1826–1833. [54] M. Erofeev, Y. Gitman, D. S. Vatolin, A. Fedorov, and J. Wang, “Perceptually motivated benchmark for video matting.” in BMVC, 2015, pp. 99–1. [55] K. He, J. Sun, and X. Tang, “Guided image filtering,” IEEE transactions on pattern analysis and machine intelligence, vol. 35, no. 6, pp. 1397–1409, 2012. [56] K. He and J. Sun, “Fast guided filter,” arXiv preprint arXiv:1505.00996, 2015.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/85673-
dc.description.abstract影片去背的研究主要著重在時間連貫性,並且從類神經網路得到了重大改進,然而提供三元圖是另一個隱性的挑戰。去背經常會依靠使用者標註的三元圖資訊來估計透明度,但是標註每個畫面的三元圖對一般使用者肯定是巨大成本。近期的研究成功利用影片物件分割的技術將少數的三元圖傳播到整個影片,只是結果並不穩定。因此我們提出了一個更加強大及快速的端到端影片去背模型—FTP-VM (Fast Trimap Propagation - Video Matting)。FTP-VM 利用少數的三元圖來對影片去背,並在速度提升的同時,表現仍保持競爭力。它在NVIDIA RTX 2080Ti上以每秒40影格來處理1024x576的影片,過去的研究則是每秒5影格。為了加速,FTP-VM結合了三元圖傳播及影片去背於一個模型,並且將記憶匹配(Memory Matching)中額外的骨幹網路替換成我們設計的輕量三元圖混和模塊 (Trimap Fusion Module)。此外,我們修改了原用於車輛語意分割中的分割一致性損失函數來更符合三元圖分割,搭配循環神經網路來改進時間連貫性。FTP-VM不論在合成或真實的不同影片下表現皆有競爭力,並能夠即時運作來用於互動性應用。zh_TW
dc.description.abstractThe research of video matting mainly focuses on temporal coherence and has gained a great improvement via neural networks. However, providing trimaps is another implicit challenge in video matting. Matting often utilizes the information of user-annotated trimaps to estimate alpha values, while annotating the trimap of each frame in the video is definitely a huge cost to common users. Recent studies successfully leverage video object segmentation methods to propagate the given trimaps through the input video but get unstable results. Thus we present a more powerful and faster end-to-end video matting model equipped with trimap propagation, FTP-VM (Fast Trimap Propagation - Video Matting). FTP-VM performs video matting with given a few trimaps, and operates faster while preserving competitive performance. It processes a 1024x576 video at 40 FPS on an NVIDIA RTX 2080Ti GPU while the previous methods operate at 5 FPS. To speed up, FTP-VM combines trimap propagation and video matting in one model, and the additional backbone in memory matching is replaced with our lightweight trimap fusion module. Furthermore, the segmentation consistency loss is adapted from automotive segmentation to fit trimap segmentation, and collaborated with RNN (Recurrent Neural Network) to improve temporal coherence. FTP-VM works competitively in composited and real videos and is able to operate in real-time to enable the interactive application.en
dc.description.provenanceMade available in DSpace on 2023-03-19T23:21:11Z (GMT). No. of bitstreams: 1
U0001-2109202201055500.pdf: 50913647 bytes, checksum: a6cc839370aeac479bd19197adf33f12 (MD5)
Previous issue date: 2022
en
dc.description.tableofcontentsAcknowledgements i 摘要 ii Abstract iii Contents v List of Figures vii List of Tables ix Chapter 1 Introduction 1 Chapter 2 Related Work 6 2.1 Image Matting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2 Video Matting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.3 Video Object Segmentation . . . . . . . . . . . . . . . . . . . . . . 9 Chapter 3 Method 10 3.1 Encoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.2 Trimap Fusion Module . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.3 Bottleneck Fusion Module . . . . . . . . . . . . . . . . . . . . . . . 13 3.4 Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.5 Loss Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.5.1 Segmentation Loss . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.5.2 Matting Loss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 Chapter 4 Experiment 19 4.1 Dataset, Metric and Implementation Detail . . . . . . . . . . . . . . 19 4.1.1 Training Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4.1.2 Evaluation Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.1.3 Evaluation Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.1.4 Implementation Detail . . . . . . . . . . . . . . . . . . . . . . . . 21 4.2 Comparative Result . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.2.1 Quantitative and Qualitative Result . . . . . . . . . . . . . . . . . . 22 4.2.2 Memory Feeding Period . . . . . . . . . . . . . . . . . . . . . . . 26 4.3 Ablation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4.3.1 Ablation on Network Design . . . . . . . . . . . . . . . . . . . . . 27 4.3.2 Ablation on Trimap Fusion Module . . . . . . . . . . . . . . . . . . 28 4.3.3 Ablation on Bottleneck Fusion Module . . . . . . . . . . . . . . . . 29 4.3.4 Ablation on Segmentation Consistency Loss . . . . . . . . . . . . . 29 4.3.5 Different Memory Trimap Width . . . . . . . . . . . . . . . . . . . 30 4.4 High-Resolution Video . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.5 Video in the Wild . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 Chapter 5 Conclusion 41 References 42
dc.language.isoen
dc.subject三元圖傳播zh_TW
dc.subject去背zh_TW
dc.subject影片去背zh_TW
dc.subject三元圖zh_TW
dc.subjectVideo Mattingen
dc.subjectMattingen
dc.subjectTrimapen
dc.subjectTrimap Propagationen
dc.title融合三元圖傳播之端到端影片去背zh_TW
dc.titleEnd-to-end Video Matting with Trimap Propagationen
dc.typeThesis
dc.date.schoolyear110-2
dc.description.degree碩士
dc.contributor.oralexamcommittee胡敏君(Min-Chun Hu),葉梅珍(Mei-Chen Yeh)
dc.subject.keyword去背,影片去背,三元圖,三元圖傳播,zh_TW
dc.subject.keywordMatting,Video Matting,Trimap,Trimap Propagation,en
dc.relation.page49
dc.identifier.doi10.6342/NTU202203693
dc.rights.note同意授權(全球公開)
dc.date.accepted2022-09-27
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept資訊網路與多媒體研究所zh_TW
dc.date.embargo-lift2022-09-29-
顯示於系所單位:資訊網路與多媒體研究所

文件中的檔案:
檔案 大小格式 
U0001-2109202201055500.pdf49.72 MBAdobe PDF檢視/開啟
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved