請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/85673| 標題: | 融合三元圖傳播之端到端影片去背 End-to-end Video Matting with Trimap Propagation |
| 作者: | Wei-Lun Huang 黃偉綸 |
| 指導教授: | 李明穗(Ming-Sui Lee) |
| 關鍵字: | 去背,影片去背,三元圖,三元圖傳播, Matting,Video Matting,Trimap,Trimap Propagation, |
| 出版年 : | 2022 |
| 學位: | 碩士 |
| 摘要: | 影片去背的研究主要著重在時間連貫性,並且從類神經網路得到了重大改進,然而提供三元圖是另一個隱性的挑戰。去背經常會依靠使用者標註的三元圖資訊來估計透明度,但是標註每個畫面的三元圖對一般使用者肯定是巨大成本。近期的研究成功利用影片物件分割的技術將少數的三元圖傳播到整個影片,只是結果並不穩定。因此我們提出了一個更加強大及快速的端到端影片去背模型—FTP-VM (Fast Trimap Propagation - Video Matting)。FTP-VM 利用少數的三元圖來對影片去背,並在速度提升的同時,表現仍保持競爭力。它在NVIDIA RTX 2080Ti上以每秒40影格來處理1024x576的影片,過去的研究則是每秒5影格。為了加速,FTP-VM結合了三元圖傳播及影片去背於一個模型,並且將記憶匹配(Memory Matching)中額外的骨幹網路替換成我們設計的輕量三元圖混和模塊 (Trimap Fusion Module)。此外,我們修改了原用於車輛語意分割中的分割一致性損失函數來更符合三元圖分割,搭配循環神經網路來改進時間連貫性。FTP-VM不論在合成或真實的不同影片下表現皆有競爭力,並能夠即時運作來用於互動性應用。 The research of video matting mainly focuses on temporal coherence and has gained a great improvement via neural networks. However, providing trimaps is another implicit challenge in video matting. Matting often utilizes the information of user-annotated trimaps to estimate alpha values, while annotating the trimap of each frame in the video is definitely a huge cost to common users. Recent studies successfully leverage video object segmentation methods to propagate the given trimaps through the input video but get unstable results. Thus we present a more powerful and faster end-to-end video matting model equipped with trimap propagation, FTP-VM (Fast Trimap Propagation - Video Matting). FTP-VM performs video matting with given a few trimaps, and operates faster while preserving competitive performance. It processes a 1024x576 video at 40 FPS on an NVIDIA RTX 2080Ti GPU while the previous methods operate at 5 FPS. To speed up, FTP-VM combines trimap propagation and video matting in one model, and the additional backbone in memory matching is replaced with our lightweight trimap fusion module. Furthermore, the segmentation consistency loss is adapted from automotive segmentation to fit trimap segmentation, and collaborated with RNN (Recurrent Neural Network) to improve temporal coherence. FTP-VM works competitively in composited and real videos and is able to operate in real-time to enable the interactive application. |
| URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/85673 |
| DOI: | 10.6342/NTU202203693 |
| 全文授權: | 同意授權(全球公開) |
| 電子全文公開日期: | 2022-09-29 |
| 顯示於系所單位: | 資訊網路與多媒體研究所 |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| U0001-2109202201055500.pdf | 49.72 MB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
