基於循環畫面生成之視訊畫面內插

Yi-Tung Liao; 廖苡彤

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/79112

標題:	基於循環畫面生成之視訊畫面內插 Deep Video Frame Interpolation using Cyclic Frame Generation
作者:	Yi-Tung Liao 廖苡彤
指導教授:	莊永裕
共同指導教授:	林彥宇
關鍵字:	視訊畫面內差,循環一致性,邊緣引導訓練, Video frame interpolation,cycle consistency,edge-guided training,
出版年 :	2018
學位:	碩士
摘要:	這篇論文針對三項在卷積類神經網路應用於視訊畫面內插的觀察，提出對應的解決辦法。首先，若能利用生成的畫面再進行影像內差並得到高品質的結果，此生成畫面便是更可靠的。第二，在短時間內的光流 (optical flow) 變化為線性的。第三，根據對過去研究成果的分析，影像畫面內插在梯度值愈大的部分，內插的效果愈不好。我們提出了一個主要的損失函數 (loss term) – 循環一致性損失函數 (Cycle Consistency Loss)，與兩種應延伸應用。利用循環一致性損失函數，我們便能比過去的研究更有效地利用訓練資料集，使得生成的畫面品質更好，也能在訓練資料量下降時維持模型的品質。兩種延伸應用為運動線性特徵損失函數 (Motion Linearity Loss) 與邊緣引導訓練(Edge-guided Training)，分別針對影片中光流 (optical flow) 的劇烈變化與大梯度值提出改進方法。這兩種損失函數與訓練方式可以被應用在任何可以由頭至尾訓練的視訊畫面內插方法中。在這篇論文裡，我們主要是應用在 Deep Voxel Flow [15] 這篇研究所提出的模型上，用來展示我們每個部件的效果。我們在三個資料集上驗證我們提出的方法: 其中，在 UCF101 資料集以及高畫質影片 <See You Again>，我們可以得到最高的峰值信號雜訊比 (PSNR); 而在 Middlebury 基準上，我們在真實場景的影片中表現最好，對於普遍化此方法在視訊畫面內插上也是最有利的。 This thesis addresses three observations in video frame interpolation using convolutional neural networks (CNNs). First, interpolated frames are more reliable if we could further utilize them to interpolate high-quality output frames. Second, the optical flow should be linear within short time inter- val. Third, according to the analysis on previous works, there would be lower performances in areas with larger gradients. We tackle the three issues by introducing the new loss term, the cycle consistency loss, accompanied with two extensions, the motion linearity loss and the edge-guided training. The cycle consistency loss could fully utilize the training data, not only to enhance the results of interpolation, but also to maintain the performance with less training data than previous works. The motion linearity loss and edge-guided training are mainly designed for dealing with the areas with large motion and great gradient, respectively. These loss terms and training method could be applied on any end-to-end trainable network for video frame interpolation, and we adopt the Deep Voxel Flow [15] model in the thesis to demonstrate the effects of them. We have conducted the experiments on three datasets, achieving the state-of-the-art in two of them, the UCF101 benchmark and the high-quality video: “See You Again”. For Middlebury benchmark, we have the best performance in the real scene videos, which is more important for the generalization of the frame interpolation method.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/79112
DOI:	10.6342/NTU201802624
全文授權:	有償授權
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-107-R05922033-1.pdf 目前未授權公開取用	7.94 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。