利用大量人工失真效果評估增強影像之視覺品質

費俊昱; Chun-Yu Fei

Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/95329

Title:	利用大量人工失真效果評估增強影像之視覺品質 DEGRAVE: Learning from Synthetic Degradation for Assessing Perceptual Quality of Video Enhancement
Authors:	費俊昱 Chun-Yu Fei
Advisor:	廖世偉 Shie-Wei Liao
Keyword:	影像品質評估,人工失真,孿生網路,影像增強, Video quality assessment,Synthetic Distortion,Siamese Network,Video enhancement,
Publication Year :	2024
Degree:	碩士
Abstract:	用戶生成內容影片品質評估（UGC-VQA）旨在無給定參考基準影片下預測用戶生成影片的品質。目前，大多數研究集中在具有未知和自然失真的一般用戶生成影片上。過往文獻利用傳統影像演算法和深度學習獲得不錯的預測效能。然而，這些模型無法很好的評估增強影片的視覺品質，代表此研究領域仍存在未開發的空間。有鑑於此，本文提出一種可套用在任何模型上的兩階段訓練策略，利用大量合成失真在大型影片品質評估數據集上進行訓練，以預測增強影片的視覺品質。此外，我們透過研究數據證明此模型可適用於一般的用戶生成影片上。為了量化各種失真類型對用戶生成影片感知品質的影響，我們提出一種結合數據拓展和學習失真的方法。具體而言，我們在現有影片數據集上加上多種可能出現在用戶生成影片上的失真，進而形成一個規模更大且包含人工失真的影片數據集。對於新生成的失真數據，我們利用一個已經訓練完畢的大語言模型生成偽分數，然後搭建一個孿生網絡模型，並用成對的失真數據訓練。訓練完成後，我們凍結主幹網路的參數以降低計算複雜度。在對下游數據進行微調時，我們僅訓練一個額外的輕量網路，該網路用於增強模型對整體輸入的感知及計算模型輸出特徵的權重以得出最終預測分數。我們利用一個大型的影像增強數據集證明模型的預測效能和不足之處，並提出一些利用失真評估增強影像品質之改善方法。 User-generated content video quality assessment (UGC-VQA) is aimed at predicting the perceptual quality of user-generated videos without reference. Currently, most works focus on the general type of user-generated videos with unknown authentic distortion. Several hand-crafted and deep-learning methods have been developed to achieve high performance. Nevertheless, these models have diverse performances when evaluating the perceptual quality of UGC videos with enhancement effects, making the solution to the UGC-VQA task flawed. In this work, we propose a model-agnostic two-stage training strategy that includes a pre-training stage to train a dual encoder architecture and a fine-tuning stage that trains a lightweight fusion network to predict the perceptual quality of enhanced videos. We demonstrate that our solution can be extended to a more unconstrained setting on general UGC-VQA datasets. To capture the synthetic effects accompanied by enhanced videos, we present a learning-by-degrading approach with a data amplification method to quantify the impact of various distortion types on the perceptual quality of videos. Specifically, we impose multiple UGC-related degradations to extend the size of an existing video dataset and leverage a well-trained MLLM to produce pseudo-scores for pre-training the newly generated distorted data. Furthermore, we build a Siamese network that learns the degradations with pairwise input of the same distortion type. The backbone network weights are frozen when fine-tuning downstream data to reduce computation complexity. A lightweight global weighted fusion network is trained to capture the additional information during fine-tuning. We demonstrate the proposed framework's effectiveness and weaknesses by evaluating the largest video enhancement dataset with various categorized enhancement approaches. Furthermore, we suggest some future works that ameliorate our proposed method.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/95329
DOI:	10.6342/NTU202403836
Fulltext Rights:	同意授權(全球公開)
Appears in Collections:	資訊工程學系

Files in This Item:

File	Size	Format
ntu-112-2.pdf	11.31 MB	Adobe PDF	View/Open

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets