基於自適應採樣和混合專家模型的短影音品質分析系統

黃致豪; Chih-Hao Huang

Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/97370

Title:	基於自適應採樣和混合專家模型的短影音品質分析系統 EXPERT-VQA: Ensemble Expert Prediction with Adaptive Frame Selection for Short-Form Video Quality Assessment
Authors:	黃致豪 Chih-Hao Huang
Advisor:	廖世偉 Shih-Wei Liao
Keyword:	影像品質評估,用戶生成短影音,內容感知禎數選取,混合專家模型,自適應品質校準,影像理解, Video Quality Assessment,User-Generated Short-form Video,Context-Aware Frame Selection,Mixture of Experts,Adaptive Quality Calibration,Video Understanding,
Publication Year :	2025
Degree:	碩士
Abstract:	用戶生成內容影像品質評估（UGC-VQA）主要針對社群平台上用戶自行拍攝上傳的影片進行影像品質評估，在近年來隨著社群平台的盛行，用戶生成內容影像的數量也急劇增加，該問題變得愈發重要，由於用戶拍攝的影片常含有不穩定的畫面品質、不同的壓縮設定以及多樣化的創意特效，要如何準確地量化和預測用戶的觀影體驗，成為維持社群平台影片內容的水準和用戶觀影品質的關鍵。然而，隨著短影音的興起，快速剪輯、特殊濾鏡和跳接之類的效果更為常見，短影音和傳統長影音的表現手法的不同，造成用戶對於短影音的觀影體驗和長影音有落差，導致傳統的 UGC-VQA 方法在短影音品質評估上面臨新挑戰。例如，固定的禎取樣策略往往無法取樣到關鍵轉場，或是錯把富含創意的特殊濾鏡當成品質失真，導致現有的模型在評估短影音時產生品質低估的問題。基於這些觀察，我們提出 EXPERT-VQA 來解決既有的方法上的這些挑戰：首先，我們採用自適應幀取樣策略（APT-FS），來有效擷取最具代表性的片段；接著，我們融合多個已訓練好的專家模型，並加入一個輕量化的閘門網路來動態決定不同專家貢獻的權重；最後，藉由品質分數校正模組，我們針對短影音使用者對於影片品質的期待和既有模型預測的落差間進行系統性偏誤修正。實驗結果證實了此框架不論是在相關性或是誤差的指標上均優於現有方法，特別能處理具有頻繁轉場或強烈風格化的短影音。我們的主要貢獻在於：（1）提出自適應影像中幀取樣策略，補足過去固定頻率禎取樣策略上的不足，（2）利用多專家模型融合多重品質評估的面向，（3）透過校正模組解決既有模型在短影音上典型的負向偏差問題。這些方法成功讓影像品質評估在短影音上更貼近真實的使用者的觀影體驗，得到更符合真實的影像品質評估分數。 User-generated content video quality assessment (UGC-VQA) tackles the task of evaluating videos that users record and share on social media. As online platforms expand dramatically, the number and variety of these videos have increased significantly. This growth makes it critical to accurately measure viewer experience, even when faced with challenges such as inconsistent video quality, different compression techniques, and a range of creative visual effects. Traditional UGC-VQA methods, originally developed for longer videos, often fall short on short-form content. Such videos typically feature rapid edits, abrupt transitions, and distinctive stylistic filters, which can lead to a consistent underestimation of the quality perceived by viewers. In response, this thesis introduces EXPERT-VQA, a novel framework specifically designed for short-form video quality assessment. Our approach tackles the problem through three key innovations. First, we propose the Adaptive and Perceptual Transition Frame Selection (APT-FS) method to dynamically identify and select frames that capture the most significant visual changes. This method overcomes the limitations of fixed-rate sampling. Second, we integrate multiple pre-trained VQA models, each excelling in different quality aspects, by employing a lightweight learnable gating network that fuses their predictions while preserving their individual strengths. Finally, we employ a calibration module to correct for the systematic bias observed in existing models. This correction ensures that the final quality score aligns more closely with actual viewer perceptions. Experimental evaluation on the YouTube SFV+HDR dataset demonstrates that EXPERT-VQA achieves superior performance, yielding higher correlation with human opinions and lower prediction errors compared to current state-of-the-art methods. Ablation studies further confirm that the APT-FS module, multi-expert fusion, and calibration process each contribute significantly to the overall improvements. In conclusion, this work provides a basis for assessing short-form video quality. The results show that adaptive frame selection, expert fusion, and calibration help reduce the difference between algorithmic predictions and human ratings. EXPERT-VQA may be used as a flexible and effective framework for video quality evaluation on social media. This work may also help guide future research in video quality assessment.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/97370
DOI:	10.6342/NTU202500939
Fulltext Rights:	未授權
metadata.dc.date.embargo-lift:	N/A
Appears in Collections:	資訊網路與多媒體研究所

Files in This Item:

File	Size	Format
ntu-113-2.pdf Restricted Access	9.93 MB	Adobe PDF

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets