Pali-VQA：基於 PaliGemma 2 的多級別無參考影片品質評估模型

梁家綸; Chia-Lun Liang

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98821

標題:	Pali-VQA：基於 PaliGemma 2 的多級別無參考影片品質評估模型 Pali-VQA: A PaliGemma 2-Based Multi-Level Blind Video Quality Assessment Model
作者:	梁家綸 Chia-Lun Liang
指導教授:	廖世偉 Shih-Wei Liao
關鍵字:	影片品質評估,大型多模態模型,PaliGemma,序數迴歸, Video Quality Assessment,Large Multimodal Model,PaliGemma,Ordinal Regression,
出版年 :	2025
學位:	碩士
摘要:	隨著使用者創作影片的迅速增加，開發穩定且自動化影片品質評估（VQA）的方法已變得至關重要。儘管大型多模態模型（LMMs）已為無參考影片品質評估（BVQA）帶來了進展，但現行方法仍普遍將品質評估轉化爲一個粗略的五級分類問題，因此限制了模型在影片品質上微小差異的辨識能力。在本文中，我們提出 Pali-VQA，一種高效、基於 LMM 的 BVQA 模型，通過引進多級別評分架構來打破此一限制。Pali-VQA 奠基於 PaliGemma 2，將 BVQA 重新定義為一個最多可達 18 個不同評分等級的細緻分類問題。我們採用低秩自適應（LoRA）進行微調，並結合次序迴歸標籤平滑技術，在正則化模型的同時保留評分等級之間的內在的順序資訊。儘管我們僅在單一數據集上做 LoRA 微調，但在四個實景 VQA 基準測試的實驗中，Pali-VQA 取得了具競爭力的表現，足以媲美或超越那些參數量更大、進行完整微調或使用集成方法的模型。此外，當 Pali-VQA 與非 LMM 的深度神經網路（DNN）BVQA 模型 FAST-VQA 進行集成時，Pali-VQA 更在基準資料集中的三個上超越了所有先前的模型。我們的研究結果顯示，提升評分等級的數量能顯著提高預測表現，為基於 LMM 的影片品質評估方法提供了一條更經濟、更有效的途徑。 The exponential increase in user-generated video content necessitates robust automated methods for Video Quality Assessment (VQA). While Large Multimodal Models (LMMs) have propelled advances in Blind VQA (BVQA), current approaches typically frame quality prediction as a coarse, five-level classification task, limiting their ability to discern fine-grained video quality differences. In this paper, we introduce Pali-VQA, an efficient LMM-based BVQA model that addresses this limitation by incorporating a multi-level rating framework. Built on the PaliGemma 2 backbone, Pali-VQA reframes BVQA as a fine-grained classification problem parameterized with a maximum of 18 distinct rating levels. We employ Low-Rank Adaptation (LoRA) for fine-tuning and incorporate an ordinal regression label smoothing technique to preserve the inherent ordinal information among rating levels while regularizing the model. Despite being fine-tuned using LoRA on only a single dataset, our experiments on four in-the-wild VQA benchmarks show that Pali-VQA achieves competitive performance, matching or outperforming larger, fully fine-tuned, or ensemble models. Moreover, when ensembled with FAST‑VQA, a non‑LMM Deep Neural Network (DNN) BVQA model, Pali‑VQA outperforms all previous top models on three of the four datasets. Our findings demonstrate that increasing the granularity of the rating levels significantly enhances predictive performance, offering a more efficient and effective path to LMM-based video quality assessment.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98821
DOI:	10.6342/NTU202501694
全文授權:	同意授權(全球公開)
電子全文公開日期:	2025-08-20
顯示於系所單位：	資訊網路與多媒體研究所

文件中的檔案：

檔案	大小	格式
ntu-113-2.pdf	3.26 MB	Adobe PDF	檢視/開啟

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。