Pali-VQA：基於 PaliGemma 2 的多級別無參考影片品質評估模型

梁家綸; Chia-Lun Liang

Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98821

Title:	Pali-VQA：基於 PaliGemma 2 的多級別無參考影片品質評估模型 Pali-VQA: A PaliGemma 2-Based Multi-Level Blind Video Quality Assessment Model
Authors:	梁家綸 Chia-Lun Liang
Advisor:	廖世偉 Shih-Wei Liao
Keyword:	影片品質評估,大型多模態模型,PaliGemma,序數迴歸, Video Quality Assessment,Large Multimodal Model,PaliGemma,Ordinal Regression,
Publication Year :	2025
Degree:	碩士
Abstract:	隨著使用者創作影片的迅速增加，開發穩定且自動化影片品質評估（VQA）的方法已變得至關重要。儘管大型多模態模型（LMMs）已為無參考影片品質評估（BVQA）帶來了進展，但現行方法仍普遍將品質評估轉化爲一個粗略的五級分類問題，因此限制了模型在影片品質上微小差異的辨識能力。在本文中，我們提出 Pali-VQA，一種高效、基於 LMM 的 BVQA 模型，通過引進多級別評分架構來打破此一限制。Pali-VQA 奠基於 PaliGemma 2，將 BVQA 重新定義為一個最多可達 18 個不同評分等級的細緻分類問題。我們採用低秩自適應（LoRA）進行微調，並結合次序迴歸標籤平滑技術，在正則化模型的同時保留評分等級之間的內在的順序資訊。儘管我們僅在單一數據集上做 LoRA 微調，但在四個實景 VQA 基準測試的實驗中，Pali-VQA 取得了具競爭力的表現，足以媲美或超越那些參數量更大、進行完整微調或使用集成方法的模型。此外，當 Pali-VQA 與非 LMM 的深度神經網路（DNN）BVQA 模型 FAST-VQA 進行集成時，Pali-VQA 更在基準資料集中的三個上超越了所有先前的模型。我們的研究結果顯示，提升評分等級的數量能顯著提高預測表現，為基於 LMM 的影片品質評估方法提供了一條更經濟、更有效的途徑。 The exponential increase in user-generated video content necessitates robust automated methods for Video Quality Assessment (VQA). While Large Multimodal Models (LMMs) have propelled advances in Blind VQA (BVQA), current approaches typically frame quality prediction as a coarse, five-level classification task, limiting their ability to discern fine-grained video quality differences. In this paper, we introduce Pali-VQA, an efficient LMM-based BVQA model that addresses this limitation by incorporating a multi-level rating framework. Built on the PaliGemma 2 backbone, Pali-VQA reframes BVQA as a fine-grained classification problem parameterized with a maximum of 18 distinct rating levels. We employ Low-Rank Adaptation (LoRA) for fine-tuning and incorporate an ordinal regression label smoothing technique to preserve the inherent ordinal information among rating levels while regularizing the model. Despite being fine-tuned using LoRA on only a single dataset, our experiments on four in-the-wild VQA benchmarks show that Pali-VQA achieves competitive performance, matching or outperforming larger, fully fine-tuned, or ensemble models. Moreover, when ensembled with FAST‑VQA, a non‑LMM Deep Neural Network (DNN) BVQA model, Pali‑VQA outperforms all previous top models on three of the four datasets. Our findings demonstrate that increasing the granularity of the rating levels significantly enhances predictive performance, offering a more efficient and effective path to LMM-based video quality assessment.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98821
DOI:	10.6342/NTU202501694
Fulltext Rights:	同意授權(全球公開)
metadata.dc.date.embargo-lift:	2025-08-20
Appears in Collections:	資訊網路與多媒體研究所

Files in This Item:

File	Size	Format
ntu-113-2.pdf	3.26 MB	Adobe PDF	View/Open

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets