Skip navigation

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More
DSpace logo
English
中文
  • Browse
    • Communities
      & Collections
    • Publication Year
    • Author
    • Title
    • Subject
    • Advisor
  • Search TDR
  • Rights Q&A
    • My Page
    • Receive email
      updates
    • Edit Profile
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊網路與多媒體研究所
Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98821
Title: Pali-VQA:基於 PaliGemma 2 的多級別無參考影片品質評估模型
Pali-VQA: A PaliGemma 2-Based Multi-Level Blind Video Quality Assessment Model
Authors: 梁家綸
Chia-Lun Liang
Advisor: 廖世偉
Shih-Wei Liao
Keyword: 影片品質評估,大型多模態模型,PaliGemma,序數迴歸,
Video Quality Assessment,Large Multimodal Model,PaliGemma,Ordinal Regression,
Publication Year : 2025
Degree: 碩士
Abstract: 隨著使用者創作影片的迅速增加,開發穩定且自動化影片品質評估(VQA)的方法已變得至關重要。儘管大型多模態模型(LMMs)已為無參考影片品質評估(BVQA)帶來了進展,但現行方法仍普遍將品質評估轉化爲一個粗略的五級分類問題,因此限制了模型在影片品質上微小差異的辨識能力。在本文中,我們提出 Pali-VQA,一種高效、基於 LMM 的 BVQA 模型,通過引進多級別評分架構來打破此一限制。Pali-VQA 奠基於 PaliGemma 2,將 BVQA 重新定義為一個最多可達 18 個不同評分等級的細緻分類問題。我們採用低秩自適應(LoRA)進行微調,並結合次序迴歸標籤平滑技術,在正則化模型的同時保留評分等級之間的內在的順序資訊。儘管我們僅在單一數據集上做 LoRA 微調,但在四個實景 VQA 基準測試的實驗中,Pali-VQA 取得了具競爭力的表現,足以媲美或超越那些參數量更大、進行完整微調或使用集成方法的模型。此外,當 Pali-VQA 與非 LMM 的深度神經網路(DNN)BVQA 模型 FAST-VQA 進行集成時,Pali-VQA 更在基準資料集中的三個上超越了所有先前的模型。我們的研究結果顯示,提升評分等級的數量能顯著提高預測表現,為基於 LMM 的影片品質評估方法提供了一條更經濟、更有效的途徑。
The exponential increase in user-generated video content necessitates robust automated methods for Video Quality Assessment (VQA). While Large Multimodal Models (LMMs) have propelled advances in Blind VQA (BVQA), current approaches typically frame quality prediction as a coarse, five-level classification task, limiting their ability to discern fine-grained video quality differences. In this paper, we introduce Pali-VQA, an efficient LMM-based BVQA model that addresses this limitation by incorporating a multi-level rating framework. Built on the PaliGemma 2 backbone, Pali-VQA reframes BVQA as a fine-grained classification problem parameterized with a maximum of 18 distinct rating levels. We employ Low-Rank Adaptation (LoRA) for fine-tuning and incorporate an ordinal regression label smoothing technique to preserve the inherent ordinal information among rating levels while regularizing the model. Despite being fine-tuned using LoRA on only a single dataset, our experiments on four in-the-wild VQA benchmarks show that Pali-VQA achieves competitive performance, matching or outperforming larger, fully fine-tuned, or ensemble models. Moreover, when ensembled with FAST‑VQA, a non‑LMM Deep Neural Network (DNN) BVQA model, Pali‑VQA outperforms all previous top models on three of the four datasets. Our findings demonstrate that increasing the granularity of the rating levels significantly enhances predictive performance, offering a more efficient and effective path to LMM-based video quality assessment.
URI: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98821
DOI: 10.6342/NTU202501694
Fulltext Rights: 同意授權(全球公開)
metadata.dc.date.embargo-lift: 2025-08-20
Appears in Collections:資訊網路與多媒體研究所

Files in This Item:
File SizeFormat 
ntu-113-2.pdf3.26 MBAdobe PDFView/Open
Show full item record


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved