請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/97370完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 廖世偉 | zh_TW |
| dc.contributor.advisor | Shih-Wei Liao | en |
| dc.contributor.author | 黃致豪 | zh_TW |
| dc.contributor.author | Chih-Hao Huang | en |
| dc.date.accessioned | 2025-05-22T16:05:36Z | - |
| dc.date.available | 2025-05-23 | - |
| dc.date.copyright | 2025-05-22 | - |
| dc.date.issued | 2025 | - |
| dc.date.submitted | 2025-05-19 | - |
| dc.identifier.citation | Yilin Wang, Joong Gon Yim, Neil Birkbeck, and Balu Adsumilli. Youtube sfv+hdr quality dataset, 2024.
Haoning Wu, Chaofeng Chen, Jingwen Hou, Liang Liao, Annan Wang, Wenxiu Sun, Qiong Yan, and Weisi Lin. Fast-vqa: Efficient end-to-end video quality assessment with fragment sampling, 2022. Haoning Wu, Chaofeng Chen, Liang Liao, Jingwen Hou, Wenxiu Sun, Qiong Yan, Jinwei Gu, and Weisi Lin. Neighbourhood representative sampling for efficient end-to-end video quality assessment, 2022. Haoning Wu, Erli Zhang, Liang Liao, Chaofeng Chen, Jingwen Hou, Annan Wang, Wenxiu Sun, Qiong Yan, and Weisi Lin. Exploring video quality assessment on user generated contents from aesthetic and technical perspectives, 2023. Haoning Wu, Zicheng Zhang, Weixia Zhang, Chaofeng Chen, Liang Liao, Chunyi Li, Yixuan Gao, Annan Wang, Erli Zhang, Wenxiu Sun, Qiong Yan, Xiongkuo Min, Guangtao Zhai, and Weisi Lin. Q-align: Teaching lmms for visual scoring via discrete text-defined levels, 2023. Anish Mittal, Anush Krishna Moorthy, and Alan Conrad Bovik. No-reference image quality assessment in the spatial domain. IEEE Transactions on Image Processing, 21(12):4695–4708, 2012. Anish Mittal, Rajiv Soundararajan, and Alan C. Bovik. Making a “completely blind” image quality analyzer. IEEE Signal Processing Letters, 20(3):209–212, 2013. Zhengzhong Tu, Yilin Wang, Neil Birkbeck, Balu Adsumilli, and Alan C. Bovik. Ugc-vqa: Benchmarking blind video quality assessment for user generated content. IEEE Transactions on Image Processing, 30:4449– 4464, 2021. Qihang Ge, Wei Sun, Yu Zhang, Yunhao Li, Zhongpeng Ji, Fengyu Sun, Shangling Jui, Xiongkuo Min, and Guangtao Zhai. Lmm-vqa: Advancing video quality assessment with large multimodal models, 2024. Wen Wen, Mu Li, Yabin Zhang, Yiting Liao, Junlin Li, Li Zhang, and Kede Ma. Modular blind video quality assessment, 2024. Chenlong He, Qi Zheng, Ruoxi Zhu, Xiaoyang Zeng, Yibo Fan, and Zhengzhong Tu. Cover: A comprehensive video quality evaluator. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 5799–5809, June 2024. Zeina Sinno and Alan Conrad Bovik. Large-scale study of perceptual video quality. IEEE Transactions on Image Processing, 28(2):612– 627, February 2019. Vlad Hosu, Franz Hahn, Mohsen Jenadeleh, Hanhe Lin, Hui Men, Tamás Szirányi, Shujun Li, and Dietmar Saupe. The konstanz natural video database (konvid-1k). In 2017 Ninth International Conference on Quality of Multimedia Experience (QoMEX), pages 1–6, 2017. Deepti Ghadiyaram, Janice Pan, Alan C. Bovik, Anush Krishna Moorthy, Prasanjit Panda, and Kai-Chieh Yang. In-capture mobile video distortions: A study of subjective behavior and objective algorithms. IEEE Transactions on Circuits and Systems for Video Technology, 28(9):2061–2077, 2018. Michele A. Saad, Alan C. Bovik, and Christophe Charrier. Blind prediction of natural video quality. IEEE Transactions on Image Processing, 23(3):1352–1365, 2014. Anish Mittal, Michele A. Saad, and Alan C. Bovik. A completely blind video integrity oracle. IEEE Transactions on Image Processing, 25(1):289–300, 2016. Qi Zheng, Yibo Fan, Leilei Huang, Tianyu Zhu, Jiaming Liu, Zhijian Hao, Shuo Xing, Chia-Ju Chen, Xiongkuo Min, Alan C. Bovik, and Zhengzhong Tu. Video quality assessment: A comprehensive survey, 2024. Yiting Lu, Xin Li, Yajing Pei, Kun Yuan, Qizhi Xie, Yunpeng Qu, Ming Sun, Chao Zhou, and Zhibo Chen. Kvq: Kwai video quality assessment for short-form videos, 2024. Junyong You and Jari Korhonen. Deep neural networks for no-reference video quality assessment. In 2019 IEEE International Conference on Image Processing (ICIP), pages 2349–2353, 2019. Woojae Kim, Jongyoo Kim, Sewoong Ahn, Jinwoo Kim, and Sanghoon Lee. Deep video quality assessor: From spatio-temporal visual sensitivity to a convolutional neural aggregation network. In Computer Vision–ECCV 2018 - 15th European Conference, 2018, Proceedings, pages 224–241, 2018. Zhenqiang Ying, Maniratnam Mandal, Deepti Ghadiyaram, and Alan Bovik. Patch-vq:`patching up'the video quality problem. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), page 14014– 14024. IEEE, June 2021. Dingquan Li, Tingting Jiang, and Ming Jiang. Unified quality assessment of in-the-wild videos with mixed datasets training. International Journal of Computer Vision, 129(4):1238– 1257, January 2021. Dingquan Li, Tingting Jiang, and Ming Jiang. Quality assessment of in-the-wild videos. In Proceedings of the 27th ACM International Conference on Multimedia, MM’ 19, page 2351– 2359. ACM, October 2019. Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition, 2015. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition, 2015. Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255, 2009. Haoning Wu, Chaofeng Chen, Liang Liao, Jingwen Hou, Wenxiu Sun, Qiong Yan, and Weisi Lin. Discovqa: Temporal distortion-content transformers for video quality assessment. IEEE Transactions on Circuits and Systems for Video Technology, 33(9):4840–4854, 2023. Fengchuang Xing, Mingjie Li, Yuan-Gen Wang, Guopu Zhu, and Xiaochun Cao. Clipvqa:video quality assessment via clip, 2024. Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. Learning spatiotemporal features with 3d convolutional networks, 2015. Joao Carreira and Andrew Zisserman. Quo vadis, action recognition? a new model and the kinetics dataset, 2018. Munan Xu, Junming Chen, Haiqiang Wang, Shan Liu, Ge Li, and Zhiqiang Bai. C3dvqa: Full-reference video quality assessment with 3d convolutional neural network, 2020. Alex Sherstinsky. Fundamentals of recurrent neural network (rnn) and long short-term memory (lstm) network. Physica D: Nonlinear Phenomena, 404:132306, March 2020. Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. Empirical evaluation of gated recurrent neural networks on sequence modeling, 2014. Haşim Sak, Andrew Senior, and Françoise Beaufays. Long short-term memory based recurrent neural network architectures for large vocabulary speech recognition, 2014. Yilin Wang, Junjie Ke, Hossein Talebi, Joong Gon Yim, Neil Birkbeck, Balu Adsumilli, Peyman Milanfar, and Feng Yang. Rich features for perceptual quality assessment of ugc videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 13435–13444, June 2021. Robin M. Schmidt. Recurrent neural networks (rnns): A gentle introduction and overview, 2019. Junyong You. Long short-term convolutional transformer for no-reference video quality assessment. In Proceedings of the 29th ACM International Conference on Multimedia, MM ’21, page 2112– 2120, New York, NY, USA, 2021. Association for Computing Machinery. Fengchuang Xing, Yuan-Gen Wang, Hanpin Wang, Leida Li, and Guopu Zhu. Starvqa: Space-time attention for video quality assessment. In 2022 IEEE International Conference on Image Processing (ICIP), pages 2326–2330, 2022. Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision, 2021. Chongchong Jin, Zongju Peng, Fen Chen, Gangyi Jiang, and Mei Yu. Multi-modal learning-based blind video quality assessment metric for synthesized views. IEEE Transactions on Broadcasting, 70(1):208–222, 2024. Jiebin Yan, Lei Wu, Yuming Fang, Xuelin Liu, Xue Xia, and Weide Liu. Video quality assessment for online processing: From spatial to temporal sampling, 2025. Ziheng Jia, Zicheng Zhang, Jiaying Qian, Haoning Wu, Wei Sun, Chunyi Li, Xiaohong Liu, Weisi Lin, Guangtao Zhai, and Xiongkuo Min. Vqa2: Visual question answering for video quality assessment, 2024. Xianfu Cheng, Wei Zhang, Shiwei Zhang, Jian Yang, Xiangyuan Guan, Xianjie Wu, Xiang Li, Ge Zhang, Jiaheng Liu, Yuying Mai, Yutao Zeng, Zhoufutu Wen, Ke Jin, Baorui Wang, Weixiao Zhou, Yunhong Lu, Tongliang Li, Wenhao Huang, and Zhoujun Li. Simplevqa: Multimodal factuality evaluation for multimodal large language models, 2025. Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale, 2021. Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton, and Jeff Dean. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer, 2017. William Fedus, Barret Zoph, and Noam Shazeer. Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity, 2022. Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization, 2019. Leslie N. Smith and Nicholay Topin. Super-convergence: Very fast training of neural networks using large learning rates, 2018. Sanjay Yadav and Sanyam Shukla. Analysis of k-fold cross-validation over holdout validation on colossal datasets for quality classification. In 2016 IEEE 6th International Conference on Advanced Computing (IACC), pages 78–83, 2016. Justin Sirignano and Konstantinos Spiliopoulos. Scaling limit of neural networks with the xavier initialization and convergence to a global minimum, 2022. Xialei Liu, Joost van de Weijer, and Andrew D. Bagdanov. Rankiqa: Learning from rankings for no-reference image quality assessment, 2017. | - |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/97370 | - |
| dc.description.abstract | 用戶生成內容影像品質評估(UGC-VQA)主要針對社群平台上用戶自行拍攝上傳的影片進行影像品質評估,在近年來隨著社群平台的盛行,用戶生成內容影像的數量也急劇增加,該問題變得愈發重要,由於用戶拍攝的影片常含有不穩定的畫面品質、不同的壓縮設定以及多樣化的創意特效,要如何準確地量化和預測用戶的觀影體驗,成為維持社群平台影片內容的水準和用戶觀影品質的關鍵。
然而,隨著短影音的興起,快速剪輯、特殊濾鏡和跳接之類的效果更為常見,短影音和傳統長影音的表現手法的不同,造成用戶對於短影音的觀影體驗和長影音有落差,導致傳統的 UGC-VQA 方法在短影音品質評估上面臨新挑戰。例如,固定的禎取樣策略往往無法取樣到關鍵轉場,或是錯把富含創意的特殊濾鏡當成品質失真,導致現有的模型在評估短影音時產生品質低估的問題。基於這些觀察,我們提出 EXPERT-VQA 來解決既有的方法上的這些挑戰:首先,我們採用自適應幀取樣策略(APT-FS),來有效擷取最具代表性的片段;接著,我們融合多個已訓練好的專家模型,並加入一個輕量化的閘門網路來動態決定不同專家貢獻的權重;最後,藉由品質分數校正模組,我們針對短影音使用者對於影片品質的期待和既有模型預測的落差間進行系統性偏誤修正。實驗結果證實了此框架不論是在相關性或是誤差的指標上均優於現有方法,特別能處理具有頻繁轉場或強烈風格化的短影音。我們的主要貢獻在於:(1)提出自適應影像中幀取樣策略,補足過去固定頻率禎取樣策略上的不足,(2)利用多專家模型融合多重品質評估的面向,(3)透過校正模組解決既有模型在短影音上典型的負向偏差問題。這些方法成功讓影像品質評估在短影音上更貼近真實的使用者的觀影體驗,得到更符合真實的影像品質評估分數。 | zh_TW |
| dc.description.abstract | User-generated content video quality assessment (UGC-VQA) tackles the task of evaluating videos that users record and share on social media. As online platforms expand dramatically, the number and variety of these videos have increased significantly. This growth makes it critical to accurately measure viewer experience, even when faced with challenges such as inconsistent video quality, different compression techniques, and a range of creative visual effects. Traditional UGC-VQA methods, originally developed for longer videos, often fall short on short-form content. Such videos typically feature rapid edits, abrupt transitions, and distinctive stylistic filters, which can lead to a consistent underestimation of the quality perceived by viewers.
In response, this thesis introduces EXPERT-VQA, a novel framework specifically designed for short-form video quality assessment. Our approach tackles the problem through three key innovations. First, we propose the Adaptive and Perceptual Transition Frame Selection (APT-FS) method to dynamically identify and select frames that capture the most significant visual changes. This method overcomes the limitations of fixed-rate sampling. Second, we integrate multiple pre-trained VQA models, each excelling in different quality aspects, by employing a lightweight learnable gating network that fuses their predictions while preserving their individual strengths. Finally, we employ a calibration module to correct for the systematic bias observed in existing models. This correction ensures that the final quality score aligns more closely with actual viewer perceptions. Experimental evaluation on the YouTube SFV+HDR dataset demonstrates that EXPERT-VQA achieves superior performance, yielding higher correlation with human opinions and lower prediction errors compared to current state-of-the-art methods. Ablation studies further confirm that the APT-FS module, multi-expert fusion, and calibration process each contribute significantly to the overall improvements. In conclusion, this work provides a basis for assessing short-form video quality. The results show that adaptive frame selection, expert fusion, and calibration help reduce the difference between algorithmic predictions and human ratings. EXPERT-VQA may be used as a flexible and effective framework for video quality evaluation on social media. This work may also help guide future research in video quality assessment. | en |
| dc.description.provenance | Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2025-05-22T16:05:36Z No. of bitstreams: 0 | en |
| dc.description.provenance | Made available in DSpace on 2025-05-22T16:05:36Z (GMT). No. of bitstreams: 0 | en |
| dc.description.tableofcontents | Verification Letter from the Oral Examination Committee i
Acknowledgements ii 摘要 iv Abstract vi Contents viii List of Figures xi List of Tables xiii Chapter 1 Introduction 1 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Chapter 2 Related Work 7 2.1 User-Generated Content and Its Evolving Challenges in VQA . . . . 7 2.2 Deep Learning and Data-Driven BVQA Approaches . . . . . . . . . 8 2.2.1 CNN-Based Models . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2.2 RNN-Based Models . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2.3 Transformer-Based Models . . . . . . . . . . . . . . . . . . . . . . 11 2.2.4 Multimodal and LLM-Based VQA Approaches . . . . . . . . . . . 12 2.3 Spatial and Temporal Sampling in VQA . . . . . . . . . . . . . . . . 12 Chapter 3 Methodology 14 3.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.2 Overall Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.3 Visual Understanding with CLIP . . . . . . . . . . . . . . . . . . . . 16 3.4 Adaptive and Perceptual Transition Frame Selection (APT-FS) . . . . 18 3.4.1 Frame Selection Problem . . . . . . . . . . . . . . . . . . . . . . . 19 3.4.2 Perceptual Scoring . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.4.3 Intuition and Implementation . . . . . . . . . . . . . . . . . . . . . 20 3.5 Multi-Expert Integration . . . . . . . . . . . . . . . . . . . . . . . . 22 3.6 Quality Expectation Calibration . . . . . . . . . . . . . . . . . . . . 25 3.7 Training Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 Chapter 4 Evaluation 28 4.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.1.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.1.2 Model Configuration . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.1.3 APT-FS Configuration . . . . . . . . . . . . . . . . . . . . . . . . 31 4.1.4 Implementation Environment . . . . . . . . . . . . . . . . . . . . . 32 4.1.5 Training Configuration . . . . . . . . . . . . . . . . . . . . . . . . 32 4.2 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.3 Performance Comparison . . . . . . . . . . . . . . . . . . . . . . . .36 4.3.1 Overall Performance Comparison . . . . . . . . . . . . . . . . . . 36. 4.3.2 Inference Time Analysis . . . . . . . . . . . . . . . . . . . . . . . 39 4.3.3 Category-wise Analysis . . . . . . . . . . . . . . . . . . . . . . . .40 4.3.4 Score Distribution Analysis . . . . . . . . . . . . . . . . . . . . . . 43 4.4 Ablation Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 4.4.1 Frame Selection Strategy . . . . . . . . . . . . . . . . . . . . . . . 44 4.4.2 Expert Model Configuration . . . . . . . . . . . . . . . . . . . . .46 4.4.3 Quality Calibration Impact . . . . . . . . . . . . . . . . . . . . . . 47 4.5 Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 Chapter 5 Conclusion 52 References 53 | - |
| dc.language.iso | en | - |
| dc.subject | 用戶生成短影音 | zh_TW |
| dc.subject | 影像理解 | zh_TW |
| dc.subject | 自適應品質校準 | zh_TW |
| dc.subject | 混合專家模型 | zh_TW |
| dc.subject | 內容感知禎數選取 | zh_TW |
| dc.subject | 影像品質評估 | zh_TW |
| dc.subject | Mixture of Experts | en |
| dc.subject | Context-Aware Frame Selection | en |
| dc.subject | User-Generated Short-form Video | en |
| dc.subject | Video Quality Assessment | en |
| dc.subject | Video Understanding | en |
| dc.subject | Adaptive Quality Calibration | en |
| dc.title | 基於自適應採樣和混合專家模型的短影音品質分析系統 | zh_TW |
| dc.title | EXPERT-VQA: Ensemble Expert Prediction with Adaptive Frame Selection for Short-Form Video Quality Assessment | en |
| dc.type | Thesis | - |
| dc.date.schoolyear | 113-2 | - |
| dc.description.degree | 碩士 | - |
| dc.contributor.oralexamcommittee | 葉春超;陳縕儂;陳尚澤 | zh_TW |
| dc.contributor.oralexamcommittee | Chun-Chao Yeh;Yun-Nung Chen;Shang-Tse Chen | en |
| dc.subject.keyword | 影像品質評估,用戶生成短影音,內容感知禎數選取,混合專家模型,自適應品質校準,影像理解, | zh_TW |
| dc.subject.keyword | Video Quality Assessment,User-Generated Short-form Video,Context-Aware Frame Selection,Mixture of Experts,Adaptive Quality Calibration,Video Understanding, | en |
| dc.relation.page | 59 | - |
| dc.identifier.doi | 10.6342/NTU202500939 | - |
| dc.rights.note | 未授權 | - |
| dc.date.accepted | 2025-05-19 | - |
| dc.contributor.author-college | 電機資訊學院 | - |
| dc.contributor.author-dept | 資訊網路與多媒體研究所 | - |
| dc.date.embargo-lift | N/A | - |
| 顯示於系所單位: | 資訊網路與多媒體研究所 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-113-2.pdf 未授權公開取用 | 9.93 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
