提示學習與選擇於弱監督視覺分析

林棋祥; Ci-Siang Lin

Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98913

Title:	提示學習與選擇於弱監督視覺分析 Prompt Learning and Selection for Weakly-Supervised Visual Analysis
Authors:	林棋祥 Ci-Siang Lin
Advisor:	王鈺強 Yu-Chiang Frank Wang
Keyword:	人工智慧,深度學習,電腦視覺,圖像,影片, artificial intelligence,deep learning,computer vision,image,video,
Publication Year :	2025
Degree:	博士
Abstract:	當前深度學習的快速發展促使多種基礎模型被提出，用以解決視覺與語言的基本任務，而提示學習成為將基礎模型適應下游任務的一種主流微調技術。本論文旨在推進提示學習與選擇技術，以實現高級的視覺分析，包括可解釋的細粒度識別（第 1章）、圖像語義分割（第 2章）以及指向式影片分割（第 3章）。在第 1章中，我們通過學習一組視覺提示，利用視覺轉換器進行注意力機制並提取具辨識性的原型，實現了可解釋的細粒度識別。在第 2章中，我們通過從CLIP模型中學習文本背景提示來提升圖像語義分割效果。最後，在第 3章中，我們的模型能夠根據文本查詢選擇對應的時空提示，從而基於SAM實現指向式影片分割。得益於這些基礎模型所學到的豐富知識，以上任務都能以弱監督方式完成，減少了高昂的標註成本。 With the rapid development of deep learning, several foundation models have been proposed to address fundamental vision and language tasks, and prompt learning becomes a prevalent finetuning technique to adapt foundation models to downstream tasks. In this dissertation, we aim to advance prompt learning and selection techniques to achieve advanced visual analysis, including interpretable fine-grained recognition (Chapter 1), image semantic segmentation (Chapter 2), and referring video segmentation (Chapter 3). In Chapter 1, we achieve interpretable fine-grained recognition by learning a set of visual prompts to perform attention through vision transformer and derive discriminative prototypes. In Chapter 2, we enhance image semantic segmentation by learning textual background prompts from the CLIP model. Lastly, in Chapter 3, our model learns to select desired spatial-temporal prompts corresponding to the text query, addressing referring video segmentation based on SAM. Thanks to the rich knowledge learned inside these foundation models, the above tasks are able to be achieved in a weakly-supervised manner, alleviating expensive annotation costs.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98913
DOI:	10.6342/NTU202504082
Fulltext Rights:	未授權
metadata.dc.date.embargo-lift:	N/A
Appears in Collections:	電信工程學研究所

Files in This Item:

File	Size	Format
ntu-113-2.pdf Restricted Access	13.28 MB	Adobe PDF

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets