透過預訓練視覺-語言模型之文本知識增強即時語義分割：一種輕量級方法

林家毅; Chia-Yi Lin

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88414

標題:	透過預訓練視覺-語言模型之文本知識增強即時語義分割：一種輕量級方法 Enhancing Real-Time Semantic Segmentation with Textual Knowledge of Pre-Trained Vision-Language Model: A Lightweight Approach
作者:	林家毅 Chia-Yi Lin
指導教授:	吳家麟 Ja-Ling Wu
關鍵字:	語義分割,即時,預訓練視覺-語言模型,文本知識,提示調整,兩階段訓練, Semantic Segmentation,Real-Time,Pre-Trained Vision-Language Model,Textual Knowledge,Prompt Tuning,Two-Stage Training,
出版年 :	2023
學位:	碩士
摘要:	在本文中，我們提出一種輕量級方法，透過預訓練視覺語言模型（pre-trained vision-language model）來增強即時語義分割（real-time semantic segmentation）。我們的方法將CLIP文本編碼器（text encoder）與語義分割模型相結合，有效地將文本知識傳遞給分割模型。我們的框架整合了圖像和文本嵌入（text and image embeddings），使視覺和文本資訊可以相互整合。同時，我們引入了可學習的提示嵌入（learnable prompt embedding），以捕捉特定類別（class-specific）的資訊並提升模型對語義的理解能力。為了確保訓練效果，我們設計了一種兩階段的訓練流程，允許語義分割模型在第一階段從固定的文本嵌入中學習，並在第二階段優化提示嵌入。通過實驗和消融研究，我們驗證了這種方法能夠顯著提升即時語義分割模型的性能。 In this paper, we present a lightweight method to enhance real-time semantic segmentation models by leveraging the power of pre-trained vision-language models. Our approach incorporates the CLIP text encoder, which provides rich semantic embeddings for text labels, and effectively transmits this textual knowledge to the segmentation model. The proposed framework integrates image and text embeddings, enabling visual and textual information alignment. Besides, we introduce learnable prompt embeddings to capture class-specific information and enhance the semantic understanding of the model. To ensure effective learning, we devise a two-stage training procedure that allows the segmentation backbone to learn from fixed text embeddings in the first stage and optimize the prompt embeddings in the second stage. Extensive experiments and ablation studies demonstrate the effectiveness of our method in significantly improving the performance of the real-time semantic segmentation model.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88414
DOI:	10.6342/NTU202302018
全文授權:	未授權
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-111-2.pdf 未授權公開取用	5.72 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。