知識遷移於視覺理解

楊福恩; Fu-En Yang

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88270

標題:	知識遷移於視覺理解 Visual Understanding with Knowledge Transfer
作者:	楊福恩 Fu-En Yang
指導教授:	王鈺強 Yu-Chiang Frank Wang
關鍵字:	深度學習,電腦視覺,遷移學習,風格轉換,零樣本學習,領域泛化,聯邦學習, Deep Learning,Computer Vision,Transfer Learning,Style Transfer,Zero-Shot Learning,Domain Generalization,Federated Learning,
出版年 :	2023
學位:	博士
摘要:	深度學習的進步得益於大規模且精細蒐集的數據資料集。然而，這些數據集通常基於一個假設，即訓練和測試資料是共享相同的分佈。但在實際的應用場景，特別是在計算機視覺領域中，這樣的假設往往很難成立，在圖像領域分佈或是語義類別通常有所差異。由於這些資料分佈的不同，對特定分佈進行訓練的深度神經網絡在不同的資料分佈數據上往往表現不佳。在本論文中，我們的目標是透過遷移學習，以實現在不同的圖像領域分佈或語義類別之間進行知識的遷移。在本論文中，我們首先解決圖像風格的知識轉移問題。我們提出了一個特徵解耦框架，實現跨多個圖像領域和多樣化的風格轉移。接著，我們研究語義類別的知識轉移，透過利用類別內觀察到的差異來完成零樣本圖像識別這一具有挑戰性的任務。為了讓訓練模型能更好地處理落在源域分佈之外的數據，我們提出了一種用於領域泛化的對抗性教師-學生表示學習框架。最後，我們轉向分佈式學習的場景，用以達成在特定應用場景，例如醫療上的隱私保護要求。為了解決這個問題，我們設計了一種針對特定數據領域的提示生成框架，來允許高效並且個性化的聯邦學習。通過實驗的分析與結果，本論文中提出的方法的有效性得以驗證。 Recent progress in deep learning owes a lot to large-scale, curated datasets. However, these datasets typically operate on the assumption that training and test data share the same distribution. This is not always the case in real-world scenarios, particularly in the field of computer vision, where discrepancies in data domains or semantic categories are common. Due to these distribution gaps, deep neural networks trained on a specific distribution can struggle to perform in a different domain. In this thesis, we aim at advancing transfer learning to enable the transfer of knowledge across distinct data domains or semantic classes. Specifically, we first address knowledge transfer for image styles. We propose a feature disentanglement framework that facilitates multi-domain and multi-modal style transfer. Next, we examine knowledge transfer for semantic categories, focusing on the challenging task of zero-shot image recognition by leveraging intra-class variations. With the goal of enabling the trained model to handle data that falls outside the source distribution, we propose an Adversarial Teacher-Student Representation Learning framework for domain generalization. Finally, we transition to a decentralized learning paradigm, accommodating the privacy-preserving requirements of certain applications, such as healthcare. To tackle this, we devise a client-specific prompt generation framework to allow efficient, personalized federated learning. Through the comprehensive analysis and results, the effectiveness of the methods presented in this thesis could be successfully confirmed.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88270
DOI:	10.6342/NTU202301924
全文授權:	同意授權(全球公開)
顯示於系所單位：	電信工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-111-2.pdf	22.12 MB	Adobe PDF	檢視/開啟

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。