資料缺乏下多樣且真實的影像生成

林揚昇; Yang-Sheng Lin

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/89020

標題:	資料缺乏下多樣且真實的影像生成 Diverse and Fidelity Image Synthesis for Unsupervised Image-to-Image Translation with Limited Data
作者:	林揚昇 Yang-Sheng Lin
指導教授:	許永真 Yung-jen Hsu
關鍵字:	非監督的圖像到圖像轉換,生成對抗網絡,資料缺乏下的圖像生成,遮罩自動編碼器, Unsupervised Image-to-Image Translation,Multiple Domain Image-to-Image Translation,Data-Efficient Generative Adversarial Network,Masked Autoencoder,
出版年 :	2023
學位:	碩士
摘要:	非監督的圖像到圖像轉換（Unsupervised Image-to-Image Translation）因其廣泛的應用範疇與不需要標註的特性，已成為圖像生成領域的研究重心之一並獲得了非常顯著的成果。然而，在資料有限的情況下，確保訓練的穩定性並產生多樣且真實的圖像仍是很困難的研究問題。為了解決這些挑戰，我們提出了兩種簡單且即插即用的方法：遮罩自動編碼器生成對抗網絡（MAE-GAN）和風格嵌入自適應歸一化塊（SEAN）。 MAE-GAN是一種用於非監督圖像到圖像（Unsupervised I2I）任務的預訓練方法，它融合了MAE和GAN的架構和優點，並且在預訓練期間能學習到不同領域的風格信息，從而使下游任務的訓練穩定性和圖像品質提高。SEAN塊是一種新的歸一化塊(Normalization Block)，它利用了大規模的預訓練特徵提取器(Large-scale Pre-trained Feature Extractor) ，並在模型的每一層中能各自學習每個不同領域的風格特徵空間。並且，它還能在多樣性和保真度之間進行選擇，使得可以生成更多樣化或更真實的圖像。我們的方法在資料型態較少見且具有挑戰性的混凝土缺陷橋樑圖像數據集（CODEBRIM）上取得了非常好的成果，此外，我們的方法也使用10％動物臉部數據集（AFHQ）進行訓練，達到了與原本訓練在完整數據集上的模型相進的圖像品質，並且還能獲得更好的圖像多樣性，證明了其在現實世界中的應用性和巨大的潛力。 Unsupervised Image-to-Image Translation (Unsupervised I2I) has emerged as a significant area of interest and has recently seen substantial advancements due to its wide range of applications and reduced data annotation requirements. However, in scenarios with limited data, ensuring training stability and generating diverse, realistic images remain critical research directions. To address these challenges, we propose two simple, plug-and-play methods: the Masked AutoEncoder Generative Adversarial Network (MAE-GAN) and the Style Embedding Adaptive Normalization (SEAN) block. The MAE-GAN, a pre-training method for Unsupervised I2I tasks, integrates the architectures and strengths of both MAE and GAN. It also enhances learning style-specific information during pre-training, leading to stable training and improved image quality in downstream tasks. The SEAN block is a novel normalization block that leverages large-scale pre-trained feature extractors and self-learns the style feature space for each domain in each layer. Consequently, it allows for a choice between diversity and fidelity, enabling the generation of more diverse or realistic images. Our method achieves substantial success on the less common and challenging concrete defect bridge dataset (CODEBRIM), demonstrating its real-world applicability. Additionally, our methods, trained on just 10% of the Animal Faces HQ dataset (AFHQ), achieve image quality on par with models trained on the full dataset, while also reaching greater image diversity, proving its real-world applicability and immense potential.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/89020
DOI:	10.6342/NTU202303360
全文授權:	同意授權(全球公開)
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-111-2.pdf	19.67 MB	Adobe PDF	檢視/開啟

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。