情境式影像合成以多模態引導方法

郭宗翰; Tsung-Han Kuo

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/90176

標題:	情境式影像合成以多模態引導方法 Contextual Image Synthesis with Multimodal Guidance
作者:	郭宗翰 Tsung-Han Kuo
指導教授:	郭大維 Tei-Wei Kuo
共同指導教授:	胡京通 Jingtong Hu
關鍵字:	自動標註,情境式影像合成,多模態引導, Automatic Labeling,Contextual Image Synthesis,Multimodal Guidance,
出版年 :	2023
學位:	博士
摘要:	隨著 2012 年深度學習的起飛，基於計算機視覺的 AI 系統顯著的提高了辨識和檢測能力。然而， AI 系統非常依賴經過標註的訓練數據，而與數據標註相關的可觀成本很可能會阻礙 AI 系統的發展。因此，針對監督式學習中訓練數據不足的問題，本文探索了自動標註和數據合成方法。在自動標註方面，我們借力深度學習技術，根據可疑結腸粘膜特徵與相對應的結腸鏡篩檢速度之間的相關性標註了需要進一步檢查的視頻圖像。在數據合成方面，我們為一個更大型的生成網路提出自下而上的訓練方法，並評估其在面部老化圖像合成方面的有效性。具體來說，這個更大型的網路由兩個 CycleGAN 所串聯而成，用於合成面部老化和年輕化圖像，我們稱為 BiTrackGAN。自下而上的訓練在這兩個 CycleGAN 之間引入了一個理想的中間狀態，即約束機制。結果表明，BiTrackGAN 通過約束機制合成出更平滑自然的面部老化和年輕化圖像。此外，它還提高了多樣性地合成效果。更進一步的，我們提出了一種名為 KFS 的分類器來評估跨年齡面部特徵相似性，作用在引導擴散生成模型中合成情境式面部老化圖像。就技術而言，KFS 所評估的面部特徵相似性指標被用來當作計算與參考圖像相似性損失的正則化項，使得透過多模態擴散引導所合成的情境式面部老化圖像更加接近情境文字提示，效果也更加協調。據我們所知，我們是第一個提出情境式面部老化的研究。我們認為這將會是面部老化和任何與生長或老化相關的圖像合成問題的新穎流行技術，包括醫學領域的腫瘤圖像合成。簡言之，我們所提出的方法有能力取得或者合成出最新的、最貼近實際的、且多樣化的數據用來進行模型訓練，而不僅僅是大量的數據。 With the deep learning take-off in 2012, computer vision's AI system has significantly improved recognition and detection. However, AI development still relies heavily on labeled training data, and the considerable costs associated with data labeling can hinder their advancement. To address the issue of insufficient training data in supervised learning, this dissertation explores automatic data labeling and data synthesis approaches. For automatic data labeling, we utilize deep learning techniques to automatically label images that may require further examination based on the correlation between suspicious colonic mucosal features and the corresponding speed of colonoscopy screening. For data synthesis, we propose a bottom-up training method for a larger generative network and evaluate its effectiveness on facial aging image synthesis. Specifically, that is a translation pipeline with two CycleGAN blocks cascaded to synthesize facial aging and rejuvenation, named BiTrackGAN. Bottom-up training induces an ideal intermediate state between these two CycleGAN blocks, namely the constraint mechanism. According to the results, BiTrackGAN synthesizes smoother, more progressive facial aging and rejuvenation while improving synthesis diversity. Furthermore, a KFS classifier was proposed to evaluate cross-age facial feature similarity to synthesize contextual facial aging images in a guided diffusion model. Technically, that is the KFS similarity metric as a regularization term to referenced image loss, making synthesizing contextual aged facial images in multimodal diffusion guidance to be more closely text prompt and coordinated. To our best knowledge, we are the first to propose and conduct research on contextual facial aging synthesis. We believe this novel technique will broadly apply to facial aging and other growth or aging-related image synthesis problems, including tumor image synthesis in the medical field. In summary, our works could obtain or synthesize the most recent, realistic, and diverse data for model training, not just large quantities.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/90176
DOI:	10.6342/NTU202303822
全文授權:	未授權
顯示於系所單位：	資訊網路與多媒體研究所

文件中的檔案：

檔案	大小	格式
ntu-111-2.pdf 目前未授權公開取用	21.79 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。