圖對比學習的自適應資料擴增架構

柯冠宇; Kuan-Yu Ko

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/93566

標題:	圖對比學習的自適應資料擴增架構 Self-Adaptive Data Augmentation Framework for Graph Contrastive Learning
作者:	柯冠宇 Kuan-Yu Ko
指導教授:	郭斯彥 Sy-Yen Kuo
關鍵字:	機器學習,自監督式學習,圖神經網路,圖對比學習,資料擴增, Machine learning,self-supervised learning,graph neural networks,graph contrastive learning,data augmentation,
出版年 :	2024
學位:	碩士
摘要:	作為著名的自監督式學習方法，圖對比學習是現今一大熱門的主題。常見的對比學習方法需要對輸入資料進行資料擴增，然而，在圖上進行資料擴增並不直觀，不適當的方法有可能破壞圖的結構，進而導致模型訓練結果不佳。因此，如何在不破壞結構的情況下對圖進行資料擴增，又或者不使用資料擴增去做圖對比學習是目前在這個領域的一大難題。這篇論文提出了一種全新的架構，該架構並不限制任何的資料擴增方法，可以自己適應並排除掉被破壞結構的資料，並且生成出一個包含原本資料集和資料擴增後沒有被破壞結構的資料的新資料集。準確來說，我們將一個批次的原本的資料和進行資料擴增後的資料輸入訓練好的模型，並計算這兩個批次間輸出的表徵的L2範數，蒐集L2範數較小的圖形成新的資料集。這是建立在同類別的資料若是沒有遭到資料擴增破壞結構的話，輸入受過訓練的模型，其表徵在潛在空間上距離比較近的觀察下所發想出來的。我們將新的資料集再去訓練一個全新的模型，也顯示這個用新資料集訓練的模型不僅比使用原有資料集訓練表現得更好，更能夠得到和最先進模型相比接近或更好的準確度。 Graph contrastive learning (GCL) has emerged as a famous self-supervised learning method. Its efficacy often hinges on the generation of positive samples through data augmentation. Unfortunately, applying data augmentation to graph is not intuitive. Inappropriate augmentation methods may destroy graph structure, leading to poor model performance. Thus, developing a data augmentation method that preserve semantics of the graph, or alternatively, a GCL methods without data augmentation becomes a significant challenge within this domain. In this paper, we propose a novel framework that is compatible with all data augmentation methods while being self-adaptive. It excludes data which graph structure are destroyed, creating a new dataset including data from original dataset and those preserved its semantics after data augmentation. Specifically, we input a batch of original data and augmented data into a trained model. The L2 norm of the representations between two batches are computed, and we extract those graphs with minimal L2 norm. This is inspired by the fact that for a trained model, representations from two graphs with same label should exhibit proximity. We train a new model on refined dataset. The results show that this model not only outperforms the model trained on the original dataset but also achieves competitive or better performance in comparison to state-of-the-art methods.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/93566
DOI:	10.6342/NTU202401171
全文授權:	未授權
顯示於系所單位：	電子工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-112-2.pdf 未授權公開取用	633.67 kB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。