從零開始生成可解釋人工智慧之異質訊息網路

黃奕翔; Yi-Hsiang Huang

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/90567

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	林澤	zh_TW
dc.contributor.advisor	Che Lin	en
dc.contributor.author	黃奕翔	zh_TW
dc.contributor.author	Yi-Hsiang Huang	en
dc.date.accessioned	2023-10-03T16:39:55Z	-
dc.date.available	2023-11-09	-
dc.date.copyright	2023-10-03	-
dc.date.issued	2023	-
dc.date.submitted	2023-08-07	-
dc.identifier.citation	[1] J. M. Stokes, K. Yang, K. Swanson, W. Jin, A. Cubillos-Ruiz, N. M. Donghia, C. R. MacNair, S. French, L. A. Carfrae, Z. Bloom-Ackermann et al., “A deep learning approach to antibiotic discovery,” Cell, vol. 180, no. 4, pp. 688–702, 2020. [2] Z. Cui, X. Xu, X. Fei, X. Cai, Y. Cao, W. Zhang, and J. Chen, “Personalized recommendation system based on collaborative filtering for iot scenarios,” IEEE Transactions on Services Computing, vol. 13, no. 4, pp. 685–695, 2020. [3] O. Shchur and S. Günnemann, “Overlapping community detection with graph neural networks,” arXiv preprint arXiv:1909.12201, 2019. [4] I. Chami, S. Abu-El-Haija, B. Perozzi, C. Ré, and K. Murphy, “Machine learning on graphs: A model and comprehensive taxonomy,” Journal of Machine Learning Research, vol. 23, no. 89, pp. 1–64, 2022. [5] J. Palowitch, A. Tsitsulin, B. Mayer, and B. Perozzi, “Graphworld: Fake graphs bring real insights for gnns,” in Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2022, pp. 3691–3701. [6] E. Dai and S. Wang, “Towards self-explainable graph neural network,” in Proceedings of the 30th ACM International Conference on Information & Knowledge Management, 2021, pp. 302–311. [7] Z. Ying, D. Bourgeois, J. You, M. Zitnik, and J. Leskovec, “Gnnexplainer: Generating explanations for graph neural networks,” Advances in neural information processing systems, vol. 32, 2019. [8] D. Luo, W. Cheng, D. Xu, W. Yu, B. Zong, H. Chen, and X. Zhang, “Parameterized explainer for graph neural network,” Advances in neural information processing systems, vol. 33, pp. 19 620–19 631, 2020. [9] W. Lin, H. Lan, H. Wang, and B. Li, “Orphicx: A causality-inspired latent variable model for interpreting graph neural networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 13 729–13 738. [10] C. Ling, C. Yang, and L. Zhao, “Deep generation of heterogeneous networks,” in 2021 IEEE International Conference on Data Mining (ICDM). IEEE, 2021, pp. 379–388. [11] T. A. Snijders and K. Nowicki, “Estimation and prediction for stochastic blockmodels for graphs with latent block structure,” Journal of classification, vol. 14, no. 1, pp. 75–100, 1997. [12] E. Abbe, “Community detection and stochastic block models: recent developments,” The Journal of Machine Learning Research, vol. 18, no. 1, pp. 6446–6531, 2017. [13] C. Zhang, D. Song, C. Huang, A. Swami, and N. V. Chawla, “Heterogeneous graph neural network,” in Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, 2019, pp. 793–803. [14] X. Wang, H. Ji, C. Shi, B. Wang, Y. Ye, P. Cui, and P. S. Yu, “Heterogeneous graph attention network,” in The World Wide Web Conference, ser. WWW ’19. New York, NY, USA: Association for Computing Machinery, 2019, p. 2022–2032. [Online]. Available: https://doi.org/10.1145/3308558.3313562 [15] X. Wang, D. Bo, C. Shi, S. Fan, Y. Ye, and S. Y. Philip, “A survey on heterogeneous graph embedding: methods, techniques, applications and sources,” IEEE Transactions on Big Data, 2022. [16] V. P. Dwivedi, C. K. Joshi, T. Laurent, Y. Bengio, and X. Bresson, “Benchmarking graph neural networks,” 2020. [17] A. Tsitsulin, J. Palowitch, B. Perozzi, and E. Müller, “Graph clustering with graph neural networks,” arXiv preprint arXiv:2006.16904, 2020. [18] B. Rozemberczki, P. Englert, A. Kapoor, M. Blais, and B. Perozzi, “Pathfinder discovery networks for neural message passing,” in Proceedings of the Web Conference 2021, 2021, pp. 2547–2558. [19] A. K. Debnath, R. L. Lopez de Compadre, G. Debnath, A. J. Shusterman, and C. Hansch, “Structure-activity relationship of mutagenic aromatic and heteroaromatic nitro compounds. correlation with molecular orbital energies and hydrophobicity,” Journal of medicinal chemistry, vol. 34, no. 2, pp. 786–797, 1991. [20] N. K. Ahmed, N. Duffield, T. Willke, and R. A. Rossi, “On sampling from massive graph streams,” arXiv preprint arXiv:1703.02625, 2017. [21] S. Gu, J. Johnson, F. E. Faisal, and T. Milenković, “From homogeneous to heterogeneous network alignment via colored graphlets,” Scientific reports, vol. 8, no. 1, p. 12524, 2018. [22] W. Hayes, K. Sun, and N. Pržulj, “Graphlet-based measures are suitable for biological network comparison,” Bioinformatics, vol. 29, no. 4, pp. 483–491, 2013. [23] R. A. Rossi, N. K. Ahmed, A. Carranza, D. Arbour, A. Rao, S. Kim, and E. Koh, “Heterogeneous graphlets,” ACM Transactions on Knowledge Discovery from Data (TKDD), vol. 15, no. 1, pp. 1–43, 2020. [24] T. L. Bailey, M. Boden, F. A. Buske, M. Frith, C. E. Grant, L. Clementi, J. Ren, W. W. Li, and W. S. Noble, “Meme suite: tools for motif discovery and searching,” Nucleic acids research, vol. 37, no. suppl_2, pp. W202–W208, 2009. [25] X. Wang, H. Ji, C. Shi, B. Wang, Y. Ye, P. Cui, and P. S. Yu, “Heterogeneous graph attention network,” in The world wide web conference, 2019, pp. 2022–2032. [26] X. Fu, J. Zhang, Z. Meng, and I. King, “Magnn: Metapath aggregated graph neural network for heterogeneous graph embedding,” in Proceedings of The Web Conference 2020, 2020, pp. 2331–2341. [27] Q. Lv, M. Ding, Q. Liu, Y. Chen, W. Feng, S. He, C. Zhou, J. Jiang, Y. Dong, and J. Tang, “Are we really making much progress? revisiting, benchmarking and refining heterogeneous graph neural networks,” in Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining, 2021, pp. 1150–1160. [28] Y. Sun, J. Han, X. Yan, P. S. Yu, and T. Wu, “Pathsim: Meta path-based top-k similarity search in heterogeneous information networks,” Proceedings of the VLDB Endowment, vol. 4, no. 11, pp. 992–1003, 2011. [29] Y. Dong, N. V. Chawla, and A. Swami, “metapath2vec: Scalable representation learning for heterogeneous networks,” in Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, 2017, pp. 135–144. [30] C. Shi, B. Hu, W. X. Zhao, and S. Y. Philip, “Heterogeneous information network embedding for recommendation,” IEEE Transactions on Knowledge and Data Engineering, vol. 31, no. 2, pp. 357–370, 2018. [31] T.-y. Fu, W.-C. Lee, and Z. Lei, “Hin2vec: Explore meta-paths in heterogeneous information networks for representation learning,” in Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, 2017, pp. 1797–1806. [32] Y. Fang, W. Lin, V. W. Zheng, M. Wu, K. C.-C. Chang, and X.-L. Li, “Semantic proximity search on graphs with metagraph-based learning,” in 2016 IEEE 32nd International Conference on Data Engineering (ICDE). IEEE, 2016, pp. 277–288. [33] Z. Huang, Y. Zheng, R. Cheng, Y. Sun, N. Mamoulis, and X. Li, “Meta structure: Computing relevance in large heterogeneous information networks,” in Proceedings of the 22nd ACM SIGKDD International conference on knowledge discovery and data mining, 2016, pp. 1595–1604. [34] Y. Fang, W. Lin, V. W. Zheng, M. Wu, J. Shi, K. C.-C. Chang, and X.-L. Li, “Metagraph-based learning on heterogeneous graphs,” IEEE Transactions on Knowledge and Data Engineering, vol. 33, no. 1, pp. 154–168, 2019. [35] W. Zhang, Y. Fang, Z. Liu, M. Wu, and X. Zhang, “mg2vec: Learning relationship preserving heterogeneous graph representations via metagraph embedding,” IEEE Transactions on Knowledge and Data Engineering, vol. 34, no. 3, pp. 1317–1329, 2020. [36] H. Yuan, H. Yu, J. Wang, K. Li, and S. Ji, “On explainability of graph neural networks via subgraph explorations,” in International Conference on Machine Learning. PMLR, 2021, pp. 12 241–12 252. [37] R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon, “Network motifs: simple building blocks of complex networks,” Science, vol. 298, no.5594, pp. 824–827, 2002. [38] N. Shervashidze, S. Vishwanathan, T. Petri, K. Mehlhorn, and K. Borgwardt, “Efficient graphlet kernels for large graph comparison,” in Artificial intelligence and statistics. PMLR, 2009, pp. 488–495. [39] X. Chen, Y. Li, P. Wang, and J. Lui, “A general framework for estimating graphlet statistics via random walk,” arXiv preprint arXiv:1603.07504, 2016. [40] R. Albert and A.-L. Barabási, “Statistical mechanics of complex networks,” Reviews of modern physics, vol. 74, no. 1, p. 47, 2002. [41] CHAMBERUNDERGROUND. (2018) Movie metadata. [Online]. Available: https://www.kaggle.com/datasets/karrrimba/movie-metadatacsv [42] [Online]. Available: http://web.cs.ucla.edu/~yzsun/data/ [43] A. Tsitsulin, B. Rozemberczki, J. Palowitch, and B. Perozzi, “Synthetic graph generation to benchmark graph learning,” arXiv preprint arXiv:2204.01376, 2022. [44] S. Yun, M. Jeong, R. Kim, J. Kang, and H. J. Kim, “Graph transformer networks,”Advances in neural information processing systems, vol. 32, 2019. [45] M.-Y. Hong, S.-Y. Chang, H.-W. Hsu, Y.-H. Huang, C.-Y. Wang, and C. Lin, “Treexgnn: can gradient-boosted decision trees help boost heterogeneous graph neural networks?” in ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023, pp. 1–5. [46] Y. Li, J. Zhou, S. Verma, and F. Chen, “A survey of explainable graph neural networks: Taxonomy and evaluation metrics,” arXiv preprint arXiv:2207.12599, 2022. [47] Z. Hu, Y. Dong, K. Wang, and Y. Sun, “Heterogeneous graph transformer,” in Proceedings of the web conference 2020, 2020, pp. 2704–2710. [48] S. E. Schaeffer, “Graph clustering,” Computer science review, vol. 1, no. 1, pp. 27–64, 2007. [49] Z. Chen, L. Chen, S. Villar, and J. Bruna, “Can graph neural networks count substructures?” Advances in neural information processing systems, vol. 33, pp. 10 383–10 395, 2020.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/90567	-
dc.description.abstract	圖神經網路(Graph neural networks; GNNs)在各個領域展現出卓越的性能，包括在電子商務中檢測垃圾使用者和評論，以及解決社群網路中的分類問題。然而，與計算機視覺(Computer vision)或自然語言處理(Natural language processing)等領域相比，公開圖資料集的稀缺性對於實現GNN模型的突破性研究創新構成了重要障礙。在異質資訊網路(Heterogeneous information networks; HIN)中，這個問題更加突出。隨著GNN模型解釋性日益受到關注，為HIN提供公平比較基準的資料集需求變得日益迫切。為了解決這個問題，我們的研究提出了從零開始生成可解釋人工智慧之異質訊息網路(Synthetic heterogeneous information networks; SynHIN)，一種從零開始生成人造異質資訊網路的新方法。使用真實世界資料集作為基礎，SynHIN識別圖數據集中的模體(motif)，並總結目標圖的統計資訊，從而創建一個人造的異質資訊網路。我們的方法使用群內合併(In-cluster merge)和群外合併(Out-clutser merge)模組，基於motif生成人造的HIN。首先，在群內合併階段，我們生成具有相同標籤的motif，並將它們合併成一個單獨的群(Cluster)。這個過程會多次重複，生成不同的群。隨後，我們進行群外合併，生成一個完整的異質資訊網路。合併後，我們採用修剪(Pruning)模組，以確保合成之人造圖與目標真實世界資料集相似，符合其統計特性。 SynHIN生成了一個適用於節點分類任務的人造異質資訊網路資料集，最初的motif用於解釋性的正確解答。SynHIN框架具有高度的適應性，可以根據不同的資料集和motif進行調整，以滿足用戶的需求。我們解決了異質資訊網路資料集的稀缺性問題，同時解決了異質資訊網路缺乏具有解釋性正確解答之資料集問題，成為評估異質圖神經網路解釋模型的工具。本研究提出了首個生成帶有motif作為解釋性正確解答的人造異質訊息網路的方法，目的在於評估HGNN解釋性模型之效能。此外，我們提供了一個可用於未來研究的異質圖解釋性模型的基準資料集。我們的研究為HGNN領域的可解釋性人工智慧建立了一個新的評估基準，為該領域未來的發展奠定了堅實的基礎。	zh_TW
dc.description.abstract	Graph Neural Networks (GNNs) have demonstrated exceptional performance in various domains, including detecting spam users and reviews in e-commerce and tackling classification problems in social networks. However, compared to fields such as computer vision or natural language processing, the scarcity of public graph datasets presents a significant hurdle for realizing breakthrough research innovations in GNN models. This challenge is even more pronounced in the case of heterogeneous information networks (HINs). As the interpretation of GNN models has gained recent attention, the need for datasets that provide a fair comparison baseline for HINs has become increasingly urgent. To address this need, our research proposes SynHIN, a novel approach for generating synthetic heterogeneous information networks from scratch. Leveraging real-world datasets as references, SynHIN identifies motifs within the graph dataset and summarizes the target graph statistics to create a synthetic heterogeneous information network. Our approach utilizes in-cluster and out-cluster merge modules to construct the synthetic HIN based on motif clusters. Initially, we generate motifs within the same label and merge them into a single cluster in the in-cluster merge phase. This process is repeated multiple times to generate various clusters. Subsequently, we perform an out-cluster merge to create a comprehensive heterogeneous graph. After merging, we apply pruning techniques to ensure that the synthetic graph closely aligns with the target real-world dataset, adhering to its statistical properties. SynHIN generates a synthetic heterogeneous graph dataset suitable for node classification tasks, with the initial motifs serving as ground truth explanations. The SynHIN framework is highly adaptable and can be adjusted to different datasets and motifs to meet user requirements. It addresses the scarcity of heterogeneous graph datasets. It also solves the problem of lacking motif ground truth in heterogeneous graphs, making it a valuable tool for evaluating interpreters of heterogeneous graph neural networks. This research introduces the first-ever methodology for generating synthetic heterogeneous information networks with motif ground truths tailored for evaluating HGNN interpreter models. Additionally, we provide a benchmark dataset for future research on heterogeneous graph explainer models. Our work establishes a new standard for explainable AI in the field of HGNNs, laying a solid foundation for further advancements.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-10-03T16:39:55Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2023-10-03T16:39:55Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	口試委員審定書 i 誌謝 iii 摘要 v Abstract vii Contents ix List of Figures xiii List of Tables xv Chapter 1 Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2.1 Synthetic Graph Generation . . . . . . . . . . . . . . . . . . 5 1.2.2 Graphlet and Motif . . . . . . . . . . . . . . . . . . . . . . . 7 1.2.3 Meta-Paths and Meta-Graphs . . . . . . . . . . . . . . . . . 8 1.2.4 Explainers for Graph Neural Networks . . . . . . . . . . . . 9 Chapter 2 Methodology 11 2.1 Model overview of SynHIN . . . . . . . . . . . . . . . . . . . . . . 11 2.1.1 Motif Extraction . . . . . . . . . . . . . . . . . . . . . . . . 13 2.1.2 Subgraph Building . . . . . . . . . . . . . . . . . . . . . . . 15 2.1.3 Merge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.1.4 In-Cluster Merge . . . . . . . . . . . . . . . . . . . . . . . . 18 2.1.5 Out-cluster Merge . . . . . . . . . . . . . . . . . . . . . . . 19 2.1.6 Pruning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.1.7 Node Features Generation . . . . . . . . . . . . . . . . . . . 23 Chapter 3 Experiment and Result 25 3.1 Real-World Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.1.1 IMDB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.1.2 ACM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.1.3 DBLP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.2 Experiment Settings . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.2.1 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . 29 3.2.2 Experiment Environment . . . . . . . . . . . . . . . . . . . . 33 3.2.3 IMDB Motif Design . . . . . . . . . . . . . . . . . . . . . . 34 3.2.4 Parameter Settings . . . . . . . . . . . . . . . . . . . . . . . 35 3.2.5 Classification Model . . . . . . . . . . . . . . . . . . . . . . 38 3.3 Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 Chapter 4 Discussion 49 4.1 Classification Model Performance . . . . . . . . . . . . . . . . . . . 50 4.2 Fidelity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.2.1 Feature Center Distance . . . . . . . . . . . . . . . . . . . . 51 4.2.2 In/Out Merge Threshold . . . . . . . . . . . . . . . . . . . . 52 4.2.3 Number of Motifs . . . . . . . . . . . . . . . . . . . . . . . 54 4.3 Applying SynHIN to Other Datasets . . . . . . . . . . . . . . . . . . 55 4.3.1 Motif Design . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.3.2 Classification Model Performance . . . . . . . . . . . . . . . 56 4.3.3 Fidelity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 Chapter 5 Conclusions 59 Bibliography 63	-
dc.language.iso	en	-
dc.subject	異質網路模體	zh_TW
dc.subject	圖神經網路	zh_TW
dc.subject	人造圖	zh_TW
dc.subject	圖學習基準	zh_TW
dc.subject	可解釋性人工智慧	zh_TW
dc.subject	異質資訊網路	zh_TW
dc.subject	synthetic graphs	en
dc.subject	graph neural networks	en
dc.subject	heterogeneous network motifs	en
dc.subject	heterogeneous information networks	en
dc.subject	explainable artificial intelligence	en
dc.subject	graph learning benchmarks	en
dc.title	從零開始生成可解釋人工智慧之異質訊息網路	zh_TW
dc.title	Generating synthetic heterogeneous information network from scratch for explainable AI	en
dc.type	Thesis	-
dc.date.schoolyear	111-2	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	王釧茹;王志宇;李宏毅	zh_TW
dc.contributor.oralexamcommittee	Chuan-Ju Wang;Chih-Yu Wang;Hung-Yi Lee	en
dc.subject.keyword	圖神經網路,人造圖,圖學習基準,可解釋性人工智慧,異質資訊網路,異質網路模體,	zh_TW
dc.subject.keyword	graph neural networks,synthetic graphs,graph learning benchmarks,explainable artificial intelligence,heterogeneous information networks,heterogeneous network motifs,	en
dc.relation.page	69	-
dc.identifier.doi	10.6342/NTU202301987	-
dc.rights.note	同意授權(限校園內公開)	-
dc.date.accepted	2023-08-08	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	電信工程學研究所	-
dc.date.embargo-lift	2028-08-01	-
顯示於系所單位：	電信工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-111-2.pdf 未授權公開取用	4.83 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。