以帶潛在標籤的關係圖神經網絡改進垃圾評論之檢測

洪贊濱; Tsan-Pin Hung

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/90968

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	謝宏昀	zh_TW
dc.contributor.advisor	Hung-Yun Hsieh	en
dc.contributor.author	洪贊濱	zh_TW
dc.contributor.author	Tsan-Pin Hung	en
dc.date.accessioned	2023-10-24T16:32:20Z	-
dc.date.available	2024-08-14	-
dc.date.copyright	2023-10-24	-
dc.date.issued	2023	-
dc.date.submitted	2023-08-11	-
dc.identifier.citation	F. Shi, Y. Cao, Y. Shang, Y. Zhou, C. Zhou, and J. Wu, “H2-fdetector: a gnn-based fraud detector with homophilic and heterophilic connections,” in Proceedings of the ACM Web Conference 2022, 2022, pp. 1486–1494. J. Pitman. (2022) Local consumer review survey 2022. https: //www.brightlocal.com/research/local-consumer-review-survey/?SSAID= 314743&SSCID=81k6 t41ah. N. Hussain, H. Turab Mirza, G. Rasool, I. Hussain, and M. Kaleem, “Spam review detection techniques: A systematic literature review,” Applied Sciences, vol. 9, no. 5, p. 987, 2019. S. K. Maurya, D. Singh, and A. K. Maurya, “Deceptive opinion spam detection approaches: a literature survey,” Applied intelligence, vol. 53, no. 2, pp. 2189–2234, 2023. S. Rayana and L. Akoglu, “Collective opinion spam detection: Bridging review networks and metadata,” in Proceedings of the 21th acm sigkdd international conference on knowledge discovery and data mining, 2015, pp. 985–994. F. Abri, L. F. Gutierrez, A. S. Namin, K. S. Jones, and D. R. Sears, “Fake reviews detection through analysis of linguistic features,” arXiv preprint arXiv:2010.04260, 2020. Y. Dou, Z. Liu, L. Sun, Y. Deng, H. Peng, and P. S. Yu, “Enhancing graph neural network-based fraud detectors against camouflaged fraudsters,” in Proceedings of the 29th ACM International Conference on Information and Knowledge Management (CIKM’20), 2020. A. Mukherjee, V. Venkataraman, B. Liu, and N. Glance, “What yelp fake review filter might be doing?” in Proceedings of the international AAAI conference on web and social media, vol. 7, no. 1, 2013. N. Jindal and B. Liu, “Opinion spam and analysis,” in Proceedings of the 2008 international conference on web search and data mining, 2008, pp. 219–230. I. Gunes, C. Kaleli, A. Bilge, and H. Polat, “Shilling attacks against recommender systems: a comprehensive survey,” Artificial Intelligence Review, vol. 42, no. 4, pp. 767–799, 2014. C. Yuan, W. Zhou, Q. Ma, S. Lv, J. Han, and S. Hu, “Learning review representations from user and product level information for spam detection,” 2019. G. Wang, S. Xie, B. Liu, and S. Y. Philip, “Review graph based online store review spammer detection,” in 2011 IEEE 11th international conference on data mining. IEEE, 2011, pp. 1242–1247. J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. Dahl, “Neural message passing for quantum chemistry,” in International conference on machine learning. PMLR, 2017, pp. 1263–1272. W. L. Hamilton, “Graph representation learning,” Synthesis Lectures on Artificial Intelligence and Machine Learning, vol. 14, no. 3, p. 51, 2020. M. Wang, D. Zheng, Z. Ye, Q. Gan, M. Li, X. Song, J. Zhou, C. Ma, L. Yu, Y. Gai, et al., “Deep graph library: A graph-centric, highly-performant package for graph neural networks,” arXiv preprint arXiv:1909.01315, 2019. Y. Liu, X. Ao, Z. Qin, J. Chi, J. Feng, H. Yang, and Q. He, “Pick and choose: A gnn-based imbalanced learning approach for fraud detection,” in Proceedings of the Web Conference 2021, 2021, pp. 3168–3177. A. Li, Z. Qin, R. Liu, Y. Yang, and D. Li, “Spam review detection with graph convolutional networks,” in Proceedings of the 28th ACM International Conference on Information and Knowledge Management, 2019, pp. 2703– 2711. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017. A. Barushka and P. Hajek, “Review spam detection using word embeddings and deep neural networks,” in IFIP International Conference on Artificial Intelligence Applications and Innovations. Springer, 2019, pp. 340–350. N. Reimers and I. Gurevych, “Sentence-bert: Sentence embeddings using siamese bert-networks,” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 11 2019. Online Available at: https://arxiv.org/abs/1908.10084 S. Shehnepoor, R. Togneri, W. Liu, and M. Bennamoun, “HIN-RNN: A graph representation learning neural network for fraudster group detection with no handcrafted features,” IEEE Transactions on Neural Networks and Learning Systems, pp. 1–14, 2021. Online Available at: https://doi.org/10.1109%2Ftnnls.2021.3123876 R. Ying, R. He, K. Chen, P. Eksombatchai, W. L. Hamilton, and J. Leskovec, “Graph convolutional neural networks for web-scale recommender systems,” in Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, 2018, pp. 974–983. M. Schlichtkrull, T. N. Kipf, P. Bloem, R. v. d. Berg, I. Titov, and M. Welling, “Modeling relational data with graph convolutional networks,” arXiv preprint arXiv:1703.06103, 2017. S.-j. Ji, Q. Zhang, J. Li, D. K. Chiu, S. Xu, L. Yi, and M. Gong, “A burstbased unsupervised method for detecting review spammer groups,” Information Sciences, vol. 536, pp. 454–469, 2020. Z. Wang, S. Gu, and X. Xu, “Gslda: Lda-based group spamming detection in product reviews,” Applied Intelligence, vol. 48, pp. 3094–3107, 2018. Y. Dou, Z. Liu, L. Sun, Y. Deng, H. Peng, and P. S. Yu, “Enhancing graph neural network-based fraud detectors against camouflaged fraudsters,” in Proceedings of the 29th ACM International Conference on Information and Knowledge Management (CIKM’20), 2020. A. Grover and J. Leskovec, “node2vec: Scalable feature learning for networks,” in Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, 2016, pp. 855–864. W. Hamilton, Z. Ying, and J. Leskovec, “Inductive representation learning on large graphs,” Advances in neural information processing systems, vol. 30, 2017. T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dolla ́r, “Focal loss for dense object detection,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2980–2988. F. Schroff, D. Kalenichenko, and J. Philbin, “FaceNet: A unified embedding for face recognition and clustering,” in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, jun 2015. Online Available at: https://doi.org/10.1109%2Fcvpr.2015.7298682 Y. Wu and Y. Liu, “Robust truncated hinge loss support vector machines,” Journal of the American Statistical Association, vol. 102, no. 479, pp. 974– 983, 2007. M. Wang, D. Zheng, Z. Ye, Q. Gan, M. Li, X. Song, J. Zhou, C. Ma, L. Yu, Y. Gai, T. Xiao, T. He, G. Karypis, J. Li, and Z. Zhang, “Deep graph library: A graph-centric, highly-performant package for graph neural networks,” arXiv preprint arXiv:1909.01315, 2019. J. Ni, J. Li, and J. McAuley, “Justifying recommendations using distantlylabeled reviews and fine-grained aspects,” in Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), 2019, pp. 188–197. S. Zhang, H. Yin, T. Chen, Q. V. N. Hung, Z. Huang, and L. Cui, “Gcnbased user representation learning for unifying robust recommendation and fraudster detection,” 2020.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/90968	-
dc.description.abstract	在垃圾評論檢測領域，基於圖的檢測法由於能捕捉評論間的互動關係而受到廣泛矚目。然而圖神經網路 (GNN)反覆聚合鄰點訊息的特導致過平滑的問題，使得良性與惡性評論的節點表示有可能趨同。雖然早前有研究試圖透過同時考慮同質和異質連接來降低影響，嘗試反向聚合異質連接，但由於依然使用相同的聚合函數同時聚合不同標籤的鄰點，且假設所有良惡評論節點表示應各自相近，導致未能有效避免過度平滑。此外，一次性更新所有節點的表示在資料量增長時將導致記憶體需求過大，因此使用子圖聚合在實際應用中變得必不可少。然而過去的方法在建構子圖時，並未考慮到圖的拓墣結構來進行鄰點採樣，因此無法有效補捉緊密交互的鄰點之訊息。為了解決上述的問題我們提出了一種基於潛在關係的圖神經網路垃圾評論模型，該模型根據圖的拓墣結構相似性對進行採樣產生子圖進行隨機訓練，在聚合鄰點訊息前先使用分類器分類出潛在良性與惡性評論鄰點，接著使用分層的聚合策略，將潛在良性與惡性評論視為兩種不同的關係分開進行聚合後，再組合這兩類評論鄰點的訊息進行下一層的聚合。同時，我們設計了一種新的三元損失函數，使良性評論的表示與評論對象的表示之間的相似度高於與惡性評論節點的相似度，來降低過度平滑的影響，更符合現實中的觀察。我們的實驗結果證明了我們方法的有效性，在 yelpNYC 資料集中使用隨機切分的情況我們的方法在 AUC 分數的表現上平均高於主要參考模型 6%和次要參考模型 1.5%，達到了 0.84，而在按時間序切分的情況下，我們的 AUC 分數上平均分別高於主要以及次要參考模型 5.5%以及次要參考模型 6.5%，在其他資料及上也都得到優於參考模型的節結果，並且在每一次的實驗結果中的 AUC 的分數都優於其他兩者。	zh_TW
dc.description.abstract	Graph-based spam review detection has been appealing due to its ability to capture review interactions. However, it has problems with over-smoothing because the recurrent aggregation of neighborhood data makes it difficult to distinguish between benign and spam reviews. Although existing studies consider homogeneous and heterogeneous connections, but employ the same aggregation function and presume that benign and spam review representations should be similar, which results in inefficiencies. Additionally, updating all node representations at once becomes unfeasible as data quantities increase due to memory constraints, necessitating subgraph aggregation. However, prior approaches did not consider the topological structure of the graph in subgraph construction, making it difficult to capture information from closely interacting neighbors effectively. To address these issues, we present a GNN model for spam review detection based on potential labels to overcome these problems. According to the topology of the graph, our model sample subgraphs use a hierarchical aggregation strategy and treat potential labels of benign and spam reviews as two different relationships. We also designed a novel triplet loss function that ensures the similarity between the representation of benign review and the target of review is higher than that with spam review nodes, mitigating over-smoothing. Our experimental results demonstrate the effectiveness of our method. In the YelpNYC dataset, under random splitting, our approach outperformed the primary and secondary baseline models by 6% and 1.5% respectively on average AUC scores, achieving a score of 0.84; in the case of chronological splitting, our AUC scores were on average 5.5% and 6.5% higher than the primary and secondary baseline models respectively, achieving a score of 0.68. Our method also achieved superior results on other datasets and consistently exceeded the AUC scores.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-10-24T16:32:20Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2023-10-24T16:32:20Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	ABSTRACT ii LISTOFTABLES v LISTOFFIGURES vi CHAPTER1 INTRODUCTION 1 CHAPTER 2 BACKGROUND AND RELATED WORK 4 2.1 SpamReview 4 2.2 Graph-basedSpamReviewDetection 5 2.2.1 GraphNeuralNetwork 6 2.2.2 StochasticTrainingonGraphs 8 2.2.3 NeighborSampler 10 2.3 RelatedWork 10 2.3.1 GAS 10 2.3.2 H2-FDetector 11 2.4 Summary 12 CHAPTER3 SYSTEMMODEL 13 3.1 DatasetDescription 13 3.2 CommentGraphConstruction 14 3.2.1 ReviewContextRepresentation 15 3.2.2 EdgeRepresentation 15 3.2.3 NodeRepresentation 15 3.3 GraphSampling 16 3.4 Heterogeneous Graph Convolutional Network 17 3.4.1 AggregationStage 19 3.4.2 CombinationStage 20 3.4.3 SummaryoftheHGNNModel 20 3.5 Summary 22 CHAPTER4 METHODOLOGY 23 4.1 Motivation 23 4.2 ModelArchitecture 24 4.3 TopologyAwareGraphSampling 25 4.4 PL-RGNNModel 25 4.4.1 PotentialLabelIdentification 26 4.4.2 Relational Graph Attention Aggregation 28 4.5 Optimization 30 4.5.1 TripletLoss 31 4.5.2 FocalLoss 32 CHAPTER5 PERFORMANCEEVALUATION 34 5.1 ExperimentSetup 34 5.2 EvaluateMethod 35 5.2.1 Metrics 35 5.2.2 Visualization 36 5.3 AveragePerformanceAnalysis 36 5.4 ModelTrade-offAnalysis 45 5.5 EmbeddingVisualization 48 5.6 Performance Comparison of Different Embedding Methods with Baselines 50 5.7 AblationStudy 52 5.8 PerformanceComparisononAmazondataset 55 5.9 Summary 56 CHAPTER 6 CONCLUSION AND FUTURE WORK 57 REFERENCES 58	-
dc.language.iso	en	-
dc.subject	關係圖神經網路	zh_TW
dc.subject	垃圾評論檢測	zh_TW
dc.subject	Relational Neural Networks	en
dc.subject	spam review detection	en
dc.title	以帶潛在標籤的關係圖神經網絡改進垃圾評論之檢測	zh_TW
dc.title	Improving Detection of Spam Reviews via Relational Graph Neural Networks with Potential Labels	en
dc.type	Thesis	-
dc.date.schoolyear	111-2	-
dc.description.degree	碩士	-
dc.contributor.coadvisor	王志宇	zh_TW
dc.contributor.coadvisor	Chih-Yu Wang	en
dc.contributor.oralexamcommittee	黃瀚萱;蔡銘峰	zh_TW
dc.contributor.oralexamcommittee	Hen-Hsen Huang;Ming-Feng Tsai	en
dc.subject.keyword	關係圖神經網路,垃圾評論檢測,	zh_TW
dc.subject.keyword	Relational Neural Networks,spam review detection,	en
dc.relation.page	60	-
dc.identifier.doi	10.6342/NTU202304108	-
dc.rights.note	同意授權(全球公開)	-
dc.date.accepted	2023-08-13	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	資料科學學位學程	-
dc.date.embargo-lift	2024-08-14	-
顯示於系所單位：	資料科學學位學程

文件中的檔案：

檔案	大小	格式
ntu-111-2.pdf	2.99 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。