請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/90968
標題: | 以帶潛在標籤的關係圖神經網絡改進垃圾評論之檢測 Improving Detection of Spam Reviews via Relational Graph Neural Networks with Potential Labels |
作者: | 洪贊濱 Tsan-Pin Hung |
指導教授: | 謝宏昀 Hung-Yun Hsieh |
共同指導教授: | 王志宇 Chih-Yu Wang |
關鍵字: | 關係圖神經網路,垃圾評論檢測, Relational Neural Networks,spam review detection, |
出版年 : | 2023 |
學位: | 碩士 |
摘要: | 在垃圾評論檢測領域,基於圖的檢測法由於能捕捉評論間的互動關係而受到廣泛矚目。然而圖神經網路 (GNN)反覆聚合鄰點訊息的特導致過平滑的問題,使得良性與惡性評論的節點表示有可能趨同。雖然早前有 研究試圖透過同時考慮同質和異質連接來降低影響,嘗試反向聚合異質連接,但由於依然使用相同的聚合函數 同時聚合不同標籤的鄰點,且假設所有良惡評論節點表示應各自相近,導致未能有效避免過度平滑。此外,一 次性更新所有節點的表示在資料量增長時將導致記憶體需求過大,因此使用子圖聚合在實際應用中變得必不可 少。然而過去的方法在建構子圖時,並未考慮到圖的拓墣結構來進行鄰點採樣,因此無法有效補捉緊密交互的 鄰點之訊息。為了解決上述的問題我們提出了一種基於潛在關係的圖神經網路垃圾評論模型,該模型根據圖的 拓墣結構相似性對進行採樣產生子圖進行隨機訓練,在聚合鄰點訊息前先使用分類器分類出潛在良性與惡性評 論鄰點,接著使用分層的聚合策略,將潛在良性與惡性評論視為兩種不同的關係分開進行聚合後,再組合這兩 類評論鄰點的訊息進行下一層的聚合。同時,我們設計了一種新的三元損失函數,使良性評論的表示與評論對 象的表示之間的相似度高於與惡性評論節點的相似度,來降低過度平滑的影響,更符合現實中的觀察。我們的 實驗結果證明了我們方法的有效性,在 yelpNYC 資料集中使用隨機切分的情況我們的方法在 AUC 分數的表現 上平均高於主要參考模型 6%和次要參考模型 1.5%,達到了 0.84,而在按時間序切分的情況下,我們的 AUC 分 數上平均分別高於主要以及次要參考模型 5.5%以及次要參考模型 6.5%,在其他資料及上也都得到優於參考模型 的節結果,並且在每一次的實驗結果中的 AUC 的分數都優於其他兩者。 Graph-based spam review detection has been appealing due to its ability to capture review interactions. However, it has problems with over-smoothing because the recurrent aggregation of neighborhood data makes it difficult to distinguish between benign and spam reviews. Although existing studies consider homogeneous and heterogeneous connections, but employ the same aggregation function and presume that benign and spam review representations should be similar, which results in inefficiencies. Additionally, updating all node representations at once becomes unfeasible as data quantities increase due to memory constraints, necessitating subgraph aggregation. However, prior approaches did not consider the topological structure of the graph in subgraph construction, making it difficult to capture information from closely interacting neighbors effectively. To address these issues, we present a GNN model for spam review detection based on potential labels to overcome these problems. According to the topology of the graph, our model sample subgraphs use a hierarchical aggregation strategy and treat potential labels of benign and spam reviews as two different relationships. We also designed a novel triplet loss function that ensures the similarity between the representation of benign review and the target of review is higher than that with spam review nodes, mitigating over-smoothing. Our experimental results demonstrate the effectiveness of our method. In the YelpNYC dataset, under random splitting, our approach outperformed the primary and secondary baseline models by 6% and 1.5% respectively on average AUC scores, achieving a score of 0.84; in the case of chronological splitting, our AUC scores were on average 5.5% and 6.5% higher than the primary and secondary baseline models respectively, achieving a score of 0.68. Our method also achieved superior results on other datasets and consistently exceeded the AUC scores. |
URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/90968 |
DOI: | 10.6342/NTU202304108 |
全文授權: | 同意授權(全球公開) |
顯示於系所單位: | 資料科學學位學程 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-111-2.pdf | 2.99 MB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。