基於帳戶身份關聯之不確定圖演算法於詐騙偵測

Hsing-Yu Shih; 施星宇

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/86033

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	林宗男(Tsung-Nan Lin)
dc.contributor.author	Hsing-Yu Shih	en
dc.contributor.author	施星宇	zh_TW
dc.date.accessioned	2023-03-19T23:33:44Z	-
dc.date.copyright	2022-09-19
dc.date.issued	2022
dc.date.submitted	2022-09-16
dc.identifier.citation	[1] A. Khan, Y. Ye, and L. Chen, “On uncertain graphs,” Synthesis Lectures on Data Management, vol. 10, no. 1, pp. 1–94, 2018. [2] G. Stringhini, P. Mourlanne, G. Jacob, M. Egele, C. Kruegel, and G. Vigna,“{EVILCOHORT}: Detecting communities of malicious accounts on online services,” in 24th USENIX Security Symposium (USENIX Security 15), pp. 563–578,2015. [3] M. Ceccarello, C. Fantozzi, A. Pietracaprina, G. Pucci, and F. Vandin, “Clustering uncertain graphs,” Proc. VLDB Endow., 2017. [4] A. Almaatouq, E. Shmueli, M. Nouh, A. Alabdulkareem, V. K. Singh, M. Alsaleh, A. Alarifi, A. Alfaris, and A. Pentland, “If it looks like a spammer and behaves like a spammer, it must be a spammer: analysis and detection of microblogging spam accounts,” International Journal of Information Security, vol. 15, no. 5, pp. 475–491, 2016. [5] C. Xiao, D. M. Freeman, and T. Hwa, “Detecting clusters of fake accounts in online social networks,” in Proceedings of the 8th ACM Workshop on Artificial Intelligence and Security, pp. 91–101, 2015. [6] K. Thomas, C. Grier, D. Song, and V. Paxson, “Suspended accounts in retrospect: an analysis of twitter spam,” in Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference, pp. 243–258, 2011. [7] Y. Gao, Y. Ma, and D. Li, “Anomaly detection of malicious users’ behaviors for web applications based on web logs,” in 2017 IEEE 17th International Conference on Communication Technology (ICCT), pp. 1352–1355, IEEE, 2017. [8] B. Chen, W. Liou, H. Shih, and T. Lin, “Dragon: Detection of related account groups for online services with uncertain graphs,” in 2021 IEEE Global Communications Conference (GLOBECOM), pp. 01–06, 2021. [9] G. C. M. Moura, C. Ga ̃n ́an, Q. Lone, P. Poursaied, H. Asghari, and M. van Eeten, “How dynamic is the isps address space? towards internet-wide dhcp churn estimation,” in 2015 IFIP Networking Conference (IFIP Networking), pp. 1–9, 2015. [10] Y. Xie, F. Yu, K. Achan, E. Gillum, M. Goldszmidt, and T. Wobber, “How dynamic are ip addresses?,” in Proceedings of the 2007 conference on Applications, technologies, architectures, and protocols for computer communications, pp. 301–312, 2007. [11] A. Beutel, W. Xu, V. Guruswami, C. Palow, and C. Faloutsos, “Copycatch: Stopping group attacks by spotting lockstep behavior in social networks,” WWW ’13, ACM, 2013. [12] P. Rathore, J. Soni, N. Prabakar, M. Palaniswami, and P. Santi, “Identifying groups of fake reviewers using a semisupervised approach,” IEEE Transactions on Computational Social Systems, vol. 8, no. 6, pp. 1369–1378, 2021. [13] M. Jiang, P. Cui, A. Beutel, C. Faloutsos, and S. Yang, “Catchsync: Catching synchronized behavior in large directed graphs,” in Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’14, (New York, NY, USA), p. 941–950, Association for Computing Machinery, 2014. [14] D. Freeman, S. Jain, M. D ̈urmuth, B. Biggio, and G. Giacinto, “Who are you? a statistical approach to measuring user authenticity.,” in NDSS, vol. 16, pp. 21–24, 2016. [15] A. Abouollo and S. Almuhammadi, “Detecting malicious user accounts using canvas fingerprint,” in 2017 8th International Conference on Information and Communication Systems (ICICS), pp. 358–361, IEEE, 2017. [16] P. Papadopoulos, N. Kourtellis, and E. Markatos, “Cookie synchronization: Everything you always wanted to know but were afraid to ask,” in The World Wide Web Conference, pp. 1432–1442, 2019. [17] S. Kleinberg and B. Mishra, “Psst: A web-based system for tracking political statements,” in Proceedings of the 17th International Conference on World Wide Web, 2008. [18] Y. Xie, F. Yu, and M. Abadi, “De-anonymizing the internet using unreliable ids,” ACM SIGCOMM Computer Communication Review, vol. 39, no. 4, pp. 75–86, 2009. [19] K. E. Martin, “Data aggregators, consumer data, and responsibility online: Who is tracking consumers online and should they stop?,” The Information Society, 2016. [20] V. Mishra, P. Laperdrix, A. Vastel, W. Rudametkin, R. Rouvoy, and M. Lopatka, “Don’t count me out: On the relevance of ip address in the tracking ecosystem,” in Proceedings of The Web Conference 2020, 2020. [21] Y. Shavitt and N. Zilberman, “A geolocation databases study,” IEEE Journal on Selected Areas in Communications, vol. 29, no. 10, pp. 2044–2056, 2011. [22] P. Laperdrix, N. Bielova, B. Baudry, and G. Avoine, “Browser fingerprinting: A survey,” ACM Trans. Web, Apr. 2020. [23] P. Eckersley, “How unique is your web browser?,” in Proceedings of the 10th International Conference on Privacy Enhancing Technologies, PETS’10, Springer Verlag, 2010. [24] P. Laperdrix, W. Rudametkin, and B. Baudry, “Beauty and the beast: Diverting modern web browsers to build unique browser fingerprints,” in 2016 IEEE Symposium on Security and Privacy, 2016. [25] A. G ́omez-Boix, P. Laperdrix, and B. Baudry, Hiding in the Crowd: An Analysis of the Effectiveness of Browser Fingerprinting at Large Scale. International World Wide Web Conferences Steering Committee, 2018. [26] D. Komosny and S. U. Rehman, “Survival analysis and prediction model of ip address assignment duration,” IEEE Access, vol. 8, pp. 162507–162515, 2020. [27] R. Padmanabhan, A. Dhamdhere, E. Aben, k. claffy, and N. Spring, “Reasons dynamic addresses change,” in Proceedings of the 2016 Internet Measurement Conference, IMC ’16, (New York, NY, USA), p. 183–198, Association for Computing Machinery, 2016. [28] R. Padmanabhan, J. P. Rula, P. Richter, S. D. Strowes, and A. Dainotti, “Dynamips: Analyzing address assignment practices in ipv4 and ipv6,” in Proceedings of the 16th International Conference on Emerging Networking EXperiments and Technologies, CoNEXT ’20, (New York, NY, USA), p. 55–70, Association for Computing Machinery, 2020. [29] L. Liu, R. Jin, C. Aggarwal, and Y. Shen, “Reliable clustering on uncertain graphs,” in 2012 IEEE 12th International Conference on Data Mining, 2012. [30] S. Dongen, “Graph clustering by flow simulation,” PhD thesis, Center for Math and Computer Science (CWI), 2000. [31] V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre, “Fast unfolding of communities in large networks,” Journal of statistical mechanics: theory and experiment, vol. 2008, no. 10, p. P10008, 2008. [32] M. Potamias, F. Bonchi, A. Gionis, and G. Kollios, “K-nearest neighbors in uncertain graphs,” Proc. VLDB Endow., vol. 3, p. 997–1008, sep 2010. [33] G. Kollios, M. Potamias, and E. Terzi, “Clustering large probabilistic graphs,” IEEE Transactions on Knowledge and Data Engineering, vol. 25, no. 2, pp. 325–336, 2013. [34] L. Tang and H. Liu, “Leveraging social media networks for classification,” Data Mining and Knowledge Discovery, vol. 23, no. 3, pp. 447–478, 2011. [35] B. Perozzi, R. Al-Rfou, and S. Skiena, “Deepwalk: Online learning of social representations,” in Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’14, (New York, NY, USA), pp. 701–710, ACM, 2014. [36] J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei, “Line: Large-scale information network embedding,” in Proceedings of the 24th International Conference on World Wide Web, WWW ’15, (Republic and Canton of Geneva, CHE),p. 1067–1077, International World Wide Web Conferences Steering Committee,2015. [37] A. Grover and J. Leskovec, “node2vec: Scalable feature learning for networks,” CoRR, vol. abs/1607.00653, 2016. [38] J. Hu, R. Cheng, Z. Huang, Y. Fang, and S. Luo, “On embedding uncertain graphs,” in Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, CIKM ’17, (New York, NY, USA), p. 157–166, Association for Computing Machinery, 2017. [39] J. J. Ramasco and S. A. Morris, “Social inertia in collaboration networks,” Phys. Rev. E, 2006. [40] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” 2013. [41] M. LLC, “Geoip2 connection type database.” [42] R. ˇReh ̊uˇrek and P. Sojka, “Software Framework for Topic Modelling with Large Corpora,” in Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks (Valletta, Malta), pp. 45–50, ELRA, May 2010. http://is.muni.cz/publication/884893/en. [43] I. Livadariu, K. Benson, A. Elmokashfi, A. Dhamdhere, and A. Dainotti, “Inferring carrier-grade nat deployment in the wild,” in IEEE INFOCOM 2018 - IEEE Conference on Computer Communications, pp. 2249–2257, 2018. [44] L. Van der Maaten and G. Hinton, “Visualizing data using t-sne.,” Journal of machine learning research, vol. 9, no. 11, 2008. [45] X. Chen, M. Chen, W. Shi, Y. Sun, and C. Zaniolo, “Embedding uncertain knowledge graphs,” 11 2018.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/86033	-
dc.description.abstract	近年來線上金融服務的影響力變得越來越大，然而，惡意犯罪集團可能會控制線上金融服務的帳號以進行詐騙行為，由於這些惡意帳戶經常由相同的犯罪集團帳控，這類帳戶之間可以被觀察到不同於一般帳戶的關聯性，使得這些帳戶能被聚集形成一個帳戶群，藉由發掘這些可疑的帳戶群將可以達到偵測惡意帳戶的目的。為了要發掘帳戶之間的關聯性以將帳戶群聚，過去的研究著重於使用單一身份特徵來確認帳戶背後是否由同一人或是同一集團所控制，然而單一身份特徵容易因為特徵可能因為巧合發生共用而造成關聯上雜訊，相對的，若使用多個身份特徵來建立帳戶關聯，因有更多證據可推論關聯性的存在，能有更少的關聯上的雜訊。在這篇論文中，我們提出AI-URG演算法來偵測線上銀行服務中可疑的異常帳戶群，AI-URG基於不確定圖技術建立帳戶身份關聯，考慮到單一身份特徵對於帳戶存在關聯性提供的證據不足，以及考慮到身份特徵得對應與真實世界的身份實體仍存在不確定性，我們提出multi-factor identity modeling能將帳戶間因由同一群人所操控而存在關聯之機率以不確定圖表示。為了要從帳戶關聯的不確定圖中偵測出可疑帳戶群，我們提出了DeepURGE，DeepURGE基於建立關聯圖中帳戶的特徵向量，來找出高相關性的帳戶群，此外，聚類後的高關聯性帳戶群並非皆為可疑帳戶群，我們基於惡意帳戶偵測的基礎知識設計了篩選策略，以從決定帳戶群是否為可疑帳戶群。我們以真實世界線上銀行服務的資料集驗證AI-URG的有效性，結果顯示AI-URG可以有效的偵測資料集中78.2%已標示的惡意帳戶，且與其他方法比較，AI-URG能得到更高的F1 score(58.0%)以及更高的精確度(46.0%)。	zh_TW
dc.description.abstract	Online banking has been increasingly important nowadays. Unfortunately, some malicious groups may control accounts to conduct fraud activities. Because the same criminal group holds malicious accounts, suspicious accounts form the communities and can be observed. Previous works focus on related accounts with single-factor identity to find those suspicious communities, while multi-factor identity is less susceptible to noise. In this work, we proposed AI-URG for detecting suspicious account groups with account identity uncertain graph. Because of the insufficient single-factor identity and uncertainty binding between accounts and identity, we model identity level relations with multi-factor identity modeling an uncertain graph. To detect suspicious account groups in the uncertain graph, we propose DeepURGE produce account representation and find the account communities. Since some communities are benign, we determine whether it is suspicious with a strategy based on domain knowledge. We evaluate AI-URG with a real-world dataset. The result shows that it can detect labeled suspicious accounts and outperform alternatives with higher F1 score(58.0%) and precision(46.0%).	en
dc.description.provenance	Made available in DSpace on 2023-03-19T23:33:44Z (GMT). No. of bitstreams: 1 U0001-1409202217303200.pdf: 5198906 bytes, checksum: dc2594a7e282bd39d36a03665a99c0be (MD5) Previous issue date: 2022	en
dc.description.tableofcontents	口試委員會審定書 i 致謝 iii 中文摘要 v Abstract vii 1 Introduction 1 2 Background 5 2.1 Online service malicious accounts detection 5 2.2 Web tracking 6 2.3 IP Dynamics 7 2.4 Uncertain Graph 8 3 Method 11 3.1 Account Identity Uncertain Graph 11 3.1.1 Framework of Multi-factor Identity Modeling 12 3.1.2 Coincident Identifier Probability Function of IP address 15 3.2 Suspicious Account Groups Detecting 19 3.2.1 Feature Learning Based Uncertain Graph Embedding: DeepURGE 20 3.2.2 Detecting Suspicious Account Group with Clusters 21 4 Results and Discussions 25 4.1 Dataset 25 4.2 Approach 28 4.3 Results 29 4.3.1 Comparison of AI-URG and Alternative 29 4.3.2 IP Duration Modeling and Comparison Between Connection Types 32 4.3.3 Find Communities with DeepURGE 32 4.4 Ablation Studies 33 4.4.1 Comparison of AI-URG and its Variants 33 4.4.2 Comparison of purning edges with different probability threshold 34 4.4.3 Comparison of selecting account groups with different group size 35 4.4.4 Comparison of different clustering approach in AI-URG 36 4.5 Discussion and Future Work 37 5 Conclusion 43 5.1 Conclusion 43 5.2 Acknowledgment 43 A Simulation of multi-factor account identity uncertain graph 45 B Proof of effectiveness of multi-factor identity to detect account groups 49 Bibliography 51
dc.language.iso	en
dc.title	基於帳戶身份關聯之不確定圖演算法於詐騙偵測	zh_TW
dc.title	AI-URG: Account Identity Based Uncertain Graph Algorithm for Fraud Detection	en
dc.type	Thesis
dc.date.schoolyear	110-2
dc.description.degree	碩士
dc.contributor.author-orcid	0000-0003-4111-3768
dc.contributor.oralexamcommittee	鄧惟中(Wei-Chung Teng),陳俊良(Jiann-Liang Chen),沈上翔(Shan-Hsiang Shen)
dc.subject.keyword	詐騙偵測,不確定圖,帳戶關聯性網路,身份追蹤,節點嵌入,	zh_TW
dc.subject.keyword	fraud detection,uncertain graph,account identity relation,identity tracking,ode embedding,	en
dc.relation.page	55
dc.identifier.doi	10.6342/NTU202203410
dc.rights.note	同意授權(全球公開)
dc.date.accepted	2022-09-19
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	電信工程學研究所	zh_TW
dc.date.embargo-lift	2022-09-19	-
顯示於系所單位：	電信工程學研究所

文件中的檔案：

檔案	大小	格式
U0001-1409202217303200.pdf	5.08 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。