監督式學習演算法之惡意來電偵測

Ting-Ni Chen; 陳亭霓

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/57810

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	雷欽隆(Chin-Laung Lei)
dc.contributor.author	Ting-Ni Chen	en
dc.contributor.author	陳亭霓	zh_TW
dc.date.accessioned	2021-06-16T07:05:00Z	-
dc.date.available	2019-07-29
dc.date.copyright	2014-07-29
dc.date.issued	2014
dc.date.submitted	2014-07-10
dc.identifier.citation	[1] Neal Shover, Glenn S. Coffey, and Clinton R. Sanders. Dialing for dollars: Opportu- nities, justifications, and telemarketing fraud. Qualitative Sociology, 27(1):59--75, March 2004. [2] Yves Moreau, Herman Verrelst, and Joos Vandewalle. Detection of mobile phone fraud using supervised neural networks: A first prototype. In Wulfram Gerstner, Alain Germond, Martin Hasler, and Jean-Daniel Nicoud, editors, Artificial Neural Networks —ICANN'97, number 1327 in Lecture Notes in Computer Science, pages 1065--1070. Springer Berlin Heidelberg, January 1997. [3] H. Grosser, P. Britos, and R. Garcia-Martinez. Detecting fraud in mobile telephony using neural networks. In Moonis Ali and Floriana Esposito, editors, Innovations in Applied Artificial Intelligence, number 3533 in Lecture Notes in Computer Science, pages 613--615. Springer Berlin Heidelberg, January 2005. [4] D. Dagon, T. Martin, and T. Starner. Mobile phones as computing devices: the viruses are coming! IEEE Pervasive Computing, 3(4):11--15, 2004. [5] Chris Fleizach, Michael Liljenstam, Per Johansson, Geoffrey M. Voelker, and Andras Mehes. Can you infect me now?: malware propagation in mobile phone networks. In Proceedings of the 2007 ACM workshop on Recurring malcode, WORM '07, page 61–68, New York, NY, USA, 2007. ACM. [6] Divya Muthukumaran, Anuj Sawani, Joshua Schiffman, Brian M. Jung, and Trent Jaeger. Measuring integrity on mobile phone systems. In Proceedings of the 13th ACM symposium on Access control models and technologies, SACMAT '08, page 155–164, New York, NY, USA, 2008. ACM. [7] Hebe R. Smythe. Fighting telemarketing scamsHastings Communications and Entertainment Law Journal (Comm/Ent), 17:347, 1994. [8] Jeffrey L. Bratkiewicz. Here's a quarter, call someone who cares: Who is answering the elderly's call for protection from telemarketing fraud. South Dakota Law Review, 45:586, 2000. [9] Giuseppe Bianchi, Nico d'Heureuse, and Saverio Niccolini. On-demand time- decaying bloom filters for telemarketer detection. SIGCOMM Comput. Commun. Rev., 41(5):5–12, October 2011. [10] Bertrand Mathieu, Saverio Niccolini, and Dorgham Sisalem. SDRS: a voice-over- IP spam detection and reaction system. IEEE Security and Privacy, 6(6):52–59, November 2008. [11] Lisa Bianchi, Jeffrey Jarrett, and R Choudary Hanumara. Improving forecasting for telemarketing centers by ARIMA modeling with intervention. International Journal of Forecasting, 14(4):497--504, December 1998. [12] E. W. T. Ngai, Yong Hu, Y. H. Wong, Yijun Chen, and Xin Sun. The application of data mining techniques in financial fraud detection: A classification framework and an academic review of literature. Decision Support Systems, 50(3):559--569, February 2011. [13] Thomas M. Mitchell. Machine Learning. McGraw-Hill, Inc., New York, NY, USA, 1 edition, 1997. [14] Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar. Foundations of Ma- chine Learning. The MIT Press, 2012. [15] Padraig Cunningham, Matthieu Cord, and Sarah Jane Delany. Supervised learning. In Matthieu Cord and Padraig Cunningham, editors, Machine Learning Techniques for Multimedia, Cognitive Technologies, pages 21--49. Springer Berlin Heidelberg, January 2008. [16] Nello Cristianini and John Shawe-Taylor. An Introduction to Support Vector Ma- chines and Other Kernel-based Learning Methods. Cambridge University Press, March 2000. [17] Statistical classification, October 2013. Page Version ID: 578239186. [18] J. R. Quinlan. Induction of decision trees. Machine Learning, 1(1):81--106, March 1986. [19] John Neter. Applied linear statistical models. Irwin, 1996. [20] David Heckerman. A tutorial on learning with bayesian networks. In Prof Dawn E. Holmes and Prof Lakhmi C. Jain, editors, Innovations in Bayesian Networks, number 156 in Studies in Computational Intelligence, pages 33--82. Springer Berlin Heidel- berg, January 2008. [21] Bernhard E. Boser, Isabelle M. Guyon, and Vladimir N. Vapnik. A training algo- rithm for optimal margin classifiers. In Proceedings of the fifth annual workshop on Computational learning theory, COLT '92, page 144–152, New York, NY, USA, 1992. ACM. [22] Corinna Cortes and Vladimir Vapnik. Support-vector networks. Mach. Learn., 20(3):273–297, September 1995. [23] Thorsten Joachims. Text categorization with support vector machines: Learning with many relevant features. In Claire Nedellec and Celine Rouveirol, editors, Machine Learning: ECML-98, number 1398 in Lecture Notes in Computer Science, pages 137--142. Springer Berlin Heidelberg, January 1998. [24] Support vector machines (svm) introductory overview. http://www.statsoft.com/ textbook/support-vector-machines. Accessed: 2013-12-18. [25] Mohamed Aly. Survey on Multiclass Classification Methods. 2005. [26] Cross validation. http://www.cs.cmu.edu/~schneide/tut5/node42.html. Ac- cessed: 2013-12-18. [27] Ron Kohavi. A study of cross-validation and bootstrap for accuracy estimation and model selection. In Proceedings of the 14th International Joint Conference on Arti- ficial Intelligence - Volume 2, IJCAI'95, page 1137–1143, San Francisco, CA, USA, 1995. Morgan Kaufmann Publishers Inc. [28] Ian H. Witten and Eibe Frank. Data Mining: Practical Machine Learning Tools and Techniques, Second Edition. Morgan Kaufmann, July 2005. [29] Nathalie Japkowicz and Mohak Shah. Evaluating Learning Algorithms: A Classification Perspective. Cambridge University Press, January 2011. [30] Igor Kononenko and Matjaz Kukar. Machine Learning and Data Mining: Introduc- tion to Principles and Algorithms. Horwood Publishing Limited, 2007. [31] Roc curves, 2005. [32] package 'e1071'. http://cran.r-project.org/web/packages/e1071/e1071.pdf. Ac- cessed: 2013-12-19. [33] David Meyer and Technische Universitat Wien. Support vector machines. the inter-face to libsvm in package e1071. online-documentation of the package e1071 for r,2001. [34] Chih-Chung Chang and Chih-Jen Lin. LIBSVM: a library for support vector ma-chines. ACM Trans. Intell. Syst. Technol., 2(3):27:1–27:27, May 2011.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/57810	-
dc.description.abstract	隨著科技快速進步,行動電話成了多數人生活中不可或缺的物品。通訊技術的廣泛流通亦造成電話詐騙等犯罪行為,使用者可能被竊取個資,甚至造成金錢上的損失。隨著行動電話使用者介面的進步,多數使用者仰賴電話簿及來電顯示以判斷來電者。然而,若來電號碼未紀錄於電話簿中,則使用者將無從判斷此來電是否為惡意電話。現今判斷惡意電話的方法採用黑名單,由使用者回報惡意電話號碼,因此, 判斷的準確率將視其回報者與黑名單的維護而定。本論文採用監督式學習演算法 (supervised learning algorithm),利用使用者過去的行為來分類惡意使用者與正常使用者。基於正常的使用者與惡意使用者有不同行為的假設,試圖讓電腦自動判斷一個號碼的擁有者是否為惡意使用者,並且分析此分類器 (classifier) 的精確度。實驗結果顯示,此方式能較傳統方法更早發現惡意電話使用者,讓惡意來電的自動判斷與偵測成為可能。	zh_TW
dc.description.abstract	With rapid advancement in technologies, mobile phones have gained popularity and become indispensable. The growth of elecommunication has also given rise to malicious calling behaviors where users may encounter theft of identities or even financial losses. Due to the improvement of user interface on mobile phone devices, most mobile phone users rely on caller IDs which link to contact book to identify callers. However, it is often difficult to detect whether an unknown ID is malicious or not without additional information. Recent malicious caller identification establishes blacklists based on user reports. Detecting malicious callers in this fashion proofs to be difficult and inefficient due to the fact that user report is inconsistant and unreliable. Since there might be differences between malicious and benign call patterns, the aim of this study is to automatically predicting whether an unknown ID is malicious or not by observing their past call histories. In this study, we collected phone call histories in two different countries and applied machine learning algorithms to detect whether an unknown ID is benign or malicious. We evaluated the ability of different classifiers and compared the experimental results with conventional blacklist approach. Emperical results suggest that the proposed method is effective and can be a viable approach in detecting malicious calls.	en
dc.description.provenance	Made available in DSpace on 2021-06-16T07:05:00Z (GMT). No. of bitstreams: 1 ntu-103-R01921078-1.pdf: 1098995 bytes, checksum: b86158968cd3964385481c1d71faff06 (MD5) Previous issue date: 2014	en
dc.description.tableofcontents	Contents 誌謝 i 摘要 ii Abstract iii 1 Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2 Related Work 4 2.1 Fraud and telemarketing detection . . . . . . . . . . . . . . . . . . . . . 4 2.2 Financial fraud detection . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3 Machine Learning and Classification 3.1 6 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.1.1 Supervised Learning . . . . . . . . . . . . . . . . . . . . . . . . 6 3.1.2 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.2 Supervised classification algorithms . . . . . . . . . . . . . . . . . . . . 7 3.3 Support Vector Machine . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.4 Evaluation of Data Results . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.4.1 Cross Validation . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.4.2 Confusion Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . 11 iv3.4.3 Receiver Operating Characteristics Curve (ROC Curve) . . . . . 12 4 Method 14 4.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 4.2 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 4.3 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.4.1 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . 16 4.4.2 Experiment 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 4.4.3 Experiment 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 4.4.4 Experiment 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.4.5 Experiment 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 5 Result and Analysis 5.1 20 Experiment 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 5.1.1 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 5.1.2 False Positive and False Negative Rate . . . . . . . . . . . . . . 26 5.1.3 ROC Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 5.1.4 Rationality of Feature Weight . . . . . . . . . . . . . . . . . . . 28 5.1.5 Expected Features . . . . . . . . . . . . . . . . . . . . . . . . . 29 5.1.6 Robustness of Features . . . . . . . . . . . . . . . . . . . . . . . 35 Experiment 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 5.2.1 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 5.2.2 False positive and false negative rate . . . . . . . . . . . . . . . . 37 5.2.3 ROC Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 5.3 Experiment 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 5.4 Experiment 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 6 Conclusions and Future Work 45 A Recorded Attributes 46 B Extracted Features 48 Bibliography 51
dc.language.iso	en
dc.title	監督式學習演算法之惡意來電偵測	zh_TW
dc.title	Who's Calling? Malicious Call Detection Using Supervised Learning Algorithms	en
dc.type	Thesis
dc.date.schoolyear	102-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	王勝德(Sheng-De Wang),陳銘憲(Ming-Syan Chen),陳昇瑋(Sheng-Wei Chen)
dc.subject.keyword	機器學習,監督式學習,分類,交叉驗證,	zh_TW
dc.subject.keyword	machine learning,supervised learning,classification,Support Vector Machine (SVM),cross-validation,	en
dc.relation.page	54
dc.rights.note	有償授權
dc.date.accepted	2014-07-10
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	電機工程學研究所	zh_TW
顯示於系所單位：	電機工程學系

文件中的檔案：

檔案	大小	格式
ntu-103-1.pdf 目前未授權公開取用	1.07 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。