一種用於釣魚網站驗證與偵測之方法

Jhen-hao Li; 李振皓

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/67453

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	王勝德(Sheng-De Wang)
dc.contributor.author	Jhen-hao Li	en
dc.contributor.author	李振皓	zh_TW
dc.date.accessioned	2021-06-17T01:32:53Z	-
dc.date.available	2017-08-14
dc.date.copyright	2017-08-14
dc.date.issued	2017
dc.date.submitted	2017-08-02
dc.identifier.citation	[1] B. B. Gupta, A. Tewari, A. K. Jain, and D. P. Agrawal, 'Fighting against phishing attacks: state of the art and future challenges,' Neural Computing and Applications, pp. 1-26, 2016. [2] B. Liang, M. Su, W. You, W. Shi, and G. Yang, 'Cracking Classifiers for Evasion: A Case Study on the Google's Phishing Pages Filter,' presented at the Proceedings of the 25th International Conference on World Wide Web, Montreal, Quebec, Canada, 2016. [3] Webroot. Webroot Quarterly Threat Update: 84% of Phishing Sites Exist for Less Than 24 hours. Available: https://www.webroot.com/us/en/about/press-room/ releases/quarterly-threat-update-about-phishing [4] PhishTank. Available: https://www.phishtank.com/ [5] OpenPhish. Available: https://openphish.com/ [6] Phishload. Available: http://www.medien.ifi.lmu.de/team/max.maurer/ files/phishload/ [7] D. G. Dobolyi and A. Abbasi, 'PhishMonger: A free and open source public archive of real-world phishing websites,' in 2016 IEEE Conference on Intelligence and Security Informatics (ISI), 2016, pp. 31-36. [8] P. Prakash, M. Kumar, R. R. Kompella, and M. Gupta, 'PhishNet: Predictive Blacklisting to Detect Phishing Attacks,' in 2010 Proceedings IEEE INFOCOM, 2010, pp. 1-5. [9] L.-H. Lee, K.-C. Lee, H.-H. Chen, and Y.-H. Tseng, 'POSTER: Proactive Blacklist Update for Anti-Phishing,' presented at the Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, Scottsdale, Arizona, USA, 2014. [10] S. Marchal, J. François, R. State, and T. Engel, 'PhishStorm: Detecting Phishing With Streaming Analytics,' IEEE Transactions on Network and Service Management, vol. 11, no. 4, pp. 458-471, 2014. [11] Q. Cui, G.-V. Jourdan, G. V. Bochmann, R. Couturier, and I.-V. Onut, 'Tracking Phishing Attacks Over Time,' presented at the Proceedings of the 26th International Conference on World Wide Web, Perth, Australia, 2017. [12] A. K. Jain and B. B. Gupta, 'A novel approach to protect against phishing attacks at client side using auto-updated white-list,' EURASIP Journal on Information Security, journal article vol. 2016, no. 1, p. 9, 2016. [13] P. Mensah, G. Blanc, K. Okada, D. Miyamoto, and Y. Kadobayashi, 'AJNA: Anti-phishing JS-based Visual Analysis, to Mitigate Users' Excessive Trust in SSL/TLS,' in 2015 4th International Workshop on Building Analysis Datasets and Gathering Experience Returns for Security (BADGERS), 2015, pp. 74-84. [14] C. Amrutkar, Y. S. Kim, and P. Traynor, 'Detecting Mobile Malicious Webpages in Real Time,' IEEE Transactions on Mobile Computing, 2016. [15] C. L. Tan, K. L. Chiew, K. Wong, and S. N. Sze, 'PhishWHO: Phishing webpage detection via identity keywords extraction and target domain name finder,' Decision Support Systems, vol. 88, pp. 18-27, 2016. [16] Z. Dong, A. Kapadia, J. Blythe, and L. J. Camp, 'Beyond the lock icon: real-time detection of phishing websites using public key certificates,' in 2015 APWG Symposium on Electronic Crime Research (eCrime), 2015, pp. 1-12. [17] I.-C. Lin, Y.-L. Chi, H.-C. Chuang, and M.-S. Hwang, 'The Novel Features for Phishing Based on User Device Detection,' JCP, vol. 11, no. 2, pp. 109-115, 2016. [18] S. Marchal, K. Saari, N. Singh, and N. Asokan, 'Know Your Phish: Novel Techniques for Detecting Phishing Sites and Their Targets,' in 2016 IEEE 36th International Conference on Distributed Computing Systems (ICDCS), 2016, pp. 323-333. [19] H. Zuhair, A. Selamat, and M. Salleh, 'New Hybrid Features for Phish Website Prediction,' International Journal of Advances in Soft Computing & Its Applications, vol. 8, no. 1, 2016. [20] M. Lichman, 'UCI Machine Learning Repository,' ed, 2013. [21] PhishTank Statistic. Available: https://www.phishtank.com/stats/2016/07/ [22] T. Moore and R. Clayton, 'Evaluating the Wisdom of Crowds in Assessing Phishing Websites,' in Financial Cryptography and Data Security: 12th International Conference, FC 2008, Cozumel, Mexico, January 28-31, 2008. Revised Selected Papers, G. Tsudik, Ed. Berlin, Heidelberg: Springer Berlin Heidelberg, 2008, pp. 16-30. [23] X. Han, N. Kheir, and D. Balzarotti, 'PhishEye: Live Monitoring of Sandboxed Phishing Kits,' presented at the Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, 2016. [24] Y. Cao, W. Han, and Y. Le, 'Anti-phishing based on automated individual white-list,' presented at the Proceedings of the 4th ACM workshop on Digital identity management, Alexandria, Virginia, USA, 2008. [25] Y. Fu, L. Wenyin, and X. Deng, 'Detecting Phishing Web Pages with Visual Similarity Assessment Based on Earth Mover's Distance (EMD),' IEEE Transactions on Dependable and Secure Computing, vol. 3, no. 4, pp. 301-311, 2006. [26] R. S. Rao and S. T. Ali, 'A Computer Vision Technique to Detect Phishing Attacks,' in 2015 Fifth International Conference on Communication Systems and Network Technologies, 2015, pp. 596-601. [27] S. Bozkir and E. A. Sezer, 'Use of HOG descriptors in phishing detection,' in 2016 4th International Symposium on Digital Forensic and Security (ISDFS), 2016, pp. 148-153. [28] J. Mao, P. Li, K. Li, T. Wei, and Z. Liang, 'BaitAlarm: Detecting Phishing Sites Using Similarity in Fundamental Visual Features,' in 2013 5th International Conference on Intelligent Networking and Collaborative Systems, 2013, pp. 790-795. [29] W. Zhang, H. Lu, B. Xu, and H. Yang, 'Web phishing detection based on page spatial layout similarity,' Informatica, vol. 37, no. 3, p. 231, 2013. [30] C. Ardi and J. Heidemann, 'Auntietuna: Personalized content-based phishing detection,' 2016. [31] Y. Zhang, J. I. Hong, and L. F. Cranor, 'Cantina: a content-based approach to detecting phishing web sites,' presented at the Proceedings of the 16th international conference on World Wide Web, Banff, Alberta, Canada, 2007. [32] V. L. Le, I. Welch, X. Gao, and P. Komisarczuk, 'Two-Stage Classification Model to Detect Malicious Web Pages,' in 2011 IEEE International Conference on Advanced Information Networking and Applications, 2011, pp. 113-120. [33] L. Wu, X. Du, and J. Wu, 'Effective Defense Schemes for Phishing Attacks on Mobile Computing Platforms,' IEEE Transactions on Vehicular Technology, vol. 65, no. 8, pp. 6678-6691, 2016. [34] M. Bahrami, M. Singhal, and Z. Zhuang, 'A cloud-based web crawler architecture,' in 2015 18th International Conference on Intelligence in Next Generation Networks, 2015, pp. 216-223. [35] R. Chandran and S. Manoharan, 'Performance analysis of New Zealand websites using HTTP header values,' in Proceedings of 2011 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing, 2011, pp. 25-30. [36] D. DeBarr and H. Wechsler, 'Spam detection using clustering, random forests, and active learning,' in Sixth Conference on Email and Anti-Spam. Mountain View, California, 2009.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/67453	-
dc.description.abstract	在本文中，我們提出一個名為Phishbox的方法，能有效收集釣魚網站資料，並產生用於釣魚驗證與偵測之模型。提出的方法將釣魚網站的收集、驗證與偵測整合成一個工具，可以即時監控PhishTank黑名單上的釣魚網站。由於釣魚網站的生命週期較短，我們提出了兩階段的偵測模型來確保偵測效能。首先，我們設計一個組合式模型來驗證釣魚網站，並應用主動學習降低人工標籤的成本，結果顯示，我們的組合式驗證模型擁有良好的效能，可以達到95%的準確度和3.9%的假陽性率。接著，驗證後的釣魚網站將用於訓練偵測模型。與原始數據相比，釣魚偵測的假陽性率平均下降了43.7%。實際參與PhishTank上的驗證投票，結果顯示兩階段的偵測模型能有效地驗證釣魚網站。最後，我們發現黑名單之中包含大量無效資料。比起PhishTank的定期更新機制，我們的偵測器在一周後能移除約五倍以上的無效網站。	zh_TW
dc.description.abstract	In this thesis, we propose an approach, called PhishBox, to effectively collect phishing data and generate models for phishing validation and detection. The proposed approach integrates the phishing websites collection, validation and detection into an on-line tool, which can monitor the blacklist of PhishTank and validate and detect phishing websites in real-time. Due to the short life time of phishing websites, the proposed approach uses a two-stage detection model to ensure the performance. First, we design an ensemble model to validate the phishing data and apply active learning for reducing the cost of manual labeling. The result shows that our ensemble validation model can achieve high performance with 95% accuracy and 3.9% false-positive rate. Next, the validated phishing data will be used to train a detection model. Comparing with the original dataset, the false-positive rate of phishing detection is dropped by 43.7% in average. After participating the voting procedure on PhishTank, the result shows that our two-stage model is effective to verify phishing websites. Finally, we monitor the blacklist and found that the blacklist contains lots of invalid data. According to our experiment, we can remove about five times more than regularly update after one week.	en
dc.description.provenance	Made available in DSpace on 2021-06-17T01:32:53Z (GMT). No. of bitstreams: 1 ntu-106-R04921041-1.pdf: 1808935 bytes, checksum: 3e09dcc56b5289cb02475529e76d5337 (MD5) Previous issue date: 2017	en
dc.description.tableofcontents	中文摘要 ii ABSTRACT iii CONTENTS iv LIST OF FIGURES vi LIST OF TABLES vii Chapter 1 Introduction 1 1.1 Background 1 1.2 Motivation 2 1.3 Approach 2 1.4 Contribution 3 1.5 Thesis organization 3 Chapter 2 Related work 4 2.1 Infrastructure of phishing data collection and analysis 4 2.2 The phishing detection and prevention technology 5 Chapter 3 Architecture 7 3.1 ETL module 8 3.2 Voting and monitoring module 10 3.3 Visualization 10 Chapter 4 Classification models 12 4.1 Phishing validation model 12 4.2 Active learning 14 4.3 Phishing detection model 16 Chapter 5 Experiments 17 5.1 Environment and dataset 17 5.2 Evaluation metrics 18 5.3 Phishing validation result 18 5.4 Phishing detection result 22 5.5 Voting result 23 5.6 Monitoring result 25 5.7 Comparing with the Google Safe Browsing 25 Chapter 6 Discussion 27 Chapter 7 Conclusion 28 REFERENCE 29
dc.language.iso	en
dc.title	一種用於釣魚網站驗證與偵測之方法	zh_TW
dc.title	PhishBox: An approach for phishing validation and detection	en
dc.type	Thesis
dc.date.schoolyear	105-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	雷欽隆(Chin-Laung Lei),蕭旭君(Hsu-Chun Hsiao),鄧惟中(Wei-Chung Teng)
dc.subject.keyword	網絡釣魚驗證,網絡釣魚檢測,機器學習,主動學習,	zh_TW
dc.subject.keyword	phishing validation,phishing detection,machine learning,active learning,	en
dc.relation.page	33
dc.identifier.doi	10.6342/NTU201702478
dc.rights.note	有償授權
dc.date.accepted	2017-08-03
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	電機工程學研究所	zh_TW
顯示於系所單位：	電機工程學系

文件中的檔案：

檔案	大小	格式
ntu-106-1.pdf 目前未授權公開取用	1.77 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。