請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/85223| 標題: | 基於圖像之敏感資訊輸入框檢測架構—用於釣魚網站檢測 SenseInput: An Image-Based Sensitive Input Detection Scheme for Phishing Website Detection |
| 作者: | Shih-Chun Lin 林士鈞 |
| 指導教授: | 林宗男(Tsung-Nan Lin) |
| 關鍵字: | 釣魚偵測,電腦視覺,物件偵測,機器學習, Phishing Detection,Computer Vision,Object Detection,Machine Learning, |
| 出版年 : | 2022 |
| 學位: | 碩士 |
| 摘要: | 隨著釣魚網站近年來的演變,網路釣魚持續對網際網路造成威脅。先前有許多相關研究致力於提取有用的特徵,並專注於釣魚網站的必要組成元件,其中一個必要組成元件是與敏感訊息相關之敏感輸入框。然而,由於網頁設計種類繁多,檢測網頁是否存在敏感輸入並非易事。一些先前的研究提供了基於規則的方法,從 HTML 程式碼檢測含有敏感輸入框的登入表單。但是,新型釣魚網站會根據檢測規則修改 HTML 程式碼,導致檢測的準確度降低。 為了克服先前研究的侷限性,我們提出一個架構 SenseInput,混合了不同深度學習模型,從網頁截圖檢測是否含有敏感輸入框與敏感訊息,因為釣魚網站最終會在視覺內容中呈現敏感輸入框。在我們蒐集的資料集和公開資料集 Phishpedia Phish30K 上,SenseInput 的敏感輸入框偵測分別達到 96.94% 與 96.73% f1-score。接著,我們使用 22 個特徵進行網路釣魚檢測,其中包含本研究提出的 7 個統計特徵和 2 個敏感輸入特徵。實驗結果表明,我們的釣魚偵測方法分別在驗證資料集與 Phishpedia 資料集上達到 98.48% 與 95.87% f1-score,兩者皆勝過之前相關研究的方法。最後,我們研究敏感輸入特徵對於釣魚網站偵測之影響,實驗結果表明,我們的敏感輸入特徵比過往基於規則偵測出的登錄表單更有效。此外,實驗還表明,我們提出的敏感輸入特徵可以減少不同資料集間之偏差影響。 Phishing has persistently posed threats to the World Wide Web as phishing websites evolve over these years. Many previous works were devoted to extracting useful features and focused on the essential components of phishing websites. One of the essential components is sensitive inputs which require sensitive information. Yet, due to a large variety of web designs, detecting the existence of sensitive inputs is not trivial. Some previous works have provided rule-based approaches to detect login forms, which contain sensitive inputs, using HTML codes. However, the novel phishing websites modify HTML codes against the detection rules, which causes less accurate detection. To overcome the limitation of previous works, we proposed SenseInput using hybrid deep learning models to detect the existence of sensitive inputs and sensitive information because phishing websites eventually present sensitive inputs in their visual content. SenseInput achieved 96.94% f1-score for sensitive input detection on our dataset and 96.73% f1-score on a public dataset, Phishpedia Phish30K. Next, we used 22 features involving the proposed seven statistical features and two sensitive input features for phishing detection. The experiment shows that our approach achieves 98.48% and 95.87% f1-score on our validation and Phishpedia datasets, outperforming previous approaches. Finally, we investigated the influence of sensitive input features. The result shows that our sensitive input features are more effective than the rule-based login form. Besides, the experiment also indicates that proposed sensitive input features can reduce the impact of bias between different datasets. |
| URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/85223 |
| DOI: | 10.6342/NTU202201802 |
| 全文授權: | 同意授權(限校園內公開) |
| 電子全文公開日期: | 2022-08-05 |
| 顯示於系所單位: | 電信工程學研究所 |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| U0001-2707202222074100.pdf 授權僅限NTU校內IP使用(校園外請利用VPN校外連線服務) | 1.59 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
