請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/29807完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 陳信希 | |
| dc.contributor.author | Wei-Lin Tseng | en |
| dc.contributor.author | 曾威箖 | zh_TW |
| dc.date.accessioned | 2021-06-13T01:19:40Z | - |
| dc.date.available | 2013-12-09 | |
| dc.date.copyright | 2011-12-09 | |
| dc.date.issued | 2011 | |
| dc.date.submitted | 2011-08-03 | |
| dc.identifier.citation | Anti-Phishing Work Group, available online at www.antiphishing.org/
Autonomous System (AS), available online at http://en.wikipedia.org/wiki/Autonomous_system_(Internet) E. Baykan, M. Henzinger, L. Marian, I. Weber. (2009) “Purely URL-based topic classification”, WWW '09 Proceedings of the 18th international conference on World wide web, pages 1109–1110. Fava, D.S., Byers, S.R., Yang, S.J. (2006). “Projecting Cyberattacks Through Variable-Length Markov Models”, The Journal of Machine Learning Research Volume 7, 12/1/2006, pages 359-369. F. Sebastiani. (2002). “Machine learning in automated text categorization”, Journal ACM Computing Surveys (CSUR) Volume 34 Issue 1, pages 1-47. H. Zuo, W. Hu, O. Wu. (2010). “Patch-based skin color detection and its application to pornography image filtering.” WWW '10: Proceedings of the 19th international conference on World wide web, pages 1227–1228. Internet Assigned Numbers Authority (IANA) , available online at http://www.iana.org/ IP to Country mapping, available online at http://www.ip2nation.com/ J. Z. Kolter, M. A. Maloof. (2008). “Learning to Detect and Classify Malicious Executables in the Wild”, IEEE Transactions on Information Forensics and Security Volume 3 Issue 3, pages 2721-2744. Lee, P. Y., Hui, S. C., and Fong, A. C. M. (2002). “Neural Networks for Web Content Filtering,” IEEE Intelligent Systems Volume 17 Issue 5, pages 48-57. Lee, P. Y., Hui, S. C., and Fong, A. C. M. (2003). “A Structural and Content-Based Analysis for Web Filtering,” Internet Research: Electronic Networking Applications and Policy Volume 13 Issue 1, pages 27-37. L. Wenyin, G. Huang, L. Xiaoyue, Z. Min, X. Deng. (2005). “Detection of phishing webpages based on visual similarity”, WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web, pages 1060-1061. M. Deshpande, G. Karypis. (2004). “Selective Markov models for predicting Web page accesses”, ACM Transactions on Internet Technology (TOIT) Volume 4 Issue 2, pages 163–184. Natural Language Toolkit (NLTK) , available online at http://www.nltk.org/ Open Directory Project (ODP) , available online at http://www.dmoz.org/ Platform for Internet Content Selection (PICS) , available online at http://www.w3.org/PICS/ PhishTank, available online at www.phishtank.com/ R. Lempel, S. Moran. (2003). “Predictive caching and prefetching of query results in search engines.” WWW '03: Proceedings of the 12th international conference on World Wide Web, pages 19–28. R. W. White, P. Bailey, and L. Chen. (2009). “Predicting user interests from contextual information.” SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, pages 363–370. Root Zone Database, available online at http://www.iana.org/domains/root/db/ S. Gündüz, M. T. Özsu. (2003). “Recommendation models for user accesses to web pages” ICANN/ICONIP'03: Proceedings of the 2003 joint international conference on Artificial neural networks and neural information processing, pages 1003–1010. Trend Micro Inc., available online at http://tw.trendmicro.com/tw/about/ Trend MicroTM URL Filtering Module (Engine), available online at http://emea.trendmicro.com/imperia/md/content/uk/products/datasheets/ds02urlf070804gb.pdf Trend Micro Inc. WTP (Web Threat Protection), available online at http://www.trendmicro.com.tw/wtp/micro/index.asp X. Shen, S. Dumais, and E. Horvitz. (2005). “Analysis of topic dynamics in web search.” Proceedings of the International Conference on World Wide Web, pages 1102–1103. Yahoo Directory, available online at http://dir.yahoo.com/ Z. Cheng, B. Gao and T.Y. Liu. (2010). “Actively predicting diverse search intent from user browsing behaviors.” WWW ’10: Proceedings of the 19th international conference on World Wide Web, pages 221–230. | |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/29807 | - |
| dc.description.abstract | 現今網路發達及網頁服務的成長非常迅速,大數的網頁類別預測皆利用來自於使用者在入口網站及搜尋引擎的關鍵字查詢、以及與查詢結果有相關連的網頁點擊,探勘其關連來預測使用者的意圖。分析這些使用者在網站上的存取資料及結果,不僅可以幫助增進搜尋引擎回傳的查詢資料的準確度、透過網頁快取及預先儲存點擊的網頁以增加搜尋引擎的效能、與查詢關連的網頁推薦系統、個人化的網站排序系統,還可應用在商業廣告行為的產品推薦及資訊過濾的應用,所以預測使用者的意圖顯然是個很重要的議題及挑戰。
多數的研究皆以觀察使用者的查詢關鍵字及關連結果的網頁點擊,來分析使用者的意圖及瀏覽行為。本論文利用觀察使用者瀏覽網頁的存取紀錄及其網頁的類別紀錄,藉由預測使用者未來點擊的網頁類別來了解其意圖,並且實作出兩種模型:利用網頁的頂級網域名稱模型(Top-Level Domain Model)及隱藏馬可夫模型(Hidden Markov Model)來預測使用者的網頁類別。 依據上述兩種模型,我們提出混合模型(Mixture Model),以隱藏馬可夫模型(Hidden Markov Model)配合瀏覽網址的頂級網域名稱模型(Top-Level Domain Model)加上網域的關連做最佳化。實驗證實:(1)觀察網址本身的資訊在特定的頂級網域上,的確能幫助提升網頁類別預測的準確性;(2)觀察使用者瀏覽行為的情境感知的資訊所預測的網頁類別會更加準確;(3)觀察使用者瀏覽行為的前幾次存取紀錄越多,準確率越高(HMM 1-gram, HMM 2-gram, HMM 3-gram, HMM 4-gram 的比較)。 | zh_TW |
| dc.description.abstract | Web activities and services are increasing rapidly. In recent years, predicting user intent most from relation between query keyword and queried result pages with search engine or portal. Analyzing users’ access data or activities on website can help web service provider to enhance the accuracy of query keyword’s result pages, to improve website’s performance by caching query keyword’s result pages and pre-fetch web pages, to improve web page recommendation system and web page ranking system personalization, to improve commercial advertisement for products and application to information filtering. So capture the context of user’s previous browsing behavior for predicting user intent is a very important issue and challenge.
Most studies are focus on user’s query keyword and relation between query keyword and next click pages in queried result page for predicting user intent. We implement two models, Top-Level Domain model(TLD) that trained by URL-based feature, Hidden Markov Model(HMM) that trained by context-aware category sequence from user’s browsing URLs. And we proposed a mixture model for combining TLD and HMM to predict category of user’s next access page. Also, to apply our proposed context-aware web page category prediction model to two filtering applications, i.e., objectionable web content filtering and web security threat prevention. | en |
| dc.description.provenance | Made available in DSpace on 2021-06-13T01:19:40Z (GMT). No. of bitstreams: 1 ntu-100-P95922006-1.pdf: 1069616 bytes, checksum: ba793f4c20cba1465c1f7b6e4b0a06cc (MD5) Previous issue date: 2011 | en |
| dc.description.tableofcontents | 摘要 i
Abstract ii 誌謝 iii 目錄 iv 圖目錄 vi 表目錄 vii 第1章 緒論 1 1.1 研究背景 1 1.2 研究動機與目的 2 1.3 論文架構 3 第2章 相關研究 4 2.1 使用者的存取樣式 4 2.2 使用者的意圖 5 第3章 網頁類別預測模型 8 3.1 使用者的存取軌跡 8 3.2 使用者的瀏覽網頁類別 12 3.3 頂級網域名稱模型 13 3.3.1 頂級網域名稱 13 3.3.2 頂級網域模型的訓練階段 16 3.3.3 頂級網域模型的測試階段 17 3.4 隱藏馬可夫模型 18 3.4.1 隱藏馬可夫模型 18 3.4.2 隱藏馬可夫模型的訓練階段 19 3.4.3 隱藏馬可夫模型的測試階段 20 3.5 混合模型 21 3.5.1 混合模型的訓練階段 21 3.5.2 混合模型的測試階段 22 第4章 實驗資料集 24 4.1 資料格式 24 4.2 相關統計資料 25 4.3 訓練與測試資料集 26 4.3.1 訓練資料集 26 4.3.2 測試資料集 27 第5章 實驗與效能評估 29 5.1 實驗資料集的設定 29 5.2 實驗數據的評估準則 31 5.3 簡易貝氏分類器、隱馬可夫及頂級網域名稱的實驗結果 35 5.4 混合模型的實驗結果 36 第6章 模型應用 40 6.1 網路不當資訊內容過濾 41 6.2 預防網路安全威脅 43 第7章 結論及未來研究方向 45 7.1 結論 45 7.2 未來研究方向 45 參考文獻 46 | |
| dc.language.iso | zh-TW | |
| dc.subject | 網頁類別預測 | zh_TW |
| dc.subject | 使用者意圖 | zh_TW |
| dc.subject | 存取紀錄 | zh_TW |
| dc.subject | 資料檢索 | zh_TW |
| dc.subject | 點擊行為 | zh_TW |
| dc.subject | User Intent | en |
| dc.subject | User Click Behavior | en |
| dc.subject | User Browsing Log | en |
| dc.subject | Web Page Category Prediction | en |
| dc.title | 以使用者瀏覽行為的情境感知學習於網頁類別預測 | zh_TW |
| dc.title | Learning User Browsing Behaviors for Context-Aware Web Page Category Prediction | en |
| dc.type | Thesis | |
| dc.date.schoolyear | 99-2 | |
| dc.description.degree | 碩士 | |
| dc.contributor.oralexamcommittee | 鄭卜任,盧文祥 | |
| dc.subject.keyword | 使用者意圖,網頁類別預測,存取紀錄,資料檢索,點擊行為, | zh_TW |
| dc.subject.keyword | User Intent,Web Page Category Prediction,User Browsing Log,User Click Behavior, | en |
| dc.relation.page | 47 | |
| dc.rights.note | 有償授權 | |
| dc.date.accepted | 2011-08-03 | |
| dc.contributor.author-college | 電機資訊學院 | zh_TW |
| dc.contributor.author-dept | 資訊工程學研究所 | zh_TW |
| 顯示於系所單位: | 資訊工程學系 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-100-1.pdf 未授權公開取用 | 1.04 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
