請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/21379
標題: | 利用歷史紀錄萃取瀏覽行為特徵建立跨裝置識別機制 A Cross-Device Linking Mechanism Based on Extracting Browsing Behavioral Features from History Logs |
作者: | Yu-Xuan Chen 陳玉璇 |
指導教授: | 林永松 |
關鍵字: | 跨裝置,追蹤,潛在語義索引,詞嵌入,監督式學習, cross-device,tracking,latent semantic indexing,word embedding,supervised learning, |
出版年 : | 2019 |
學位: | 碩士 |
摘要: | 隨著科技裝置的多樣化與快速發展,人們的生活中可能會接觸並同時擁有多個電子 產品,包含:個人電腦、平板、行動裝置等。經常透過這些不同的設備進行網路搜索、 瀏覽等行為早已成為日常現象。這樣的多裝置介入使得電子商務的發生管道更加多元化, 但同時也提升了分析消費行為的複雜程度。究其原因,是因為消費行為的一系列流程不 再像以往一般,由同一個終端設備所擁有,而是可能會分散於各個不同的裝置,而這些 電子裝置在網路上並非實名制度,絕大多數因隱私考量,裝置在網路上擁有匿名特性, 這樣的特性使同一擁有者的裝置們無法經由網路上的瀏覽行為被有效地歸屬於同一位 使用者,為了實現精準行銷或其他客製化之應用,找出裝置之間的真實關係是必不可少 的步驟。將同一人在數個電子設備上產生的瀏覽紀錄正確地連結,能將用戶在網絡上的 所有行為資訊連結起來,從而找到完整的瀏覽歷程。本篇將以 CIKM Cup 2016 之競賽資 料集作為實驗,藉由瀏覽紀錄提取之特徵作為該設備之屬性,利用潛在語義索引(Latent sematic indexing)表示式,結合監督式學習方法找出任一目標設備的候選設備集合,取代 全數進行兩兩配對之法,達成計算量之減低,並透過非監督式的詞嵌入(word embedding), 將文本資料轉化為詞向量,配合其他特徵轉化生成之向量,作為監督式分類法的輸入, 藉此分類可找出候選集合中,任兩個設備屬同一位使用者之機率,利用上述流程建立出 一個透過擷取瀏覽紀錄展現之偏好,跨裝置鏈結機制。 With the rapid development of diversified technology, people may use multiple electronic devices to connect the Internet in daily lives: personal computers, tablets, smartphones, and others. An owner can use the devices for browsing, searching and performing other activities such as purchasing online. Switching between devices enables e-commerce to take place on various platforms. Moreover, the complexity of consumer behavior analysis rises as the number of involving devices grows. The two factors of the difficulty include: a series of purchasing processes may spread across different devices, all the devices on the network are anonymous. As a result, the collected action records from devices cannot be associated with their real owners effectively. To achieve precision marketing or customized applications, finding out the real relation among devices is an indispensable step. Connecting separated browsing logs generated by the same browser on several electronic devices can complete the entire user accessing and browsing history. This research uses the dataset provided by the CIKM CUP 2016 Challenge. The representation of a device is created by extracting features from browsing logs. The computation cost reduced by filtering candidates of a target device instead of comparing all the candidates in pairs. The filtering accomplished by the latent semantic indexing representations and supervised learning. Performing word embedding can turn semantic to vectors through an unsupervised neural ensemble. Adding feature engineering enhances the discrimination of the supervised classifier. The classification provides a probability of any two instances belonging to the same user. Implementing the mentioned sequences above forms a cross-device linking mechanism that extracting preferences from browsing logs. |
URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/21379 |
DOI: | 10.6342/NTU201902936 |
全文授權: | 未授權 |
顯示於系所單位: | 資訊管理學系 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-108-1.pdf 目前未授權公開取用 | 2.19 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。