將機率模型以及圖形隨機漫步理論應用在時序資料以改良網頁搜尋品質

Po-Tzu Chang; 張博詞

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/25041

標題:	將機率模型以及圖形隨機漫步理論應用在時序資料以改良網頁搜尋品質 Combining probabilistic model with graph-based random walk to improve search quality through exploiting time-sensitive query information
作者:	Po-Tzu Chang 張博詞
指導教授:	林守德(Shou-De Lin)
關鍵字:	時間敏感關鍵字,搜尋引擎優化, time sensitive queries,search engine reranking,
出版年 :	2011
學位:	碩士
摘要:	現今的搜尋引擎設備提供使用者輕易的搜尋，藉由輸入關鍵字，搜尋引擎會回相關的事物。但是關鍵字的意圖會隨著時間不同而相異，因此時間相關的資訊，可以提供給搜尋引擎對時間敏感關鍵字回傳的結果作優化。在這篇論文當中，我們針對時間相關的資訊提出新的重新排序排名的方法，來增進搜尋結果的品質。我們主要是往兩個不同的資料面向去做優化: 1. 關鍵字具有時間相關的資訊的資料。 2. 關鍵字不具時間相關資訊的資料。主要的方法是，我們將支援向量回歸加入時間相關的特徵去對搜尋結果最排序的優化。在我的實驗結果中可以看到，在關鍵字具有時間相關資訊的資料中，使用時間相關的資訊，比起原本的排名，會得到10.28%左右的進步。而在關鍵字不具有時間相關資訊的資料中，會得到1.14%的進步。在這篇論文的最後，我們針對我們由時間相關資訊所產生的特徵值做了分析，並比較之間的優缺點。 Search Engine services provide platforms for users to search their intent using query. The intent of query may vary in different time period. Time related information should be taking into consideration when search engine return search results. In this paper, we present new re-ranking methods based on time information to improve search result quality. This paper aims at re-ranking search result depending on time sensitive information to improve the following situation: 1. Existed Queries dataset: URLs clicked by queries have sufficient time click information in training data. 2. Rare Queries dataset: URLs clicked by queries have on clicks information in training data and bad search results dataset. We propose SVM Regression using time related features to effectively re-rank the search result of each query depending on click number in each time periods. And propose useful features generated from three methodologies on Existed Query dataset: (a) Probabilistic Prior, (b) Probabilistic Model using Language Model and KL-divergence, and (c) Page Rank approach based on Time click. Besides, without click information on rare query dataset, we also propose features on rare queries dataset (a) Extract clicks from related query (b) Time based Page Rank. Then combine some features for SVM Regression to predict. In my experiment results show that the proposed approach gains 10.28% improve over the original ranking in the AOL query log on Existed Query dataset. In rare query dataset, SVM Regression gains 1.14% improvement on Existed queries and 12.9% improvement on Non-Existed queries. In the end, we analysis the improvement of each methods and discuss the pros and cons between these methods.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/25041
全文授權:	未授權
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-100-1.pdf 目前未授權公開取用	1.12 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。