以非監督式方法利用知識庫與搜尋結果提升網頁搜尋排序一致性

Jian-De Jiang; 江建德

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/50302

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	鄭卜壬(Pu-Jen Cheng)
dc.contributor.author	Jian-De Jiang	en
dc.contributor.author	江建德	zh_TW
dc.date.accessioned	2021-06-15T12:35:37Z	-
dc.date.available	2021-08-03
dc.date.copyright	2016-08-03
dc.date.issued	2016
dc.date.submitted	2016-08-01
dc.identifier.citation	[1] J. S. Beis and D. G. Lowe. Shape indexing using approximate nearest-neighbour search in high-dimensional spaces. In Proceedings of the 1997 Conference on Computer Vision and Pattern Recognition, 1997. [2] S. M. Beitzel, E. C. Jensen, A. Chowdhury, D. Grossman, O. Frieder, and N. Goharian. Fusion of effective retrieval strategies in the same information retrieval system. Journal of the American Society for Information Science and Technology, 55(10): 859–868, 2004. [3] P. N. Bennett, R. W. White, W. Chu, S. T. Dumais, P. Bailey, F. Borisyuk, and X. Cui. Modeling the impact of short-and long-term behavior on search personalization. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval, pages 185–194, 2012. [4] K. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor. Freebase: a collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 1247–1250, 2008. [5] M. R. Bouadjenek, H. Hacid, and M. Bouzeghoub. Sopra: A new social personalized ranking function for improving web search. In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval, pages 861–864, 2013. [6] C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender. Learning to rank using gradient descent. In Proceedings of the 22nd international conference on Machine learning, pages 89–96, 2005. [7] Z. Cao, T. Qin, T.-Y. Liu, M.-F. Tsai, and H. Li. Learning to rank: from pairwise approach to listwise approach. In Proceedings of the 24th international conference on Machine learning, pages 129–136, 2007. [8] Y. Chen, X. Li, A. Dick, and R. Hill. Ranking consistency for image matching and object retrieval. Pattern Recognition, 47(3):1349–1360, 2014. [9] J. C. K. Cheung and X. Li. Sequence clustering and labeling for unsupervised query intent discovery. In Proceedings of the fifth ACM international conference on Web search and data mining, pages 383–392, 2012. [10] G. V. Cormack, C. L. Clarke, and S. Buettcher. Reciprocal rank fusion outperforms condorcet and individual rank learning methods. In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, pages 758–759, 2009. [11] E. A. Fox and J. A. Shaw. Combination of multiple searches. NIST SPECIAL PUBLICATION SP, pages 243–243, 1994. [12] S. Fox, K. Karnawat, M. Mydland, S. Dumais, and T. White. Evaluating implicit measures to improve web search. ACM Transactions on Information Systems (TOIS), 23(2):147–168, 2005. [13] J. Guo, G. Xu, X. Cheng, and H. Li. Named entity recognition in query. In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, pages 267–274, 2009. [14] J. Hu, G. Wang, F. Lochovsky, J.-t. Sun, and Z. Chen. Understanding user’s query intent with wikipedia. In Proceedings of the 18th international conference on World wide web, pages 471–480, 2009. [15] J. Jiang, X. Song, N. Yu, and C.-Y. Lin. Focus: learning to crawl web forums. IEEE Transactions on knowledge and Data Engineering, 25(6):1293–1306, 2013. [16] J.-Y. Jiang, J. Liu, C.-Y. Lin, and P.-J. Cheng. Improving ranking consistency for web search by leveraging a knowledge base and search logs. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pages 1441–1450, 2015. [17] T. Joachims, L. Granka, B. Pan, H. Hembrooke, F. Radlinski, and G. Gay. Evaluating the accuracy of implicit feedback from clicks and query reformulations in web search. ACM Transactions on Information Systems (TOIS), 25(2):7, 2007. [18] M. G. Kendall. A new measure of rank correlation. Biometrika, 30(1/2):81–93, 1938. [19] A. Khudyak Kozorovitsky and O. Kurland. Cluster-based fusion of retrieved lists. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval, pages 893–902, 2011. [20] H. S. Koppula, K. P. Leela, A. Agarwal, K. P. Chitrapura, S. Garg, and A. Sasturkar. Learning url patterns for webpage de-duplication. In Proceedings of the third ACM international conference on Web search and data mining, pages 381–390, 2010. [21] Y. Li, B.-J. P. Hsu, and C. Zhai. Unsupervised identification of synonymous query intent templates for attribute intents. In Proceedings of the 22nd ACM international conference on Conference on information & knowledge management, pages 2029–2038, 2013. [22] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830. [23] C. Quoc and V. Le. Learning to rank with nonsmooth cost functions. Proceedings of the Advances in Neural Information Processing Systems, 19:193–200, 2007. [24] J. J. Rocchio. Relevance feedback in information retrieval. 1971. [25] P. D. Turney, P. Pantel, et al. From frequency to meaning: Vector space models of semantics. Journal of artificial intelligence research, 37(1):141–188, 2010. [26] H. Wang, X. He, M.-W. Chang, Y. Song, R. W. White, and W. Chu. Personalized ranking model adaptation for web search. In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval, pages 323–332, 2013. [27] K. Wang, T. Walker, and Z. Zheng. Pskip: estimating relevance ranking quality from web search clickthrough data. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1355–1364, 2009. [28] X. Yin, W. Tan, X. Li, and Y.-C. Tu. Automatic extraction of clickable structured web contents for name entity queries. In Proceedings of the 19th international conference on World wide web, pages 991–1000, 2010.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/50302	-
dc.description.abstract	對於網頁搜尋系統如知名搜尋引擎Google, Yahoo!與Bing，相關性排序是一個最重要的問題。相關性排序的傳統方法採用對於查詢分別進行最佳化的方式來增進效能。之前曾有一篇論文提出一個根據查詢意圖的相似性使用兩階段監督式學習，並藉由提升排序一致性來改善相關性排序。然而在該篇論文中有兩個問題需要被提出來解決。第一，他們使用學習排序需要使用大量的查詢紀錄，而如此大量的查詢紀錄只有成熟的搜尋引擎才會擁有，剛開始發展或發展中的搜尋系統必須仰賴非監督式方法來提升相關性排序。第二，該篇論文使用知識庫中的實體來代表查詢意圖。但由於查詢通常含有一些特定的資訊，所以實體並無法完全的表達查詢意圖。舉例來說:``Kobe Bryant family'表達的意圖是想了解Kobe Bryant的家人而非Kobe Bryant本人。在這篇論文當中，我們提出一個藉由搜尋結果與知識庫的兩階段非監督式方法來改善排序一致性與相關性排序，解決不成熟的搜尋系統沒有查詢紀錄的問題。第一階段從搜尋結果擷取排序一致性的分數，並於第二階段藉由衡量獨特性與一致性的方式重新排序搜尋結果。此外，我們在查詢意圖加入查詢模板可以讓我們更清楚的解析查詢意圖。就我們所知，我們的論文是第一個使用非監督式排序一致性方法來改善相關性排序。最後，我們使用Freebase與Yahoo!的搜尋結果當作實驗資料庫並證實我們的方法，結果顯示出我們成功藉由非監督式方法改善了排序一致性與相關性排序的效能。	zh_TW
dc.description.abstract	Relevance ranking is the most important problem in web search system, such as Google, Yahoo!, Bing etc. Most of conventional approaches focus on optimizing ranking model by each query separately. One past work propose a two-stage supervised approach to improve relevance ranking by enhancing ranking consistency across queries with similar search intents. However, there are two crucial problems of previous work. First, they use pair-wise learning to rank to learn consistency, and the method relies on large-scale query log which only few of mature web search systems have. Most of developing search engines need to improve their performance without query log. Second, they considers query intents on entities in knowledge base. Nevertheless, entities cannot completely represent query intents because queries contains some specific information to ask, such as ``Kobe Bryant family' for the intents of family. In this work, we propose an two-phase unsupervised approach to improve ranking consistency by knowledge base and search results. The first phase extracts consistency from search results and the second phase re-ranks search results by leveraging consistency and unique. Furthermore, we add query templates to help us clarify query intents completely. For the best of our knowledge, our work is the first unsupervised method with ranking consistency to improve relevance ranking. We conducted extensive experiments using Freebase and search results from Yahoo! search engine, and results demonstrate that our approach improves ranking consistency and relevance ranking significantly.	en
dc.description.provenance	Made available in DSpace on 2021-06-15T12:35:37Z (GMT). No. of bitstreams: 1 ntu-105-R03922057-1.pdf: 3511895 bytes, checksum: 5a79875bcca3a596b6ee271659d769f8 (MD5) Previous issue date: 2016	en
dc.description.tableofcontents	Contents 誌謝iii 摘要v Abstract vii 1 Introduction 1 2 Related Work 5 2.1 Ranking Consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Federated web Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 URL Patterns of Web pages . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.4 Search Intents behind Queries . . . . . . . . . . . . . . . . . . . . . . . 7 3 Problem Defination 9 3.1 Notations of Given Data . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.2 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.3 Flow of Approach and Notations . . . . . . . . . . . . . . . . . . . . . . 11 4 Consistency Rank 13 4.1 Similar Query Intents . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 4.1.1 Named Entity Recognizing Query . . . . . . . . . . . . . . . . . 14 4.1.2 Query Intent Template . . . . . . . . . . . . . . . . . . . . . . . 14 4.1.3 Similar Query Intent Set . . . . . . . . . . . . . . . . . . . . . . 15 4.2 Topical Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 4.2.1 URL Pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 4.2.2 URL Sub-pattern . . . . . . . . . . . . . . . . . . . . . . . . . . 17 4.3 Consistency Rank Extraction . . . . . . . . . . . . . . . . . . . . . . . . 18 5 Re-rank Model 21 5.1 Merging Ranks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 5.2 Rank Score Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 5.2.1 Ranking Properties . . . . . . . . . . . . . . . . . . . . . . . . . 22 5.2.2 Reciprocal Rank Function . . . . . . . . . . . . . . . . . . . . . 23 5.3 Leveraging Unique and Consistency . . . . . . . . . . . . . . . . . . . . 25 5.3.1 Model Formulation . . . . . . . . . . . . . . . . . . . . . . . . . 25 5.3.2 Multiple Parameters . . . . . . . . . . . . . . . . . . . . . . . . 25 5.3.3 Optimization Based on Unique and Consistency . . . . . . . . . . 26 5.4 Re-Ranking Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 5.4.1 Consistency Features . . . . . . . . . . . . . . . . . . . . . . . . 27 5.4.2 Unique Features: . . . . . . . . . . . . . . . . . . . . . . . . . . 28 6 Experiments 29 6.1 Datasets and Experimental Settings . . . . . . . . . . . . . . . . . . . . . 29 6.2 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 6.2.1 Evaluation of Ranking Consistency . . . . . . . . . . . . . . . . 30 6.2.2 Evaluation of Unsupervised Approach . . . . . . . . . . . . . . . 31 6.2.3 Evaluation with Different Parameters . . . . . . . . . . . . . . . 33 6.2.4 Re-ranking Feature Weights . . . . . . . . . . . . . . . . . . . . 34 7 Conclusions and Future Work 37 7.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 7.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 Bibliography 39
dc.language.iso	en
dc.subject	知識庫	zh_TW
dc.subject	網頁搜尋	zh_TW
dc.subject	排序一致性	zh_TW
dc.subject	查詢意圖	zh_TW
dc.subject	非監督式方法	zh_TW
dc.subject	主題分群	zh_TW
dc.subject	查詢意圖模板	zh_TW
dc.subject	網頁搜尋	zh_TW
dc.subject	排序一致性	zh_TW
dc.subject	查詢意圖	zh_TW
dc.subject	非監督式方法	zh_TW
dc.subject	知識庫	zh_TW
dc.subject	主題分群	zh_TW
dc.subject	查詢意圖模板	zh_TW
dc.subject	Topical Cluster	en
dc.subject	Query Intent	en
dc.subject	Unsupervised Approach	en
dc.subject	Knowledge Base	en
dc.subject	Web Search	en
dc.subject	Query Intent Template	en
dc.subject	Web Search	en
dc.subject	Ranking Consistency	en
dc.subject	Query Intent	en
dc.subject	Unsupervised Approach	en
dc.subject	Knowledge Base	en
dc.subject	Topical Cluster	en
dc.subject	Query Intent Template	en
dc.subject	Ranking Consistency	en
dc.title	以非監督式方法利用知識庫與搜尋結果提升網頁搜尋排序一致性	zh_TW
dc.title	An Unsupervised Ranking Consistency Approach based on Knowledge Base and Search Results	en
dc.type	Thesis
dc.date.schoolyear	104-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	陳信希(Hsin-Hsi Chen),林守德(Shou-De Lin),蔡宗翰(Tzong-Han Tsai),陳柏琳(Ber-Lin Chen)
dc.subject.keyword	網頁搜尋,排序一致性,查詢意圖,非監督式方法,知識庫,主題分群,查詢意圖模板,	zh_TW
dc.subject.keyword	Web Search,Ranking Consistency,Query Intent,Unsupervised Approach,Knowledge Base,Topical Cluster,Query Intent Template,	en
dc.relation.page	42
dc.identifier.doi	10.6342/NTU201600837
dc.rights.note	有償授權
dc.date.accepted	2016-08-01
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊工程學研究所	zh_TW
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-105-1.pdf 未授權公開取用	3.43 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。