整合多方資源以優化搜尋引擎結果頁面之多元性

Chung-Lun Chiang; 江忠倫

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/18438

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	鄭卜壬
dc.contributor.author	Chung-Lun Chiang	en
dc.contributor.author	江忠倫	zh_TW
dc.date.accessioned	2021-06-08T01:05:13Z	-
dc.date.copyright	2014-08-25
dc.date.issued	2014
dc.date.submitted	2014-08-20
dc.identifier.citation	[1] Mikhail Bautin and Steven Skiena. Concordance-based entity-oriented search. Web Intelli. and Agent Sys., 7(4), December 2009. [2] Zheng Xu, Xiangfeng Luo, Jie Yu, and Weimin Xu. Mining web search engines for query suggestion. Concurr. Comput. : Pract. Exper., 23(10), July 2011. [3] Markus Strohmaier, Mark Kroll, and Christian Korner. Intentional query suggestion: Making user goals more explicit during search. In Proc. of the 2009 Workshop on Web Search Click Data, 2009. [4] Adam L. Berger and Vibhu O. Mittal. Ocelot: a system for summarizing web pages. In Proc. of SIGIR, 2000. [5] Ramakrishna Varadarajan. A system for query-specific document summarization. In Proc. of CIKM, 2006. [6] Anastasios Tombros and Mark Sanderson. Advantages of query biased summaries in information retrieval. In Proc. of SIGIR, 1998. [7] Michael White, Tanya Korelsky, Claire Cardie, Vincent Ng, David Pierce, and Kiri Wagstaff. Multidocument summarization via information extraction. In Proc. of HLT, 2001. [8] Youngjoong Ko, Hongkuk An, and Jungyun Seo. Pseudo-relevance feedback and statistical query expansion for web snippet generation. Inf. Process. Lett., 109(1), 2008. [9] Dragomir R. Radev, Weiguo Fan, and Zhu Zhang. Webinessence: a personalized web-based multi-document summarization and recommendation system. In NAACL 2001 Workshop on Automatic Summarization, 2001. [10] Pinaki Bhaskar and Sivaji Bandyopadhyay. A query focused multi document automatic summarization. In Proc. of PACLIC, 2010. [11] Hua-Jun Zeng, Qi-Cai He, Zheng Chen, Wei-Ying Ma, and Jinwen Ma. Learning to cluster web search results. In Proc. of SIGIR, 2004. [12] Xuanhui Wang and ChengXiang Zhai. Learn from web search logs to organize search results. In Proc. of SIGIR, 2007. [13] Marti A. Hearst. Clustering versus faceted categories for information exploration. Commun. ACM, 49(4), 2006. [14] Ingmar Weber, Antti Ukkonen, and Aris Gionis. Answers, not links: extracting tips from yahoo! answers to address how-to web queries. In Proc. of WSDM, 2012. [15] Soon Ae Chun, Tian Tian, and James Geller. Enhancing the famous people ontology by mining a social network. In Proc. of the 2Nd International Workshop on Semantic Search over the Web, pages 5:1–5:4, 2012. [16] Google search autocomplete - search help. https://support.google.com/websearch/answer/106230?hl=en. Accessed: 2014-07-27. [17] Julian Kupiec, Jan Pedersen, and Francine Chen. A trainable document summarizer. In Proc. of SIGIR, 1995. [18] Daniel M. McDonald and Hsinchun Chen. Summary in context: Searching versus browsing. ACM Trans. Inf. Syst., 24(1), January 2006. [19] Dragomir R. Radev, Hongyan Jing, Małgorzata Styś, and Daniel Tam. Centroidbased summarization of multiple documents. Inf. Process. Manage., 40(6), November 2004. [20] Gunes Erkan and Dragomir R. Radev. Lexrank: graph-based lexical centrality as salience in text summarization. J. Artif. Int. Res., 22(1), December 2004. [21] Emilia Stoica, Marti A Hearst, and Megan Richardson. Automating creation of hierarchical faceted metadata structures. In Proc. of NAACL HLT, 2007. [22] Wisam Dakka and Panagiotis G. Ipeirotis. Automatic extraction of useful facet hierarchies from text databases. In Proc. of ICDE, 2008. [23] Ori Ben-Yitzhak, Nadav Golbandi, Nadav Har’El, Ronny Lempel, Andreas Neumann, Shila Ofek-Koifman, Dafna Sheinwald, Eugene Shekita, Benjamin Sznajder, and Sivan Yogev. Beyond basic faceted search. In Proc. of WSDM, 2008. [24] Zhicheng Dou, Sha Hu, Yulong Luo, Ruihua Song, and Ji-Rong Wen. Finding dimensions for queries. In Proc. of CIKM, 2011. [25] Xu Ling, Qiaozhu Mei, ChengXiang Zhai, and Bruce Schatz. Mining multi-faceted overviews of arbitrary topics in a text collection. In Proc. of SIGKDD, 2008. [26] Wei Song, Qing Yu, Zhiheng Xu, Ting Liu, Sheng Li, and Ji-Rong Wen. Multi-aspect query summarization by composite query. In Proc. of SIGIR, 2012. [27] Christina Sauper and Regina Barzilay. Automatically generating wikipedia articles: a structure-aware approach. In Proc. of ACL and IJCLP of the AFNLP, 2009. [28] Chung-Lun Chiang, Shih-Ying Chen, and Pu-Jen Cheng. Summarizing search results with community question answering. In Proc. of Web Intelligence, 2014. [29] Chien-Chung Huang, Kuan-Ming Lin, and Lee-Feng Chien. Automatic training corpora acquisition through web mining. In Proc. of Web Intelligence, 2005. [30] Ryan McDonald. A study of global inference algorithms in multi-document summarization. In Proc. of ECIR, 2007. [31] Jun-Ping Ng, Praveen Bysani, Ziheng Lin, Min-Yen Kan, and Chew-Lim Tan. Exploiting category-specific information for multi-document summarization. In Proc. of COLING, 2012. [32] Dan Gillick and Benoit Favre. A scalable global model for summarization. In Proc. of ILP Workshop, 2009. [33] Chin-Yew Lin. Rouge: A package for automatic evaluation of summaries. In Proc. of ACL workshop, 2004.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/18438	-
dc.description.abstract	在先前對搜尋引擎結果頁面產生片段資訊(snippet)的方法著重於針對單一搜尋結果之優化，主要考量搜尋詞彙相關性及上下文的資訊含量。在此篇論文中，我們欲在單一搜尋結果頁面中的多個搜尋結果分別產生多個片段資訊，並且將此多個片段資訊視為該搜尋詞的總覽。首先我們自問答社群網頁系統、線上百科全書、搜尋引擎推薦詞中分別抽取不同類別搜尋詞之屬性詞與前後文，並藉此資訊以產生片段資訊。在產生片段資訊時，將考量句子是否與搜尋詞相關、句子是否與該類別相關、以及句子是否含有先前抽取出來之屬性詞。在系統的第二階段，我們利用整數線性規畫找出一組最佳的句子組合，作為我們的系統輸出－多個片段資訊。除此之外，我們將結合該搜尋詞的擴充推薦搜尋之結果頁面，以補強原先未找出之屬性詞以增加每個搜尋詞之多元性。實驗資料來源為Wikipedia、Yahoo! Answers及Google Search Autocomplete，在結果中可看出我們提出產生片段資訊之方法可行並且優於其他的摘要方法，最終有效地增加搜尋引擎結果頁面之多樣性。	zh_TW
dc.description.abstract	Previous work on snippet generation focused mainly on how to produce one snippet for an individual search result. This paper aims to generate snippets as a comprehensive overview for an entity query (e.g., flu) in a search-result page. Our approach first extracts the attributes (e.g., symptom and diagnose) of the categories (e.g., disease) from multi-resources including a community-based question-answering (CQA) website, an online encyclopedia website and suggestions from a commercial search engine. Then, we generate the snippets based on how central a sentence is to the query, its category, and how well it diversifies the attributes from multi-resources. Integer Linear Programming (ILP) is adopted to find the optimal sentence set. After finding the initial set of sentences, we further improve the result by aggregate the search-result page(SERP) of the query's suggestion words. The experiments are conducted on Wikipedia, Yahoo! Answers, Google Search. Experimental results demonstrate the effectiveness of our approach, compared to an existing commercial search engine and several summarization baselines.	en
dc.description.provenance	Made available in DSpace on 2021-06-08T01:05:13Z (GMT). No. of bitstreams: 1 ntu-103-R01922007-1.pdf: 3269558 bytes, checksum: 081788d87784f3f7ab0069ab47e94669 (MD5) Previous issue date: 2014	en
dc.description.tableofcontents	口試委員審定書i 中文摘要ii Abstract iii Contents iv List of Figures vi List of Tables vii 1 Introduction 1 2 Related Work 6 2.1 Document Summarization . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2 Facet Generation and Summarization . . . . . . . . . . . . . . . . . . . . 7 2.3 Previous Work with Shih-Ying Chen . . . . . . . . . . . . . . . . . . . . 8 3 Overview of Snippet Generation Approach 9 4 Off-line Knowledge Extraction 12 4.1 CQA Attribute Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . 12 4.2 Wikipedia Attribute Extraction . . . . . . . . . . . . . . . . . . . . . . . 14 4.3 Search Suggestion Attribute Extraction . . . . . . . . . . . . . . . . . . . 15 4.4 Building Category Vector . . . . . . . . . . . . . . . . . . . . . . . . . . 16 5 On-line Sentence Selection 19 5.1 Scoring Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 5.2 Weight Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 5.3 Integer Linear Programming Model (ILP) . . . . . . . . . . . . . . . . . 21 5.4 Sentence Compression . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 5.5 Post Enhancement by Suggestions . . . . . . . . . . . . . . . . . . . . . 23 6 Experiments 25 6.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 6.2 Baselines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 6.3 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 6.4 Performance Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . 27 6.4.1 ROUGE Performance . . . . . . . . . . . . . . . . . . . . . . . 27 6.4.2 Snippet Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 28 6.5 Parameter Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 6.5.1 The Impact of Weights in Simportant . . . . . . . . . . . . . . . . 32 6.5.2 Number of CQA Attributes . . . . . . . . . . . . . . . . . . . . . 32 6.5.3 Limit for Same Attributes . . . . . . . . . . . . . . . . . . . . . 32 6.5.4 SoftN Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 33 7 Conclusions and Future Work 35 Bibliography 36
dc.language.iso	en
dc.title	整合多方資源以優化搜尋引擎結果頁面之多元性	zh_TW
dc.title	Aggregating Multi-Resources to Improve the Diversity of Search Engine Result Pages	en
dc.type	Thesis
dc.date.schoolyear	102-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	盧文祥,蔡宗翰,邱志義
dc.subject.keyword	搜尋結果總結,片段資訊產生,搜尋詞概要,	zh_TW
dc.subject.keyword	Search-result summarization,snippet generation,query overview,	en
dc.relation.page	39
dc.rights.note	未授權
dc.date.accepted	2014-08-20
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊工程學研究所	zh_TW
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-103-1.pdf 目前未授權公開取用	3.19 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。