具擴展性的高效能分散式圖資料處理系統

Tsung-Han Li; 李宗翰

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/61115

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	劉邦鋒(Pangfeng Liu)
dc.contributor.author	Tsung-Han Li	en
dc.contributor.author	李宗翰	zh_TW
dc.date.accessioned	2021-06-16T10:47:22Z	-
dc.date.available	2016-08-16
dc.date.copyright	2013-08-16
dc.date.issued	2013
dc.date.submitted	2013-08-12
dc.identifier.citation	[1] Hbase. http://hbase.apache.org/. [2] Jeﬀrey Dean and Sanjay Ghemawat. Mapreduce: simpliﬁed data processing on large clusters. In Proceedings of the 6th conference on Symposium on Opearting Systems Design and Implementation. [3] Leslie G. Valiant. A bridging model for parallel computation. Commun. ACM, 33(8):103–111, August 1990. [4] Storm. http://storm-project.net/. [5] Grzegorz Malewicz, Matthew H. Austern, Aart J.C Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. Pregel: a system for large-scale graph processing. In Proceedings of the 2010 international conference on Management of data. [6] Hama. http://hama.apache.org/. [7] Giraph. http://giraph.apache.org/. [8] Wei Dong, Charikar Moses, and Kai Li. Eﬃcient k-nearest neighbor graph con- struction for generic similarity measures. In Proceedings of the 20th international conference on World wide web. [9] Kyle H. Ambert and Aaron M. Cohen. k-information gain scaled nearest neigh- bors: A novel approach to classifying protein-protein interaction-related documents. IEEE/ACM Trans. Comput. Biol. Bioinformatics. [10] Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The pagerank citation ranking: Bringing order to the web. Technical report, 1999. [11] G. Karypis and V. Kumar. Multilevel algorithms for multi-constraint graph parti- tioning. In Supercomputing, 1998. SC98. IEEE/ACM Conference on. [12] Li-Yung Ho, Jan-Jan Wu, and Pangfeng Liu. Distributed graph database for large- scale social computing. In Cloud Computing (CLOUD), 2012 IEEE 5th International Conference on, 2012. [13] Wei Chen, Yifei Yuan, and Li Zhang. Scalable inﬂuence maximization in social net- works under the linear threshold model. In Proceedings of the 2010 IEEE International Conference on Data Mining. [14] David Kempe, Jon Kleinberg, and ′ Eva Tardos. Maximizing the spread of inﬂuence through a social network. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. [15] Xiaohong Wang, Aaron Smalter, Jun Huan, and Gerald H. Lushington. G-hash: towards fast kernel-based similarity search in large graph databases. In Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, EDBT ’09, pages 472–480, New York, NY, USA, 2009. ACM. [16] Luke Wang, Yanghua Xiao, and Haixun Wang. How to partition a billion-node graph. Technical report, Microsoft Research Asia, 2013. [17] Michele Coscia, Giulio Rossetti, Fosca Giannotti, and Dino Pedreschi. Demon: a local-ﬁrst discovery method for overlapping communities. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’12, pages 615–623, New York, NY, USA, 2012. ACM. [18] Alan Mislove, Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, and Bobby Bhattacharjee. Measurement and Analysis of Online Social Networks. In Proceedings of the 5th ACM/Usenix Internet Measurement Conference (IMC’07).
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/61115	-
dc.description.abstract	We introduce Kylin, an eﬃcient and scalable graph data processing system. Kylin is based on Bulk Synchronous Parallel (BSP) model to process graph data. Although there have been some BSP-based graph processing systems, Kylin is diﬀerent from these systems in two-fold. First, Kylin cooperates with HBase to achieve scalable data manipulation. Second, We propose three techniques to optimize the performance of Kylin. The proposed techniques are pull messaging, lazy vertex loading and vertex-weighted partitioning. We demonstrate Kylin outperforms other BSP-based systems, i.e. Hama and Giraph, in the experiments.	en
dc.description.provenance	Made available in DSpace on 2021-06-16T10:47:22Z (GMT). No. of bitstreams: 1 ntu-102-R00922074-1.pdf: 536483 bytes, checksum: 4cc5c1aaade490bd6fb6f82fd7ca8ba0 (MD5) Previous issue date: 2013	en
dc.description.tableofcontents	Contents Certiﬁcation I Acknowledgement II Chinese Abstract III Abstract IV 1 Introduction 1 2 Related Works 4 3 Architecture 6 3.1 Partition Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.2 Query Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.3 Data Graph and Data Loader . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.4 Query Processor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 4 Optimizations 9 4.1 Pull Messaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 4.2 Lazy Vertex Loading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 4.3 Vertex-weighted Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . 11 5 Implementation 14 5.1 Data Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 5.2 Communication and Synchronization . . . . . . . . . . . . . . . . . . . . . . 15 5.3 Vertex Program Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 6 Benchmark Suite 17 6.1 Max Value Propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 6.2 Single Source Shortest Path . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 6.3 Bipartite Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 6.4 Inﬂuence Spreading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 6.5 PageRank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 6.6 N-steps neighbors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 6.7 Label propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 7 Experiments 20 7.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 7.2 Pull model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 7.3 Lazy Vertex Loading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 7.4 Partitioning strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 7.5 Overall performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 8 Conclusion 25 Bibliography 26
dc.language.iso	en
dc.title	具擴展性的高效能分散式圖資料處理系統	zh_TW
dc.title	Kylin: An Eﬃcient and Scalable Graph Data Processing System	en
dc.type	Thesis
dc.date.schoolyear	101-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	吳真貞(Jan-Jan Wu),洪士灝(Shih-Hao Hung)
dc.subject.keyword	圖資料處理,圖資料切割,負載平衡,動態載入,	zh_TW
dc.subject.keyword	graph data processing,graph data partition,load balancing,pull messaging,dynamic loading,	en
dc.relation.page	27
dc.rights.note	有償授權
dc.date.accepted	2013-08-12
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊工程學研究所	zh_TW
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-102-1.pdf 目前未授權公開取用	523.91 kB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。