請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/61115
完整後設資料紀錄
DC 欄位 | 值 | 語言 |
---|---|---|
dc.contributor.advisor | 劉邦鋒(Pangfeng Liu) | |
dc.contributor.author | Tsung-Han Li | en |
dc.contributor.author | 李宗翰 | zh_TW |
dc.date.accessioned | 2021-06-16T10:47:22Z | - |
dc.date.available | 2016-08-16 | |
dc.date.copyright | 2013-08-16 | |
dc.date.issued | 2013 | |
dc.date.submitted | 2013-08-12 | |
dc.identifier.citation | [1] Hbase. http://hbase.apache.org/.
[2] Jeffrey Dean and Sanjay Ghemawat. Mapreduce: simplified data processing on large clusters. In Proceedings of the 6th conference on Symposium on Opearting Systems Design and Implementation. [3] Leslie G. Valiant. A bridging model for parallel computation. Commun. ACM, 33(8):103–111, August 1990. [4] Storm. http://storm-project.net/. [5] Grzegorz Malewicz, Matthew H. Austern, Aart J.C Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. Pregel: a system for large-scale graph processing. In Proceedings of the 2010 international conference on Management of data. [6] Hama. http://hama.apache.org/. [7] Giraph. http://giraph.apache.org/. [8] Wei Dong, Charikar Moses, and Kai Li. Efficient k-nearest neighbor graph con- struction for generic similarity measures. In Proceedings of the 20th international conference on World wide web. [9] Kyle H. Ambert and Aaron M. Cohen. k-information gain scaled nearest neigh- bors: A novel approach to classifying protein-protein interaction-related documents. IEEE/ACM Trans. Comput. Biol. Bioinformatics. [10] Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The pagerank citation ranking: Bringing order to the web. Technical report, 1999. [11] G. Karypis and V. Kumar. Multilevel algorithms for multi-constraint graph parti- tioning. In Supercomputing, 1998. SC98. IEEE/ACM Conference on. [12] Li-Yung Ho, Jan-Jan Wu, and Pangfeng Liu. Distributed graph database for large- scale social computing. In Cloud Computing (CLOUD), 2012 IEEE 5th International Conference on, 2012. [13] Wei Chen, Yifei Yuan, and Li Zhang. Scalable influence maximization in social net- works under the linear threshold model. In Proceedings of the 2010 IEEE International Conference on Data Mining. [14] David Kempe, Jon Kleinberg, and ′ Eva Tardos. Maximizing the spread of influence through a social network. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. [15] Xiaohong Wang, Aaron Smalter, Jun Huan, and Gerald H. Lushington. G-hash: towards fast kernel-based similarity search in large graph databases. In Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, EDBT ’09, pages 472–480, New York, NY, USA, 2009. ACM. [16] Luke Wang, Yanghua Xiao, and Haixun Wang. How to partition a billion-node graph. Technical report, Microsoft Research Asia, 2013. [17] Michele Coscia, Giulio Rossetti, Fosca Giannotti, and Dino Pedreschi. Demon: a local-first discovery method for overlapping communities. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’12, pages 615–623, New York, NY, USA, 2012. ACM. [18] Alan Mislove, Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, and Bobby Bhattacharjee. Measurement and Analysis of Online Social Networks. In Proceedings of the 5th ACM/Usenix Internet Measurement Conference (IMC’07). | |
dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/61115 | - |
dc.description.abstract | We introduce Kylin, an efficient and scalable graph data processing system. Kylin is based
on Bulk Synchronous Parallel (BSP) model to process graph data. Although there have been some BSP-based graph processing systems, Kylin is different from these systems in two-fold. First, Kylin cooperates with HBase to achieve scalable data manipulation. Second, We propose three techniques to optimize the performance of Kylin. The proposed techniques are pull messaging, lazy vertex loading and vertex-weighted partitioning. We demonstrate Kylin outperforms other BSP-based systems, i.e. Hama and Giraph, in the experiments. | en |
dc.description.provenance | Made available in DSpace on 2021-06-16T10:47:22Z (GMT). No. of bitstreams: 1 ntu-102-R00922074-1.pdf: 536483 bytes, checksum: 4cc5c1aaade490bd6fb6f82fd7ca8ba0 (MD5) Previous issue date: 2013 | en |
dc.description.tableofcontents | Contents
Certification I Acknowledgement II Chinese Abstract III Abstract IV 1 Introduction 1 2 Related Works 4 3 Architecture 6 3.1 Partition Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.2 Query Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.3 Data Graph and Data Loader . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.4 Query Processor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 4 Optimizations 9 4.1 Pull Messaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 4.2 Lazy Vertex Loading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 4.3 Vertex-weighted Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . 11 5 Implementation 14 5.1 Data Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 5.2 Communication and Synchronization . . . . . . . . . . . . . . . . . . . . . . 15 5.3 Vertex Program Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 6 Benchmark Suite 17 6.1 Max Value Propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 6.2 Single Source Shortest Path . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 6.3 Bipartite Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 6.4 Influence Spreading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 6.5 PageRank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 6.6 N-steps neighbors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 6.7 Label propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 7 Experiments 20 7.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 7.2 Pull model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 7.3 Lazy Vertex Loading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 7.4 Partitioning strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 7.5 Overall performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 8 Conclusion 25 Bibliography 26 | |
dc.language.iso | en | |
dc.title | 具擴展性的高效能分散式圖資料處理系統 | zh_TW |
dc.title | Kylin: An Efficient and Scalable Graph Data Processing
System | en |
dc.type | Thesis | |
dc.date.schoolyear | 101-2 | |
dc.description.degree | 碩士 | |
dc.contributor.oralexamcommittee | 吳真貞(Jan-Jan Wu),洪士灝(Shih-Hao Hung) | |
dc.subject.keyword | 圖資料處理,圖資料切割,負載平衡,動態載入, | zh_TW |
dc.subject.keyword | graph data processing,graph data partition,load balancing,pull messaging,dynamic loading, | en |
dc.relation.page | 27 | |
dc.rights.note | 有償授權 | |
dc.date.accepted | 2013-08-12 | |
dc.contributor.author-college | 電機資訊學院 | zh_TW |
dc.contributor.author-dept | 資訊工程學研究所 | zh_TW |
顯示於系所單位: | 資訊工程學系 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-102-1.pdf 目前未授權公開取用 | 523.91 kB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。