以架構自覺之社群圖管理達到有效率地更新及取得資料

Wei-Hao Lin; 林煒晧

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/61050

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	周承復
dc.contributor.author	Wei-Hao Lin	en
dc.contributor.author	林煒晧	zh_TW
dc.date.accessioned	2021-06-16T10:43:32Z	-
dc.date.available	2014-02-26
dc.date.copyright	2014-02-26
dc.date.issued	2013
dc.date.submitted	2013-08-13
dc.identifier.citation	[1] Mohammad Al-Fares, Alexander Loukissas, and Amin Vahdat. A scalable, commodity data center network architecture. In Proceedings of the ACM SIGCOMM 2008 conference on Data communication, SIGCOMM ’08, pages 63–74, New York, NY, USA, 2008. ACM. [2] Facebook reports first quarter 2013 results. http://investor.fb.com/releasedetail. cfm?ReleaseID=761090. [3] Celebrating #twitter7. https://blog.twitter.com/2013/celebrating-twitter7. [4] Michael Armbrust, Armando Fox, Rean Griffith, Anthony D. Joseph, Randy H. Katz, Andrew Konwinski, Gunho Lee, David A. Patterson, Ariel Rabkin, Ion Stoica, and Matei Zaharia. Above the clouds: A berkeley view of cloud computing. Technical Report UCB/EECS-2009-28, EECS Department, University of California, Berkeley, Feb 2009. [5] M. R. Garey, D. S. Johnson, and L. Stockmeyer. Some simplified np-complete problems. In Proceedings of the sixth annual ACM symposium on Theory of computing, STOC ’74, pages 47–63, New York, NY, USA, 1974. ACM. [6] Konstantin Andreev and Harald Racke. Balanced graph partitioning. In Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures, SPAA ’04, pages 120–124, New York, NY, USA, 2004. ACM. [7] FabrA-cio Benevenuto, Tiago Rodrigues, Meeyoung Cha, and VirgA-lio A. F. Almeida. Characterizing user behavior in online social networks. In Anja Feldmann and Laurent Mathy, editors, Proceedings of the 9th ACM SIGCOMM Conference on Internet Measurement 2009, Chicago, Illinois, USA, November 4-6, 2009, pages 49–62. ACM, 2009. [8] Fabian Schneider, Anja Feldmann, Balachander Krishnamurthy, and Walter Willinger. Understanding online social network usage from a network perspective. In Internet Measurement Conference, pages 35–48. ACM, 2009. [9] Josep M. Pujol, Vijay Erramilli, Georgos Siganos, Xiaoyuan Yang, Nikos Laoutaris, Parminder Chhabra, and Pablo Rodriguez. The little engine(s) that could: scaling online social networks. In Proceedings of the ACM SIGCOMM 2010 conference, SIGCOMM ’10, pages 375–386, New York, NY, USA, 2010. ACM. [10] Sabrina Gaito, Matteo Zignani, Gian Paolo Rossi, Alessandra Sala, Xiaohan Zhao, Haitao Zheng, and Ben Y. Zhao. On the bursty evolution of online social networks. In Proceedings of the First ACM International Workshop on Hot Topics on Interdisciplinary Social Networks Research, HotSocial ’12, pages 1–8, New York, NY, USA, 2012. ACM. [11] Apache cassandra. http://cassandra.apache.org/. [12] Mongodb. http://www.mongodb.org/. [13] Grzegorz Malewicz, Matthew H. Austern, Aart J.C Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. Pregel: a system for large-scale graph processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, SIGMOD ’10, pages 135–146, New York, NY, USA, 2010. ACM. [14] George Karypis and Vipin Kumar. Multilevel graph partitioning schemes. In Proc. 24th Intern. Conf. Par. Proc., III, pages 113–122. CRC Press, 1995. [15] George Karypis and Vipin Kumar. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput., 20(1):359–392, December 1998. [16] George Karypis and Vipin Kumar. Multilevel k-way partitioning scheme for irregular graphs. Journal of Parallel and Distributed Computing, 48:96–129, 1998. [17] Irene Moulitsas and George Karypis. Architecture aware partitioning algorithms. In Proceedings of the 8th international conference on Algorithms and Architectures for Parallel Processing, ICA3PP ’08, pages 42–53, Berlin, Heidelberg, 2008. Springer- Verlag. [18] Bin Shao, Haixun Wang, and Yanghua Xiao. Managing and mining large graphs: systems and implementations. In SIGMOD Conference, pages 589–592, 2012. [19] Josep M. Pujol, Vijay Erramilli, and Pablo Rodriguez. Divide and conquer: Partitioning online social networks. Technical report, 2009. [20] Jayanta Mondal and Amol Deshpande. Managing large dynamic graphs efficiently. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, SIGMOD ’12, pages 145–156, New York, NY, USA, 2012. ACM. [21] Adam Silberstein, Jeff Terrace, Brian F. Cooper, and Raghu Ramakrishnan. Feeding frenzy: selectively materializing users’ event feeds. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, SIGMOD ’10, pages 831–842, New York, NY, USA, 2010. ACM. [22] Adam Silberstein, Ashwin Machanavajjhala, and Raghu Ramakrishnan. Feed following: the big data challenge in social applications. In Databases and Social Networks, DBSocial ’11, pages 1–6, New York, NY, USA, 2011. ACM. [23] Sudarshan Kadambi, Jianjun Chen, Brian F. Cooper, David Lomax, Raghu Ramakrishnan, Adam Silberstein, Erwin Tam, and Hector Garcia-Molina. Where in the world is my data? PVLDB, 4(11):1040–1050, 2011.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/61050	-
dc.description.abstract	對於線上社群網路系統而言，如何有效率的管理及查詢大量的圖架構資料是一個日漸重要的問題，特別是要支援大量更新及取得資料的使用者要求。為了解決這個問題，系統會將社群圖資料切割分散到各個機器內，以便平行處理使用者的要求；而在這個狀況之下，處理大量要求的所需總時間以及處理單個要求的回應時間都被處理最慢的機器所限制住。然而，在過去的研究中大多都致力於降低在處理要求時所需的跨機器傳輸量，而非分散負載量以達到消除瓶頸及分攤時間成本。另外在雲端運算的幫助下，提供服務者可低成本地以動態租用不同等級之機器來逐漸升級硬體環境，所以我們在分散負載量時也應該注意到異質性的硬體環境。在本篇論文中，我們提出理論性的分析以及提出我們的系統設計，其中包含三個在運行時提供動態調整的啟發式的方法，用以降低系統處理大量使用者要求所需的時間。首先，我們提出圖分割的方法以決定使用者的主資料及要求將由哪一台機器負責。第二，我們提出副本策略輔助系統做較佳的副本決定。最後，我們提出遠端讀取方針來輔助系統從多個遠端讀取候選機器中選取較佳的讀取目標。我們也運用了模擬器來評估我們的方法，而結果顯示我們的方法顯著地降低了系統處理大量要求的所需時間。	zh_TW
dc.description.abstract	There is an increasing need for the systems of Online Social Networks (OSNs) to manage and query large volumes of graph-structured data efficiently, especially to support the massive {em update} and {em fetch} queries raised by users. To cope with this, the system partitions the social graph data across a set of machines and processes the user queries in parallel; therefore, the total time spent for processing all queries and the response time of queries are bounded by the machine which takes longest time. However, although there is much work on reducing the total inter-machine traffic during processing queries, up to date there is little work on distributing the loads to eliminate the bottlenecks and amortize the time cost. Further, since it is cost-effective to scale-up the hardware environment by reserving different classes of machines dynamically with the aid of cloud computing, we should also give consideration to the resulting heterogeneous environment when distributing the loads. In this paper, we provide theoretic analysis and propose the design of social graph data management system including three heuristic techniques to minimize the low total time spent for processing massive user queries at runtime. First, we propose a graph partitioning method to make the decision that which machine is in charge of the data and queries of each user. Second, we propose a replication strategy that dictate how replication decisions should be made. Finally, we propose a remote read policy that choose one from all remote replicas to read. We evaluate our method on simulator, and show that our system reduces the time spent significantly.	en
dc.description.provenance	Made available in DSpace on 2021-06-16T10:43:32Z (GMT). No. of bitstreams: 1 ntu-102-R00944035-1.pdf: 1367012 bytes, checksum: c4570c8030877cee5562f9cfe548ff9a (MD5) Previous issue date: 2013	en
dc.description.tableofcontents	口試委員會審定書i 誌謝ii 摘要iii Abstract iv 1 Introduction 1 2 Related Work 5 3 System Analysis 8 3.1 Query Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.1.1 Update Query . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.1.2 Fetch Query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.2 Techniques of Social Graph Data Manager . . . . . . . . . . . . . . . . . 10 3.2.1 Graph Partition . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.2.2 Data Replication Strategy . . . . . . . . . . . . . . . . . . . . . 11 3.2.3 Remote Read Policy . . . . . . . . . . . . . . . . . . . . . . . . 11 3.3 Resource Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.3.1 Physical Links . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.3.2 Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 4 Problem Statement 15 4.1 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.2 Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 4.3 Objective Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.3.1 Arrival Rates for Links . . . . . . . . . . . . . . . . . . . . . . . 19 4.3.2 Arrival Rates for Machines . . . . . . . . . . . . . . . . . . . . . 20 4.3.3 Objective Function . . . . . . . . . . . . . . . . . . . . . . . . . 20 5 Heuristic Method 24 5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 5.2 During Fetch Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 5.3 After Update Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 5.3.1 Replication Strategy . . . . . . . . . . . . . . . . . . . . . . . . 26 5.3.2 Graph Partition . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 6 Evaluation 30 6.1 Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 6.2 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 6.3 Algorithms for comparison . . . . . . . . . . . . . . . . . . . . . . . . . 31 6.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 6.4.1 Varying the Link Bandwidth . . . . . . . . . . . . . . . . . . . . 31 6.4.2 Varying the Machine Deployment . . . . . . . . . . . . . . . . . 35 6.4.3 Varying the Number of Machines . . . . . . . . . . . . . . . . . 39 7 Conclusion 41 Bibliography 42
dc.language.iso	en
dc.subject	圖資料庫	zh_TW
dc.subject	要求處理	zh_TW
dc.subject	平行資料庫	zh_TW
dc.subject	線上社群網路	zh_TW
dc.subject	資料中心	zh_TW
dc.subject	online social network	en
dc.subject	graph database	en
dc.subject	datacenter	en
dc.subject	parallel databases and query processing	en
dc.title	以架構自覺之社群圖管理達到有效率地更新及取得資料	zh_TW
dc.title	Architecture Aware Social Graph Management in Datacenters for Updating and Fetching Data Efficiently	en
dc.type	Thesis
dc.date.schoolyear	101-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	蔡子傑,逄愛君,廖婉君,吳曉光
dc.subject.keyword	線上社群網路,圖資料庫,資料中心,平行資料庫,要求處理,	zh_TW
dc.subject.keyword	online social network,graph database,datacenter,parallel databases and query processing,	en
dc.relation.page	46
dc.rights.note	有償授權
dc.date.accepted	2013-08-13
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊網路與多媒體研究所	zh_TW
顯示於系所單位：	資訊網路與多媒體研究所

文件中的檔案：

檔案	大小	格式
ntu-102-1.pdf 未授權公開取用	1.33 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。