一個具有成本效益的即時巨量資料處理系統

Linjiun Tsai; 蔡林峻

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/51422

標題:	一個具有成本效益的即時巨量資料處理系統 A Cost-Effective System for Real-Time Big Data Processing
作者:	Linjiun Tsai 蔡林峻
指導教授:	廖婉君
關鍵字:	雲端運算,巨量資料,資源最佳化,記憶體管理,效能成本權衡,效能保證,網路最佳化, Cloud Computing,Big Data,Resource Optimization,Memory Management,Performance-Cost Trade-off,Performance Guarantee,Network Optimization,
出版年 :	2016
學位:	博士
摘要:	The emerging Big Data paradigm has attracted attention from a wide variety of industry sectors, including healthcare, finance, retail, and manufacturing. To process massive heterogeneous data in a near real-time manner, Big Data applications should be run on dedicated server clusters that aggregate huge computing power, memory and storage through fast, unimpeded and reliable network infrastructures. Implementing such high-performance cluster computing is typically not economical for companies that only have occasional demand for Big Data processing. Cloud computing is considered a viable solution to reducing operating costs for Big Data applications due to its on-demand, pay-per-use and scalable nature. The shared nature of cloud data centers, however, may make application performance unpredictable. The strict network requirements and extremely large memory demands of Big Data clusters also lead to difficulties in optimizing the allocation of cloud resources. These difficulties translate into higher hosting cost per application. This dissertation proposes a solution to these problems that allows more concurrent Big Data applications to be deployed in cloud data centers in the most resource-efficient way while meeting their real-time requirements. To this end, we present 1) the first resource allocation framework that guarantees network performance for each Big Data cluster in multi-tenant clouds, 2) the first machine learning model that predicts the most efficient memory size for each Big Data cluster according to given upper bounds on performance penalties, and 3) an adaptive resource consolidation mechanism that strikes a balance between the number of required servers and the overhead of dynamic server consolidation for each cluster. The resource allocation framework takes advantage of the symmetry of the fat-tree network structure to enable data center networks to be efficiently partitioned into mutually exclusive and collectively exhaustive star networks, each allocated to a Big Data cluster. It provides several promising properties: 1) every cluster is isolated from other ones; 2) the topology for every cluster is non-blocking for arbitrary traffic pattern; 3) the number of links to form each cluster is the minimum; 4) the per-hop distance between any two servers in a cluster is equal; 5) the network topology allocated to each cluster is guaranteed logically unchanged during and after reallocation; 6) for fault tolerant allocation, the number of backup links connecting backup and active servers is the minimum; 7) the data center networks can be elastically trimmed and expanded while maintaining all the properties above. Based on the promising properties of this framework, a cost-bounded resource reallocation mechanism is also proposed, making nearly full use of cloud resources in polynomial time. The model for predicting the optimal memory size is designed to capture the memory management behaviors of Java virtual machines as well as the dynamic changes in memory consumption on distributed compute nodes. Through experiments on a physical Spark cluster containing 128 cores and 1 TB of memory, the model shows good prediction accuracy and saves a significant amount of memory space for operating Big Data applications that demand up to hundreds of gigabytes of working memory.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/51422
全文授權:	有償授權
顯示於系所單位：	電機工程學系

文件中的檔案：

檔案	大小	格式
ntu-105-1.pdf 未授權公開取用	2.38 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。