Skip navigation

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More
DSpace logo
English
中文
  • Browse
    • Communities
      & Collections
    • Publication Year
    • Author
    • Title
    • Subject
    • Advisor
  • Search TDR
  • Rights Q&A
    • My Page
    • Receive email
      updates
    • Edit Profile
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 電機工程學系
Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/51422
Title: 一個具有成本效益的即時巨量資料處理系統
A Cost-Effective System for Real-Time Big Data Processing
Authors: Linjiun Tsai
蔡林峻
Advisor: 廖婉君
Keyword: 雲端運算,巨量資料,資源最佳化,記憶體管理,效能成本權衡,效能保證,網路最佳化,
Cloud Computing,Big Data,Resource Optimization,Memory Management,Performance-Cost Trade-off,Performance Guarantee,Network Optimization,
Publication Year : 2016
Degree: 博士
Abstract: The emerging Big Data paradigm has attracted attention from a wide variety of industry sectors, including healthcare, finance, retail, and manufacturing. To process massive heterogeneous data in a near real-time manner, Big Data applications should be run on dedicated server clusters that aggregate huge computing power, memory and storage through fast, unimpeded and reliable network infrastructures. Implementing such high-performance cluster computing is typically not economical for companies that only have occasional demand for Big Data processing.
Cloud computing is considered a viable solution to reducing operating costs for Big Data applications due to its on-demand, pay-per-use and scalable nature. The shared nature of cloud data centers, however, may make application performance unpredictable. The strict network requirements and extremely large memory demands of Big Data clusters also lead to difficulties in optimizing the allocation of cloud resources. These difficulties translate into higher hosting cost per application.
This dissertation proposes a solution to these problems that allows more concurrent Big Data applications to be deployed in cloud data centers in the most resource-efficient way while meeting their real-time requirements. To this end, we present 1) the first resource allocation framework that guarantees network performance for each Big Data cluster in multi-tenant clouds, 2) the first machine learning model that predicts the most efficient memory size for each Big Data cluster according to given upper bounds on performance penalties, and 3) an adaptive resource consolidation mechanism that strikes a balance between the number of required servers and the overhead of dynamic server consolidation for each cluster.
The resource allocation framework takes advantage of the symmetry of the fat-tree network structure to enable data center networks to be efficiently partitioned into mutually exclusive and collectively exhaustive star networks, each allocated to a Big Data cluster. It provides several promising properties: 1) every cluster is isolated from other ones; 2) the topology for every cluster is non-blocking for arbitrary traffic pattern; 3) the number of links to form each cluster is the minimum; 4) the per-hop distance between any two servers in a cluster is equal; 5) the network topology allocated to each cluster is guaranteed logically unchanged during and after reallocation; 6) for fault tolerant allocation, the number of backup links connecting backup and active servers is the minimum; 7) the data center networks can be elastically trimmed and expanded while maintaining all the properties above. Based on the promising properties of this framework, a cost-bounded resource reallocation mechanism is also proposed, making nearly full use of cloud resources in polynomial time.
The model for predicting the optimal memory size is designed to capture the memory management behaviors of Java virtual machines as well as the dynamic changes in memory consumption on distributed compute nodes. Through experiments on a physical Spark cluster containing 128 cores and 1 TB of memory, the model shows good prediction accuracy and saves a significant amount of memory space for operating Big Data applications that demand up to hundreds of gigabytes of working memory.
URI: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/51422
Fulltext Rights: 有償授權
Appears in Collections:電機工程學系

Files in This Item:
File SizeFormat 
ntu-105-1.pdf
  Restricted Access
2.38 MBAdobe PDF
Show full item record


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved