Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/56509
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor蘇雅韻(Ya-Yunn Su)
dc.contributor.authorChi-Ou Chenen
dc.contributor.author陳紀甌zh_TW
dc.date.accessioned2021-06-16T05:32:10Z-
dc.date.available2014-08-21
dc.date.copyright2014-08-21
dc.date.issued2014
dc.date.submitted2014-08-13
dc.identifier.citation[1] Apache hadoop. http://hadoop.apache.org/.
[2] Apache hadoop rumen. http://hadoop.apache.org/docs/r1.2.1/rumen.html.
[3] scikit-learn: machine learning in python. http://scikit-learn.org/stable/.
[4] Planning guide:getting started with big data. Intel IT Center, January 2013.
[5] S. Babu. Towards automatic optimization of mapreduce programs. In SoCC, 2010.
[6] J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters.
In Communications of the ACM, 2008.
[7] H. Herodotos and S. Babu. Profiling, what-if analysis, and cost-based optimization
of mapreduce programs. In Proc. of the VLDB Endowment, 2011.
[8] H. Herodotou. Hadoop performance models. In Technical Report CS-2011-05, Duke
University, 2011.
[9] H. Herodotou, H. Lim, G. Luo, N. Borisov, L. Dong, F. B. Cetin, and S. Babu.Starfish: A self-tuning system for big data analytics. In 5th Conference on Innovative Data Systems Research, 2011
[10] S. Huang. The hibench benchmark suite: Characterization of the mapreduce-based
data analysis. In Data Engineering Workshops (ICDEW), 2010 IEEE 26th International Conference on, 2010.
[11] L. Jimmy and C. Dyer. Data-intensive text processing with mapreduce. In Synthesis
Lectures on Human Language Technologies, 2010.
[12] O. O’Malley. Terabyte sort on apache hadoop. In Yahoo, available online at:
http://sortbenchmark. org/Yahoo-Hadoop. pdf, 2008.
[13] A. Rabkin and R. Katz. How hadoop clusters break. In Software, IEEE 30.4, 2013.
[14] C. Shalizi. Lecture 10: Regression trees. http://www.stat.cmu.edu/ cshalizi/350-
2006/lecture-10.pdf, October 2006.
[15] T. Ye, H. T. Kaur, and S. Kalyanaraman. A recursive random search algorithm for
large-scale network parameter configuration. In ACM SIGMETRICS, 2003.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/56509-
dc.description.abstract隨著巨量資料分析的興起, 支持此類大規模資料處理的系統, 如分散
式系統也越受到關注, 在管理建立在日益龐大機器叢集的系統, 系統管
理者必須花更多心力管理。除了使系統能夠穩定地支援各式各樣的資
料分析應用, 也需要對系統作優化, 讓效能夠有效的提昇, 提高系統的使
率及降低運行這些資料分析應用的時間。然而, 對大規模機器叢集而
言, 系統參數調校是複雜的, 管理者除了要處理各個機器之間互動的問
題, 也必須針對不同應用, 了解其運算特性, 進而調校系統參數。而現行
系統參數調校的方法有可用性不高, 以及可調校的參數受到限制等缺
點。本研究基於這些現行的的方法, 以機器學習來改善上述的這些問
題, 打破這些限制使系統效能更進一步提昇
zh_TW
dc.description.abstractBig Data has emerged in recent year. Systems which is able to support such large-scale data analysis are received more attentions. The distributed system like Hadoop is most used for the analysis. However, it will be increasingly difficult for system administrators to manage the whole system when the cluster of the system scales out. System administrator should maintain the system to execute applications stably. Besides, they need to optimize the system to improve the performance, increase the system utilization and reduce the latency of application executing. And the configuration problem is the most important issue of system optimization. Configuration parameter tuning is related lots of complicated issues. It needs to understand the interaction between physical machines and the behavior of each applications. The current method, rule-based and cost-based optimization, have drawbacks like unfeasibility and limitation of configuration parameter space. Our work exploit machine learning to solve the problem to improve the performance.en
dc.description.provenanceMade available in DSpace on 2021-06-16T05:32:10Z (GMT). No. of bitstreams: 1
ntu-103-R01922108-1.pdf: 1684770 bytes, checksum: f930d2a6e472a3ab02a324a9664fbed1 (MD5)
Previous issue date: 2014
en
dc.description.tableofcontents摘要 i
Abstract ii
1 Introduction 1
1.1 Misconfiguration in Hadoop . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Configuration Tuning 5
2.1 Rule-based Optimization in Hadoop:Vaidya . . . . . . . . . . . . . . . . 5
2.2 Cost-based Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2.1 Limitation of Configuration Space . . . . . . . . . . . . . . . . . 7
2.2.2 Limitation of Portability . . . . . . . . . . . . . . . . . . . . . . 10
3 Design Concept 11
3.1 Configuration Parameters Space . . . . . . . . . . . . . . . . . . . . . . 11
3.2 Machine Learning-Based Predictor . . . . . . . . . . . . . . . . . . . . . 12
3.3 RRS Optimizer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4 Implementation 18
4.1 ML Predictor and RRS Optimizer . . . . . . . . . . . . . . . . . . . . . 18
5 Evaluation 20
5.1 Importance of Configuration Parameters . . . . . . . . . . . . . . . . . . 21
5.2 Accuracy of ML Predictor . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.3 Improvement from Machine Learning-based Optimization . . . . . . . . 23
6 Conclusion and Future Work 26
Bibliography 27
dc.language.isoen
dc.subject巨量資料zh_TW
dc.subject分散式系統zh_TW
dc.subject機器學習zh_TW
dc.subject全局優化zh_TW
dc.subject隨機抽樣zh_TW
dc.subjectrandom samplingen
dc.subjectbig dataen
dc.subjectdistributed systemen
dc.subjectmachine learningen
dc.subjectglobal optimizationen
dc.title以機器學習改善Hadoop系統優化zh_TW
dc.titleConfiguration Tuning on Hadoop System Based on Machine Learningen
dc.typeThesis
dc.date.schoolyear102-2
dc.description.degree碩士
dc.contributor.oralexamcommittee廖世偉(Shih-Wei Liao),林守德(Shou-De Lin)
dc.subject.keyword巨量資料,分散式系統,機器學習,全局優化,隨機抽樣,zh_TW
dc.subject.keywordbig data,distributed system,machine learning,global optimization,random sampling,en
dc.relation.page28
dc.rights.note有償授權
dc.date.accepted2014-08-13
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept資訊工程學研究所zh_TW
顯示於系所單位:資訊工程學系

文件中的檔案:
檔案 大小格式 
ntu-103-1.pdf
  未授權公開取用
1.65 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved