異質雲端資料中心之巨量資料處理效能保證與改進

Chien-Hung Chen; 陳建宏

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/71255

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	郭斯彥(Sy-Yen Kuo)
dc.contributor.author	Chien-Hung Chen	en
dc.contributor.author	陳建宏	zh_TW
dc.date.accessioned	2021-06-17T05:01:14Z	-
dc.date.available	2023-08-01
dc.date.copyright	2018-08-01
dc.date.issued	2018
dc.date.submitted	2018-07-25
dc.identifier.citation	[1] P. Ferrari, E. Sisinni, D. Brandao, and M. Rocha, 'Evaluation of Communication Latency in Industrial IoT Applications,' in Proc. IEEE Int.Workshop Measurement and Networking, 2017. [2] K. E. S. Desikan, M. Srinivasan, and C. S. R. Murthy, 'A Novel Distributed Latency-Aware Data Processing in Fog Computing-Enabled IoT Networks,' in Pro. ACM Workshop Distributed Information Processing in Wireless Networks, 2017, pp. 4:1-4:6. [3] L. Wang and R. Ranjan, 'Processing Distributed Internet of Things Data in Clouds,' IEEE Cloud Computing, vol. 2, no. 1, Apr. 2015. [4] M. Alfawair, 'A Cloud Storage Architecture for High Data Availability, Reliability, and Fault-tolerance,' in Proc. Int. Conf. Future Networks and Distributed Systems, 2017, pp. 19:1-19:6. [5] F. Bonomi, R. Milito, J. Zhu, and S. Addepalli, 'Fog Computing and Its Role in the Internet of Things,' in Proc. 1st Edition of the MCC Workshop on Mobile Cloud Computing, 2012, pp. 13-16. [6] S. Chen, T. Zhang, and W. Shi, 'Fog Computing,'IEEE Internet Computing, vol. 21, no. 2, pp. 4-6, Mar. 2017. [7] (2018) Apache Hadoop 3.1.0. [Online]. Available: https://hadoop.apache.org/docs/r3.1.0/ [8] (2018) Apache Spark - Lightning-Fast Cluster Computing. [Online]. Available: http://spark.apache.org/ [9] L. Zhou, S. Pan, J. Wang, and A. V. Vasilakos, 'Machine Learning on Big Data,' Neurocomputing, vol. 237, no. C, pp. 350-361, May 2017. [10] S. Garca, J. Luengo, and F. Herrera, Data Preprocessing in Data Mining. Springer Publishing Company, Incorporated, 2014. [11] J. Dean and S. Ghemawat, 'MapReduce: Simplified data processing on large clusters,' ACM Commun., vol. 51, no. 1, pp. 107-113, Jan. 2008. [12] T. White, Hadoop: The Definitive Guide: Storage and Analysis at Internet Scale, 4th Edition. O'Reilly Media, Inc., 2015. [13] X. Dong, Y. Wang, and H. Liao, 'Scheduling mixed real-time and non-real-time applications in MapReduce environment,' in Proc. IEEE 17th Int. Conf. Parallel Distrib. Syst., Dec. 2011, pp. 9-16. [14] E. Hwang and K. H. Kim, 'Minimizing Cost of Virtual Machines for Deadline-Constrained MapReduce Applications in the Cloud,' in Proc. ACM/IEEE GRID, Sep. 2012, pp. 130-138. [15] J. Polo, Y. Becerra, D. Carrera, M. Steinder, I. Whalley, J. Torres, and E. Ayguad’e,'Deadline-based MapReduce workload management,' IEEE Trans. Netw. Service Manag., vol. 10, no. 2, pp. 231-244, 2013. [16] B. Cho, M. Rahman, T. Chajed, I. Gupta, C. Abad, N. Roberts, and P. Lin, 'Natjam: Design and Evaluation of Eviction Policies for Supporting Priorities and Deadlines in Mapreduce Clusters,' in Proc. ACM SoCC, 2013, pp. 1-17. [17] M. Mattess, R. N. Calheiros, and R. Buyya, 'Scaling MapReduce Applications Across Hybrid Clouds to Meet Soft Deadlines,' in Proc. IEEE AINA, Mar. 2013, pp. 629-636. [18] M. Zaharia, A. Konwinski, A. D. Joseph, R. Katz, and I. Stoica, 'Improving MapReduce performance in heterogeneous environments,' in Proc. OSDI, 2008, pp. 29-42. [19] B. T. Rao and L. S. S. Reddy, 'Survey on Improved Scheduling in Hadoop MapReduce in Cloud Environments,' IJCA, vol. 34, no. 9, pp. 29-33, Nov. 2011. [20] T. Sandholm and K. Lai, 'Dynamic Proportional Share Scheduling in Hadoop,' in Proc. Int. Workshop JSSPP, Apr. 2010, pp. 110-131. [21] H. Li, A. Ghodsi, M. Zaharia, S. Shenker, and I. Stoica, 'Tachyon: Reliable, Memory Speed Storage for Cluster Computing Frameworks,' in Proc. ACM Symposium on Cloud Computing (SOCC'14), Seattle, WA, USA, 2014, pp. 1-15. [22] H. Karau, A. Kowinski, and M. Hamstra, Learning Spark: Lightning-fast Big Data Analysis. USA: O'Reilly Media, Inc., 2015. [23] (2018) Alluxio - Open Source Memory Speed Virtual Distributed Storage. [Online]. Available: http://www.alluxio.org/ [24] (2018) Welcome to Apache Hadoop [Online]. Available: http://hadoop.apache.org/ [25] C. Ji, Y. Li, W. Qiu, U. Awada, and K. Li, 'Big Data Processing in Cloud Computing Environments,' in Proc. Int. Symp. Pervasive Systems, Algorithms and Networks, 2012, pp. 17-23. [26] Y. Jararweh, M. Al-Ayyoub, A. Darabseh, E. Benkhelifa, M. Vouk, and A. Rindos, 'Software Defined Cloud,' Future Generation Computer Systems, vol. 58, no. C, pp. 56-74, May 2016. [27] S. C. Gupta and A. Goel, 'Software Defined Storage Technology,' in Proc. Asia-Pacific Software Engineering Conference, 2015. [28] G. Lewis, S. Echeverr’?a, S. Simanta, B. Bradshaw, and J. Root, 'Tactical Cloudlets: Moving Cloud Computing to the Edge,' in Proc. IEEE Military Communications Conference, 2014. [29] N. Mohan, P. Zhou, K. Govindaraj, and J. Kangasharju, 'Managing Data in Computational Edge Clouds,' in Proc. Workshop Mobile Edge Communications, 2017, pp. 19-24. [30] K. Hong, D. Lillethun, U. Ramachandran, B. Ottenw‥alder, and B. Koldehofe, 'Mobile Fog: A Programming Model for Large-scale Applications on the Internet of Things,' in Proc. ACM SIGCOMM Workshop on Mobile Cloud Computing, 2013, pp. 15-20. [31] A. Kumar, M. Boehm, and J. Yang, 'Data Management in Machine Learning: Challenges, Techniques, and Systems,' in Proc. ACM Int. Conf. Management of Data, 2017, pp. 1717-1722. [32] X. L. Dong and D. Srivastava, 'Big Data Integration,' VLDB Endowment, vol. 6, no. 11, pp. 1188-1189, Aug. 2013. [33] A. Verma, L. Pedrosa, M. Korupolu, D. Oppenheimer, E. Tune, and J. Wilkes, 'Large-scale Cluster Management at Google with Borg,' in Proc. European Conf. Computer Systems, 2015, pp. 18:1-18:17. [34] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, M. Kudlur, J. Levenberg, R. Monga, S. Moore, D. G. Murray, B. Steiner, P. Tucker, V. Vasudevan, P. Warden, M. Wicke, Y. Yu, and X. Zheng, 'TensorFlow: A System for Large-scale Machine Learning,' in Proc. USENIX Conf. Operating Systems Design and Implementation, 2016, pp. 265-283. [35] R. Chen, I. E. Akkus, B. Viswanath, I. Rimac, and V. Hilt, 'Towards Reliable Application Deployment in the Cloud,' in Proc. Int. Conf. Emerging Networking EXperiments and Technologies, 2017, pp. 464-477. [36] B. Hou, F. Chen, Z. Ou, R. Wang, and M. Mesnier, 'Understanding I/O Performance Behaviors of Cloud Storage from a Client's Perspective,' ACM Trans. Storage, vol. 13, no. 2, pp. 16:1-16:36, May 2017. [37] M. U. Hameed, S. A. Haider, and B. Kantarci, 'Performance Impacts of Hybrid Cloud Storage,' Computing, vol. 99, no. 12, pp. 1207-1229, Dec. 2017. [38] M. B. Gudadhe and A. J. Agrawal, 'Performance Analysis Survey of Data Replication Strategies in Cloud Environment,' in Proc. Int. Conf. Big Data Research, 2017, pp. 38-43. [39] S. Eom, K. Tabet, R. Mokadem, and M. R. Laouar, 'Data Replication in Cloud Systems: A Survey,' International Journal of Information Systems and Social Change, vol. 8, no. 3, pp. 17-33, Jul. 2017. [40] A. Verbitski, A. Gupta, D. Saha, M. Brahmadesam, K. Gupta, R. Mittal, S. Krishnamurthy, S. Maurice, T. Kharatishvili, and X. Bao, 'Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases,' in Proc. ACM Int. Conf. Management of Data, 2017, pp. 1041-1052. [41] M. K. Aguilera, 'Tutorial on Geo-replication in Data Center Applications,' ACM SIGMETRICS Performance Evaluation Review, vol. 41, no. 1, pp. 385-386, Jun. 2013. [42] M. W. Convolbo, J. Chou, S. Lu, and Y. C. Chung, 'DRASH: A Data Replication-Aware Scheduler in Geo-Distributed Data Centers,' in Proc. IEEE Int. Conf. Cloud Computing Technology and Science, 2016, pp. 302-309. [43] V. Vavilapalli, S. Seth, B. Saha, C. Curino, O. O'Malley, S. Radia, B. Reed, E. Baldeschwieler, A. Murthy, C. Douglas, S. Agarwal, M. Konar, R. Evans, T. Graves, J. Lowe, and H. Shah, 'Apache Hadoop YARN: Yet Another Resource Negotiator,' in Proc. 4th Annual Symposium on Cloud Computing (SOCC'13), Santa Clara, CA, USA, 2013, pp. 5:1-5:16. [44] W.Wang, K. Zhu, L. Ying, J. Tan, and L. Zhang, 'MapTask Scheduling in MapReduce With Data Locality: Throughput and Heavy-Traffic Optimality,' IEEE/ACM Transactions on Networking, vol. 24, no. 1, pp. 190-203, 2016. [45] L.-Y. Ho, J.-J. Wu, and P. Liu, 'Optimal Algorithms for Cross-Rack Communication Optimization in MapReduce Framework,' in Proc. IEEE CLOUD, Jul. 2011, pp. 420-427. [46] L. Chen,W. Lu, X. Che,W. Xing, L.Wang, and Y. Yang, 'MRSIM: Mitigating Reducer Skew In MapReduce,' in Proc. Int. Conf. Advanced Information Networking and Applications Workshops, Mar. 2017, pp. 379-384. [47] Y. Gao, Y. Zhou, B. Zhou, L. Shi, and J. Zhang, 'Handling Data Skew in MapReduce Cluster by Using Partition Tuning,' Journal of Healthcare Engineering, vol. 2017, no. 1, Mar. 2017. [48] F. Xu, F.-M. Liu, P. Yin, and H. Jin, 'Network-Aware Task Assignment for MapReduce Applications in Shared Clusters,' Journal of Internet Technology, vol. 16, no. 2, pp. 325-333, Mar. 2015. [49] D. Cheng, P. Lama, C. Jiang, and X. Zhou, 'Towards Energy Efficiency in Heterogeneous Hadoop Clusters by Adaptive Task Assignment,' in Proc. Int. Conf. Distributed Computing Systems, 2015. [50] X. Meng and L. Golab, 'Optimal Reducer Placement to Minimize Data Transfer in MapReduce-Style Processing,' in Proc. IEEE Int. Conf. Big Data, 2017. [51] Rainer Burkard and Mauro DellA’ mico and Silvano Martello, Assignment Problems. Society for Industrial and Applied Mathematics, 2012. [52] B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A. Joseph, R. Katz, S. Shenker, and I. Stoica, 'Mesos: A Platform for Fine-grained Resource Sharing in the Data Center,' in Proc. 8th USENIX Conference on Networked Systems Design and Implementation (NSDI'11), Boston, MA, USA, 2011, pp. 295-308. [53] N. Tiwari, S. Sarkar, U. Bellur, and M. Indrawan, 'Classification Framework of MapReduce Scheduling Algorithms,' ACM Computing Surveys, vol. 47, no. 3, pp. 49:1-49:38, Apr. 2015. [54] Q. Zhang, M. F. Zhani, Y. Yang, R. Boutaba, and B.Wong, 'PRISM: Fine-Grained Resource-Aware Scheduling for MapReduce,' IEEE Trans. Cloud Comput., Jan. 2015. [55] Y. Yao, J. Wang, B. Sheng, C. Tan, and N. Mi, 'Self-adjusting slot configurations for homogeneous and heterogeneous Hadoop clusters,' IEEE Trans. Cloud Comput., Mar. 2015. [56] M. Pastorelli, D. Carra, M. Dell'Amico, and P. Michiardi, 'HFSP: Bringing sizebased scheduling to Hadoop,' IEEE Trans. Cloud Comput., Jan. 2015. [57] M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. Franklin, S. Shenker, and I. Stoica, 'Resilient Distributed Datasets: A Fault-tolerant Abstraction for In-memory Cluster Computing,' in Proc. 9th USENIX Conference on Networked Systems Design and Implementation (NSDI'12), Berkeley, CA, USA, 2012, pp. 2-2. [58] X. Shi, M. Chen, L. He, X. Xie, L. Lu, H. Jin, Y. Chen, and S. Wu, 'Mammoth: Gearing Hadoop Towards Memory-Intensive MapReduce Applications,' IEEE Transaction on Parallel and Distributed Systems, vol. 26, no. 8, pp. 2300-2315, Jul. 2014. [59] Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, and J. Hellerstein, 'Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud,' VLDB Endowment, vol. 5, no. 8, pp. 716-727, Apr. 2012. [60] B. Ooi, Y. Wang, Z. Xie, M. Zhang, K. Zheng, K. Tan, S. Wang, W. Wang, Q. Cai, G. Chen, J. Gao, Z. Luo, and A. Tung, 'SINGA: A Distributed Deep Learning Platform,' in Proc. 23rd ACM International Conference on Multimedia (MM'15), Brisbane, Australia, 2015, pp. 685-688. [61] A. Toshniwal, S. Taneja, A. Shukla, K. Ramasamy, J. M. Patel, S. Kulkarni, J. Jackson, K. Gade, M. Fu, J. Donham, N. Bhagat, S. Mittal, and D. Ryaboy, 'Storm@twitter,' in Proc. 2014 ACM SIGMOD International Conference on Management of Data (SIGMOD'14), Snowbird, Utah, USA, 2014, pp. 147-156. [62] L. Neumeyer, B. Robbins, A. Nair, and A. Kesari, 'S4: Distributed Stream Computing Platform,' in Proc. 2010 IEEE International Conference on Data Mining Workshops (ICDMW), Sydney, NSW, Australia, 2010, pp. 170-177. [63] R. Gandhi, A. Gupta, A. Povzner, W. Belluomini, and T. Kaldewey, 'Mercury: Bringing Efficiency to Key-value Stores,' in Proc. 6th International Systems and Storage Conference (SYSTOR'13), Haifa, Israel, 2013, pp. 6:1-6:6. [64] H. Lim, D. Han, D. Andersen, and M. Kaminsky, 'MICA: A Holistic Approach to Fast In-memory Key-value Storage,' in Proc. 11th USENIX Conference on Networked Systems Design and Implementation (NSDI'14), Seattle, WA, USA, 2014, pp. 429-444. [65] C. Mitchell, Y. Geng, and J. Li, 'Using One-sided RDMA Reads to Build a Fast, CPU-efficient Key-value Store,' in Proc. 2013 USENIX Conference on Annual Technical Conference (USENIX ATC'13), San Jose, CA, USA, 2013, pp. 103-114. [66] G. Ananthanarayanan, A. Ghodsi, A.Wang, D. Borthakur, S. Kandula, S. Shenker, and I. Stoica, 'PACMan: CoordinatedMemory Caching for Parallel Jobs,' in Proc. 9th USENIX Conference on Networked Systems Design and Implementation (NSDI'12), San Jose, CA, USA, 2012, pp. 267-280. [67] Z. Yu, M. Li, X. Yang, H. Zhao, and X. Li, 'Taming Non-local Stragglers using Efficient Prefetching in MapReduce,' in Proc. 2015 IEEE International Conference on Cluster Computing (CLUSTER), Chicago, IL, USA, 2015, pp. 52-61. [68] M. Sun, H. Zhuang, C. Li, K. Lu, and X. Zhou, 'Scheduling Algorithm Based on Prefetching in MapReduce Clusters,' Applied Soft Computing, vol. 38, pp. 1109-1118, Jan. 2016. [69] E. Baccelli, C. Mehlis, O. Hahm, T. C. Schmidt, and M. W‥ahlisch, 'Information Centric Networking in the IoT: Experiments with NDN in theWild,' in Proc. ACM Conf. Information-Centric Networking, 2014, pp. 77-86. [70] M.-K. Chang, Y.-W. Chan, H.-P. Tsai, T.-C. Chen, and M.-H. Chuang, 'Node Connectivity Analysis in Cloud-assisted IoT Environments,' Journal of Supercomputing, vol. 73, no. 7, pp. 2966-2986, Jul. 2017. [71] G. Kaur and M. Moh, 'Cloud Computing Meets 5G Networks: Efficient Cache Management in Cloud Radio Access Networks,' in Proc. ACM Southeast Conference, 2018, pp. 21:1-21:8. [72] C. Chen, C. Liu, P. Liu, B. T. Loo, and L. Ding, 'A Scalable Multi-datacenter Layer-2 Network Architecture,' in Proc. ACM SIGCOMM Symp. Software Defined Networking Research, 2015, pp. 8:1-8:12. [73] (2015) Apache Hadoop MapReduce. [Online]. Available: https://hadoop.apache.org/docs/r1.2.1/mapredn tutorial.html [74] K. Shvachko, H. Kuang, S. Radia, and R. Chansler, 'The Hadoop Distributed File System,' in Proc. IEEE MSST, May 2010, pp. 1-10. [75] S. Wang, A. Zhou, F. Yang, and R. N. Chang, 'Towards Network-Aware Service Composition in the Cloud,' IEEE Trans. Cloud Computing, vol. 1, no. 1, Aug. 2016. [76] A. Verma, L. Cherkasova, and R. H. Campbell, 'ARIA: Automatic Resource Inference and Allocation for MapReduce Environments,' in Proc. ACM ICAC, Jun. 2011, pp. 235-244. [77] B. Constantine and G. Forget and R. Geib and R. Schrage, 'Framework for TCP Throughput Testing,' IETF, RFC 6349, Aug. 2011. [78] P. Lu, Y. C. Lee, and A. Y. Zomaya, 'Non-intrusive Slot Layering in Hadoop,' in Proc. IEEE/ACM CCGrid, May 2013, pp. 253-260. [79] (2018) Apache Hadoop 2.7.2 - HDFS Users Guide. [Online]. Available: https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html [80] (2018) Welcome to Swift's documentation! [Online]. Available: http://docs.openstack.org/developer/swift/ [81] (2018) Amazon Simple Storage Service (Amazon S3). [Online]. Available: https://aws.amazon.com/s3 [82] D. T. Nguyen, T. Le Dinh, and B. M. Nguyen, 'Efficient Core Selection for Multicast Routing in Mobile Ad Hoc Networks,' in Advanced Computational Methods for Knowledge Engineering, 2014, pp. 415-426. [83] S. Bougleux and L. Brun, 'Linear Sum Assignment with Edition,' Computing Research Repository, 2016. [84] C. H. Chen, T. Y. Hsia, Y. Huang, and S. Y. Kuo, 'Scheduling-Aware Data Prefetching for Data Processing Services in Cloud,' in IEEE International Conference on Advanced Information Networking and Applications (AINA), March 2017, pp. 835-842. [85] B. Palanisamy, A. Singh, L. Liu, and B. Jain, 'Purlieus: Locality-aware Resource Allocation for MapReduce in a Cloud,' in Proc. ACM SC, 2011. [86] R. Wilhelm, J. Engblom, A. Ermedahl, N. Holsti, S. Thesing, D. Whalley, G. Bernat, C. Ferdinand, R. Heckmann, T. Mitra, F. Mueller, I. Puaut, P. Puschner, J. Staschulat, B. Becker, W. Damm, A. Podelski, R. Wilhelm, R. Wilhelm, J. Engblom, A. Ermedahl, N. Holsti, S. Thesing, G. Bernat, C. Ferdin, R. Heckmann, T. Mitra, I. Puaut, P. Puschner, and J. Staschulat, 'TheWorst-Case Execution-Time Problem-Overview of Methods and Survey of Tools,' ACM TESC, vol. 7, no. 3, Feb. 2008. [87] A. M. Al-Qawasmeh, S. Pasricha, A. A. Maciejewski, and H. J. Siegel, 'Power and Thermal-Aware Workload Allocation in Heterogeneous Data Centers,' IEEE Trans. Comput., vol. 64, no. 2, Feb. 2015. [88] S. Dasgupta, C. H. Papadimitriou, and U. V. Vazirani, Algorithms. McGraw-Hill, 2008. [89] R. M. Karp, 'Reducibility among combinatorial problems,' in Proc. Symposium on the Complexity of Computer Computations. Springer US, 1972, pp. 85-103. [90] K. Zhang, 'A Constrained Edit Distance between Unordered Labeled Trees,' Algorithmica, vol. 15, no. 3, pp. 205-222, Mar. 1996. [91] C.-X. Xu, 'A Simple Solution to Maximum Flow at Minimum Cost,' in Proc. 2nd International Conference on Industrial miscs and Computer Science (ICIECS'10), Wuhan, China, 2010, pp. 1-4. [92] Z. Han, H. Tan, Y. Wang, and J. Zhou, 'Channel Selection for Rendezvous with High Link Stability in Cognitive Radio Network,' in Proc. 9th International Conference on Wireless Algorithms, Systems and Applications (WASA'14), Harbin, China, 2014, pp. 494-506. [93] (2018) MathWorks R . [Online]. Available: http://www.mathworks.com/ [94] A. C. de S. Araujo, L. N. Sampaio, and A. Ziviani, 'BEEP: Balancing Energy, Redundancy, and Performance in Fat-Tree Data Center Networks,' IEEE Internet Computing, vol. 21, no. 4, pp. 44-53, 2017. [95] A. Roy, H. Zeng, J. Bagga, G. Porter, and A. C. Snoeren, 'Inside the Social Network's (Datacenter) Network,' in Proc. ACM Conf. Special Interest Group on Data Communication, 2015, pp. 123-137. [96] M. W. ur Rahman, N. S. Islam, X. Lu, and D. K. D. Panda, 'A Comprehensive Study of MapReduce Over Lustre for Intermediate Data Placement and Shuffle Strategies on HPC Clusters,' IEEE Trans. Parallel and Distributed Systems, vol. 28, no. 3, pp. 633-646, Mar. 2017. [97] (2018) XenServer - Open Source Server Virtualization. [Online]. Available: http://xenserver.org/ [98] K. Sarda, S. Sanghrajka, and R. Sion, 'Cloud Performance Benchmark Series: Amazon EC2 CPU Speed Benchmark,' Cloud Comput. Center & Netw. Security and Appl. Cryptography Lab, Stony Brook Univ., New York, USA, Tech. Rep., Nov. 2010. [Online]. Available: http://digitalpiglet.org/research/sion2010cloud-cpu.pdf [99] G. Wang and T. S. E. Ng, 'The Impact of Virtualization on Network Performance of Amazon EC2 Data Center,' in Proc. IEEE INFOCOM, Mar. 2010, pp. 1-9. [100] Q. Chen, C. Liu, and Z. Xiao, 'Improving MapReduce Performance Using Smart Speculative Execution Strategy,' IEEE Trans. Comput., vol. 36, no. 4, 2014. [101] (2015) VMWare. [Online]. Available: http://www.vmware.com/ [102] J. A. Issa, 'Performance Evaluation and Estimation Model Using Regression Method for Hadoop WordCount,' IEEE Access, vol. 3, pp. 2784-2793, 2015. [103] R. Moussa, 'Benchmarking Data Warehouse Systems in the Cloud,' in ACS International Conference on Computer Systems and Applications (AICCSA), May 2013, pp. 1-8. [104] P. Dreher, C. Byun, C. Hill, V. Gadepally, B. Kuszmaul, and J. Kepner, 'PageRank Pipeline Benchmark Proposal for a Holistic System Benchmark for Big-Data Platforms,' in IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), May 2016, pp. 929-937. [105] S. E. Mendili, Y. E. B. E. Idrissi, and N. Hmina, 'Benchmarking Study on Smart City Data Analytics,' in IEEE International Colloquium on Information Science and Technology (CiSt), Oct 2016, pp. 841-846.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/71255	-
dc.description.abstract	隨著物聯網發展，越來越多裝置傳送資料到雲端進行分析。為提供物聯網快速的響應，多個雲端資料中心被部署在不同的地利位置，讓雲更靠近物聯網裝置。因此巨量資料處理應用程式在雲端執行時需要從多個資料中心的遠端伺服器上讀取大量資料，而它的執行時間主要受到旗下每個子任務的資料讀取時間所影響。漫長的資料讀取時間嚴重惡化了巨量資料處理效能。本論文研究巨量資料處理在多個異質雲端資料中心的效能，其中包含三個研究主題。第一，考慮資料讀取成本及資料偏斜，本論文研究巨量資料處理應用程式的任務分配問題。使用雲端資料中心的網路拓撲以及資料儲存位置來計算每一個任務的資料讀取成本，將每個任務分配到資料讀取成本較低的伺服器執行可以減少一個應用程式的資料讀取時間。此外，當大量任務在雲端等待執行，系統需較長的計算時間為每一個任務挑選資料成本最小的伺服器。為解決此問題，本論文提出了一個貪婪演算法及一個啟發式演算法，盡可能降低資料讀取成本及演算法計算時間。第二，本論文研究如何排程多個具有執行時間限制的巨量資料處理應用程式。在現有的排程方法中並不考慮每個計算節點具有不同的執行效能，任務的執行時間也可能隨著系統負荷動態改變。針對此問題，本論文提出了一個新的排程演算法。此方法將具時間限制的巨量資料排程問題轉換成著名的最小權重二分匹配問題並且獲得最佳解。當系統無法滿足所有工作的執行時間限制時，它也可以最小化無法滿足條件的工作數量。第三，在記憶體技術方面，本論文提出了新的資料預取機制。此機制根據工作排程的資訊預先將資料載入至記憶體或將資料從記憶體驅逐，藉此減少資料的讀取時間並且回收珍貴的記憶體資源。最後，本論文以模擬程式和真實的巨量資料系統進行實驗，證實本論文所提出方法之可行性及有效性。	zh_TW
dc.description.abstract	With rapid growth of the Internet of Things (IoT), more and more devices transmit data to cloud for analysis. To provide quick responsiveness to the IoT services, multiple cloud data centers are geographically distributed to get close to the IoT devices. A data processing application run on the cloud requires to access a large amount of data stored on remote servers across the data centers. As the execution time is mainly dependent on the network access latency of each task, long data access latency deteriorates the performance of data processing. This paper investigates the performance of big data processing in heterogeneous cloud data centers including three main topics. First, the task assignment problem with considerations of data access cost and data skew is investigated. The network topology of cloud data centers and data locations are used to formulate the data access cost of each task. By assigning tasks to the servers with lower data access costs, the data access time of data processing application can be decreased. Moreover, when a large number of tasks are waiting to be assigned, it is difficult to quickly select the best servers with the lowest data access costs from multiple data centers for all waiting tasks. To solve the task assignment problem, a greedy algorithm and a heuristic are proposed for reducing the total data access cost and the algorithm computational time. Second, the deadline-constrained job scheduling is investigated. The existing deadline-constrained job schedulers do not consider the following two problems: various node performance and dynamical task execution time. In this paper, the Bipartite Graph modelling is utilized to propose a new Scheduler. It can obtain the optimal solution of the deadline-constrained scheduling problem by transforming the problem into a well-known minimum weighted bipartite matching problem. The proposed scheduler is able to shorten the data access time of jobs. If the total available computing resources of the system cannot satisfy the deadline requirements of all jobs, it can also minimize the number of jobs with the deadline violation. The third topic investigates in-memory techniques. The scheduling-aware data prefetching and eviction mechanisms are proposed for prefetching data to memory and releasing memory resources based on the scheduling information. Finally, both simulations and real experiments are performed to demonstrate the effectiveness of the proposed approaches.	en
dc.description.provenance	Made available in DSpace on 2021-06-17T05:01:14Z (GMT). No. of bitstreams: 1 ntu-107-D01921025-1.pdf: 5179858 bytes, checksum: 44c3ab54c263dfae8d414f0877e41428 (MD5) Previous issue date: 2018	en
dc.description.tableofcontents	口試委員審定書 i 致謝 iii 中文摘要 v Abstract vii 1 Introduction 1 2 Related Work 9 3 Preliminaries 17 3.1 System Model 17 3.2 Definition of Task Assignment Problem 19 3.3 Definition of Deadline-Constrained Scheduling Problem 21 3.4 Definition of Data Prefetching and Eviction Problem 23 4 Proposed Approaches 27 4.1 Proposed Task Assignments 27 4.1.1 Inspiration of the Central Based Task Assignment 27 4.1.2 Optimal Solution and Scalability Issue 30 4.1.3 Consideration of Data Skew 34 4.2 Proposed Scheduling Schemes 35 4.2.1 Deadline Partition 35 4.2.2 Weighted Bipartite Graph Formation 39 4.2.3 Scheduling Problem Transformation 44 4.3 Scheduling-Aware Data Prefetching 48 4.3.1 Optimal Solution 48 4.3.2 Heuristic Algorithm 51 4.3.3 Enhancements to Heuristic Algorithm 58 5 Performance Evaluation 65 5.1 Evaluation for Proposed Task Assignments 65 5.1.1 Simulation Environments 65 5.1.2 Simulation Results 66 5.1.3 Real-world Testbed Experimental Results 72 5.2 Evaluation for Proposed Schedulers 75 5.2.1 Simulation Environments 75 5.2.2 Simulation Results 77 5.2.3 Testbed Results 82 5.3 Evaluation for Proposed Data Prefetching mechanisms 84 5.3.1 Experimental Environments 84 5.3.2 Experimental Results 85 5.3.3 Evaluation of Algorithm Computational Time 90 6 Conclusion 93 References 95
dc.language.iso	en
dc.title	異質雲端資料中心之巨量資料處理效能保證與改進	zh_TW
dc.title	Guarantee and Improvement in Performance of Big Data Processing over Heterogeneous Cloud Data Centers	en
dc.type	Thesis
dc.date.schoolyear	106-2
dc.description.degree	博士
dc.contributor.oralexamcommittee	黃彥男(Yen-Nun Huang),顏嗣鈞(Hsu-Chun Yen),雷欽隆(Chin-Laung Lei),王勝德(Sheng-De Wang),王國禎(Kuo-Chen Wang)
dc.subject.keyword	雲端計算,多資料中心,巨量資料處理,任務分配,資料偏斜,巨量資料排程,異質性,記憶體技術,資料預取,最佳化演算法,啟發式演算法,	zh_TW
dc.subject.keyword	Cloud Computing,Multiple Data Centers,Big Data Processing,Task Assignment,Data Skew,Job Scheduling,Heterogeneity,In-memory Techniques,Data Prefetching,Optimal Solution,Heuristic Algorithm,	en
dc.relation.page	107
dc.identifier.doi	10.6342/NTU201801913
dc.rights.note	有償授權
dc.date.accepted	2018-07-25
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	電機工程學研究所	zh_TW
顯示於系所單位：	電機工程學系

文件中的檔案：

檔案	大小	格式
ntu-107-1.pdf 目前未授權公開取用	5.06 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。