以深度強化學習進行公平考量之行動邊緣運算資源分配最佳化

張安浩; An-How Chang

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88997

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	蔡志宏	zh_TW
dc.contributor.advisor	Zsehong Tsai	en
dc.contributor.author	張安浩	zh_TW
dc.contributor.author	An-How Chang	en
dc.date.accessioned	2023-08-16T16:42:03Z	-
dc.date.available	2023-11-09	-
dc.date.copyright	2023-08-16	-
dc.date.issued	2023	-
dc.date.submitted	2023-08-08	-
dc.identifier.citation	[1] D. Sabella, A. Vaillant, P. Kuure, U. Rauschenbach, and F. Giust, “Mobile-edge computing architecture: The role of mec in the internet of things,” IEEE Consumer Electronics Magazine, pp. 84–91, Oct 2016. [2] Y. C. Hu, M. Patel, D. Sabella, N. Sprecher, and V. Young, “Mobile edge computing—a key technology towards 5g,” ETSI white paper, pp. 1–16, Nov 2015. [3] Q. V. Pham, F. Fang, V. N. Ha, M. J. Piran, M. Le, and et al., “A survey of multi-access edge computing in 5g and beyond: Fundamentals, technology integration, and state-of-the-art.” IEEE access, pp. 116 974–117 017, Aug 2020. [4] N. C. Luong, D. T. Hoang, Gong, S., Niyato, D., Wang, P., Y. C. Liang, and D. I. Kim, “Applications of deep reinforcement learning in communications and networking: A survey.” IEEE Communications Surveys and Tutorials, vol. 21, no. 4, pp. 3133–3174, 2019. [5] V. Mnih and K. et al., “Playing atari with deep reinforcement learning.” arXiv preprint arXiv:1312.5602, 2013. [6] W. Xiong, T. Hoang, and W. Y. Wang, “Deeppath: A reinforcement learning method for knowledge graph reasoning,” arXiv preprint arXiv:1707.06690, 2017. [7] K. Li, T. Zhang, and R. Wang, “Deep reinforcement learning for multiobjective optimization,” IEEE transactions on cybernetics, vol. 51, no. 6, pp. 3103–3114, 2020. [8] J. Li, H. Gao, T. Lv, and Y. Lu, “Deep reinforcement learning based computation offloading and resource allocation for mec,” IEEE Wireless Communications and Networking Conference(WCNC), pp. 1–6, 2018. [Online]. Available: https://ieeexplore.ieee.org/document/8377343 [9] H. Zhou, K. Jiang, X. Liu, X. Li, and V. C. Leung, “Deep reinforcement learning for energy-efficient computation offloading in mobile-edge computing,” IEEE Internet of Things Journal, vol. 9, no. 2, pp. 1517–1530, January 2021. [10] K. Cheng, Y. Teng, W. Sun, A. Liu, and X. Wang, “Energy-efficient joint offloading and wireless resource allocation strategy in multi-mec server systems,” Proc. IEEE ICC, pp. 1–6, May 2018. [11] L. Huang, S. Bi, and Y.-J. A. Zhang, “Deep reinforcement learning for online computation offloading in wireless powered mobile-edge computing networks,” IEEE Trans. Mobile Comput., vol. 19, no. 11, pp. 2581–2593, November 2020. [12] J. Zhou and X. Zhang, “Fairness-aware task offloading and resource allocation in cooperative mobile-edge computing,” IEEE Internet of Things Journal, vol. 9, no. 5, pp. 3812 – 3824, March 2022. [13] Z. Shu, J. Wan, J. Lin, S. Wang, D. Li, S. Rho, and C. Yang, “Traffic engineering in software-defined networking: Measurement and management,” IEEE access, vol. 4, pp. 3246–3256, 2016. [14] J. Liu, Y. Mao, J. Zhang, and K. B. Letaief, “Delay-optimal computation task scheduling for mobile-edge computing systems,” in 2016 IEEE international symposium on information theory (ISIT). IEEE, 2016, pp. 1451–1455.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88997	-
dc.description.abstract	隨著日益進步的使用者產品和使用者需求，手機、平板、手錶等行動裝置使用者(mobile user)逐漸要求越來越大量的運算需求已完成更高品質的成果，產品本身的硬體運算資源便可能不足以應付低容忍度完成時間需求。為了突破邊緣的硬體限制，行動邊緣運算(Mobile Edge Computing) 逐漸獲得重視，藉著鄰近的邊緣運算的分擔邊緣裝置(Edge Device)達成運算需求。本研究建構一個系統包含多個MEC伺服器並模擬多個使用者在系統中提出計算需求的環境，並提出以深度強化學習(Deep Reinforcement Learning)為系統架構的中央決策分配模型。藉由優化最大數量的工作數量的目標，並輔以公平分配指標檢視分配抉擇，以達到公平分配的最佳優化。此研究更進一步擴展實驗，藉由測試存在多種資料大小迥異的工作環境下，測試出模型期依然保持良好且公平分配決策，進一步驗證決策模型的可性度。此外，我們測試數種加權獎勵函數(Weighted Reward Function)並觀察模型不同設定下的表現。總結來說，實驗顯示提出之深度強化學習模型能夠在資料大小迥異的環境下提供公平的資源分配。	zh_TW
dc.description.abstract	As the requirement for computation grows exponentially in quantity and quality, mobile-edge computing (MEC) has perceived more attention to become a computational mechanism for near mobile users. With the additional computing power from nearby edge servers, edge users can potentially handle tasks that need to meet massive and real-time computation requirements. In this thesis, we consider a multi-MEC systems with multiple users, each users generate computational tasks follow by Poisson process. The thesis propose a deep reinforcement learning-based allocation model to allocate tasks gathering from all users in a centralized decision algorithm. We formulate the optimization goal as maximizing the number tasks finished within deadline while task fairness awareness is taken into considerations. In order to handle continuous task flows, we formulate a multi-round allocation scheme and simulate its operation in the real world complex system. For validation purposes, this thesis made several simulations on huge task size difference network to test model's performance and comparing to random allocation. In addition, we tested a various type of award functions, including a basic reward function and a set of weighted reward function, to observe their performance under different setups. In conclusion, the simulation results show that DRL model is effective in providing for task allocations for various task-size distributions.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-08-16T16:42:03Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2023-08-16T16:42:03Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	口試委員會審定書 i Acknowledgements iii 摘要 v Abstract vii Contents ix List of Figures xiii List of Tables xv Chapter 1 Introduction - 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Background and Literature Review . . . . . . . . . . . . . . . . . . 4 1.3 Research Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Chapter 2 System Architecture - 9 2.1 Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2 User & Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2.1 Users with different task size requirements . . . . . . . . . . . . . . 10 2.2.2 Task Arrival Pattern . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2.3 Task Structure Ai . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.3 At Base Station . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.3.1 Local Computation v.s. Remote Computation . . . . . . . . . . . . 12 2.3.2 Single Round to Multi-Round . . . . . . . . . . . . . . . . . . . . . 12 2.3.3 Local Offload Threshold ηi . . . . . . . . . . . . . . . . . . . . . . 13 Chapter 3 The Allocation Model - 15 3.1 Local Computing Mode . . . . . . . . . . . . . . . . . . . . . . . . 15 3.2 MEC Computing Mode . . . . . . . . . . . . . . . . . . . . . . . . 16 3.3 Deep Reinforcement Learning Model . . . . . . . . . . . . . . . . . 17 3.3.1 State, Action, and Reward Definition . . . . . . . . . . . . . . . . . 18 3.3.2 Q-learning method, Deep Q-learning . . . . . . . . . . . . . . . . . 21 3.3.3 The Public Round and The Mini Round . . . . . . . . . . . . . . . 25 Chapter 4 Performance Evaluation - 29 4.1 Multi-Round Simulation General Performance Analysis . . . . . . . 29 4.1.1 Environment System Setup . . . . . . . . . . . . . . . . . . . . . . 29 4.1.1.1 Data Size Type . . . . . . . . . . . . . . . . . . . . . . 30 4.1.1.2 MEC CPU Frequency . . . . . . . . . . . . . . . . . . 31 4.1.2 Completion Rate over different Arrival Rate . . . . . . . . . . . . . 31 4.1.2.1 Hard Completion Rate and Soft Completion Rate . . . 31 4.1.3 Utilization per Mini Round Performance . . . . . . . . . . . . . . . 33 4.1.4 Fairness Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.1.5 Model Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.2 Performance Analysis of Multi-Round Simulation over Different Data Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.2.1 Completion Rate over task size type . . . . . . . . . . . . . . . . . 39 4.2.2 Task Fairness Ratio over data size type . . . . . . . . . . . . . . . . 40 4.3 Large Data Size Difference with Weighted Reward . . . . . . . . . . 42 4.3.1 Weighed Reward Function . . . . . . . . . . . . . . . . . . . . . . 43 4.3.2 Fairness Index Evaluation . . . . . . . . . . . . . . . . . . . . . . . 47 Chapter 5 Conclusion and Future Work - 51 5.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 References 55 Denotation 59	-
dc.language.iso	en	-
dc.subject	計算分載最佳化	zh_TW
dc.subject	使用者平等	zh_TW
dc.subject	強化式學習	zh_TW
dc.subject	決策演算法	zh_TW
dc.subject	雲端資源分配	zh_TW
dc.subject	Resource Allocation	en
dc.subject	Reinforcement Learning	en
dc.subject	Fairness	en
dc.subject	Computation Offloading	en
dc.title	以深度強化學習進行公平考量之行動邊緣運算資源分配最佳化	zh_TW
dc.title	Fairness-Aware Deep Reinforcement Learning for Task Offloading and Resource Allocation in Mobile Edge Computing	en
dc.type	Thesis	-
dc.date.schoolyear	111-2	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	林風;馮輝文;鐘耀梁	zh_TW
dc.contributor.oralexamcommittee	Phone Lin;Huei-Wen Ferng;Yao-Liang Chung	en
dc.subject.keyword	計算分載最佳化,雲端資源分配,決策演算法,強化式學習,使用者平等,	zh_TW
dc.subject.keyword	Computation Offloading,Resource Allocation,Reinforcement Learning,Fairness,	en
dc.relation.page	61	-
dc.identifier.doi	10.6342/NTU202302089	-
dc.rights.note	同意授權(全球公開)	-
dc.date.accepted	2023-08-10	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	電信工程學研究所	-
顯示於系所單位：	電信工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-111-2.pdf	1.69 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。