基於連續強化學習線上調整動態隨機存取記憶體控制器

黃彥豪; Yen-Hao Huang

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/96725

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	楊佳玲	zh_TW
dc.contributor.advisor	Chia-Lin Yang	en
dc.contributor.author	黃彥豪	zh_TW
dc.contributor.author	Yen-Hao Huang	en
dc.date.accessioned	2025-02-21T16:16:21Z	-
dc.date.available	2025-02-22	-
dc.date.copyright	2025-02-21	-
dc.date.issued	2024	-
dc.date.submitted	2024-12-24	-
dc.identifier.citation	[1] M. Liu, W. Ji, Z. Wang, and X. Pu, “A memory access scheduling method for multi- core processor,” in 2009 Second International Workshop on Computer Science and Engineering, vol. 1, pp. 367–371, 2009. [2] H. David, C. Fallin, E. Gorbatov, U. R. Hanebutte, and O. Mutlu, “Memory power management via dynamic voltage/ frequency scaling,” in Proceedings of the 8th ACM International Conference on Autonomic Computing, ICAC ’11, (New York, NY, USA), p. 31–40, Association for Computing Machinery, 2011. [3] R. Buyya, S. Ilager, and P. Arroba, “Energy-efficiency and sustainability in new generation cloud computing: A vision and directions for integrated management of data centre resources and workloads,” Software: Practice and Experience, vol. 54, no. 1, pp. 24–38, 2024. [4] L. Steiner, M. Jung, F. S. Prado, K. Bykov, and N. Wehn, “Dramsys4.0: An open- source simulation framework for in-depth dram analyses,” International Journal of Parallel Programming, vol. 50, p. 217–242, Apr. 2022. [5] M. V. Natale, M. Jung, K. Kraft, F. Lauer, J. Feldmann, C. Sudarshan, C. Weis,S. Krumke, and N. Wehn, “Efficient generation of application specific memory con- trollers,” in Proceedings of the 6th International Symposium on Memory Systems, MEMSYS ’20, (New York, NY, USA), p. 233–247, Association for Computing Machinery, 2020. [6] M. Ghasempour, A. Jaleel, J. D. Garside, and M. Luján, “Happy: Hybrid address- based page policy in drams,” in Proceedings of the Second International Symposium on Memory Systems, MEMSYS ’16, (New York, NY, USA), p. 311–321, Association for Computing Machinery, 2016. [7] S. Krishnan, A. Yazdanbakhsh, S. Prakash, J. Jabbour, I. Uchendu, S. Ghosh,B. Boroujerdian, D. Richins, D. Tripathy, A. Faust, and V. Janapa Reddi, “Archgym: An open-source gymnasium for machine learning assisted architecture design,” in Proceedings of the 50th Annual International Symposium on Computer Architecture, ISCA ’23, (New York, NY, USA), Association for Computing Machinery, 2023. [8] L. Steiner, G. Delazeri, I. Prando da Silva, M. Jung, and N. Wehn, “Automatic dram subsystem configuration with irace,” in Proceedings of the DroneSE and RAPIDO: System Engineering for Constrained Embedded Systems, RAPIDO ’23, (New York, NY, USA), p. 66–72, Association for Computing Machinery, 2023. [9] R. V. W. Putra, M. A. Hanif, and M. Shafique, “Drmap: a generic dram data map- ping policy for energy-efficient processing of convolutional neural networks,” in Proceedings of the 57th ACM/IEEE Design Automation Conference, DAC ’20, IEEE Press, 2020. [10] M. Jung, D. M. Mathew, C. Weis, N. Wehn, I. Heinrich, M. V. Natale, and S. O. Krumke, “Congen: An application specific dram memory controller generator,” in Proceedings of the Second International Symposium on Memory Systems, MEMSYS ’16, (New York, NY, USA), p. 257–267, Association for Computing Machin- ery, 2016. [11] M. Ghasempour, A. Jaleel, J. D. Garside, and M. Luján, “Dream: Dynamic re-arrangement of address mapping to improve the performance of drams,” in Proceedings of the Second International Symposium on Memory Systems, MEM- SYS ’16, (New York, NY, USA), p. 362–373, Association for Computing Machin- ery, 2016. [12] X. Lin, L. Sun, F. Tu, L. Liu, X. Li, S. Wei, and S. Yin, “Adroit: An adaptive dynamic refresh optimization framework for dram energy saving in dnn training,” in Proceedings of the 58th ACM/IEEE Design Automation Conference, DAC ’21, pp. 751–756, 2021. [13] X. Li, Z. Yuan, Y. Guan, G. Sun, T. Zhang, R. Wei, and D. Niu, “Flatfish: A reinforcement learning approach for application-aware address mapping,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 41, no. 11, pp. 4758–4770, 2022. [14] B. Maity, B. Donyanavard, A. Surhonne, A. Rahmani, A. Herkersdorf, and N. Dutt, “Seams: Self-optimizing runtime manager for approximate memory hierarchies,” ACM Transactions on Embedded Computing Systems, vol. 20, no. 5, pp. 1–26, 2021. [15] E. Ipek, O. Mutlu, J. F. Martínez, and R. Caruana, “Self-optimizing memory con- trollers: A reinforcement learning approach,” in Proceedings of the 35th Annual International Symposium on Computer Architecture, ISCA ’08, pp. 39–50, 2008. [16] K. Khetarpal, M. Riemer, I. Rish, and D. Precup, “Towards continual reinforcement learning: A review and perspectives,” Journal of Artificial Intelligence Research, vol. 75, pp. 1401–1476, 12 2022. [17] M. Wolczyk, B. Cupiał, M. Zając, R. Pascanu, Ł. Kuciński, and P. Miłoś, “On the role of forgetting in fine-tuning reinforcement learning models,” in Workshop on Reincarnating Reinforcement Learning at ICLR 2023, 2023. [18] M. Wołczyk, M. Zając, R. Pascanu, L. Kuciński, and P. Miłoś, “Continual world: a robotic benchmark for continual reinforcement learning,” in Proceedings of the 35th Advances in Neural Information Processing Systems, NeurIPS ’21, (Red Hook, NY, USA), Curran Associates Inc., 2024. [19] S. Kanev, J. P. Darago, K. Hazelwood, P. Ranganathan, T. Moseley, G.-Y. Wei, and D. Brooks, “Profiling a warehouse-scale computer,” in Proceedings of the 42nd Annual International Symposium on Computer Architecture, ISCA ’15, pp. 158– 169, 2015. [20] N. Khan, I. Yaqoob, I. A. T. Hashem, Z. Inayat, W. K. Mahmoud Ali, M. Alam, M. Shiraz, and A. Gani, “Big data: survey, technologies, opportunities, and chal- lenges,” The scientific world journal, vol. 2014, no. 1, p. 712826, 2014. [21] M. Wołczyk, M. Zając, R. Pascanu, L. Kuciński, and P. Miłoś, “Disentangling trans- fer in continual reinforcement learning,” in Proceedings of the 36th Advances in Neural Information Processing Systems, NeurIPS ’24, (Red Hook, NY, USA), Cur- ran Associates Inc., 2024. [22] A. Chaudhry, M. Rohrbach, M. Elhoseiny, T. Ajanthan, P. Dokania, P. Torr, and M. Ranzato, “Continual learning with tiny episodic memories,” in Workshop on Multi-Task and Lifelong Reinforcement Learning, 2019. [23] G. Dulac-Arnold, N. Levine, D. J. Mankowitz, J. Li, C. Paduraru, S. Gowal, and T. Hester, “Challenges of real-world reinforcement learning: definitions, bench- marks and analysis,” Machine Learning, vol. 110, pp. 2419–2468, Sep 2021. [24] Y. Yu, “Towards sample efficient reinforcement learning,” in Proceedings of the 27th International Joint Conference on Artificial Intelligence, IJCAI’18, p. 5739–5743, AAAI Press, 2018. [25] C. S. D. Witt, T. Gupta, D. Makoviichuk, V. Makoviychuk, P. H. S. Torr, M. Sun, and S. Whiteson, “Is independent learning all you need in the starcraft multi-agent challenge?,” ArXiv, vol. abs/2011.09533, 2020. [26] K. Lee, S. Subramanian, and M. Crowley, “Investigation of independent rein- forcement learning algorithms in multi-agent environments,” Frontiers in Artificial Intelligence, vol. 5, 09 2022. [27] ASUS ROG, “Memtweakit.” [Online]. Available: https://rog.asus.com/tag/ memtweakit/. [Accessed: 2024-10-27]. [28] Intel Technology, “Processor e7 v2 2800/4800/8800 product family.” [On- line]. Available: https://www.intel.com/content/dam/www/public/us/en/ documents/datasheets/xeon-e7-v2-datasheet-vol-2.pdf. [Accessed: 2024-12-19]. [29] M. Hassan and H. Patel, “Mcxplore: Automating the validation process of dram memory controller designs,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 37, no. 5, pp. 1050–1063, 2018. [30] Micron Technology, “4gb: x4, x8, x16 ddr4 sdram.” [Online]. Avail- able: https://tw.micron.com/products/memory/dram-components/ddr4-sdram/part-catalog/part-detail/mt40a512m8sa-062e-it-f.Acessed: 2024-11-20]. [31] S. Gronauer and K. Diepold, “Multi-agent deep reinforcement learning: a survey,”Artificial Intelligence Review, vol. 55, no. 2, pp. 895–943, 2022. [32] A. Wong, T. Bäck, A. V. Kononova, and A. Plaat, “Deep multiagent reinforce- ment learning: challenges and directions,” Artificial Intelligence Review, vol. 56, pp. 5023–5056, Jun 2023. [33] L. Wang, X. Zhang, H. Su, and J. Zhu, “A comprehensive survey of continual learn- ing: Theory, method and application,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 8, pp. 5362–5383, 2024. [34] J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu,K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska, D. Hassabis, C. Clopath,D. Kumaran, and R. Hadsell, “Overcoming catastrophic forgetting in neural net- works,” Proceedings of the National Academy of Sciences, vol. 114, p. 3521–3526, Mar. 2017. [35] H. Ritter, A. Botev, and D. Barber, “Online structured laplace approximations for overcoming catastrophic forgetting,” in Proceedings of the 32nd Advances in Neural Information Processing Systems, NeurIPS’18, (Red Hook, NY, USA), p. 3742–3752,Curran Associates Inc., 2018. [36] H. Miao, Y. Zhao, C. Guo, B. Yang, K. Zheng, F. Huang, J. Xie, and C. S. Jensen, “ A Unified Replay-Based Continuous Learning Framework for Spatio-Temporal Prediction on Streaming Data ,” in Proceedings of the 40th International Conference on Data Engineering, ICDE’24, (Los Alamitos, CA, USA), pp. 1050–1062, Computer Society, 2024. [37] M. Delange, R. Aljundi, M. Masana, S. Parisot, X. Jia, A. Leonardis, G. Slabaugh, and T. Tuytelaars, “A continual learning survey: Defying forgetting in classification tasks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, p. 1–1, 2021. [38] A. Mallya and S. Lazebnik, “Packnet: Adding multiple tasks to a single network by iterative pruning,” in Proceedings of the 31th IEEE/CVF Conference on ComputernVision and Pattern Recognition, CVPR’18, pp. 7765–7773, 2018. [39] A. Chaudhry, M. Ranzato, M. Rohrbach, and M. Elhoseiny, “Efficient lifelong learn- ing with a-gem,” ArXiv, vol. abs/1812.00420, 2018. [40] X. Chen, J. Wang, and K. Xie, “Trafficstream: A streaming traffic flow forecasting framework based on graph neural networks and continual learning,” in Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (Z.-H. Zhou, ed.), IJCAI’21, pp. 3620–3626, International Joint Conferences on Artificial Intelli- gence Organization, 8 2021. Main Track. [41] D. Lee, Y. Kim, G. Pekhimenko, S. M. Khan, V. Seshadri, K. K.-W. Chang, and O. Mutlu, “Adaptive-latency dram: Optimizing dram timing for the common-case,” in Proceedings of the 21st International Symposium on High Performance Computer Architecture, HPCA’15, pp. 489–501, 2015. [42] A. Kanervisto, C. V. Scheller, and V. Hautamäki, “Action space shaping in deep re- inforcement learning,” in Proceedings of the IEEE Conference on Games, COG’20, pp. 479–486, 2020. [43] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” ArXiv, vol. abs/1707.06347, 2017. [44] J. S. Vitter, “Random sampling with a reservoir,” ACM Transactions on Mathematical Software, vol. 11, p. 37–57, Mar. 1985. [45] CMU-SAFARI Research Group, “Ramulator-pim.” [Online]. Available: https:// github.com/CMU-SAFARI/ramulator-pim. [Accessed: 2024-12-19]. [46] K. Chandrasekar, C. Weis, Y. Li, B. Akesson, N. Wehn, and K. Goossens, “Dram- power: Open-source dram power & energy estimation tool,” URL: http://www.drampower. info, vol. 22, 2012. [47] J. Bucek, K.-D. Lange, and J. v. Kistowski, “Spec cpu2017: Next-generation com- pute benchmark,” in Proceedings of the 9th ACM/SPEC International Conference on Performance Engineering, ICPE ’18, (New York, NY, USA), p. 41–42, Association for Computing Machinery, 2018. [48] A. A. Nair and L. K. John, “Simulation points for spec cpu 2006,” in Proceedings of the 27th IEEE International Conference on Computer Design, ICCD’08, pp. 397– 403, 2008. [49] C. Bienia, S. Kumar, J. P. Singh, and K. Li, “The parsec benchmark suite: char- acterization and architectural implications,” in Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, PACT ’08, (New York, NY, USA), p. 72–81, Association for Computing Machinery, 2008. [50] AMD, “Stream benchmark.” AMD Developer. [Online]. Available: https://www.amd.com/en/developer/zen-software-studio/applications/ spack/stream-benchmark.html, [Accessed: Jun. 19, 2024]. [51] T. Gonçalves, A. Beck, and A. Lorenzon, “Balancing performance and aging in cloud environments,” in Proceedings of the 14th International Conference on Cloud Computing and Services Science, pp. 216–223, INSTICC, SciTePress, 2024	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/96725	-
dc.description.abstract	部署貼合用戶任務的最佳內存控制器策略提供低延遲以及低耗能的記憶體資料傳輸。然而，傳統的方法並未全面解決限制性能的幾個問題，包括：(1) 對動態環境缺乏適應性，(2) 無法擴展至多重策略，(3) 缺乏知識保留能力。本研究的目標是設計一個在線調整內存控制器的工具，能夠實現：(1) 適應性，根據當前用戶任務動態調整策略；(2) 可擴展性，支援多重策略管理；(3) 連續性，在經歷一系列新任務後保留知識。為此，提出一套自動化工具，透過連續強化學習後的代理人來調整控制器設定，已具備應對上述問題的特性。在多樣的實驗中，本工具的性能接近帕累托最佳解，較製造商基線平均提升35%。此外，研究引入了連續學習來保留知識，當工作變化更加頻繁時，其性能相比於未採用連續學習的策略提高了多達 33%。	zh_TW
dc.description.abstract	Deploying optimal memory controller policies can significantly improve energy efficiency. However, previous methods face challenges such as limited adaptivity to dynamic environments, poor scalability for multiple policies, and lack of knowledge retention. To address these, this work proposes an online tuning tool that: (1) dynamically adapts policies to current tasks, (2) scales to manage multiple policies, and (3) retains knowledge. It employs continual reinforcement learning to tackle these issues effectively. Experiments on diverse workloads show that this tool achieves performance near the Pareto-optimal solution, with an average improvement of 35% over the manufacturer’s baseline, and improves performance by up to 33% compared to non-continual learning approaches under frequent task changes.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2025-02-21T16:16:21Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2025-02-21T16:16:21Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	Acknowledgements (ii) 摘要 (iii) Abstract (iv) Contents (v) List of Figures (vii) List of Tables (viii) Chapter 1 Introduction (1) Chapter 2 Background & Motivation (6) 2.1 Memory Controller (6) 2.1.1 Adaptivity to Dynamic Environment (6) 2.1.2 Multiple Optimization Objectives (8) 2.2 Multi-agent Reinforcement Learning (MARL) (9) 2.2.1 Independent Reinforcement Learning (9) 2.3 Continual Learning (10) 2.3.1 Retraining Frequency (11) Chapter 3 Related Works (13) Chapter 4 Problem Formulation (17) Chapter 5 Implementation and Design (20) 5.1 Reinforcement Learning (RL) Stage (21) 5.1.1 Agent Network (22) 5.1.2 Learning (23) 5.2 Continual Learning (CL) stage (25) Chapter 6 Evaluation (28) 6.1 Setup (28) 6.2 Performance Analysis (30) 6.3 Knowledge Retentio (32) 6.4 Learning Efficiency (33) 6.5 Overhead Analysis (34) Chapter 7 Conclusion (38) References (39)	-
dc.language.iso	en	-
dc.subject	強化學習	zh_TW
dc.subject	連續學習	zh_TW
dc.subject	記憶體控制器	zh_TW
dc.subject	線上調整	zh_TW
dc.subject	效能工具	zh_TW
dc.subject	Continual learning	en
dc.subject	Performance tool	en
dc.subject	Online tuning	en
dc.subject	Memory controller	en
dc.subject	Reinforcement learning	en
dc.title	基於連續強化學習線上調整動態隨機存取記憶體控制器	zh_TW
dc.title	Online Tuning of DRAM Controllers Using Continual Reinforcement Learning	en
dc.type	Thesis	-
dc.date.schoolyear	113-1	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	張原豪;劉宗德	zh_TW
dc.contributor.oralexamcommittee	Yuan-Hao Chang;Tsung-Te Liu	en
dc.subject.keyword	效能工具,線上調整,記憶體控制器,強化學習,連續學習,	zh_TW
dc.subject.keyword	Performance tool,Online tuning,Memory controller,Reinforcement learning,Continual learning,	en
dc.relation.page	47	-
dc.identifier.doi	10.6342/NTU202404649	-
dc.rights.note	未授權	-
dc.date.accepted	2024-12-25	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	資訊網路與多媒體研究所	-
dc.date.embargo-lift	N/A	-
顯示於系所單位：	資訊網路與多媒體研究所

文件中的檔案：

檔案	大小	格式
ntu-113-1.pdf 未授權公開取用	3.56 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。