以多智能體強化學習解決異質車輛路徑問題

姚智元; Chih-Yuan Yao

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98123

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	李綱	zh_TW
dc.contributor.advisor	Kang Li	en
dc.contributor.author	姚智元	zh_TW
dc.contributor.author	Chih-Yuan Yao	en
dc.date.accessioned	2025-07-29T16:07:54Z	-
dc.date.available	2025-07-30	-
dc.date.copyright	2025-07-28	-
dc.date.issued	2025	-
dc.date.submitted	2025-07-23	-
dc.identifier.citation	D. Bahdanau, K. Cho, and Y. Bengio. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014. B. M. Baker and M. Ayechew. A genetic algorithm for the vehicle routing problem. Computers Operations Research, 30(5):787–800, 2003. R. Baldacci, A. Mingozzi, and R. Roberti. New route relaxation and pricing strategies for the vehicle routing problem. Operations Research, 59:1269–1283, 09 2011. G. Barbarosoglu and D. Ozgur. A tabu search algorithm for the vehicle routing problem. Computers Operations Research, 26(3):255–270, 1999. I. Bello, H. Pham, Q. V. Le, M. Norouzi, and S. Bengio. Neural combinatorial optimization with reinforcement learning. arXiv preprint arXiv:1611.09940, 2016. G. Bono, J. S. Dibangoye, O. Simonin, L. Matignon, and F. Pereyron. Solving multi-agent routing problems using deep attention mechanisms. IEEE Transactions on Intelligent Transportation Systems, 22(12):7804–7813, 2020. G. Clarke and J. W. Wright. Scheduling of vehicles from a central depot to a number of delivery points. Operations Research, 12(4):568–581, 1964.43 G. B. Dantzig and J. H. Ramser. The truck dispatching problem. Management science, 6:80–91, 1959. V. Furnon and L. Perron. Or-tools routing library. B. E. Gillett and L. R. Miller. A heuristic algorithm for the vehicle-dispatch problem. Operations research, 22(2):340–349, 1974. K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. A. Hottung and K. Tierney. Neural large neighborhood search for the capacitated vehicle routing problem. In ECAI 2020, pages 443–450. IOS Press, 2020. Y. Hu, Y. Yao, and W. S. Lee. A reinforcement learning approach for optimizing multiple traveling salesman problems over graphs. Knowledge-Based Systems, 204:106244, 2020. E. Khalil, H. Dai, Y. Zhang, B. Dilkina, and L. Song. Learning combinatorial optimization algorithms over graphs. Advances in neural information processing systems, 30, 2017. T. N. Kipf and M. Welling. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016. W. Kool, H. van Hoof, and M. Welling. Attention, learn to solve routing problems! In International Conference on Learning Representations, 2018. Y.-D. Kwon, J. Choo, B. Kim, I. Yoon, Y. Gwon, and S. Min. Pomo: Policy optimization with multiple optima for reinforcement learning. Advances in Neural Information Processing Systems, 33:21188–21198, 2020. G. Laporte. The vehicle routing problem: An overview of exact and approximate algorithms. European Journal of Operational Research, 59:345–358, 1992. B. Li, G. Wu, Y. He, M. Fan, and W. Pedrycz. An overview and experimental study of learning-based optimization algorithms for the vehicle routing problem. IEEE/CAA Journal of Automatica Sinica, 9(7):1115–1138, 2022. J. Li, Y. Ma, R. Gao, Z. Cao, A. Lim, W. Song, and J. Zhang. Deep reinforcement learning for solving the heterogeneous capacitated vehicle routing problem. IEEE Transactions on Cybernetics, 52(12):13572–13585, 2022. I. Loshchilov and F. Hutter. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017. C. Morris, M. Ritzert, M. Fey, W. L. Hamilton, J. E. Lenssen, G. Rattan, and M. Grohe. Weisfeiler and leman go neural: Higher-order graph neural networks. In Proceedings of the AAAI conference on artificial intelligence, volume 33, pages 4602–4609, 2019. M. Nazari, A. Oroojlooy, L. Snyder, and M. Takác. Reinforcement learning for solving the vehicle routing problem. Advances in neural information processing systems, 31, 2018. M. Tang, W. Zhuang, B. Li, H. Liu, Z. Song, and G. Yin. Energy-optimal routing for electric vehicles using deep reinforcement learning with transformer. Applied Energy, 350:121711, 2023.45 P. Toth and D. Vigo. The vehicle routing problem. SIAM, 2002. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017. O. Vinyals, M. Fortunato, and N. Jaitly. Pointer networks. Advances in neural information processing systems, 28, 2015. R. J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8:229–256, 1992. A. Wren and A. Holliday. Computer scheduling of vehicles from one or more depots to a number of delivery points. Journal of the Operational Research Society, 23(3):333–344, 1972. L. Xin, W. Song, Z. Cao, and J. Zhang. Multi-decoder attention model with embedding glimpse for solving vehicle routing problems. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 12042–12049, 2021. K. Xu, C. Li, Y. Tian, T. Sonobe, K.-i. Kawarabayashi, and S. Jegelka. Representation learning on graphs with jumping knowledge networks. In International conference on machine learning, pages 5453–5462. PMLR, 2018. Y. Xu, M. Fang, L. Chen, G. Xu, Y. Du, and C. Zhang. Reinforcement learning with multiple relational attention for solving vehicle routing problems. IEEE Transactions on Cybernetics, 52(10):11107–11120, 2021. C. Zhang, Y. Wu, Y. Ma, W. Song, Z. Le, Z. Cao, and J. Zhang. A review on learning to solve combinatorial optimisation problems in manufacturing. IET Collaborative Intelligent Manufacturing, 5(1):e12072, 2023. K. Zhang, F. He, Z. Zhang, X. Lin, and M. Li. Multi-vehicle routing problems with soft time windows: A multi-agent reinforcement learning approach. Transportation Research Part C: Emerging Technologies, 121:102861, 2020. Z. Zhang, G. Qi, and W. Guan. Coordinated multi-agent hierarchical deep reinforcement learning to solve multi-trip vehicle routing problems with soft time windows. IET Intelligent Transport Systems, 17(10):2034–2051, 2023. 李明峰. 深度強化學習於多車路徑規劃: 基於狀態編碼與有序建構的可擴展框架. Master’s thesis, 國立臺灣大學, 臺北市, 2023.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98123	-
dc.description.abstract	本研究以馬可夫決策流程（MDP）建模車輛路徑規劃問題，採用多智能體強化學習（Multi-Agent Reinforcement Learning, MARL）以共享策略與集中獎勵進行模型訓練。為避免模型過度依賴初始環境狀態，於每一決策步驟中提取當前環境特徵作為輸入。在執行階段，於同一時間步中每台車輛並行決策，選擇其下一個目標節點。為處理多車競爭同一節點的問題，引入了遮罩機制以消除衝突動作，並進一步結合改良的訓練基線設計，以提升訓練與推論的效果。在數值模擬實驗以異質車輛路徑問題 (HVRP) 進行實驗，先採用以 50 節點 8台車的規模進行訓練，最後再混合的 25 節點 4 台車、75 節點 12 台車兩種規模，總共 3 種規模的的資料進行混合批次 (Mix Batch) 訓練。以犧牲性能換取計算的效率，實驗證明在有額外使用混合批次進行訓練的模型可以有更好的性能表現，其中在 25 節點 4 台車的規模中以 3.4% 的性能損失換取減少約 45.5% 的計算時間；在 50 節點 8 台車的規模中以 6.79% 的性能損失換取減少約 54.3% 的計算時間；並在 75 節點 12 台車的規模中以 12.97% 的性能損失換取減少約 61% 的計算時間，並且在規模較小的問題下有更好的性能表現。	zh_TW
dc.description.abstract	This study formulates the vehicle routing problem within a Markov Decision Process (MDP) framework and leverages Multi-Agent Reinforcement Learning (MARL) with a shared policy and centralized rewards for model training. To prevent the model from relying on the initial environment state, the model extracts features at each decision step. During inference, all vehicles perform parallel actions at each time step to select their respective next target nodes. To address potential conflicts arising from multiple vehicles competing for the same node, a masking mechanism is incorporated to suppress invalid actions. Furthermore, an enhanced training baseline is designed to improve training and inference result. Numerical experiments are conducted on the Heterogeneous Vehicle Routing Problem (HVRP). The model is initially trained on instances with 50 nodes and 8 vehicles, and subsequently refined using mix-batch training across three problem scales: 25 nodes with 4 vehicles, 50 nodes with 8 vehicles, and 75 nodes with 12 vehicles, the experiment shows that the model with mix-batch training has better performance than the model without mix-batch training. By trading off performance for computational efficiency, the approach achieves approximately a 45.5% reduction in computation time with a 3.4% performance loss on the 25-node, 4-vehicle scale; a 54.3% reduction with a 6.79% performance loss on the 50-node, 8-vehicle scale; and a 61% reduction with a 12.97% performance loss on the 75-node, 12-vehicle scale. Moreover, the method demonstrates better performance on smaller-scale problems.time.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2025-07-29T16:07:54Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2025-07-29T16:07:54Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	致謝 i 摘要 ii Abstract iii 目次 v 圖次 viii 表次 x 第一章緒論 1 1.1 研究背景 1 1.2 研究動機與目的 2 1.3 研究貢獻 3 第二章文獻回顧 4 2.1 車輛路徑問題 4 2.1.1 定義與數學模型 4 2.1.2 問題解法 6 2.1.3 基於學習的端到端方法 7 2.2 多智能體強化學習 10 2.2.1 強化學習 10 2.2.2 多智能體共享參數下的策略梯度 11 2.3 神經網路結構 13 2.3.1 注意力機制 (Attention Mechanism) 13 2.3.2 指針網路 (Pointer Network) 15 2.3.3 圖神經網路 (Graph Neural Network, GNN) 16 第三章研究架構與方法 19 3.1 車輛路徑問題建模 19 3.1.1 規劃架構 20 3.1.2 馬可夫決策流程 21 3.1.3 狀態表示與遮罩機制 23 3.2 端到端模型架構 25 3.2.1 圖特徵擷取模塊 26 3.2.2 車隊特徵擷取模塊 28 3.2.3 指針輸出模塊 29 3.3 多智能體強化學習訓練模型 32 3.3.1 RL 獎勵函數與序列解碼 32 3.3.2 RL with Baseline 33 3.3.3 訓練資料生成 35 第四章實驗結果與分析 37 4.1 實驗配置 37 4.1.1 實驗平台與訓練超參數 37 4.1.2 OR-tools 38 4.2 實驗與結果分析 39 第五章結論與未來建議 41 5.1 結論 41 5.2 未來建議 41 參考文獻 43	-
dc.language.iso	zh_TW	-
dc.subject	車輛路徑問題	zh_TW
dc.subject	多智能體系統	zh_TW
dc.subject	強化學習	zh_TW
dc.subject	Multi-Agent Systems	en
dc.subject	Vehicle Routing Problem	en
dc.subject	Reinforcement Learning	en
dc.title	以多智能體強化學習解決異質車輛路徑問題	zh_TW
dc.title	Solving Heterogeneous Vehicle Routing Problem by Using Multi-Agent Reinforcement Learning	en
dc.type	Thesis	-
dc.date.schoolyear	113-2	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	蕭得聖;林峻永	zh_TW
dc.contributor.oralexamcommittee	Te-Sheng Hsiao;Chun-Yeon Lin	en
dc.subject.keyword	車輛路徑問題,強化學習,多智能體系統,	zh_TW
dc.subject.keyword	Vehicle Routing Problem,Reinforcement Learning,Multi-Agent Systems,	en
dc.relation.page	47	-
dc.identifier.doi	10.6342/NTU202501716	-
dc.rights.note	同意授權(限校園內公開)	-
dc.date.accepted	2025-07-24	-
dc.contributor.author-college	工學院	-
dc.contributor.author-dept	機械工程學系	-
dc.date.embargo-lift	2025-07-30	-
顯示於系所單位：	機械工程學系

文件中的檔案：

檔案	大小	格式
ntu-113-2.pdf 授權僅限NTU校內IP使用（校園外請利用VPN校外連線服務）	4.69 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。