以多智能體強化學習解決異質車輛路徑問題

姚智元; Chih-Yuan Yao

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98123

標題:	以多智能體強化學習解決異質車輛路徑問題 Solving Heterogeneous Vehicle Routing Problem by Using Multi-Agent Reinforcement Learning
作者:	姚智元 Chih-Yuan Yao
指導教授:	李綱 Kang Li
關鍵字:	車輛路徑問題,強化學習,多智能體系統, Vehicle Routing Problem,Reinforcement Learning,Multi-Agent Systems,
出版年 :	2025
學位:	碩士
摘要:	本研究以馬可夫決策流程（MDP）建模車輛路徑規劃問題，採用多智能體強化學習（Multi-Agent Reinforcement Learning, MARL）以共享策略與集中獎勵進行模型訓練。為避免模型過度依賴初始環境狀態，於每一決策步驟中提取當前環境特徵作為輸入。在執行階段，於同一時間步中每台車輛並行決策，選擇其下一個目標節點。為處理多車競爭同一節點的問題，引入了遮罩機制以消除衝突動作，並進一步結合改良的訓練基線設計，以提升訓練與推論的效果。在數值模擬實驗以異質車輛路徑問題 (HVRP) 進行實驗，先採用以 50 節點 8台車的規模進行訓練，最後再混合的 25 節點 4 台車、75 節點 12 台車兩種規模，總共 3 種規模的的資料進行混合批次 (Mix Batch) 訓練。以犧牲性能換取計算的效率，實驗證明在有額外使用混合批次進行訓練的模型可以有更好的性能表現，其中在 25 節點 4 台車的規模中以 3.4% 的性能損失換取減少約 45.5% 的計算時間；在 50 節點 8 台車的規模中以 6.79% 的性能損失換取減少約 54.3% 的計算時間；並在 75 節點 12 台車的規模中以 12.97% 的性能損失換取減少約 61% 的計算時間，並且在規模較小的問題下有更好的性能表現。 This study formulates the vehicle routing problem within a Markov Decision Process (MDP) framework and leverages Multi-Agent Reinforcement Learning (MARL) with a shared policy and centralized rewards for model training. To prevent the model from relying on the initial environment state, the model extracts features at each decision step. During inference, all vehicles perform parallel actions at each time step to select their respective next target nodes. To address potential conflicts arising from multiple vehicles competing for the same node, a masking mechanism is incorporated to suppress invalid actions. Furthermore, an enhanced training baseline is designed to improve training and inference result. Numerical experiments are conducted on the Heterogeneous Vehicle Routing Problem (HVRP). The model is initially trained on instances with 50 nodes and 8 vehicles, and subsequently refined using mix-batch training across three problem scales: 25 nodes with 4 vehicles, 50 nodes with 8 vehicles, and 75 nodes with 12 vehicles, the experiment shows that the model with mix-batch training has better performance than the model without mix-batch training. By trading off performance for computational efficiency, the approach achieves approximately a 45.5% reduction in computation time with a 3.4% performance loss on the 25-node, 4-vehicle scale; a 54.3% reduction with a 6.79% performance loss on the 50-node, 8-vehicle scale; and a 61% reduction with a 12.97% performance loss on the 75-node, 12-vehicle scale. Moreover, the method demonstrates better performance on smaller-scale problems.time.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98123
DOI:	10.6342/NTU202501716
全文授權:	同意授權(限校園內公開)
電子全文公開日期:	2025-07-30
顯示於系所單位：	機械工程學系

文件中的檔案：

檔案	大小	格式
ntu-113-2.pdf 授權僅限NTU校內IP使用（校園外請利用VPN校外連線服務）	4.69 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。