深度強化學習於多車路徑規劃: 基於狀態編碼與有序建構的可擴展框架

李明峰; Ming-Feng Li

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/89867

標題:	深度強化學習於多車路徑規劃: 基於狀態編碼與有序建構的可擴展框架 Deep Reinforcement Learning for Multi-Vehicle Routing : An Adaptable Framework with State Encoding and Sequential Construction
作者:	李明峰 Ming-Feng Li
指導教授:	李綱 Kang Li
關鍵字:	強化學習,車輛路徑問題,智慧運輸系統,馬爾可夫決策流程,策略網絡, Reinforcement Learning,Vehicle Routing Problem,Intelligent Transportation System,Markov Decision Process,Policy Network,
出版年 :	2023
學位:	碩士
摘要:	本研究提出了一套基於深度強化學習的多車路徑規劃模型，使用馬爾可夫決策流程對路徑規劃任務進行建模。透過將決策過程每一步的狀態觀測視為彼此獨立，並為每個狀態進行特徵抽取的方式，來處理實際路徑應用中馬爾可夫狀態轉移可能帶有不確定性的問題。這使得模型能夠及時的依實際應用中的動態變化來修正規劃結果。另一方面本研究創新地使用了串接式的方案有序地完成多部車輛的路徑建構，搭配遮罩機制使得本架構易於擴展至各類不同的路徑應用，並且提出相應的改進Reinforce with DSE Baseline演算法進行模型的訓練與推論。在數值模擬實驗中，本研究以多旅行商(MTSP)、異構車隊路徑(HVRP)以及動態需求與隨機交通時間路徑(DSVRP)三個不同的路徑應用進行實驗，使用了訓練在小規模約50客戶(節點)的訓練資料上後，直接泛化應用至各個測試資料集。在MTSP的公開資料集mTSPlib中以0.48秒的平均計算時間達到5.5\%的性能間隙，對比Google的路徑最佳化算法OR-Tools在平均4.84秒達到的4.579\% ;在HVRP的Golden-Taillard資料集上以12.57\% / 0.642秒與9.87\% / 2.334秒對比OR-Tools中最佳的9.56\% / 10.005秒在效率上取得明顯的優勢 ; 最後在DSVRP的模擬環境測試中表現出降低路徑成本優先的決策傾向，與對照算法相比在需求滿足率損失9.3\%的情況下節省了14.7\%的運輸成本。 This study proposes a deep reinforcement learning-based multi-vehicle routing model , the routing task is modeled using Markov Decision Process (MDP). By treating the state observation at each step of the decision process as independent and performing feature extraction for each state to address the uncertainty in state transitions that may occur in practical routing applications ,this allows the model to adjust the routing results promptly according to dynamic change in the real-world application. On the other hand, this study innovatively uses a concatenation scheme to sequentially construct routes for multiple vehicles, along with a masking mechanism, making the framework easily adaptable to various routing applications. Then , an improved reinforce with baseline algorithm is introduced for model training and inference. In numerical simulation experiments, the proposed model is tested on three different routing problem: Multiple Traveling Salesmen Problem (MTSP), Heterogeneous Vehicle Routing Problem (HVRP), and the Dynamic and Stochastic Vehicle Routing Problem (DSVRP). The model is trained on a small-scale dataset of approximately 50 customers (nodes) and then directly applied to various test datasets. The proposed model achieves a performance gap of 5.5\% in the MTSP case using the mTSPlib public dataset, with an average computation time of 0.48 seconds. When compared to Google's OR-Tools route optimization algorithm, which achieves a performance gap of 4.579\% with an average computation time of 4.84 seconds. In the HVRP case, using the Golden-Taillard dataset, the proposed model outperforms OR-Tools in terms of efficiency, with a performance gap of 12.57\% / 0.642 seconds and 9.87\% / 2.334 seconds compared to the best result of OR-Tools at 9.56\% / 10.005 seconds. Lastly, in the DSVRP simulation environment, the proposed model demonstrates a preference for cost-reducing route decisions. Compared to the benchmark algorithm, it achieves a 14.7\% reduction in transportation costs while incurring a 9.3\% loss in demand fulfill rate.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/89867
DOI:	10.6342/NTU202303050
全文授權:	同意授權(限校園內公開)
電子全文公開日期:	2028-08-07
顯示於系所單位：	機械工程學系

文件中的檔案：

檔案	大小	格式
ntu-111-2.pdf 未授權公開取用	15.5 MB	Adobe PDF	檢視/開啟

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。