深度強化學習於多車路徑規劃: 基於狀態編碼與有序建構的可擴展框架

李明峰; Ming-Feng Li

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/89867

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	李綱	zh_TW
dc.contributor.advisor	Kang Li	en
dc.contributor.author	李明峰	zh_TW
dc.contributor.author	Ming-Feng Li	en
dc.date.accessioned	2023-09-22T16:27:48Z	-
dc.date.available	2023-11-09	-
dc.date.copyright	2023-09-22	-
dc.date.issued	2023	-
dc.date.submitted	2023-08-09	-
dc.identifier.citation	[1] mtsplib. Accessed on 6 4, 2023. [2] J. L. Ba, J. R. Kiros, and G. E. Hinton. Layer normalization. arXiv preprint arXiv:1607.06450, 2016. [3] D. Bahdanau, K. Cho, and Y. Bengio. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014. [4] R. Baldacci and A. Mingozzi. A unified exact method for solving different classes of vehicle routing problems. Mathematical Programming, 120:347–380, 2009. [5] R. Bellman. Dynamic programming treatment of the travelling salesman problem. Journal of the ACM (JACM), 9(1):61–63, 1962. [6] I. Bello, H. Pham, Q. V. Le, M. Norouzi, and S. Bengio. Neural combinatorial optimization with reinforcement learning. arXiv preprint arXiv:1611.09940, 2016. [7] N. Boysen, S. Fedtke, and S. Schwerdfeger. Last-mile delivery concepts: a survey from an operational research perspective. Or Spectrum, 43:1–58, 2021. [8] O. Cheikhrouhou and I. Khoufi. A comprehensive survey on the multiple traveling salesman problem: Applications, approaches and taxonomy. Computer Science Review, 40:100369, 2021. [9] N. Christofides and S. Eilon. An algorithm for the vehicle-dispatching problem. Journal of the Operational Research Society, 20(3):309–318, 1969. [10] B. Golden, A. Assad, L. Levy, and F. Gheysens. The fleet size and mix vehicle routing problem. Computers & Operations Research, 11(1):49–66, 1984. [11] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. [12] K. Helsgaun. An extension of the lin-kernighan-helsgaun tsp solver for constrained traveling salesman and vehicle routing problems. Roskilde: Roskilde University, 12, 2017. [13] F. D. Hildebrandt, B. Thomas, and M. W. Ulmer. Where the action is: Let’s make reinforcement learning for stochastic dynamic vehicle routing problems work! arXiv preprint arXiv:2103.00507, 2021. [14] Y. Hu, Y. Yao, and W. S. Lee. A reinforcement learning approach for optimizing multiple traveling salesman problems over graphs. Knowledge-Based Systems,204:106244, 2020. [15] B. Hudson, Q. Li, M. Malencia, and A. Prorok. Graph neural network guided local search for the traveling salesperson problem. arXiv preprint arXiv:2110.05291, 2021. [16] S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning, pages 448–456. pmlr, 2015. [17] J. James, W. Yu, and J. Gu. Online vehicle routing with neural combinatorial optimization and deep reinforcement learning. IEEE Transactions on Intelligent Transportation Systems, 20(10):3806–3817, 2019. [18] G. Jie. Model and algorithm of vehicle routing problem with time windows in stochastic traffic network. In 2010 International Conference on Logistics Systems and Intelligent Management (ICLSIM), volume 2, pages 848–851. IEEE, 2010. [19] C. K. Joshi, T. Laurent, and X. Bresson. An efficient graph convolutional network technique for the travelling salesman problem. arXiv preprint arXiv:1906.01227, 2019. [20] Y. Kaempfer and L. Wolf. Learning the multiple traveling salesmen problem with permutation invariant pooling networks. arXiv preprint arXiv:1803.09621, 2018. [21] E. Khalil, H. Dai, Y. Zhang, B. Dilkina, and L. Song. Learning combinatorial optimization algorithms over graphs. Advances in neural information processing systems, 30, 2017. [22] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014. [23] T. N. Kipf and M. Welling. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016. [24] Ç. Koç, T. Bektaş, O. Jabali, and G. Laporte. Thirty years of heterogeneous vehicle routing. European Journal of Operational Research, 249(1):1–21, 2016. [25] W. Kool, H. Van Hoof, and M. Welling. Attention, learn to solve routing problems! arXiv preprint arXiv:1803.08475, 2018. [26] Y.-D. Kwon, J. Choo, B. Kim, I. Yoon, Y. Gwon, and S. Min. Pomo: Policy optimization with multiple optima for reinforcement learning. Advances in Neural Information Processing Systems, 33:21188–21198, 2020. [27] B. Li, G. Wu, Y. He, M. Fan, and W. Pedrycz. An overview and experimental study of learning-based optimization algorithms for the vehicle routing problem. IEEE/CAA Journal of Automatica Sinica, 9(7):1115–1138, 2022. [28] J. Li, Y. Ma, R. Gao, Z. Cao, A. Lim, W. Song, and J. Zhang. Deep reinforcement learning for solving the heterogeneous capacitated vehicle routing problem. IEEE Transactions on Cybernetics, 52(12):13572–13585, 2021. [29] I. Loshchilov and F. Hutter. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017. [30] I. Loshchilov and F. Hutter. Fixing weight decay regularization in adam. 2017. [31] C. Morris, M. Ritzert, M. Fey, W. L. Hamilton, J. E. Lenssen, G. Rattan, and M. Grohe. Weisfeiler and leman go neural: Higher-order graph neural networks. In Proceedings of the AAAI conference on artificial intelligence, volume 33, pages 4602–4609, 2019. [32] M. Nazari, A. Oroojlooy, L. Snyder, and M. Takác. Reinforcement learning for solving the vehicle routing problem. Advances in neural information processing systems, 31, 2018. [33] R. Necula, M. Breaban, and M. Raschip. Tackling the bi-criteria facet of multiple traveling salesman problem with ant colony systems. In 2015 IEEE 27th international conference on tools with artificial intelligence (ICTAI), pages 873–880. IEEE, 2015. [34] W. Pan and S. Q. Liu. Deep reinforcement learning for the dynamic and uncertain vehicle routing problem. Applied Intelligence, 53(1):405–422, 2023. [35] J. Park, S. Bakhtiyar, and J. Park. Schedulenet: Learn to solve multi-agent scheduling problems with reinforcement learning. arXiv preprint arXiv:2106.03051, 2021. [36] B. Peng, J. Wang, and Z. Zhang. A deep reinforcement learning algorithm using dynamic attention model for vehicle routing problems. In Artificial Intelligence Algorithms and Applications: 11th International Symposium, ISICA 2019, Guangzhou, China, November 16–17, 2019, Revised Selected Papers 11, pages 636–650. Springer, 2020. [37] L. Perron and V. Furnon. Or-tools. [38] A. Pessoa, R. Sadykov, and E. Uchoa. Enhanced branch-cut-and-price algorithm for heterogeneous fleet vehicle routing problems. European Journal of Operational Research, 270(2):530–543, 2018. [39] A. Pessoa, R. Sadykov, E. Uchoa, and F. Vanderbeck. A generic exact solver for vehicle routing and related problems. Mathematical Programming, 183:483–523, 2020. [40] V. Pillac, M. Gendreau, C. Guéret, and A. L. Medaglia. A review of dynamic vehicle routing problems. European Journal of Operational Research, 225(1):1–11, 2013. [41] B. T. Polyak. Some methods of speeding up the convergence of iteration methods. Ussr computational mathematics and mathematical physics, 4(5):1–17, 1964. [42] W. Qin, Z. Zhuang, Z. Huang, and H. Huang. A novel reinforcement learning-based hyper-heuristic for heterogeneous vehicle routing problem. Computers & Industrial Engineering, 156:107252, 2021. [43] S. M. Raza, M. Sajid, and J. Singh. Vehicle routing problem using reinforcement learning: Recent advancements. In Advanced Machine Intelligence and Signal Processing, pages 269–280. Springer, 2022. [44] G. Reinelt. TSPLIB–a traveling salesman problem library. ORSA Journal on Computing, 3(4):376–384, 1991. [45] H. Robbins and S. Monro. A stochastic approximation method. The annals of mathematical statistics, pages 400–407, 1951. [46] B. Sarasola, K. F. Doerner, V. Schmid, and E. Alba. Variable neighborhood search for the stochastic and dynamic vehicle routing problem. Annals of Operations Research, 236:425–461, 2016. [47] M. Savelsbergh and T. Van Woensel. 50th anniversary invited article—city logistics: Challenges and opportunities. Transportation Science, 50(2):579–590, 2016. [48] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, et al. Mastering the game of go with deep neural networks and tree search. nature, 529(7587):484–489, 2016. [49] É. D. Taillard. A heuristic column generation method for the heterogeneous fleet vrp. RAIRO-Operations Research, 33(1):1–14, 1999. [50] E. Taniguchi, R. G. Thompson, and T. Yamada. New opportunities and challenges for city logistics. Transportation research procedia, 12:5–13, 2016. [51] D. Taş, N. Dellaert, T. Van Woensel, and T. De Kok. Vehicle routing problem with stochastic travel times including soft time windows and service costs. Computers & Operations Research, 40(1):214–224, 2013. [52] M. W. Ulmer, J. C. Goodson, D. C. Mattfeld, and B. W. Thomas. On modeling stochastic dynamic vehicle routing problems. EURO Journal on Transportation and Logistics, 9(2):100008, 2020. [53] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser,and I. Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017. [54] O. Vinyals, M. Fortunato, and N. Jaitly. Pointer networks. Advances in neural information processing systems, 28, 2015. [55] Y. Wu and K. He. Group normalization. In Proceedings of the European conference on computer vision (ECCV), pages 3–19, 2018. [56] K. Xu, W. Hu, J. Leskovec, and S. Jegelka. How powerful are graph neural networks? arXiv preprint arXiv:1810.00826, 2018. [57] K. Xu, C. Li, Y. Tian, T. Sonobe, K.-i. Kawarabayashi, and S. Jegelka. Representation learning on graphs with jumping knowledge networks. In International conference on machine learning, pages 5453–5462. PMLR, 2018.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/89867	-
dc.description.abstract	本研究提出了一套基於深度強化學習的多車路徑規劃模型，使用馬爾可夫決策流程對路徑規劃任務進行建模。透過將決策過程每一步的狀態觀測視為彼此獨立，並為每個狀態進行特徵抽取的方式，來處理實際路徑應用中馬爾可夫狀態轉移可能帶有不確定性的問題。這使得模型能夠及時的依實際應用中的動態變化來修正規劃結果。另一方面本研究創新地使用了串接式的方案有序地完成多部車輛的路徑建構，搭配遮罩機制使得本架構易於擴展至各類不同的路徑應用，並且提出相應的改進Reinforce with DSE Baseline演算法進行模型的訓練與推論。在數值模擬實驗中，本研究以多旅行商(MTSP)、異構車隊路徑(HVRP)以及動態需求與隨機交通時間路徑(DSVRP)三個不同的路徑應用進行實驗，使用了訓練在小規模約50客戶(節點)的訓練資料上後，直接泛化應用至各個測試資料集。在MTSP的公開資料集mTSPlib中以0.48秒的平均計算時間達到5.5\%的性能間隙，對比Google的路徑最佳化算法OR-Tools在平均4.84秒達到的4.579\% ;在HVRP的Golden-Taillard資料集上以12.57\% / 0.642秒與9.87\% / 2.334秒對比OR-Tools中最佳的9.56\% / 10.005秒在效率上取得明顯的優勢 ; 最後在DSVRP的模擬環境測試中表現出降低路徑成本優先的決策傾向，與對照算法相比在需求滿足率損失9.3\%的情況下節省了14.7\%的運輸成本。	zh_TW
dc.description.abstract	This study proposes a deep reinforcement learning-based multi-vehicle routing model , the routing task is modeled using Markov Decision Process (MDP). By treating the state observation at each step of the decision process as independent and performing feature extraction for each state to address the uncertainty in state transitions that may occur in practical routing applications ,this allows the model to adjust the routing results promptly according to dynamic change in the real-world application. On the other hand, this study innovatively uses a concatenation scheme to sequentially construct routes for multiple vehicles, along with a masking mechanism, making the framework easily adaptable to various routing applications. Then , an improved reinforce with baseline algorithm is introduced for model training and inference. In numerical simulation experiments, the proposed model is tested on three different routing problem: Multiple Traveling Salesmen Problem (MTSP), Heterogeneous Vehicle Routing Problem (HVRP), and the Dynamic and Stochastic Vehicle Routing Problem (DSVRP). The model is trained on a small-scale dataset of approximately 50 customers (nodes) and then directly applied to various test datasets. The proposed model achieves a performance gap of 5.5\% in the MTSP case using the mTSPlib public dataset, with an average computation time of 0.48 seconds. When compared to Google's OR-Tools route optimization algorithm, which achieves a performance gap of 4.579\% with an average computation time of 4.84 seconds. In the HVRP case, using the Golden-Taillard dataset, the proposed model outperforms OR-Tools in terms of efficiency, with a performance gap of 12.57\% / 0.642 seconds and 9.87\% / 2.334 seconds compared to the best result of OR-Tools at 9.56\% / 10.005 seconds. Lastly, in the DSVRP simulation environment, the proposed model demonstrates a preference for cost-reducing route decisions. Compared to the benchmark algorithm, it achieves a 14.7\% reduction in transportation costs while incurring a 9.3\% loss in demand fulfill rate.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-09-22T16:27:48Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2023-09-22T16:27:48Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	口試委員審定書i 致謝iii 摘要v Abstract vii 目錄ix 圖目錄xiii 表目錄xv 第一章緒論1 1.1 研究背景. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 研究動機與目的. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 研究貢獻. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 第二章文獻回顧5 2.1 緒論-基於神經網路處理組合優化問題. . . . . . . . . . . . . . . . . 5 2.2 機器學習處理車輛路徑問題. . . . . . . . . . . . . . . . . . . . . . 7 2.2.1 單一被決策體的旅行商與車輛路徑問題. . . . . . . . . . . . . . 7 2.2.2 多車輛路徑問題. . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2.3 異構車隊與動態不確定性的變體. . . . . . . . . . . . . . . . . . 11 2.3 強化學習(Reinforcement learning) 與策略梯度(Policy Gradient) . . . 13 2.3.1 強化學習. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.3.2 策略梯度(Policy Gradient) . . . . . . . . . . . . . . . . . . . . . . 16 2.4 神經網絡結構. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.4.1 注意力機制(Attention Mechanism) . . . . . . . . . . . . . . . . . 18 2.4.2 指針網路(Pointer Network) . . . . . . . . . . . . . . . . . . . . . 19 2.4.3 圖神經網路(Graph Neural Network) . . . . . . . . . . . . . . . . 20 第三章研究架構與方法23 3.1 路徑規劃模型與馬爾可夫決策流程. . . . . . . . . . . . . . . . . . 23 3.1.1 多車路徑問題的數學模型. . . . . . . . . . . . . . . . . . . . . . 23 3.1.2 路徑規劃的馬爾可夫模型. . . . . . . . . . . . . . . . . . . . . . 29 3.2 獨立決策狀態的串接式多車路徑規劃模型. . . . . . . . . . . . . . 32 3.2.1 基於獨立決策狀態的路徑規劃架構. . . . . . . . . . . . . . . . 32 3.2.2 基於串接的多車路徑建構方案. . . . . . . . . . . . . . . . . . . 35 3.2.3 路徑規劃模型的完整狀態表示. . . . . . . . . . . . . . . . . . . 37 3.2.4 遮罩機制. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.3 策略網絡架構. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.3.1 神經網絡整體架構介紹. . . . . . . . . . . . . . . . . . . . . . . 42 3.3.2 路網圖嵌入抽取模組. . . . . . . . . . . . . . . . . . . . . . . . 45 3.3.3 車隊嵌入抽取與多頭注意力機制. . . . . . . . . . . . . . . . . . 48 3.3.4 特徵聚合模組與指針網絡輸出層. . . . . . . . . . . . . . . . . . 50 3.4 強化學習訓練決策網路. . . . . . . . . . . . . . . . . . . . . . . . . 53 3.4.1 訓練資料生成與訓練架構. . . . . . . . . . . . . . . . . . . . . . 53 3.4.2 基於等效狀態的改進Reinforce with Baseline 演算法. . . . . . . 58 第四章實驗結果與分析65 4.1 實驗配置介紹. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.1.1 參數相關設定. . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.1.2 OR-Tools 路徑最佳化算法庫介紹. . . . . . . . . . . . . . . . . . 68 4.1.3 推論階段搜索. . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.1.4 距離矩陣的變換. . . . . . . . . . . . . . . . . . . . . . . . . . . 71 4.2 實驗結果與分析. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4.2.1 實驗一. 多旅行商問題(Multi Traveling Salesman Problem) . . . . 73 4.2.2 實驗二. 異構車隊路徑問題(Heterogeneous Vehicle Routing Problem). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 4.2.3 實驗三. 動態需求& 隨機交通時間路徑問題(Dynamic Request & Stochastic traffic time Routing Problem) . . . . . . . . . . . . . 88 第五章結論與未來建議101 5.1 結論. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 5.2 未來建議. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 參考文獻103	-
dc.language.iso	zh_TW	-
dc.subject	馬爾可夫決策流程	zh_TW
dc.subject	強化學習	zh_TW
dc.subject	車輛路徑問題	zh_TW
dc.subject	智慧運輸系統	zh_TW
dc.subject	策略網絡	zh_TW
dc.subject	Intelligent Transportation System	en
dc.subject	Reinforcement Learning	en
dc.subject	Policy Network	en
dc.subject	Markov Decision Process	en
dc.subject	Vehicle Routing Problem	en
dc.title	深度強化學習於多車路徑規劃: 基於狀態編碼與有序建構的可擴展框架	zh_TW
dc.title	Deep Reinforcement Learning for Multi-Vehicle Routing : An Adaptable Framework with State Encoding and Sequential Construction	en
dc.type	Thesis	-
dc.date.schoolyear	111-2	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	蘇偉儁;詹魁元;詹景堯	zh_TW
dc.contributor.oralexamcommittee	Wei-Jiun Su;Kuei-Yuan Chan;Ching-Yao Chan	en
dc.subject.keyword	強化學習,車輛路徑問題,智慧運輸系統,馬爾可夫決策流程,策略網絡,	zh_TW
dc.subject.keyword	Reinforcement Learning,Vehicle Routing Problem,Intelligent Transportation System,Markov Decision Process,Policy Network,	en
dc.relation.page	108	-
dc.identifier.doi	10.6342/NTU202303050	-
dc.rights.note	同意授權(限校園內公開)	-
dc.date.accepted	2023-08-10	-
dc.contributor.author-college	工學院	-
dc.contributor.author-dept	機械工程學系	-
dc.date.embargo-lift	2028-08-07	-
顯示於系所單位：	機械工程學系

文件中的檔案：

檔案	大小	格式
ntu-111-2.pdf 未授權公開取用	15.5 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。