請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/85400| 標題: | 應用多代理深度強化學習於高速公路主線及匝道聯合儀控策略—以國道5號為例 Coordinating Freeway Mainline Metering and Ramp Metering Strategies by Multi-Agent Deep Reinforcement Learning—A Case Study of Freeway No.5 |
| 作者: | Yu-Chun Chen 陳又均 |
| 指導教授: | 許添本(Tien-Pen Hsu) |
| 關鍵字: | 多代理深度強化學習,主線儀控,匝道儀控, Multi-agent deep reinforcement learning,Mainline metering,Ramp metering, |
| 出版年 : | 2022 |
| 學位: | 碩士 |
| 摘要: | 國道5號高速公路為聯絡台北及宜蘭的重要道路,然自通車以來,周末及連續假日車潮造成的壅塞情形便持續發生,使得高速公路效率降低也帶來龐大的社會成本,為此高公局也提出許多管理措施,包括主線儀控、匝道儀控、開放路肩大客車專用道、高乘載管制等等。高公局目前依據車流回堵長度及動態查表法決定主線儀控及匝道儀控之儀控率,於車流回堵至主線儀控處時才啟動儀控設施,本研究認為此控制策略並沒有達到及早控制的效果。 近年來強化學習領域蓬勃發展,在訓練技巧及應用領域上都有許多研究被發表,故本研究採用多代理深度強化學習方法結合雙層Q學習、優先經驗回放、競爭網路結構、多步強化學習、分布式強化學習等技巧建構主線及匝道聯合儀控策略,期望能透過多代理間合作控制儀控下游匯流區及雪山隧道之車流狀況,減緩或避免雪山隧道內之壅塞情形。本研究使用VISSIM微觀車流模擬軟體做為模擬平台,建構國道5號部分路網並訓練模型,主線儀控及匝道儀控代理會觀察路網中車輛偵測器位置的流量及速率資料及另一代理的指紋,並即時選出最佳動作執行,獎勵部分本研究選擇以最大化雪山隧道內瓶頸點之流量作為目標。聯合儀控模型在經過450回合訓練後得以收斂,並以訓練後模型與現況情境進行績效評比。 與現況情境相比,頭城匝道出口至雪隧末旅行時間約減少5%,旅行速率則提升約4.3%;路網系統平均速率小客車提升約2.7%,大客車提升約2.37%;雪山隧道內車流情形以及隧道內瓶頸點速率也有所改善。本研究也透過儀控率分析觀察本研究主線代理傾向選擇較大之儀控率,匝道儀控代理則因較頻繁選擇低儀控率使停等車隊長度較現況情境高,可能造成地方道路較嚴重的回堵,另外也嘗試探討模型選擇儀控率之原因,發現模型會依據隧道內瓶頸點速率調整下一時階之儀控率,顯示本研究多代理深度強化學習模型確實在訓練後學習到獎勵與動作間之關聯。最後根據以上分析做出結論並對後續研究提出建議。 Freeway No.5 is an important road connecting Taipei and Yilan. However, traffic congestion continued to occur on weekends and national holidays, reducing the efficiency of the highway and bringing substantial social costs. The Freeway Bureau has proposed some traffic demand management strategies to alleviate this congestion problem, including mainline metering, ramp metering, motorized shoulder lane for buses, and high-occupancy regulation policy. The Freeway Bureau currently determines the metering rate of the mainline metering and the ramp metering based on the length of the traffic jam and the dynamic lookup method and activates the metering facilities only when the traffic jam extends to the mainline metering spot. This study believes that this control strategy does not achieve the goal of early control. In recent years, the field of reinforcement learning has been booming, and many studies of training techniques and applications have been published. This study adopts a multi-agent deep reinforcement learning method combined with double-Q learning, priority experience replay, dueling network, multi-step reinforcement learning, and distributional reinforcement learning technique to construct the mainline and ramp joint metering strategy, hoping to avoid congestion in Xueshan Tunnel. In this study, VISSIM, the micro traffic flow simulation software was used to construct the simulation environment and train the model. Mainline and ramp agents will observe the traffic and velocity data of the vehicle detector locations and the fingerprint of another agent, and instantly select the best action to execute. Choosing the flow of the bottleneck point in the Xueshan Tunnel as the reward. The joint metering model converged after 450 episodes of training, and the performance was compared between the trained model and the current situation. Compared with the current situation, the travel time from the exit of the Toucheng ramp to the end of the Xueshan Tunnel is reduced by about 5%, and the travel speed is increased by about 4.3%; the system average speed is increased by about 2.7% for passenger cars, and increased by about 2.37% for buses; the traffic condition inside Xueshan Tunnel and speed at bottleneck points also improved. This study also observes that the mainline agent tends to choose a higher metering rate, while the ramp agent chooses a lower metering rate more frequently, so the length of the ramp waiting fleet is longer than the current situation. In addition, we also tried to explore the reason why the model select the metering rate. It is found that the model will adjust the metering rate of the next time step according to the speed of the bottleneck point in the tunnel. This shows that the multi-agent deep reinforcement learning model this study proposed does learn the connection between reward and action after training. Finally, based on the analysis above, conclusions are drawn and suggestions for follow-up research are put forward. |
| URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/85400 |
| DOI: | 10.6342/NTU202201605 |
| 全文授權: | 同意授權(全球公開) |
| 電子全文公開日期: | 2022-07-26 |
| 顯示於系所單位: | 土木工程學系 |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| U0001-2107202214213300.pdf | 4.03 MB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
