請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/90121
標題: | 應用深度Q網路之動態派工與預防保養方法 Deep Q Network for Dynamic Dispatching and Preventive Maintenance |
作者: | 張宇翔 Yu-Hsiang Chang |
指導教授: | 吳政鴻 Cheng-Hung Wu |
關鍵字: | 強化學習,深度Q網路,動態派工,預防保養,降低複雜度, Reinforcement learning,Deep Q network,Dynamic dispatching,Preventive maintenance,Reducing complexity, |
出版年 : | 2023 |
學位: | 碩士 |
摘要: | 本研究針對大型生產系統動態派工與保養問題進行初步探討,針對多產品單機台問題開發強化學習模型,該模型透過使智能體(Agent)與環境(Environment)不斷互動所得到的獎勵(Reward)來更新決策,透過正規化將時間軸做離散處理,以事件導向的方式直接在深度Q網路中(Deep Q Network, DQN)中對各狀態進行模擬,預測出近似最佳的動態派工決策,該方法能大大避免動態規劃常見的維度災難(Curse of dimensionality)問題,降低求解時間,彌補了動態規劃方法難以被實際應用於大型生產系統的缺點。
本研究的問題鎖定在考慮不確定性下的多產品單機台生產系統。目標為最小化等候成本,其中考慮的不確定性包含了機台健康狀態、需求率、加工率及保養/維修時間,透過比較DQN方法與C-u規則及DDPM方法的差異來呈現本研究的結果。實驗結果顯示在完工時間及總產出的表現上DQN方法皆較C-u規則優秀。雖不如DDPM方法但差異也不大,且在求解時間上有著極大的減少。 This research proposes a preliminary discussion on the dynamic dispatch and maintenance of large-scale production systems, and develops a reinforcement learning model for the multi-product single-machine problem. To update the decision, the time axis is discretized through normalization, and the state is directly simulated in the deep Q network (Deep Q Network, DQN) in an event-oriented way to predict the approximate optimal dynamic dispatching decision , this method can avoid the Curse of dimensionality problem when using dynamic programming by reducing the solving time, conquer the dilemma of dynamic programming methods that are difficult to be practically applied to large-scale production systems. The problem of this research is focus on the multi-product single-machine production system under the consideration of uncertainty. The goal is to minimize the holding cost. The uncertainties considered include machine health status, product arrival rate, processing rate and maintenance/repair time. This research is presented by comparing the differences between the result of DQN method、C-u rule and DDPM method. The experimental results show that the DQN method is better than the C-u rule in terms of completion time and total output. Although it is not as good as the DDPM method, the difference is not large, and the solution time is greatly reduced. |
URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/90121 |
DOI: | 10.6342/NTU202303306 |
全文授權: | 同意授權(限校園內公開) |
電子全文公開日期: | 2028-08-02 |
顯示於系所單位: | 工業工程學研究所 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-111-2.pdf 目前未授權公開取用 | 3.01 MB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。