深度強化學習在化工程序控制上的應用：以攪拌槽加熱程序及乙烯乙酸乙烯酯聚合系統為例

吳槃昕; Pan-Hsin Wu

Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/91957

Title:	深度強化學習在化工程序控制上的應用：以攪拌槽加熱程序及乙烯乙酸乙烯酯聚合系統為例 Application of Deep Reinforcement Learning to Chemical Process Control: Stirred-Tank Heating Process and Ethylene-Vinyl Acetate Polymerization System
Authors:	吳槃昕 Pan-Hsin Wu
Advisor:	吳哲夫 Jeffrey D. Ward
Keyword:	程序控制,強化學習,行為評論演算法,高分子製程, Process control,Reinforcement learning,Actor-critic algorithm,Polymerization process,
Publication Year :	2023
Degree:	碩士
Abstract:	強化學習是機器學習的一套演算法架構，機器在與環境的互動中收集經驗並自我優化，最終習得控制策略。在本研究中，我們利用深度強化學習 (DRL) 中的深度確定策略梯度 (DDPG) 訓練出DRL控制器以控制單輸入單輸出 (SISO) 系統。本論文首先以一階轉移函數建立的攪拌槽加熱製程模型為例，探討 DRL 各項基本要素的設定及其對控制結果的影響，包括狀態 (state) 和獎勵函數 (reward function) 等。此外，我們也展示了DRL控制器能夠在線上更新，在製程發生改變時快速適應新的環境。第二部分則將第一部分探討的結果應用於 DRL 控制器的訓練，以控制連續乙烯-醋酸乙烯酯 (EVA) 聚合製程。在 EVA 聚合製程中，品別轉換是一個極具挑戰性的控制問題，因為EVA製程具有高度非線性的特徵，且作為關鍵品質控制變數的熔融指數 (MI) 範圍極廣，小至個位，大則近千，大幅增加控制難度。本論文中使用的 EVA 聚合製程模型是以門控循環單元 (GRU) 為架構的數據模型，由先前的研究中根據真實工廠數據所開發。本研究中呈現了攪拌槽加熱製程和EVA聚合系統的控制結果，並對DRL控制器和PI控制器的性能進行比較。結果顯示，多數情形下 DRL 控制器表現優於 PI 控制器，顯示出其在此類富有挑戰性的非線性控制問題上具有相當潛力。 Reinforcement learning, in which machines eventually acquire control behaviors by self-exploration of the environment, is a subfield of machine learning. In this study, we present a controller that utilizes the deep deterministic policy gradient (DDPG), one of the methods in the field of deep reinforcement learning (DRL), for the control of single input single output (SISO) systems. This study first discusses several DRL settings and their effects on control behavior, including state definition and reward function, with a stirred-tank heating process modeled by first-order transfer function as example. Online improvement of DRL controller is also demonstrated, indicating its strong adaptability to process changes. The settings determined by the previous step are then applied to the training of the DRL controller for a nonlinear continuous ethylene-vinyl acetate (EVA) polymerization process. EVA polymerization process grade transition is a challenging control problem due to its high non-linearity and wide range of product melt index (MI), which is the critical quality control variable of the process. The EVA polymerization process model used in this work is a gated recurrent unit (GRU) data driven model developed in a previous study based on data from a real industrial plant. The control results of the stirred-tank heating process and the EVA polymerization system are shown, and the performance of the DRL controller and the PI controller are compared. The results show that the DRL controller generally outperforms the PI controller, suggesting that it is a strong candidate for such challenging nonlinear control problems.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/91957
DOI:	10.6342/NTU202300989
Fulltext Rights:	未授權
Appears in Collections:	化學工程學系

Files in This Item:

File	Size	Format
ntu-112-2.pdf Restricted Access	9.68 MB	Adobe PDF

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets