朝向可泛化和可解釋的強化學習

劉冠廷; Guan-Ting Liu

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/92260

標題:	朝向可泛化和可解釋的強化學習 Towards Generalizable and Interpretable Reinforcement Learning
作者:	劉冠廷 Guan-Ting Liu
指導教授:	鄭卜壬 Pu-Jen Cheng
關鍵字:	機器學習,強化學習,泛化性,可解釋性,程式化強化學習, Machine Learning,Reinforcement Learning,Generalization,Interpretation,Programmatic Reinforcement Learning,
出版年 :	2024
學位:	博士
摘要:	深度強化學習是當前一項重要的機器學習領域，在機器人、自駕車、金融交易、棋類遊戲以及生成式人工智慧等領域都有非常多成功的應用。深度強化學習通過與環境的互動來學習達成特定目標，所以要如何有效地學習並在不同的輸入狀態提取能夠泛化的表徵一直都是深度強化學習的一大考驗。除了有效地學習外，強化學習的相關研究也十分關注如何創建能夠泛化並被人類理解的策略。因為在自駕車、金融交易以及醫療領域都十分注重代理人系統的可解釋性以即可驗證性，而如何驗證跟解釋透過強化學習所學習到的策略是非常大的挑戰。為了更進一步探討以上議題，本論文旨在利用不同輸入狀態的相似度以及領域特定語言來分別提升強化學習的泛化性以及可解釋性。在提升泛化性這個面向上，我們希望可以利用比較不同狀態特徵狀並計算其相似性，利用相關結果在多個環境中測試並評估提升的幅度及潛力。在提升可解釋性方面，我們可以利用組合程式，以生成描述超越現有數據集的精細行為的政策。這種創新方法在特定領域中展現出優於傳統方法的效果，有望促進更具適應性的強化學習政策的形成。另一個非常具有潛力的方向是將程序化強化學習和有限狀態機相結合，以更有效地展現複雜行為並解決長期任務。本論文所提出的方法，在許多不同的測試環境中取得非常好的效果，也可以與其他正則化或是提升可解釋性的方法互相搭配，進一步提升整體的可泛用性以及可解釋性。 Deep Reinforcement Learning (DRL) stands as a critical field within contemporary machine learning, finding widespread applications in robotics, autonomous vehicles, financial trading, strategic games, and generative artificial intelligence. DRL operates through interactive engagement with environments to achieve specific objectives, thereby posing persistent challenges in effectively learning and extracting generalizable representations across diverse input states. Besides learning the policy efficiently and effectively, substantial focus in reinforcement learning research dwells on creating strategies that not only generalize but also remain comprehensible to human understanding. The interpretability of agent systems holds paramount importance in domains such as autonomous vehicles, financial trading, and healthcare, necessitating verifiability alongside comprehensibility. Validating and elucidating strategies learned through reinforcement learning poses a significant challenge. This thesis endeavors to delve further into these issues by leveraging the similarity of distinct input states and domain-specific languages to enhance the generalizability and interpretability of reinforcement learning. In the pursuit of augmenting generalizability, this study aims to compare various state features and compute their similarities. The resulting correlations will be assessed across multiple environments to evaluate the extent and potential of improvements. Concerning interpretability enhancement, employing compositional programs to generate policies describing nuanced behaviors beyond existing datasets presents an innovative approach. This method exhibits superior efficacy within specific domains, potentially fostering the development of more adaptive reinforcement learning policies. Another promising avenue involves amalgamating procedural reinforcement learning with finite-state machines to better manifest complex behaviors and address enduring tasks efficiently. The proposed methodology showcased commendable performance across diverse testing environments and could be synergistically combined with other regularization or interpretability-enhancing techniques, further augmenting overall applicability and comprehensibility.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/92260
DOI:	10.6342/NTU202400365
全文授權:	同意授權(限校園內公開)
電子全文公開日期:	2029-01-30
顯示於系所單位：	資訊網路與多媒體研究所

文件中的檔案：

檔案	大小	格式
ntu-112-1.pdf 未授權公開取用	17.57 MB	Adobe PDF	檢視/開啟

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。