請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98342| 標題: | 強化學習之反學習 Towards Unlearning in Reinforcement Learning |
| 作者: | 楊凱恩 Kai-En Yang |
| 指導教授: | 孫紹華 Shao-Hua Sun |
| 關鍵字: | 強化學習,機器反學習,機器學習,深度學習,環境狀態反學習, reinforcement learning,machine unlearning,machine learning,deep learning,state unlearning, |
| 出版年 : | 2025 |
| 學位: | 碩士 |
| 摘要: | 隨著對用戶數據的日益依賴,機器反學習這一新興領域受到越來越多的關注。該領域旨在在不進行完整重新訓練的情況下,選擇性移除特定數據對機器學習模型的影響。儘管機器反學習在分類與生成模型中已有廣泛研究,但其在強化學習中的應用仍鮮有探討。強化學習因其序列決策的特性而帶來獨特挑戰。本文針對這些挑戰,將強化學習中的反學習定義為移除特定狀態下轉移資訊的影響,使得與這些狀態相關的環境對模型而言變得未被探索。我們提出了一個正式的的數學框架實現精確反學習,並改進了重新訓練策略,同時設計出一個高效的反學習演算法,該演算法在基於價值與基於策略函數的方法中皆融入了高斯噪音。實驗結果涵蓋離散與連續狀態空間,顯示出該方法具備有效的遺忘能力。所提出的演算法在顯著降低訓練時間的同時,始終可達到與黃金標準,重新訓練,相當的表現。此外,在應用於初始偏誤場景中,該方法亦顯示出優於現有基線的效果,驗證了其更廣泛的實用性。 The growing reliance on user data has brought attention to the emerging field of machine unlearning, which focuses on selectively removing the influence of specific data (or groups of data) from machine learning models without requiring full re-training. While machine unlearning has been extensively studied in classification and generative models, its application to reinforcement learning remains largely unexplored. Reinforcement learning poses unique challenges due to its sequential decision-making nature. In this paper, we address these challenges by defining unlearning in reinforcement learning as the removal of information about transitions at specific states, rendering the environment related to those states unexplored for the agent. We propose a formal mathematical framework for exact unlearning, refine the re-training strategy, and introduce an efficient unlearning algorithm that incorporates Gaussian noise into both value-based and policy-based methods. Experimental results across discrete and continuous state spaces demonstrate effective unlearning performance. The proposed algorithm consistently matches the golden baseline of re-training while requiring less training time. Applications to the primacy bias further illustrate superior performance compared to an existing baseline, validating its broader practical applicability. |
| URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98342 |
| DOI: | 10.6342/NTU202502402 |
| 全文授權: | 未授權 |
| 電子全文公開日期: | N/A |
| 顯示於系所單位: | 電信工程學研究所 |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-113-2.pdf 未授權公開取用 | 23.31 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
