Skip navigation

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More
DSpace logo
English
中文
  • Browse
    • Communities
      & Collections
    • Publication Year
    • Author
    • Title
    • Subject
    • Advisor
  • Search TDR
  • Rights Q&A
    • My Page
    • Receive email
      updates
    • Edit Profile
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 電信工程學研究所
Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98342
Title: 強化學習之反學習
Towards Unlearning in Reinforcement Learning
Authors: 楊凱恩
Kai-En Yang
Advisor: 孫紹華
Shao-Hua Sun
Keyword: 強化學習,機器反學習,機器學習,深度學習,環境狀態反學習,
reinforcement learning,machine unlearning,machine learning,deep learning,state unlearning,
Publication Year : 2025
Degree: 碩士
Abstract: 隨著對用戶數據的日益依賴,機器反學習這一新興領域受到越來越多的關注。該領域旨在在不進行完整重新訓練的情況下,選擇性移除特定數據對機器學習模型的影響。儘管機器反學習在分類與生成模型中已有廣泛研究,但其在強化學習中的應用仍鮮有探討。強化學習因其序列決策的特性而帶來獨特挑戰。本文針對這些挑戰,將強化學習中的反學習定義為移除特定狀態下轉移資訊的影響,使得與這些狀態相關的環境對模型而言變得未被探索。我們提出了一個正式的的數學框架實現精確反學習,並改進了重新訓練策略,同時設計出一個高效的反學習演算法,該演算法在基於價值與基於策略函數的方法中皆融入了高斯噪音。實驗結果涵蓋離散與連續狀態空間,顯示出該方法具備有效的遺忘能力。所提出的演算法在顯著降低訓練時間的同時,始終可達到與黃金標準,重新訓練,相當的表現。此外,在應用於初始偏誤場景中,該方法亦顯示出優於現有基線的效果,驗證了其更廣泛的實用性。
The growing reliance on user data has brought attention to the emerging field of machine unlearning, which focuses on selectively removing the influence of specific data (or groups of data) from machine learning models without requiring full re-training. While machine unlearning has been extensively studied in classification and generative models, its application to reinforcement learning remains largely unexplored. Reinforcement learning poses unique challenges due to its sequential decision-making nature. In this paper, we address these challenges by defining unlearning in reinforcement learning as the removal of information about transitions at specific states, rendering the environment related to those states unexplored for the agent. We propose a formal mathematical framework for exact unlearning, refine the re-training strategy, and introduce an efficient unlearning algorithm that incorporates Gaussian noise into both value-based and policy-based methods. Experimental results across discrete and continuous state spaces demonstrate effective unlearning performance. The proposed algorithm consistently matches the golden baseline of re-training while requiring less training time. Applications to the primacy bias further illustrate superior performance compared to an existing baseline, validating its broader practical applicability.
URI: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98342
DOI: 10.6342/NTU202502402
Fulltext Rights: 未授權
metadata.dc.date.embargo-lift: N/A
Appears in Collections:電信工程學研究所

Files in This Item:
File SizeFormat 
ntu-113-2.pdf
  Restricted Access
23.31 MBAdobe PDF
Show full item record


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved