Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 工學院
  3. 工程科學及海洋工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/99788
標題: 基於強化學習實現動態功率限制之顯示卡熱保護機制
A Reinforcement Learning Based Dynamic Power Limiting Thermal Protection Mechanism for GPU
作者: 邱昱翰
Yu-Han Chiu
指導教授: 李佳翰
Jia-Han Li
關鍵字: 強化學習,動態功率限制,顯示卡熱管理,散熱失效韌性,
Reinforcement Learning,Dynamic Power Capping,GPU Thermal Manage- ment,Cooling Failure Resilience,
出版年 : 2025
學位: 碩士
摘要: 隨著資料中心 GPU 運算密度急遽提升,冷卻系統若發生瞬時失效,晶片溫度易於數秒內突破風險門檻,導致韌體降速與硬體壽命衰減。本研究針對「秒級功率–溫度動態」提出一套 GPU 動態功率限制框架,核心以 Proximal Policy Optimization (PPO) 強化學習取代傳統 Anti-windup PID,並優化觀測與動作設計以提升安全性與泛化能力。方法上:(1)觀測空間僅保留溫度旗標、功率上限、即時功率及上一步功率增量四項特徵,排除易致策略僵化的風扇轉速;(2)動作空間以連續 Power-Limit 微幅調節 (±50 W、兩秒一更新) 為唯一致動;(3)獎勵函數採「超溫懲罰+功率獎勵」分層加權,確保安全溫度優先;(4)用 Generalized Advantage Estimation 緩解功率–溫度回饋延遲。
實驗於單張 RTX 4070 SUPER(100–275 W)平台進行。首先,在風扇交錯配置的驗證中,移除風扇特徵後的 RL 策略能依實際溫度即時上、下調功率,成功避免「轉速—功率」硬式對映造成的泛化失效。其次,在散熱裝置失效場景,RL 將溫度拉回目標帶的上升時間、安定時間分別較 anti-windup PID 縮短 79 % 與 60 %,功率波動亦由 ±20 W 降至 ±10 W;在散熱恢復場景,RL 僅 30 s 即將 Power-Limit 恢復至上限,較 PID 再縮短 23 %。進一步於 LLaMA-7B LoRA 微調、電腦視覺 ResNet CIFAR-100 及 Seq2Seq 翻譯三項高負載工作實測,策略皆能在 98–100 % GPU 使用率下維持核心溫度於 85 °C 保護門檻以下,並保持功率穩定。
綜合結果顯示,本研究提出之 RL 架構在面對散熱失效與極端環境時,兼具更快的溫度抑制速度與更平滑的功率輸出,相較傳統 PID 可顯著降低超溫風險與功率雜訊,並具跨任務、跨框架的部署彈性。此成果為資料中心提供一條不依賴額外硬體的軟體式 GPU 熱—功率協同控制解決方案,對提升高密度 AI 訓練伺服器的韌性與能源效率具實務價值。
As GPU compute density in data centers continues to climb, a sudden failure of the cooling system can push chip temperatures beyond safe limits within seconds, triggering firmware throttling and accelerating hardware degradation. This study proposes a real-time GPU dynamic power-limiting framework tailored to “second-scale power-temperature dynamics."The framework replaces a conventional anti-windup PID controller with a reinforcement-learning (RL) agent based on Proximal Policy Optimization (PPO) and refines both the observation and action spaces to maximize safety and generalization capability.
Methodologically, we (1) confine the observation space to four highly relevant features—temperature flag, power-limit, instantaneous power draw, and previous power-limit increment—while deliberately excluding fan-speed signals that cause policy overfitting; (2) define the action space as a single continuous adjustment of the power-limit (±50W every 2 s); (3) design a reward function that combines high-priority over-temperature penalties with secondary power rewards to enforce a “safety-first"policy; and (4) apply Generalized Advantage Estimation to mitigate the multi-second delay between power changes and thermal feedback.
Experiments are conducted on a single NVIDIA RTX 4070 SUPER (adjustable from 100 W to 275 W). In cross-fan tests, the RL policy—trained without fan-speed inputs—adaptively increases or reduces power based solely on actual temperature, avoiding the hard-coded “fan-speed → power"mapping that undermines generalization. Under a cooling-failure scenario, the RL controller shortens rise time and settling time by 79 % and 60 %, respectively, compared with the anti-windup PID, while halving power oscillations from ±20 W to ±10 W. When cooling is restored, the RL agent raises the power-limit to its maximum in 30 s—23 % faster than PID—without any temperature violations. Further validation on three high-load tasks—LLaMA-7B LoRA fine-tuning, ResNet CIFAR-100 training, and Seq2Seq translation (spanning PyTorch and TensorFlow frameworks)—shows the policy maintains core temperature below the 85 °C firmware throttle threshold at 98–100 % GPU utilization with stable power output.
Overall, the proposed RL architecture delivers faster thermal suppression and smoother power control than a traditional PID across both cooling-failure and extreme-environment scenarios. It significantly lowers over-temperature risk and power noise while remaining task- and framework-agnostic, providing a purely software-based solution for GPU thermal-power co-management that enhances the resilience and energy efficiency of high density AI training servers.
URI: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/99788
DOI: 10.6342/NTU202501824
全文授權: 同意授權(限校園內公開)
電子全文公開日期: 2030-07-14
顯示於系所單位:工程科學及海洋工程學系

文件中的檔案:
檔案 大小格式 
ntu-113-2.pdf
  未授權公開取用
5.86 MBAdobe PDF檢視/開啟
顯示文件完整紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved