Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 電機工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/90166
標題: 非穩態下之極值多臂吃角子老虎機於數值優化領域之探討
Non-stationary Extreme Bandits: An Optimization Case Study
作者: 吳柏儒
Po-Ju Wu
指導教授: 于天立
Tian-Li Yu
關鍵字: 多臂吃角子老虎機,順序統計量,非穩態,極值,強化學習,機器學習,實數優化,蒙地卡羅樹,超參數優化,自適應共變異數矩陣演化策略,
multi-armed bandits,order statistics,non-stationary,extreme value,reinforcement learning,machine learning,real-valued optimization,Monte-Carlo tree search,hyperparameter optimization,Covariance matrix adaptation evolution strategy,
出版年 : 2023
學位: 碩士
摘要: 在工程、科學和財務領域,我們經常遇到在有限的資源下做出選擇的問題,這種問題可歸結為多臂吃角子老虎機問題,也是強化學習的基礎。現有的研究主要集中在穩態場景的期望報酬優化,但實際上許多問題並非穩態,或是追求極值報酬而非期望報酬。我們開發了一種以順序統計量為基礎的演算法,並配合自適應分佈模型,以優化在非穩態場景下追求極值之資源分配。我們將此演算法應用於實數數值優化、蒙地卡羅樹搜索以及深度學習模型超參數優化等三種問題,並與目前經典的多臂吃角子老虎機演算法做比較。在實數數值優化中,我們使用自適應共變異數矩陣演化策略作為優化器,在CEC2005的多模態基準函數上做測試,在有限的抽樣預算下我們的演算法具有優勢。在蒙地卡羅數搜尋中,我們設計了獎勵函數,透過實驗測試演算法在不同場景下之穩態性,在獎勵函數值域不受限制的場景下我們的演算法具有統計上的優勢。而在超參數優化問題上,我們設計了一個架構結合目前之最新架構,並且在我們測試的電腦視覺訓練問題上能取得較原版較高之測試集準確度。
In engineering, scientific, and financial domains, we frequently encounter the challenge of making decisions when faced with limited resources. This issue can be framed as the multi-armed bandit problem, which serves as a fundamental concept in reinforcement learning. Existing research primarily focuses on optimizing the expected reward in stationary scenarios. However, many real-world problems exhibit non-stationarity or prioritize attaining the highest possible reward rather than the expected reward. To address this, we have developed an algorithm that leverages order statistics and adaptive distribution models to optimize resource allocation in pursuit of the highest possible reward in non-stationary environments. We applied our algorithm to three different problems: real-valued optimization, Monte-Carlo tree search, and hyperparameter optimization for deep learning models. Also, we compared it with the classical multi-armed bandit algorithm.

In real-valued optimization, we utilized the self-adaptive covariance matrix evolution strategy as the optimizer. We tested our algorithm on the multimodal benchmark functions from CEC2005. Our algorithm exhibited advantages within limited sampling budgets. For Monte-Carlo tree search problems, we designed a reward function and conducted experiments to assess the robustness of our algorithm in various scenarios. Our algorithm demonstrated statistical advantages in scenarios where the reward function's value range was unconstrained. Regarding hyperparameter optimization, we devised a framework that incorporates state-of-the-art architectures. When evaluated on computer vision training problems, our algorithm achieved higher testing set accuracy compared to the original version.
URI: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/90166
DOI: 10.6342/NTU202303980
全文授權: 同意授權(全球公開)
顯示於系所單位:電機工程學系

文件中的檔案:
檔案 大小格式 
ntu-111-2.pdf8.72 MBAdobe PDF檢視/開啟
顯示文件完整紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved