Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 電機工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/91627
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor張時中zh_TW
dc.contributor.advisorShi-Chung Changen
dc.contributor.author蕭詠隆zh_TW
dc.contributor.authorYung-Lung Xiaoen
dc.date.accessioned2024-02-20T16:16:39Z-
dc.date.available2024-02-21-
dc.date.copyright2024-02-20-
dc.date.issued2023-
dc.date.submitted2024-01-30-
dc.identifier.citation[BeP22] Dimitris Bertsimas, and Alex Paskov, “World-class interpretable poker”, Machine Learning, vol 111, pp. 3063-3083, June 2022
[BoT92] George E.P. Box and George C. Tiao, “Bayesian inference in statistical analysis”,April 1992 [Online].Available:https://onlinelibrary.wiley.com/doi/book/10.1002/9781118033197
[BrS19] Noam Brown, and Tuomas Sandholm, “Solving imperfect-information games via discounted regret minimization”, Proceedings of the AAAI Conference on Artificial Intelligence, July 2019
[BrS19] Noam Brown and Tuomas Sandholm, “Superhuman AI for multiplayer poker”, Science, vol 365, Issue 6456, pp. 885-890, July 2019
[FoD16] Lewis Forder and Benjamin James Dyson, “Behavioural and neural modulation of win-stay but not lose-shift strategies as a function of outcome value in Rock, Paper, Scissors”, Scientific Reports, Vol 6, pp. 1-8, September 2016
[Git23] Github:Rock, Paper, Scissors AI trained through self-play via counterfactual regret minimization algorithm. [Online].Available:https://github.com/cookm346/cfr_rps_r
[HHB11] Frank Hutter, Holger H. Hoos, and Kevin Leyton-Brown, “Sequential model-based optimization for general algorithm configuration”, International Conference on Learning and Intelligent Optimization, pp. 507-523, January 2011
[HSH05] Bret Hoehn, Finnegan Southey, Robert C. Holte ,and Valeriy Bulitko, “Effective short-term opponent exploitation in simplified poker”, AAAI Conference on Artificial Intelligence, July 2005
[Hub07] Douglas Hubbard, “How to measure anything: finding the value of intangibles in business”, John Wiley & Sons, 2007
[Huj07] 胡均立.賽局理論.[Online].Available:https://jinlihu.tripod.com/Chinese.htm
[KaT79] Daniel Kahneman and Amos Tversky, “Prospect theory: an analysis of decision under risk”, Econometrica, Vol 2, pp.263-292, March 1979
[Kuh50] Kuhn H.W, “Simplified two-person poker”, Contributions to the Theory of Games, 1950
[LQR08] Alessandro Lazaric, Mario Quaresimale,and Marcello Restelli, “On the usefulness of opponent modeling: the Kuhn Poker case study”, 7th International Joint Conference on Autonomous Agents and Multiagent Systems, May 2008
[LWJ21] Huale Li, Xuan Wang, Fengwei Jia, Yifan Li,and Qian Chen, “A survey of nash equilibrium strategy solving based on CFR”, Archives of Computational Methods in Engineering, vol 28, pp. 2749-2760, June 2021
[NeL13] Todd Neller and Marc Lanctot, “An introduction to counterfactual regret minimization”,[Online].Available:http://modelai.gettysburg.edu/2013/cfr/index.html#objectives
[OsR94] Martin J. Osborne and Ariel Rubinstein, “A course in game theory”, ,July 1994
[Scc21] 張時中.訊號控制與決策.[Online].Available: http://recipe.ee.ntu.edu.tw/C&D/index.htm
[SHH09] Finnegan Southey, Bret Hoehn, Robert C. Holte, “Effective short-term opponent exploitation in simplified poker”, Machine Learning, Vol 74, pp. 159-189, February 2009
[Wik23] Wiki:MBA智庫百科.二項分佈[Online].Available:https://wiki.mbalib.com/zh-tw/%E4%BA%8C%E9%A1%B9%E5%88%86%E5%B8%83
[Wik23] Wiki:零和賽局定義.[Online].Available:https://zh.wikipedia.org/zh-tw/%E9%9B%B6%E5%92%8C%E5%8D%9A%E5%BC%88
[Wik23] Wiki:無限重複賽局定義.[Online].Available:https://zh.wikipedia.org/zh-tw/%E9%87%8D%E5%A4%8D%E5%8D%9A%E5%BC%88
[Wtp23] 吳庭斌(2023).統計學.[Online].Available:http://web.ntpu.edu.tw/~wtp/course.html
[XuB18] CUI Jia-Xu and YANG Bo, “Survey on bayesian optimization methodology and applications”, Journal of Software, Vol 29, Issue 10, pp. 3068-3090, June 2018
[Yue23] 余清祥(2023).統計計算與模擬.[Online].Available:http://csyue.nccu.edu.tw/ch/index.htm
[ZJB07] Martin Zinkevich, Michael Johanson, Michael Bowling, and Carmelo Piccione, “Regret minimization in games with incomplete information”, Advances in Neural Information Processing Systems 20, December 2007
[ZMG21] Hanshu Zhang, Frederic Moisan, and Cleotilde Gonzalez, “Rock-paper-scissors play: beyond the win-stay/lose-change strategy games”, Games ,Vol 12, June 2021
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/91627-
dc.description.abstract零和重複賽局在國際政治、世界金融、軍事領域都有著廣泛應用的實際案例,例如冷戰時期的軍備競賽、各國央行的貨幣戰爭和兩國對峙的軍事偵察等等。其中最常見的情況主要分為兩種,一種是有如剪刀石頭布,雙方同時進行決策的靜態賽局;另一種則是有如庫恩撲克,行動有優先次序,且後行動者能夠觀察到先行動者的動態賽局。研究此賽局可以幫助在無法完全了解對方所有的資訊下,透過反覆觀察和學習來辨識對手的策略,並透過辨識出來的結果來調整自身的策略,用來取得在賽局中較高的收益。
此類賽局的策略主要分為兩種模式,分別為靜態均衡策略與動態調整策略。基於賽局理論的基礎,假設雙方皆為理性並彼此都知曉時,靜態均衡策略為雙方在對方策略不變時的最佳策略,許多用來計算零和重複賽局的靜態均衡策略演算法相繼出現,其中以反事實遺憾最小化演算法(CFR)和以其為基底之改良較被廣為人知。然而,現實生活中經常會因為種種不管是技術或心態層面的影響,使得遇到的對手,並非如賽局理論下的假設為理性的使用靜態均衡策略。在此情況下,一味地使用靜態均衡策略並不一定能夠帶來最高的期望收益。相對使用靜態均衡策略而言,動態調整策略可以利用與對手進行賽局時得到的資訊,計算對手偏離均衡策略的多寡,並用來更新原本的策略,進而達到增加預期收益的效果。若能將兩種策略相互結合並善加利用各自的優點,對於提升預期收益會有一定程度的幫助。
所以本論文探討在面對未知型態對手的情況下,應該如何進行觀察對手是否為理性地使用均衡策略,並同時以靜態均衡策略作為預設策略與對手進行賽局,直到取得一定可信度的判斷對手偏離均衡後,再切換成動態調整策略,根據對手偏離均衡策略的多寡進行自身策略的調整,接著進行下一回合的賽局,並利用此回合賽局的資訊重新調整對於對手偏離均衡策略的多寡,再次進行自身策略的調整,如此可以達到增加預期收益的效果。
本論文以剪刀石頭布和庫恩撲克(Kuhn Poker)作為研究案例,將零和重複賽局區分成靜態賽局和動態賽局來設計兩階段策略調整並呈現,且在較為複雜的庫恩撲克動態賽局中,在原有的型態辨識方法中添加加權係數之後,可以有效利用部分對手底牌未亮出的資訊,進而達到辨識對手型態並提升整體預期收益的效果。本論文研究在面對未知型態對手的情況下,應該如何進行觀察,並決定整體策略的架構流程與切換時機,用來達到提升預期收益的目的。本論文研究聚焦於剪刀石頭布的零和重複靜態賽局策略和庫恩撲克的零和重複動態賽局策略,針對兩階段策略調整的主要研究問題(P)以及相應的挑戰(C)和本論文提出的解決方法(M)如下:
P1. 有效整合靜態均衡策略與動態調整策略的策略優化設計問題:要如何透過兩階段策略調整的方式,將兩種策略進行有效整合之後,可保留各自的優點並改善使用單一策略的缺點。
C1. 要判斷在何種時機使用何種策略型態進行賽局,並在適當的情況下進行不同的策略,過程中包含對手型態的辨識以及切換時機的掌握是為一項挑戰。
M1. 兩階段策略調整之架構與流程設計:賽局初期先維持靜態均衡策略,再利用賽局中取得的資訊,決定切換成動態調整策略的時機。透過此架構流程能夠在未確認對手為非理性前保持均衡,也能在資訊充足的情況下,進行對手型態辨識並以此獲取較高的預期收益。
P2. 靜態均衡與動態調整策略的切換時機問題:要如何利用每回合賽局進行後的結果,蒐集有用的資訊配合假設檢定,快速判斷對手是否偏離均衡策略來決定切換兩階段策略的時間點。
C2. 要使用何種假設檢定進行判斷,並能兼顧準確以及迅速的前提下,能夠利用有效的資訊進行對手的辨識並搭配假設檢定決定策略切換的時機是為一項挑戰。
M2. 使用卡方適合度檢定作為依據,去判斷對手是否使用均衡策略進行賽局。若檢定出來的卡方值未達顯著,則接受對手使用均衡策略的假設;反之判斷對手並非使用均衡策略,本論文使用之策略即會從原本的靜態均衡策略,切換成動態調整策略進行往後賽局。
P3. 動態調整策略階段對手資訊掌握以及策略調整問題:在靜態或動態賽局中,當判斷對手並非使用均衡策略,要如何觀察對手偏離均衡策略的多寡,並以此為依據,進行自身的動態調整策略用來提升長期遊戲中的期望收益。
C3. 在切換成動態調整策略之後,由於在庫恩撲克賽局時對手資訊並非每一回合皆可獲得,例如對手底牌,較難準確預估對手策略偏移程度,因此要如何判斷對手的策略並以此為基準,再加上賽局中得到的資訊作為調整方向,是為一項挑戰。
M3. 在剪刀石頭布賽局時,使用貝氏推論(Bayesian Inference)去計算對手策略的偏移量,然而在庫恩撲克賽局時,對手資訊並非每回合可取得,可將原先貝氏推論中,事件發生的次數利用條件機率推導出加權係數來掌握對手未知資訊。再利用序列式模型貝氏優化(SMBO)的方法去決定自身策略的偏移量,用來獲取更高的預期收益。
本論文的發現和貢獻如下:
1. 創新設計兩階段策略調整的方法架構,透過決定切換策略的時機,整合利用靜態均衡策略和動態調整策略的優點,來提升剪刀石頭布與庫恩撲克賽局的預期收益。
2. 選取並應用卡方適合度檢定來決定切換策略時機。在90%信心水準下,剪刀石頭布靜態賽局由於策略形式單純為出拳的機率,辨識對手是否使均衡策略及偏離均衡的程度較易檢定出,平均能比庫恩撲克動態賽局提早20回合辨識出對手型態並開始切換策略。
3. 設計兩階段策略調整作法,優於完全依賴對手資訊進行動態調整。在需辨識對手型態的回合中,庫恩撲克賽局由於對手底牌資訊並非每回合可得知,故確認對手型態之前進行動態調整較易產生誤判,因此使用兩階段策略調整平均可提升7%預期收益;反之剪刀石頭布靜態賽局由於對手資訊較易取得,故與直接進行動態策略相比沒有取得較大優勢。
4. 設計兩階段策略調整作法,優於完全使用CFR演算法得出的均衡策略。根據貝氏推論並設計加權係數進行動態調整策略,在進行500次庫恩撲克賽局後,平均可提升40%預期收益。
zh_TW
dc.description.abstractZero-sum repeated games are widely used in real-world applications in international politics, global finance, and the military, such as the arms race during the Cold War, the currency wars between countries, and military reconnaissance between two countries. The most common cases are mainly divided into two types: one is a static game where both parties make decisions at the same time, such as rock-paper-scissors; the other is a dynamic game where actions have priority, such as Kuhn poker. The study of this game can help to identify the opponent's strategy through repeated observation and learning, and adjust their own strategy based on the identified results, in order to obtain a higher return in the game.
The strategies of this type of game are mainly divided into two modes: static equilibrium strategies and dynamic adjustment strategies. Based on the foundation of game theory, assuming that both sides are rational and know each other, the static equilibrium strategy is the best strategy for both sides when the opponent's strategy does not change. Many algorithms have been developed to calculate static equilibrium strategies for zero-sum repeated games, among which the counterfactual regret minimization algorithm (CFR) and its modifications are more well-known. However, in real life, due to various factors, whether technical or psychological, the opponents we encounter may not be as rational as the assumption under game theory to use static equilibrium strategies. In this case, simply using static equilibrium strategies may not necessarily lead to the highest expected return. Compared to using static equilibrium strategies, dynamic adjustment strategies can use the information obtained during the game with the opponent to calculate how much the opponent deviates from the equilibrium strategy, and use it to update the original strategy, thereby achieving the effect of increasing expected return. If the two strategies can be combined and each other's advantages can be well utilized, it will be helpful to improve the expected return to a certain extent.
Therefore, this paper explores how to observe the opponent in the case of an unknown opponent, and uses the static equilibrium strategy as the default strategy to play the game with the opponent until the opponent is judged to deviate from the equilibrium with a certain degree of credibility, and then switch to the dynamic adjustment strategy. According to the opponent's deviation from the equilibrium strategy, we adjust our own strategy. Then, the next round of the game is played, and the information of this round of the game is used to re-adjust the opponent's deviation from the equilibrium strategy, and then we adjust our own strategy again. This can achieve the effect of increasing the expected return.
This paper uses rock-paper-scissors and Kuhn poker as case studies to divide zero-sum repeated games into static games and dynamic games to design two-stage strategy adjustment. In the more complex Kuhn poker dynamic game, after adding weight coefficients to the original type recognition method, it can effectively use some of the opponent's hidden card information to achieve the effect of identifying the opponent's type and improving the overall expected return. This paper studies how to observe in the case of an unknown opponent, and determine the overall strategy framework, flow, and switching time to achieve the purpose of increasing expected return. This paper focuses on the zero-sum repeated static game strategy of rock-paper-scissors and the zero-sum repeated dynamic game strategy of Kuhn poker. The main research problems (P) of the two-stage strategy adjustment, the corresponding challenges (C), and the solutions (M) proposed by this paper are as follows:
P1. The problem of optimizing strategies through effective integration of static equilibrium strategies and dynamic adjustment strategies: How to integrate the two strategies effectively through a two-stage strategy adjustment process, preserving their respective advantages and addressing the shortcomings of using a single strategy.
C1. Determining the appropriate timing to employ different types of strategies in a game, recognizing opponent types, and mastering the timing of strategy switching pose a challenge.
M1. Framework and process design for two-stage strategy adjustment: Maintain a static equilibrium strategy in the early stages of the game and decide when to switch to a dynamic adjustment strategy based on information obtained during the game. This framework allows maintaining equilibrium before confirming opponent irrationality and enables opponent type recognition for higher expected return when sufficient information is available.
P2. The timing problem of switching between static equilibrium and dynamic adjustment strategies: How to collect useful information after each round of the game, conduct hypothesis testing to quickly determine if opponents deviate from equilibrium strategies, and decide the optimal time to switch between the two-stage strategies.
C2. Choosing a hypothesis test that balances accuracy and speed and effectively using information for opponent recognition and strategy switching is a challenge.
M2. Use the chi-square goodness-of-fit test to determine if opponents are using equilibrium strategies in the game. If the calculated chi-square value is not significant, accept the hypothesis that opponents are using equilibrium strategies. Otherwise, switch from the original static equilibrium strategy to the dynamic adjustment strategy for subsequent rounds.
P3. Opponent information acquisition and strategy adjustment in the dynamic adjustment strategy stage: When determining that opponents are not using equilibrium strategies, how to observe the degree of deviation from equilibrium and adjust one's own strategy based on this observation to enhance long-term expected return in static or dynamic games.
C3. After switching to the dynamic adjustment strategy, obtaining opponent information in each round, such as their hand in poker, poses a challenge in accurately estimating opponent strategy deviation. Determining opponent strategy and using acquired information as an adjustment direction is challenging.
M3. In rock-paper-scissors, use Bayesian Inference to calculate opponent strategy deviation. In Kuhn poker, where opponent information is not available every round, modify Bayesian Inference by incorporating weighted coefficients derived from conditional probabilities to handle unknown opponent information. Use Sequential Model-Based Optimization (SMBO) to decide the degree of strategy deviation for higher expected return.
The findings and contributions of this thesis are summarized as follows:
1. Innovative design of a two-stage strategy adjustment framework, which enhances expected returns in games such as rock-paper-scissors, and Kuhn Poker by integrating the advantages of static equilibrium strategies and dynamic adjustment strategies through the timely decision of strategy switching.
2. Selection and application of the chi-square goodness-of-fit test to determine the timing of strategy switching. At a 90% confidence level, the simplicity of the strategy format in rock-paper-scissors, where strategies are based on the probabilities of moves, makes it easier to detect whether opponents are using equilibrium strategies and the degree of deviation from equilibrium. On average, recognition of opponent types and the initiation of strategy switching can occur 20 rounds earlier compared to Kuhn Poker's dynamic game.
3. Design of a two-stage strategy adjustment approach, outperforming reliance solely on opponent information for dynamic adjustments. In rounds requiring opponent type identification, Kuhn Poker's dynamic game, given the lack of information about opponents' cards every round, is prone to misjudgments before confirming opponent types. Thus, using a two-stage strategy adjustment method can, on average, improve expected returns by 7%. Conversely, in the static game of rock-paper-scissors, where opponent information is readily available, there is no significant advantage compared to directly implementing dynamic strategies.
4. Design of a two-stage strategy adjustment approach, superior to equilibrium strategies obtained solely through Counterfactual Regret Minimization (CFR) algorithm. Utilizing Bayesian Inference and designing weighted coefficients for dynamic strategy adjustments, after 500 rounds of Kuhn Poker games, the average expected returns can be increased by 40%.
en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-02-20T16:16:39Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2024-02-20T16:16:39Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontents致謝 i
摘要 ii
Abstract vi
圖片清單 xiv
表格清單 xvi
第一章 論文介紹 1
1.1 研究動機 1
1.2 文獻探討 1
1.3 研究範圍 2
1.4 論文組織架構 2
第二章 剪刀石頭布和庫恩撲克中動態策略調整問題定義 3
2.1 零和重複賽局簡介 3
2.1.1 賽局與零和重複賽局理論概述 3
2.1.2 剪刀石頭布規則 7
2.1.3 庫恩撲克規則 7
2.1.4 靜態賽局 10
2.1.5 動態賽局 10
2.2 撲克賽局策略研究之現況 10
2.2.1 靜態均衡策略 11
2.2.2 動態調整策略 12
2.3 對手型態辨識簡介 12
2.3.1 假設檢定介紹 13
2.3.2 貝氏推論應用 14
2.4 均衡與動態兩階段策略調整之方法架構設計 16
2.5 剪刀石頭布和庫恩撲克中動態策略調整問題定義 18
2.5.1 剪刀石頭布和庫恩撲克中動態策略調整之問題定義 18
2.5.2 剪刀石頭布和庫恩撲克中動態策略調整之挑戰 19
第三章 靜態與動態策略之應用 20
3.1 重複賽局數學模型定義 20
3.1.1 兩人剪刀石頭布賽局模型 20
3.1.2 庫恩撲克賽局模型 22
3.2 靜態均衡策略:CFR演算法 25
3.2.1 CFR演算法名詞介紹 25
3.2.2 CFR演算法流程架構 27
3.2.3 CFR演算法運算結果 33
3.3 動態調整策略:貝氏推論應用和SMBO形式策略調整 40
3.3.1 貝氏推論於對手策略推測之基礎 41
3.3.2 貝式推論演算法架構 44
3.3.3 貝式推論演算法測試結果 46
3.3.4 SMBO形式策略調整 48
3.3.5 SMBO形式策略調整之應用 49
3.4 總結 52
第四章 型態辨識與時機切換之設計 53
4.1 策略時機切換之設計 53
4.1.1 切換設計之重要性 53
4.1.2 卡方檢定應用於時機切換之方法 53
4.1.3 卡方適合度檢定架構 55
4.1.4 卡方適合度檢定之判斷時間 57
4.2 兩階段策略調整之整合與應用 60
4.2.1 策略整合之整體流程架構 60
4.2.2 策略整合之應用結果 61
4.3 應用於庫恩撲克之不足 64
第五章 貝氏推論於庫恩撲克賽局應用之改良 65
5.1 貝氏推論應用之改良 65
5.1.1 峰值估計之加權係數設計 65
5.1.2 加權係數方法之驗證 69
5.2 庫恩撲克賽局模擬結果 70
5.3總結 72
第六章 結論與未來研究方向 76
6.1 結論 76
6.2 未來研究方向 76
參考文獻 78
-
dc.language.isozh_TW-
dc.subject撲克zh_TW
dc.subject重複賽局zh_TW
dc.subject動態策略調整zh_TW
dc.subject貝氏推論zh_TW
dc.subject假設檢定zh_TW
dc.subjectHypothesis Testingen
dc.subjectBayesian Inferenceen
dc.subjectDynamic Strategy Adjustmenten
dc.subjectPokeren
dc.subjectRepeated Gameen
dc.title重複賽局之兩階段策略調整與切換時機設計:剪刀石頭布和庫恩撲克案例zh_TW
dc.titleTwo-Stage Strategy Adjustment and Timing Switching Design in Repeated Games with Imperfect Information: Case Studies of RPS and Kuhn Pokeren
dc.typeThesis-
dc.date.schoolyear112-1-
dc.description.degree碩士-
dc.contributor.oralexamcommittee連豊力;蔡崇聖;洪一薰;楊奕農zh_TW
dc.contributor.oralexamcommitteeFeng-Li Lian;Tsung-Sheng Tsai;I-Hsuan Hong;Yi-Nung Yangen
dc.subject.keyword重複賽局,撲克,假設檢定,貝氏推論,動態策略調整,zh_TW
dc.subject.keywordRepeated Game,Poker,Hypothesis Testing,Bayesian Inference,Dynamic Strategy Adjustment,en
dc.relation.page80-
dc.identifier.doi10.6342/NTU202400369-
dc.rights.note未授權-
dc.date.accepted2024-01-30-
dc.contributor.author-college電機資訊學院-
dc.contributor.author-dept電機工程學系-
顯示於系所單位:電機工程學系

文件中的檔案:
檔案 大小格式 
ntu-112-1.pdf
  未授權公開取用
3.85 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved