重複賽局之兩階段策略調整與切換時機設計:剪刀石頭布和庫恩撲克案例

蕭詠隆; Yung-Lung Xiao

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/91627

標題:	重複賽局之兩階段策略調整與切換時機設計:剪刀石頭布和庫恩撲克案例 Two-Stage Strategy Adjustment and Timing Switching Design in Repeated Games with Imperfect Information: Case Studies of RPS and Kuhn Poker
作者:	蕭詠隆 Yung-Lung Xiao
指導教授:	張時中 Shi-Chung Chang
關鍵字:	重複賽局,撲克,假設檢定,貝氏推論,動態策略調整, Repeated Game,Poker,Hypothesis Testing,Bayesian Inference,Dynamic Strategy Adjustment,
出版年 :	2023
學位:	碩士
摘要:	零和重複賽局在國際政治、世界金融、軍事領域都有著廣泛應用的實際案例，例如冷戰時期的軍備競賽、各國央行的貨幣戰爭和兩國對峙的軍事偵察等等。其中最常見的情況主要分為兩種，一種是有如剪刀石頭布，雙方同時進行決策的靜態賽局；另一種則是有如庫恩撲克，行動有優先次序，且後行動者能夠觀察到先行動者的動態賽局。研究此賽局可以幫助在無法完全了解對方所有的資訊下，透過反覆觀察和學習來辨識對手的策略，並透過辨識出來的結果來調整自身的策略，用來取得在賽局中較高的收益。此類賽局的策略主要分為兩種模式，分別為靜態均衡策略與動態調整策略。基於賽局理論的基礎，假設雙方皆為理性並彼此都知曉時，靜態均衡策略為雙方在對方策略不變時的最佳策略，許多用來計算零和重複賽局的靜態均衡策略演算法相繼出現，其中以反事實遺憾最小化演算法(CFR)和以其為基底之改良較被廣為人知。然而，現實生活中經常會因為種種不管是技術或心態層面的影響，使得遇到的對手，並非如賽局理論下的假設為理性的使用靜態均衡策略。在此情況下，一味地使用靜態均衡策略並不一定能夠帶來最高的期望收益。相對使用靜態均衡策略而言，動態調整策略可以利用與對手進行賽局時得到的資訊，計算對手偏離均衡策略的多寡，並用來更新原本的策略，進而達到增加預期收益的效果。若能將兩種策略相互結合並善加利用各自的優點，對於提升預期收益會有一定程度的幫助。所以本論文探討在面對未知型態對手的情況下，應該如何進行觀察對手是否為理性地使用均衡策略，並同時以靜態均衡策略作為預設策略與對手進行賽局，直到取得一定可信度的判斷對手偏離均衡後，再切換成動態調整策略，根據對手偏離均衡策略的多寡進行自身策略的調整，接著進行下一回合的賽局，並利用此回合賽局的資訊重新調整對於對手偏離均衡策略的多寡，再次進行自身策略的調整，如此可以達到增加預期收益的效果。本論文以剪刀石頭布和庫恩撲克(Kuhn Poker)作為研究案例，將零和重複賽局區分成靜態賽局和動態賽局來設計兩階段策略調整並呈現，且在較為複雜的庫恩撲克動態賽局中，在原有的型態辨識方法中添加加權係數之後，可以有效利用部分對手底牌未亮出的資訊，進而達到辨識對手型態並提升整體預期收益的效果。本論文研究在面對未知型態對手的情況下，應該如何進行觀察，並決定整體策略的架構流程與切換時機，用來達到提升預期收益的目的。本論文研究聚焦於剪刀石頭布的零和重複靜態賽局策略和庫恩撲克的零和重複動態賽局策略，針對兩階段策略調整的主要研究問題(P)以及相應的挑戰(C)和本論文提出的解決方法(M)如下： P1. 有效整合靜態均衡策略與動態調整策略的策略優化設計問題：要如何透過兩階段策略調整的方式，將兩種策略進行有效整合之後，可保留各自的優點並改善使用單一策略的缺點。 C1. 要判斷在何種時機使用何種策略型態進行賽局，並在適當的情況下進行不同的策略，過程中包含對手型態的辨識以及切換時機的掌握是為一項挑戰。 M1. 兩階段策略調整之架構與流程設計:賽局初期先維持靜態均衡策略，再利用賽局中取得的資訊，決定切換成動態調整策略的時機。透過此架構流程能夠在未確認對手為非理性前保持均衡，也能在資訊充足的情況下，進行對手型態辨識並以此獲取較高的預期收益。 P2. 靜態均衡與動態調整策略的切換時機問題：要如何利用每回合賽局進行後的結果，蒐集有用的資訊配合假設檢定，快速判斷對手是否偏離均衡策略來決定切換兩階段策略的時間點。 C2. 要使用何種假設檢定進行判斷，並能兼顧準確以及迅速的前提下，能夠利用有效的資訊進行對手的辨識並搭配假設檢定決定策略切換的時機是為一項挑戰。 M2. 使用卡方適合度檢定作為依據，去判斷對手是否使用均衡策略進行賽局。若檢定出來的卡方值未達顯著，則接受對手使用均衡策略的假設；反之判斷對手並非使用均衡策略，本論文使用之策略即會從原本的靜態均衡策略，切換成動態調整策略進行往後賽局。 P3. 動態調整策略階段對手資訊掌握以及策略調整問題：在靜態或動態賽局中，當判斷對手並非使用均衡策略，要如何觀察對手偏離均衡策略的多寡，並以此為依據，進行自身的動態調整策略用來提升長期遊戲中的期望收益。 C3. 在切換成動態調整策略之後，由於在庫恩撲克賽局時對手資訊並非每一回合皆可獲得，例如對手底牌，較難準確預估對手策略偏移程度，因此要如何判斷對手的策略並以此為基準，再加上賽局中得到的資訊作為調整方向，是為一項挑戰。 M3. 在剪刀石頭布賽局時，使用貝氏推論(Bayesian Inference)去計算對手策略的偏移量，然而在庫恩撲克賽局時，對手資訊並非每回合可取得，可將原先貝氏推論中，事件發生的次數利用條件機率推導出加權係數來掌握對手未知資訊。再利用序列式模型貝氏優化(SMBO)的方法去決定自身策略的偏移量，用來獲取更高的預期收益。本論文的發現和貢獻如下： 1. 創新設計兩階段策略調整的方法架構，透過決定切換策略的時機，整合利用靜態均衡策略和動態調整策略的優點，來提升剪刀石頭布與庫恩撲克賽局的預期收益。 2. 選取並應用卡方適合度檢定來決定切換策略時機。在90%信心水準下，剪刀石頭布靜態賽局由於策略形式單純為出拳的機率，辨識對手是否使均衡策略及偏離均衡的程度較易檢定出，平均能比庫恩撲克動態賽局提早20回合辨識出對手型態並開始切換策略。 3. 設計兩階段策略調整作法，優於完全依賴對手資訊進行動態調整。在需辨識對手型態的回合中，庫恩撲克賽局由於對手底牌資訊並非每回合可得知，故確認對手型態之前進行動態調整較易產生誤判，因此使用兩階段策略調整平均可提升7%預期收益；反之剪刀石頭布靜態賽局由於對手資訊較易取得，故與直接進行動態策略相比沒有取得較大優勢。 4. 設計兩階段策略調整作法，優於完全使用CFR演算法得出的均衡策略。根據貝氏推論並設計加權係數進行動態調整策略，在進行500次庫恩撲克賽局後，平均可提升40%預期收益。 Zero-sum repeated games are widely used in real-world applications in international politics, global finance, and the military, such as the arms race during the Cold War, the currency wars between countries, and military reconnaissance between two countries. The most common cases are mainly divided into two types: one is a static game where both parties make decisions at the same time, such as rock-paper-scissors; the other is a dynamic game where actions have priority, such as Kuhn poker. The study of this game can help to identify the opponent's strategy through repeated observation and learning, and adjust their own strategy based on the identified results, in order to obtain a higher return in the game. The strategies of this type of game are mainly divided into two modes: static equilibrium strategies and dynamic adjustment strategies. Based on the foundation of game theory, assuming that both sides are rational and know each other, the static equilibrium strategy is the best strategy for both sides when the opponent's strategy does not change. Many algorithms have been developed to calculate static equilibrium strategies for zero-sum repeated games, among which the counterfactual regret minimization algorithm (CFR) and its modifications are more well-known. However, in real life, due to various factors, whether technical or psychological, the opponents we encounter may not be as rational as the assumption under game theory to use static equilibrium strategies. In this case, simply using static equilibrium strategies may not necessarily lead to the highest expected return. Compared to using static equilibrium strategies, dynamic adjustment strategies can use the information obtained during the game with the opponent to calculate how much the opponent deviates from the equilibrium strategy, and use it to update the original strategy, thereby achieving the effect of increasing expected return. If the two strategies can be combined and each other's advantages can be well utilized, it will be helpful to improve the expected return to a certain extent. Therefore, this paper explores how to observe the opponent in the case of an unknown opponent, and uses the static equilibrium strategy as the default strategy to play the game with the opponent until the opponent is judged to deviate from the equilibrium with a certain degree of credibility, and then switch to the dynamic adjustment strategy. According to the opponent's deviation from the equilibrium strategy, we adjust our own strategy. Then, the next round of the game is played, and the information of this round of the game is used to re-adjust the opponent's deviation from the equilibrium strategy, and then we adjust our own strategy again. This can achieve the effect of increasing the expected return. This paper uses rock-paper-scissors and Kuhn poker as case studies to divide zero-sum repeated games into static games and dynamic games to design two-stage strategy adjustment. In the more complex Kuhn poker dynamic game, after adding weight coefficients to the original type recognition method, it can effectively use some of the opponent's hidden card information to achieve the effect of identifying the opponent's type and improving the overall expected return. This paper studies how to observe in the case of an unknown opponent, and determine the overall strategy framework, flow, and switching time to achieve the purpose of increasing expected return. This paper focuses on the zero-sum repeated static game strategy of rock-paper-scissors and the zero-sum repeated dynamic game strategy of Kuhn poker. The main research problems (P) of the two-stage strategy adjustment, the corresponding challenges (C), and the solutions (M) proposed by this paper are as follows: P1. The problem of optimizing strategies through effective integration of static equilibrium strategies and dynamic adjustment strategies: How to integrate the two strategies effectively through a two-stage strategy adjustment process, preserving their respective advantages and addressing the shortcomings of using a single strategy. C1. Determining the appropriate timing to employ different types of strategies in a game, recognizing opponent types, and mastering the timing of strategy switching pose a challenge. M1. Framework and process design for two-stage strategy adjustment: Maintain a static equilibrium strategy in the early stages of the game and decide when to switch to a dynamic adjustment strategy based on information obtained during the game. This framework allows maintaining equilibrium before confirming opponent irrationality and enables opponent type recognition for higher expected return when sufficient information is available. P2. The timing problem of switching between static equilibrium and dynamic adjustment strategies: How to collect useful information after each round of the game, conduct hypothesis testing to quickly determine if opponents deviate from equilibrium strategies, and decide the optimal time to switch between the two-stage strategies. C2. Choosing a hypothesis test that balances accuracy and speed and effectively using information for opponent recognition and strategy switching is a challenge. M2. Use the chi-square goodness-of-fit test to determine if opponents are using equilibrium strategies in the game. If the calculated chi-square value is not significant, accept the hypothesis that opponents are using equilibrium strategies. Otherwise, switch from the original static equilibrium strategy to the dynamic adjustment strategy for subsequent rounds. P3. Opponent information acquisition and strategy adjustment in the dynamic adjustment strategy stage: When determining that opponents are not using equilibrium strategies, how to observe the degree of deviation from equilibrium and adjust one's own strategy based on this observation to enhance long-term expected return in static or dynamic games. C3. After switching to the dynamic adjustment strategy, obtaining opponent information in each round, such as their hand in poker, poses a challenge in accurately estimating opponent strategy deviation. Determining opponent strategy and using acquired information as an adjustment direction is challenging. M3. In rock-paper-scissors, use Bayesian Inference to calculate opponent strategy deviation. In Kuhn poker, where opponent information is not available every round, modify Bayesian Inference by incorporating weighted coefficients derived from conditional probabilities to handle unknown opponent information. Use Sequential Model-Based Optimization (SMBO) to decide the degree of strategy deviation for higher expected return. The findings and contributions of this thesis are summarized as follows: 1. Innovative design of a two-stage strategy adjustment framework, which enhances expected returns in games such as rock-paper-scissors, and Kuhn Poker by integrating the advantages of static equilibrium strategies and dynamic adjustment strategies through the timely decision of strategy switching. 2. Selection and application of the chi-square goodness-of-fit test to determine the timing of strategy switching. At a 90% confidence level, the simplicity of the strategy format in rock-paper-scissors, where strategies are based on the probabilities of moves, makes it easier to detect whether opponents are using equilibrium strategies and the degree of deviation from equilibrium. On average, recognition of opponent types and the initiation of strategy switching can occur 20 rounds earlier compared to Kuhn Poker's dynamic game. 3. Design of a two-stage strategy adjustment approach, outperforming reliance solely on opponent information for dynamic adjustments. In rounds requiring opponent type identification, Kuhn Poker's dynamic game, given the lack of information about opponents' cards every round, is prone to misjudgments before confirming opponent types. Thus, using a two-stage strategy adjustment method can, on average, improve expected returns by 7%. Conversely, in the static game of rock-paper-scissors, where opponent information is readily available, there is no significant advantage compared to directly implementing dynamic strategies. 4. Design of a two-stage strategy adjustment approach, superior to equilibrium strategies obtained solely through Counterfactual Regret Minimization (CFR) algorithm. Utilizing Bayesian Inference and designing weighted coefficients for dynamic strategy adjustments, after 500 rounds of Kuhn Poker games, the average expected returns can be increased by 40%.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/91627
DOI:	10.6342/NTU202400369
全文授權:	未授權
顯示於系所單位：	電機工程學系

文件中的檔案：

檔案	大小	格式
ntu-112-1.pdf 未授權公開取用	3.85 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。