mcts下載 - mcts原始碼下載

使用 Python 程式碼的蒙特卡羅樹搜尋 (MCTS) 演算法教學。應用MCTS為簡單遊戲創建AI。

在本教程中，我們將解釋蒙特卡羅樹搜尋演算法和程式碼的每個部分。最近我們應用MCTS來開發我們的遊戲。

程式碼是通用的，僅假設您熟悉基本的 Python。我們已經針對我們的遊戲進行了解釋。如果你想將它用於你的專案或遊戲，你將不得不稍微修改我下面提到的功能。

以下是該遊戲的 Playstore 連結：Sudo Tic Tac Toe。

我們的遊戲規則（模式1）如下：

遊戲像數獨一樣在 9 x 9 的網格上進行。
這個大的 9 × 9 網格被分成 9 個較小的 3 × 3 網格（局部板）。
遊戲的目標是贏得 9 個本地棋盤中的任何一個。
你的舉動決定了人工智慧必須在哪個本地棋盤上採取行動，反之亦然。
例如，您在本地 5 號棋盤的位置 1 下棋。
普通井字遊戲的規則適用於當地棋盤。

為什麼選擇MCTS？

正如您所看到的，這款遊戲具有非常高的分支因子。對於第一步，整個棋盤都是空的。所以有81個空位。第一回合有 81 種可能的動作。對於第二回合，應用規則 4，它有 8 或 9 種可能的移動。

對於前 2 個動作，這會產生 81*9 = 729 種可能的組合。因此，隨著遊戲的進行，可能的組合數量會增加，從而產生高分支因子。對於我們遊戲的兩種模式來說，分支因子都非常高。對於具有如此高分支因子的遊戲，不可能應用極小極大演算法。 MCTS 演算法適用於此類遊戲。

另外，正如您在玩遊戲時所看到的那樣，人工智慧採取行動所需的時間大約只有一秒鐘。因此 MCTS 運行速度很快。 MCTS 已應用於這兩種遊戲模式。

下面我們用Python來示範MCTS程式碼。首先我們需要導入 numpy 和 defaultdict。

 import numpy as np
from collections import defaultdict

定義 MCTS 類，如下所示。

 class MonteCarloTreeSearchNode ():
    def __init__ ( self , state , parent = None , parent_action = None ):
        self . state = state
        self . parent = parent
        self . parent_action = parent_action
        self . children = []
        self . _number_of_visits = 0
        self . _results = defaultdict ( int )
        self . _results [ 1 ] = 0
        self . _results [ - 1 ] = 0
        self . _untried_actions = None
        self . _untried_actions = self . untried_actions ()
        return

構造函數用於初始化以下變數。

state ：對於我們的遊戲，它代表棋盤狀態。通常棋盤狀態由陣列表示。對於普通的 Tic Tac Toe，它是一個 3 x 3 的陣列。
Parent ：對根節點來說它是 None ，對其他節點來說它等於它的衍生節點。正如您從遊戲中看到的，第一回合它是“無”。
Children ：它包含目前節點的所有可能的操作。對於我們遊戲的第二回合，這是 9 或 8，取決於您的移動位置。
Parent_action ：對於根節點無，對於其他節點，它等於其父節點執行的操作。
_number_of_visits : 目前節點被存取的次數
結果：這是一本字典
_untried_actions ：表示所有可能操作的列表
action ：必須執行的移動。

類別由以下成員函數組成。除main()函數外，以下所有函數都是成員函數。

 def untried_actions ( self ):

    self . _untried_actions = self . state . get_legal_actions ()
    return self . _untried_actions

傳回給定狀態下未嘗試的操作的清單。我們遊戲的第一回合有 81 種可能的動作。第二回合是 8 或 9。

 def q ( self ):
    wins = self . _results [ 1 ]
    loses = self . _results [ - 1 ]
    return wins - loses

返回勝利 - 損失的差值

 def n ( self ):
    return self . _number_of_visits

傳回每個節點被造訪的次數。

 def expand ( self ):
	
    action = self . _untried_actions . pop ()
    next_state = self . state . move ( action )
    child_node = MonteCarloTreeSearchNode (
		next_state , parent = self , parent_action = action )

    self . children . append ( child_node )
    return child_node

根據所執行的操作，從目前狀態產生下一個狀態。在此步驟中，所有可能的狀態都會附加到 Children 陣列並傳回 child_node。目前狀態可能出現的狀態都會被追加到children數組中，並傳回與該狀態對應的child_node。

 def is_terminal_node ( self ):
    return self . state . is_game_over ()

這用於檢查當前節點是否為終端。當遊戲結束時到達終端節點。

 def rollout ( self ):
    current_rollout_state = self . state
    
    while not current_rollout_state . is_game_over ():
        
        possible_moves = current_rollout_state . get_legal_actions ()
        
        action = self . rollout_policy ( possible_moves )
        current_rollout_state = current_rollout_state . move ( action )
    return current_rollout_state . game_result ()

從目前狀態開始，模擬整個遊戲，直到出現遊戲結果。傳回該遊戲結果。例如，如果結果獲勝，則結果為 1。若平手則為 0。如果整個遊戲是隨機模擬的，即在每個回合從一組可能的移動中隨機選擇移動，則稱為輕遊戲。

 def backpropagate ( self , result ):
    self . _number_of_visits += 1.
    self . _results [ result ] += 1.
    if self . parent :
        self . parent . backpropagate ( result )

在此步驟中，節點的所有統計資料都會更新。直到到達父節點，每個節點的造訪次數加1。

 def is_fully_expanded ( self ):
    return len ( self . _untried_actions ) == 0

所有動作都從_untried_actions中一一彈出。當它變空時，即大小為零時，它就完全擴展了。

 def best_child ( self , c_param = 0.1 ):
    
    choices_weights = [( c . q () / c . n ()) + c_param * np . sqrt (( 2 * np . log ( self . n ()) / c . n ())) for c in self . children ]
    return self . children [ np . argmax ( choices_weights )]

一旦完全展開，函數就會從子數組中選擇最好的子數組。公式中的第一項對應於開發，第二項對應於探索。

 def rollout_policy ( self , possible_moves ):
    
    return possible_moves [ np . random . randint ( len ( possible_moves ))]

從可能的移動中隨機選擇一個移動。這是隨機播放的範例。

 def _tree_policy ( self ):

    current_node = self
    while not current_node . is_terminal_node ():
        
        if not current_node . is_fully_expanded ():
            return current_node . expand ()
        else :
            current_node = current_node . best_child ()
    return current_node

選擇要運行推出的節點。

 def best_action ( self ):
    simulation_no = 100
	
	
    for i in range ( simulation_no ):
		
        v = self . _tree_policy ()
        reward = v . rollout ()
        v . backpropagate ( reward )
	
    return self . best_child ( c_param = 0. )

這是最佳動作函數，它會傳回與最佳可能移動相對應的節點。上面的程式碼完成了展開、模擬和反向傳播的步驟。

 def get_legal_actions ( self ): 
    '''
    Modify according to your game or
    needs. Constructs a list of all
    possible states from current state.
    Returns a list.
    '''

 def is_game_over ( self ):
    '''
    Modify according to your game or 
    needs. It is the game over condition
    and depends on your game. Returns
    true or false
    '''

 def game_result ( self ):
    '''
    Modify according to your game or 
    needs. Returns 1 or 0 or -1 depending
    on your state corresponding to win,
    tie or a loss.
    '''

 def move ( self , action ):
    '''
    Modify according to your game or 
    needs. Changes the state of your 
    board with a new value. For a normal
    Tic Tac Toe game, it can be a 3 by 3
    array with all the elements of array
    being 0 initially. 0 means the board 
    position is empty. If you place x in
    row 2 column 3, then it would be some 
    thing like board[2][3] = 1, where 1
    represents that x is placed. Returns 
    the new state after making a move.
    '''

 def main ():
    root = MonteCarloTreeSearchNode ( state , None , action )
    selected_node = root . best_action ()
    return

這是 main() 函數。初始化根節點，呼叫best_action函式取得最佳節點。這不是該類別的成員函數。所有其他函數都是該類別的成員函數。

MCTS 包含 4 個步驟：

精選

這個想法是不斷選擇最好的子節點，直到我們到達樹的葉節點。選擇此類子節點的一個好方法是使用 UCT（應用於樹的置信上限）公式：

	wi/ni + c*sqrt(t)/ni

wi = 第 i 步棋後獲勝的次數
ni = 第 i 次移動後的模擬次數
c = 探索參數（理論等於√2）
t = 父節點的模擬總數

擴張：

當它無法再應用 UCT 來尋找後繼節點時，它會透過附加葉節點中的所有可能狀態來擴展博弈樹。

模擬：

擴展後，演算法任意選擇一個子節點，並從所選節點開始模擬整個遊戲，直到達到遊戲的結果狀態。如果在播放過程中隨機選取節點，則稱為輕播放。您也可以透過編寫品質啟發式或評估函數來選擇重度播放。

反向傳播：

一旦演算法到達遊戲結束，它就會評估狀態以確定哪個玩家獲勝。它向上遍歷到根並增加所有訪問節點的訪問分數。如果該位置的玩家贏得了比賽，它還會更新每個節點的獲勝分數。

設計你的遊戲：

如果您打算製作自己的遊戲，您將不得不考慮以下問題。

您將如何表示您的遊戲狀態？想想我們遊戲的初始狀態。
您的遊戲的最終遊戲條件是什麼？將其與我們遊戲的最終遊戲條件進行比較。
您將如何在遊戲中獲得法律訴訟？嘗試為我們遊戲的第一步採取法律行動。

須藤井字棋

如果您有任何疑問或建議，請隨時透過 [email protected] 與我們聯繫

展開

mcts

使用 Python 程式碼的蒙特卡羅樹搜尋 (MCTS) 演算法教學。應用MCTS為簡單遊戲創建AI。

為什麼選擇MCTS？

構造函數用於初始化以下變數。

精選

擴張：

模擬：

反向傳播：

設計你的遊戲：

waymo open dataset

SmartTube

Sunamu

MySchedule.py

viptools for eslam

VITAident

chat.petals.dev

GPT Prompt Templates

GPTyped

waymo open dataset

SmartTube

Sunamu

waymo open dataset

wp functions

termwind