Exploration of Possibilities - Part 2

Make AIs for board games

Part 2: Improvements and More Blabla!

We left off with the construction of a basic search tree structure using a possibility-based search with a balance of exploration, score, and chance (MCTS: Monte Carlo Tree Search). In this second part, I will introduce the improvements I found for the basic method.

Our little program works quite well as it is, but when I programmed it for the first time, I noticed some points for improvement.

Let’s remind ourselves of our main program:

Node root(nullptr, gameState);
do {
    // Search for the best leaf
    Node* child = root;
    while(child->get_move_count() != 0) {
        if(child->children.size() < child->state->child->get_move_count()) {
            // Node can still be expanded
            child = child->expand_children();
            break;
        }
        child = child->get_best_child_UCT();
    }

    if(child->get_move_count() != 0) {
        // Can expand
        double score = child->rollout();
        child->backpropagate(score);
    }
} while (we have time);

IGame_State* bestMove = root->get_best_child_UCB(); // Use UCB

Branch Pruning

To start, we never eliminate branches when they are completely explored.

Imagine using our little program for TicTacToe: 9 squares, so 9 moves in total (maximum tree depth… 9, Bravo!). If we give our little program a second, it will quickly explore all possibilities; the tree will be fully explored in a fraction of time, and then we will spin our wheels, going down the tree (if(child->get_move_count() != 0) will always be false; we have exhausted the possibilities).

If we have a second to kill, why not. While we’re at it, let’s limit the search time to 15 minutes for the coffee break, and let’s not talk about it anymore. In this extreme case, we will get very high scores and visit counts, all towards the same path (at least the choice of the path is clear!).

On my side, I would prefer to use the remaining time to do something else. So, let’s set up a system to close a Node when it can no longer have children.

Node Closure

We add the is_fully_explored method to our Node class:

bool Node::is_fully_explored() {
    // true if this node is fully explored
    return _children.size() == _state->get_move_count();
}

This method will simply tell us that all possibilities of this Node have been explored. However, it does not indicate anything about the children of our Node, which could well be not explored at all.

The obvious solution is to check each child recursively, but that wouldn’t be very efficient. Instead, we will sacrifice a bit of memory and add a _isClosed member to our Node class.

_isClosed will be updated in the backpropagation step (this is where we close the nodes; we have to go up the tree anyway, so we might as well take advantage of it). First, add _isClosed = _state.get_move_count() == 0; in the Node constructor. Nodes without children will be the first to be indicated as closed (also add a bool Node::is_closed() method for later use).

We will also add a _closedChildrenCount member to keep track of the number of closed children for this node (add _closedChildrenCount = 0 in the constructor).

During the backpropagation, we will propagate the closure information and update the parent, which will be responsible for checking if it is indeed closed:

void Node::backpropagate (score) {
    // Increment the number of visits and score
    _nombreDeVisites += 1;
    _score += score;

    // Check if we still have open children
    if(_isClosed) {
        _closedChildrenCount += 1;  // one of our children declared itself closed

        if(this->is_fully_explored() and _closedChildrenCount == this->children.size()) {
            // All children explored and closed
            if(dynamic_cast<Node*>(_parent) != nullptr) {
                // Force the parent to check if it needs to close
                _parent->_isClosed = true;

                // Propagate to the parent if it exists
                _parent->backpropagate(score);
            }

            return; // prevent a recheck of parent validity
        }
        else {
            // a child is closed, but some remain open for exploration
            _isClosed = false;
        }
    }

    if(dynamic_cast<Node*>(_parent) != nullptr) {
        // Propagate to the parent if it exists
        _parent->backpropagate(score);
    }
}

Here, we aim to propagate the closure information by setting the parent’s _isClosed variable to true when we close a node. The parent will itself set _isClosed back to false if it still has open children.

All that’s left is to add one more condition to the elapsed time to exit the solution search loop when all possibilities have been explored. It’s quite simple; just test if the _isClosed member of the root Node is true (} while (we have time and not root->is_closed());).

This simple modification allows for a more sensible use of resources and saves processing time for other tasks (or making more coffee).

Elimination of Closed Nodes

Don’t forget to eliminate nodes marked as closed in the get_best_child_UCT function, otherwise, our branch pruning is pointless.

Rollout - Expand

During my tests, I noticed that one of the steps that consumes the most time is the game state part managed by a class derived from IGame_State. In itself, there’s not much we can do; a game state can be very computationally heavy, and optimizing it is not really within the scope of this blog post.

Here, we will try to call this function less often. Our Rollout step (random unrolling of moves until a final state) performs most of these calls, and we practically do nothing with them.

The idea is to expand the search tree at the same time we perform the rollout phase to calculate our moves only once and save maximum processing time.

Expand

To keep a certain flexibility of use, I will keep the existing function and create a new one:

void Node::rollout_expand() {
    Node* currentNode = this;

    while(currentNode->_state->get_move_count() > 0) {
        // Create a new node
        Node* nextNode = new Node(currentNode, currentNode->_state->do_move(
            // index to choose
        ));

        // Add the node as a child of this one
        currentNode->_children.push_back(nextNode);

        // Start over!
        currentNode = nextNode;
    }

    // Propagate the score of the final state
    this->backpropagate(
        currentNode->_state->get_score()
    );
}

A function not very different, but it saves us a lot of computation time.

A few problems, however:

  • The backpropagated score will be the same for each node added during the rollout phase.
  • We get a tree with very long branches, whose base develops gradually.

Restore Scores to Their Place

Each new node in the rollout-expand phase will have a fixed number of visits set to 0. We will simply modify our backpropagate function and add a condition on the propagated score:

// Increment the score, as before
_score += score;

if(_nombreDeVisites == 0) {
    // Add the score of this node to the total score to be backpropagated
    score += _state->get_score();
}

// Increment the visits
_nombreDeVisites += 1;

We get scores quite close to our pre rollout-expand situation. There will still be a bias, however: with this type of search, we do not reach the same scores because the nodes created are eliminated from the exploration during the rollout phase, and their score naturally only rises once.

Exploration in Depth

This one is a bit more complex. Let’s start by grouping the search for a leaf in a function.

Node* find_leaf() {
    Node* child = root;
    while(child->get_move_count() != 0) {
        if(child->children.size() < child->state->child->get_move_count()) {
            // Node can still be expanded
            return child->expand_children();
        }
        child = child->get_best_child_UCT();
    }
    return nullptr; // Reached the end of the tree without finding a leaf to expand; this should not happen with our branch pruning.
}

Node root(nullptr, gameState);

do {
    // Search for the best leaf
    Node* child = find_leaf();

    if(child == nullptr)
        break;  // Error: exit loop

    if(child->get_move_count() != 0) {
        // Can expand
        double score = child->rollout();
        child->backpropagate(score);
    }
} while (we have time and not root->is_closed());

IGame_State* bestMove = root->get_best_child_UCB(); // Use UCB

Jumping Nodes for Enhanced Exploration

We can then modify this function to add a “jump” option if a child node has unexplored children, and it’s a win! For the jump, I make a random draw between 0 and the number of unexplored children, and skip the current node if the draw is > the number of explored children:

Node* find_leaf() {
    Node* child = root;
    while(child->get_move_count() != 0) {
        if(child->children.size() < child->state->child->get_move_count()) {
            // Node can still be expanded
            int draw = randInt(0, child->_state->get_move_count());

            if(draw < child->children.size())
                return child->expand_children();
            // Continue to search
        }
        child = child->get_best_child_UCT();
    }
    return nullptr; // Reached the end of the tree without finding a leaf to expand; this should not happen with our branch pruning.
}

We obtain a quite natural metric that favors jumping nodes when we have already explored the tree well.

Graph?

//todo!

-> hash

-> all states in a map<hash, state>

-> multiple parents Backpropagation is done from each parent, so the score increases significantly for each node in the graph.


See also