Part 2: Improvements and More Blabla!
We left off with the construction of a basic search tree structure using a possibility-based search with a balance of exploration, score, and chance (MCTS: Monte Carlo Tree Search). In this second part, I will introduce the improvements I found for the basic method.
Our little program works quite well as it is, but when I programmed it for the first time, I noticed some points for improvement.
Let’s remind ourselves of our main program:
Node root(nullptr, gameState);
do {
// Search for the best leaf
Node* child = root;
while(child->get_move_count() != 0) {
if(child->children.size() < child->state->child->get_move_count()) {
// Node can still be expanded
child = child->expand_children();
break;
}
child = child->get_best_child_UCT();
}
if(child->get_move_count() != 0) {
// Can expand
double score = child->rollout();
child->backpropagate(score);
}
} while (we have time);
IGame_State* bestMove = root->get_best_child_UCB(); // Use UCB
Branch Pruning
To start, we never eliminate branches when they are completely explored.
Imagine using our little program for TicTacToe: 9 squares, so 9 moves in total (maximum tree depth… 9, Bravo!). If we give our little program a second, it will quickly explore all possibilities; the tree will be fully explored in a fraction of time, and then we will spin our wheels, going down the tree (if(child->get_move_count() != 0)
will always be false
; we have exhausted the possibilities).
If we have a second to kill, why not. While we’re at it, let’s limit the search time to 15 minutes for the coffee break, and let’s not talk about it anymore. In this extreme case, we will get very high scores and visit counts, all towards the same path (at least the choice of the path is clear!).
On my side, I would prefer to use the remaining time to do something else. So, let’s set up a system to close a Node when it can no longer have children.
Node Closure
We add the is_fully_explored
method to our Node class:
bool Node::is_fully_explored() {
// true if this node is fully explored
return _children.size() == _state->get_move_count();
}
This method will simply tell us that all possibilities of this Node have been explored. However, it does not indicate anything about the children of our Node, which could well be not explored at all.
The obvious solution is to check each child recursively, but that wouldn’t be very efficient. Instead, we will sacrifice a bit of memory and add a _isClosed
member to our Node class.
_isClosed
will be updated in the backpropagation step (this is where we close the nodes; we have to go up the tree anyway, so we might as well take advantage of it). First, add _isClosed = _state.get_move_count() == 0;
in the Node constructor. Nodes without children will be the first to be indicated as closed (also add a bool Node::is_closed()
method for later use).
We will also add a _closedChildrenCount
member to keep track of the number of closed children for this node (add _closedChildrenCount = 0
in the constructor).
During the backpropagation, we will propagate the closure information and update the parent, which will be responsible for checking if it is indeed closed:
void Node::backpropagate (score) {
// Increment the number of visits and score
_nombreDeVisites += 1;
_score += score;
// Check if we still have open children
if(_isClosed) {
_closedChildrenCount += 1; // one of our children declared itself closed
if(this->is_fully_explored() and _closedChildrenCount == this->children.size()) {
// All children explored and closed
if(dynamic_cast<Node*>(_parent) != nullptr) {
// Force the parent to check if it needs to close
_parent->_isClosed = true;
// Propagate to the parent if it exists
_parent->backpropagate(score);
}
return; // prevent a recheck of parent validity
}
else {
// a child is closed, but some remain open for exploration
_isClosed = false;
}
}
if(dynamic_cast<Node*>(_parent) != nullptr) {
// Propagate to the parent if it exists
_parent->backpropagate(score);
}
}
Here, we aim to propagate the closure information by setting the parent’s _isClosed
variable to true
when we close a node. The parent will itself set _isClosed
back to false
if it still has open children.
All that’s left is to add one more condition to the elapsed time to exit the solution search loop when all possibilities have been explored. It’s quite simple; just test if the _isClosed
member of the root Node is true
(} while (we have time and not root->is_closed());
).
This simple modification allows for a more sensible use of resources and saves processing time for other tasks (or making more coffee).
Elimination of Closed Nodes
Don’t forget to eliminate nodes marked as closed in the get_best_child_UCT
function, otherwise, our branch pruning is pointless.
Rollout - Expand
During my tests, I noticed that one of the steps that consumes the most time is the game state part managed by a class derived from IGame_State
. In itself, there’s not much we can do; a game state can be very computationally heavy, and optimizing it is not really within the scope of this blog post.
Here, we will try to call this function less often. Our Rollout step (random unrolling of moves until a final state) performs most of these calls, and we practically do nothing with them.
The idea is to expand the search tree at the same time we perform the rollout phase to calculate our moves only once and save maximum processing time.
Expand
To keep a certain flexibility of use, I will keep the existing function and create a new one:
void Node::rollout_expand() {
Node* currentNode = this;
while(currentNode->_state->get_move_count() > 0) {
// Create a new node
Node* nextNode = new Node(currentNode, currentNode->_state->do_move(
// index to choose
));
// Add the node as a child of this one
currentNode->_children.push_back(nextNode);
// Start over!
currentNode = nextNode;
}
// Propagate the score of the final state
this->backpropagate(
currentNode->_state->get_score()
);
}
A function not very different, but it saves us a lot of computation time.
A few problems, however:
- The backpropagated score will be the same for each node added during the rollout phase.
- We get a tree with very long branches, whose base develops gradually.
Restore Scores to Their Place
Each new node in the rollout-expand phase will have a fixed number of visits set to 0. We will simply modify our backpropagate function and add a condition on the propagated score:
// Increment the score, as before
_score += score;
if(_nombreDeVisites == 0) {
// Add the score of this node to the total score to be backpropagated
score += _state->get_score();
}
// Increment the visits
_nombreDeVisites += 1;
We get scores quite close to our pre rollout-expand situation. There will still be a bias, however: with this type of search, we do not reach the same scores because the nodes created are eliminated from the exploration during the rollout phase, and their score naturally only rises once.
Exploration in Depth
This one is a bit more complex. Let’s start by grouping the search for a leaf in a function.
Node* find_leaf() {
Node* child = root;
while(child->get_move_count() != 0) {
if(child->children.size() < child->state->child->get_move_count()) {
// Node can still be expanded
return child->expand_children();
}
child = child->get_best_child_UCT();
}
return nullptr; // Reached the end of the tree without finding a leaf to expand; this should not happen with our branch pruning.
}
Node root(nullptr, gameState);
do {
// Search for the best leaf
Node* child = find_leaf();
if(child == nullptr)
break; // Error: exit loop
if(child->get_move_count() != 0) {
// Can expand
double score = child->rollout();
child->backpropagate(score);
}
} while (we have time and not root->is_closed());
IGame_State* bestMove = root->get_best_child_UCB(); // Use UCB
Jumping Nodes for Enhanced Exploration
We can then modify this function to add a “jump” option if a child node has unexplored children, and it’s a win! For the jump, I make a random draw between 0 and the number of unexplored children, and skip the current node if the draw is > the number of explored children:
Node* find_leaf() {
Node* child = root;
while(child->get_move_count() != 0) {
if(child->children.size() < child->state->child->get_move_count()) {
// Node can still be expanded
int draw = randInt(0, child->_state->get_move_count());
if(draw < child->children.size())
return child->expand_children();
// Continue to search
}
child = child->get_best_child_UCT();
}
return nullptr; // Reached the end of the tree without finding a leaf to expand; this should not happen with our branch pruning.
}
We obtain a quite natural metric that favors jumping nodes when we have already explored the tree well.
Graph?
//todo!
-> hash
-> all states in a map<hash, state>
-> multiple parents Backpropagation is done from each parent, so the score increases significantly for each node in the graph.