Modified Tangled With Fixed Token Positions

In an earlier post about Elo ratings, I noticed that very strong MCTS agents were getting a high percentage of draws against weak MCTS agents. I have an intuition that the reason for this is that regardless of how edges are selected on a game graph, what really matters is placing your token on the right vertex near the end of the game.

To see this, note that even if I play randomly up until the end of the game, all I need to do is evaluate the spin-spin correlation function for the game board at the end of the game and then place my token on the vertex with the highest influence. The only way this strategy fails is if my opponent already owns the highest influence vertex, but I think the best way to do that for the opponent is to do the exact same thing — play randomly until the endgame and then choose the best vertex.

This is not what I want for the game! I would like what moves you make throughout the game to matter — like they do in chess and Go.

An idea I had was to modify the game to fix the token locations for the players, removing the freedom to choose where to place them. If we do this, then player actions are only edge coloring throughout the game, and intuitively it makes it more likely that action selections should matter throughout the game. Another possibility would be a hybrid where you could place your token anywhere, but only up to some point in the game (say halfway through). In this case you can’t just pick the best location at that point, as subsequent edge selections would presumably change things.

Let’s implement the simpler version with fixed token locations and see how it works.

Implementing The Fixed Token Variant

We don’t have to change much! I think there are only two changes we need to make.

  1. The first is to ensure that the initial state of the game has two selected vertices. This requires that for any game board graph, we specify which two vertices are owned by which player.

  2. The second is to ensure that game board symmetries are restricted to ones that preserve the fixed token locations.

That’s it! Oh and of course we need to decide which vertex is red’s and which is blue. Here are some of the small/tiny graphs I have been experimenting with, and which vertices I’ve chosen for the two players.

Here are my choices for red/blue vertices for the ‘fixed vertex variant’ of Tangled. The graph number is my internal label for the graphs — they don’t make sense except they are numbered in the order I added them. The names of the graphs are a combination of usual names and ones I made up. The (V, E) is the number of vertices and edges in each graph. Anything bigger than the cube graph is too big to enumerate all the terminal states. These bigger graphs are important but at this stage I’m focusing on the smaller ones just so I understand what’s going on.

Enumerating Unique Terminal States

When vertices are fixed, the number of terminal states is just 3ᴱ (each edge can be independently one of three colors), and the number of unique terminal states is upper bounded by this (the number can be reduced due to symmetries in the graph where some of these are equivalent). I calculated how many unique terminal states each has when the red and blue vertices are fixed. The results are shown below. These are all manageable in terms of enumerating the unique terminal states and adjudicating all of them, even for the Schrodinger Equation (in practice the Moser Spindle and Cube would require a bit more compute to do this than I have on my desktop, but it could be done with a reasonable cluster).

Elo Results For MCTS Agents For the Fixed Vertex Variant on These Graphs

I ran a set of games for each of these graphs against a baseline MCTS10 agent (MCTS using 10 rollouts). Each set had the same number of games, with the baseline agent playing red half the time and blue half the time. The agents I tested were (a) random, (b) MCTS100, and (c) MCTS1000. The results are shown below. While it’s difficult to draw any conclusions from this because the graphs are still so small, it looks like the Elo scores of the better MCTS agents go up as the graphs get bigger, and the number of draws goes down. This is what I’d expect. As the complexity of play goes up we want the spread in Elo scores for different agents to increase — better agents should preferentially beat worse agents, and not draw with them.

For the Moser Spindle: MCTS1000 red and MCTS10 blue: 485 / 0 / 11 and MCTS 1000 blue and MCTS10 red: 441 / 0 / 55 — a small advantage to playing red (makes sense as red has one more move (6 vs 5) for this graph). For the Cube Graph where each player takes 6 moves: MCTS1000 red and MCTS10 blue: 439 / 0 / 57 and MCTS1000 blue and MCTS10 red: 443 / 0 / 53 — looks like no advantage to either.

Testing How Increasing Rollouts Changes Elo for Moser Spindle and Cube Graphs

I’m going to pick these two to initially deep dive into because they are small enough to have manageable numbers of unique terminal states but they might be large enough to be interesting for agents. I want to check how MCTS2000 does against MCTS1000. I only ran ~ 100 games this time but this should be enough for this quick test. Here’s the result:

Cube — all draws — 0/0/96; Moser Spindle MCTS2000 red and MCTS1000 blue: 7 / 0 / 41; MCTS2000 blue and MCTS1000 red: 0 / 0 / 48 — so this reinforces that there’s an advantage on the Moser Spindle to playing red (Elo of MCTS2000 is 1518). Also it looks like somewhere around MCTS1000 is close to optimal for Cube, and I suspect MCTS2000 is close to optimal for the Spindle.

Previous
Previous

Analysis of Adjudication of Terminal States for Small Graphs

Next
Next

Some Thoughts on Quantum Mechanics and Consciousness