The $10,000 AlphaQ Up Challenge
Last week, I put the tangled-game.com site live. This initial site had five agents you could play against. One was a random agent, two were pure Monte Carlo Tree Search (MCTS) agents with differing numbers of rollouts (Murray and Melissa), and two were AlphaZero agents of differing strengths (Andy and Amara).
In the lead-up to posting these agents, I did some limited testing and found somewhat confusing results (see my previous post). Amara, while being the best of my agents, was still losing quite a bit more than I would have expected. But I posted the agents anyway.
A challenger bot, murr2k, found a way to consistently beat Amara. Murr2k’s designer, Murray Kopit, used a very smart attack vector. What he found was that for certain initial moves Amara performed well, but for some Amara lost consistently. This was a clue as to why Amara might not have been doing so well. For some reason during training the agent was not adequately exploring certain initial states and their consequences.
The way my AlphaZero code learns, there is supposed to be a certain kind of noise applied to moves at the roots of search trees (called Dirichlet noise). When I looked closely at the code I (well … actually it was Claude … :-) ) found a bug in how this noise was being applied which could plausibly account for the weakness that was being exploited in Amara.
I fixed the bug and retrained an agent with a bigger network. I call this agent AlphaQ Up. In my initial tests, it appears as though the weakness to selecting different initial moves is gone. The agent appears as strong as I would expect. Yay! Although to know for sure we need people to try to beat it.
Because building bots takes time and understanding a new game like Tangled is not easy, I wanted to give a motivation to people to try to beat it (without using a quantum computer!). But I am not sure that this one doesn’t have some exploitable weakness that I just haven’t found yet.
$10,000 is enough to be worth trying, and I can afford to lose that much, so that’s the bounty. Also, you get a trophy!
Godspeed good sirs and dames. If you have any questions or comments or think you’ve claimed the prize, leave a comment here!