What can software developers learn about machine learning from playing a game? Plenty, it turns out.
For the past two years, Two Sigma has hosted a public artificial intelligence programming challenge called Halite. In this game, players build bots using the coding language of their choice to battle on a two-dimensional virtual board.
The first version of the game, launched in 2016, involved moving pieces around a grid, while the second iteration involved moving ships through space to “mine” planets. Two Sigma hosts the game environment, providing starter kits in a range of languages, so that players can get up and running quickly, as well as tools for players to build their own starter kits in even more languages. Halite II ended with a total of 21 different languages supported via starter kit.
While planning for the launch of Halite II, the development team at Two Sigma could not ignore how helpful (not to mention interesting) it would be to provide better machine learning support. In 2016 an ML bot reached 12th place in the competition, but this kind of bot did not emerge until the end of the competition. The team wanted to launch 2017’s game with more starter-kit support for machine learning right out of the gate.
The key difference between the “standard” AI bots used in Halite and machine learning bots is that the former are “rules-based” bots, meaning a human has written all the logic for the bots to obey. For example, a player might write code that instructs the bot try to dock on a planet only if it is empty. But for an ML bot, a player lets the bot determine its own logic.
The Halite team decided it needed machine learning expertise on board to build the ML starter kit. So, they met with members of Two Sigma’s Deep Learning team to evaluate how to proceed. You can see the ultimate result, the ML starter bot, play on the Halite leaderboard as tscommander-ML.
Through the development process, the Halite team learned some important lessons on how to approach machine learning problems, and it bumped up against some of the kinds of limitations and challenges still common in this space--even in other contexts.
What follows is a list of best practices the team picked up while developing its first ML bot for Halite, lessons it plans to put to further use in the next iteration of the game--and perhaps even beyond.
1. The Most Powerful Approach is Not Always the Best
With all the excitement around AlphaGo and AlphaGoZero, it was natural to consider trying to apply reinforcement learning, particularly through self-play, to teach a bot to play Halite. Though reinforcement learning would have been a very interesting research project, the team recognized that this type of analysis would require vast amounts of computational power for training.
For someone starting out in ML (or, for example, for a startup team with a high degree of uncertainty in a project), choosing tools with high resource requirements isn’t always the best idea to start, and other, lighter-weight tools can do well, too. So, the developers chose to consider supervised learning, which requires less computing power but could also leverage previously completed games as training data. As a minimum viable product, supervised learning proved to be sufficient.
2. Start with a Small Problem
When thinking about a supervised learning approach to Halite, the team started by selecting a specific question for the ML bot to answer. Since Halite is a complex game with a branching factor billions of times greater than the game Go, this would allow the ML algorithm to focus on a relatively simple problem without having to grapple with the whole complexity of Halite.
This, the team observed, is a valuable tactic when approaching problems with the potential to become very complicated, very quickly. For example, would it make sense to start training an autonomous vehicle in the middle of Times Square? No, of course not. It would be much more sensible to start in the countryside or somewhere similar, where the car could first master driving down a straight, empty road. So, the team simplified the bot’s complexity down to a basic question about where in the game its agents should go. And the algorithm would learn this by looking at a large number of historical games to see what winning bots had historically done.
3. Find the Right Amount of Data to Start
Once the team had set up the neural net and starter bot, they needed to start training it (the Halite team keeps all the games ever played in cloud storage). They created a training function that takes a set of games and trains the bot with them. The function parses the game data, checks for bots worth imitating, develops winning strategies, then saves the output for the bot to play in the game.
It would have been too arduous to try to analyze every single historical game--9.6M by the end of the competition--so the team started with a pack of 1000.This number seemed adequate to begin with, because it was a large enough sample size to start to see patterns, but not so big as to be unwieldy.
4. Make Sure the Model Evaluates the Right Variables
As the team designed the neural network, they wanted to make certain it was exposed to a fair representation of the Halite game. Halite is played on a map, similar to a board in a board game, and these maps can come in a range of layouts. The developers made sure the neural net would not try to assume a specific map or number of planets, or over-value the specific layout fed to it if the layout were held constant in the training data. It was critical that the algorithm didn’t overfit to the training data, which is always a challenge when dealing with neural nets.
5. Evaluate an ML Bot’s Performance Compared to Nothing
One of the Halite team’s most interesting observations from having built the ML bot was that it could be difficult to know whether or not ML was helping the model get smarter. The team was very excited to see one of its earliest bots do reasonably well against the non-ML competitors, but when the developers replaced their neural network with another set of logic (in this case, random numbers), they found that the bot continued to do just as well with or without machine learning. This happened because the heuristics surrounding the neural network that translated its output into actions in the game were powerful enough that even the relatively bad outputs from the net would still work reasonably well. To use the autonomous car analogy, imagine a car with a cruise control system that stops if the car ahead stops. This system would do pretty well (for a while!) on a straight highway, even if the car had no other self-driving systems.
6. Compare Algorithms’ Performance with Each Other’s
This realization motivated a different evaluation metric for ML going forward: instead of simply comparing the ML to other bots on the board, the team also started comparing ML bots to each other, gauging the relative strengths of bots with different amounts of training (including none) to make sure that the ML was actually adding some value.
7. Catch Edge Cases
While running the ML bot against itself, the team noticed that every now and then (~1 out of 10 games) the bot with less training would win. This happened because of the randomness of the Halite map--certain situations led to strange behaviors. To make the test clearer, the developers decided to choose one specific map, in order to avoid the edge cases (at least while initiating bot training). You can see more about how they did ML-vs.-ML training here (the “compare” target).
The Halite team is excited to see its starter bot in the top portion of the competition; as of this writing it is ranked 640 out of the 5800 players on the board. What’s even more exciting is to see that more than 100 different players are using ML in their bot strategies, and one player using ML has achieved 41st place. As players’ ML approaches become more sophisticated, their bots seem likely to become even more competitive in future versions of the game.