DeepMind MuZero AI can master games without knowing the rule

The holy grail of AI has always been to enable computers to learn the way humans do. The most powerful AIs today, however, still rely on having certain known rules, like rules for a game of chess or Go. Human learning, however, is often messy in inferential, learning the rules of life as we go. DeepMind has long been trying to create such AIs using games as their environment and test suite. Google’s sister company focusing on AI research has just revealed its latest achievement in MuZero, an AI that can master a game without learning the rules beforehand.

DeepMind’s previous AIs like AlphaGo have been widely covered in media for beating human champions in their respective games. Impressive as they may have been, they were still a few steps shy of the ultimate goal. AlphaGo, in particular, had the advantage of knowing not only the rules of Go but also domain knowledge and data from human players. Its successors, AlphaGo Zero and AlphaZero, could still bank on having the rule book to learn from.

While these AIs excelled in games with complex strategies but simple visuals, they failed when applied to more visually complex games where the rules are not so easy to infer. That’s where the new MuZero AI comes in and it uses a selection of Atari games, like Ms. Pac-Man, to test out their theory.

Most AI researchers use two strategies to tackle the learning problem, one of which is the lookahead search that relies on being given the rules or knowledge of a game. Model-based planning does learn by creating an accurate model of an environment but at the expense of being overly complex. MuZero’s advantage is that it models only the parts of the environment that are important, like knowing that an umbrella will help keep you dry under the rain rather than modeling the movement of all raindrops.

DeepMind was definitely impressed by the efficiency and speed that MuZero was able to master games, even when given only a limited number of steps to plan ahead. It hopes that this new method of AI learning will be applied to messy real-world environments where the rules aren’t laid down in a well-defined manner.

Leave comment

Your email address will not be published. Required fields are marked with *.