An algorithm learns to play Stratego like an expert human

2022-12-02T11:23:13.646Z

A DeepMind program, a Google research company, manages to beat humans in a game much more complex than chess or poker, opening new paths for science

The British company DeepMind, owned by Google since 2014, has managed to develop an algorithm capable of playing Stratego, a popular board game, like an expert human.

As detailed by a team of company researchers in an article published today in

Science

magazine , DeepNash (as the tool has been named) has been placed among the top three players on the Gravon portal specializing in online games of this game.

This is a milestone due to the high complexity of the game, which combines elements of strategy, intuition (players do not have all the necessary information to draw up perfect plans) and even bluffing.

The study authors believe that the algorithm could have applications in areas such as automatic traffic optimization.

Marketed by Jumbo since the 1960s, though invented before World War I, Stratego was one of the few iconic board games not yet mastered by artificial intelligence.

This strategy game is characterized by a double challenge: it requires long-term strategic thinking, like chess, but you also need to manage imperfect information, like in poker, because the opponent's chips start covered and are revealed as the game progresses. departure.

This uniqueness makes it a more complex game than Go, the thousand-year-old Asian game whose board allows the tiles to be arranged in more different combinations than there are atoms in the universe.

It also makes winning more cunning than in poker,

Game simulators have historically served as a good thermometer to measure the effectiveness of computer programs.

They offer a controlled environment with precise rules in which the tools can develop their abilities and where it is easy to measure their success: just see if they win or not the game.

It is a perfect test bed to study how humans and machines develop and execute winning strategies.

Hence, DeepMind has set its sights on Stratego, a major challenge for the machine given the lack of information that it must manage during the game.

In Stratego there are 12 types of chips with different attributes.

Each player places their 40 tokens on the board, but does not know how their opponent has placed them.DeepMind

DeepMind has a long history in this field, having developed cutting-edge tools to outperform man in complex, long-term strategy games with perfect information, such as Go (with AphaGo), but also in imperfect information video games, such as StarCraft (with AlphaStar). .

Until now no one had managed to develop a tool capable of playing Stratego at the same level as an expert human.

It is not by chance: the game has 10⁵³⁵ possible arrangements, which is much better than Texas Hold'em poker, a game of imperfect information (each one knows only the cards in his hand and those that are being played) very studied, with 10¹⁶⁴ states, like Go, the ancient Asian game, which has 10³⁶⁰ options.

On the other hand, any move made on the first turn involves thinking of 10⁶⁶ possible pairs of tile configurations.

In poker it's 10⁶.

Perfect information games don't have that problem, because the tiles are in plain view.

These two particular complexities make it impossible to take advantage of previous research to approach a game simulator for Stratego.

For this reason, the DeepMind team has developed a reinforcement learning algorithm that applies theoretical models based on Nash equilibrium, a theorem of the famous American mathematician specializing in game theory.

The tool does not try to predict the possible moves of the opponent, which is the usual approximation in game simulators, because the tree of possibilities of the game just started is almost endless, but instead establishes its own strategy and then adapts it based on the March.

"Our paper shows how DeepNash can be applied in situations of uncertainty and successfully balance their actions to help solve complex problems," explains Julien Perolat, lead author of the study.

The scientist and his colleagues believe that R-NaD, the algorithm behind DeepNash, may be useful for developing new artificial intelligence applications that involve interaction with many human beings with different objectives, which leads to a lack of information in the system. about what is going to happen.

Large-scale optimization of traffic management to reduce travel times and associated gas emissions appears to be a good application, Perolat and colleagues write in Science.

In this play, the machine bluffed the human player, posing a scout as a marshal and managing to track down the spy, a key player.DeepMind

How to play Stratego

Stratego lives a second youth thanks to the internet.

The popular board game has now made its way to forums like Gravon, where players from around the world square off against each other in tense online matches.

In Stratego, two players face each other in turns, who have 40 pieces with different attributes on their side of the board.

The objective is to capture the opponent's flag or leave the opponent without mobile pieces.

To do this, the players advance in turns with their mobile tokens, which can be of ten types, corresponding to military ranks and specialists such as minelayers, explorers or spies.

Every time a token comes into contact with another of the opponent, both are exposed.

The one that wins, due to being of higher rank or due to her special abilities, stays on the board;

the loser withdraws from the game.

The DeepNash algorithm is capable of developing unpredictable strategies and executing equivalent moves in an apparently random manner.

All this aimed at confusing the opponent so that he cannot draw conclusions about the style of play of the machine.

In one of the games reviewed in the article, for example, he sacrificed two important pieces to locate the opponent's highest-ranking pieces.

That left him at a material disadvantage, but the algorithm understood that having information about the location of the opponent's best pieces gave him a 70% chance of success.

In the end he won that game.

On another occasion, he bluffed, chasing a high-ranked piece with a very low-ranking piece, which led the opponent to convince himself that he was playing with the 10 (marshal) and take out the spy (S),

“DeepNash's level of play surprised me.

I had never seen a machine capable of playing Stratego like an experienced human.

After playing against DeepNash myself, I was not surprised that he later went on to place himself in the top-3 of the Gravon rankings.

I think he would do very well if they let him participate in the World Championships,” says Vincent de Boer, co-author of the Science article and a former Stratego world champion.

You can follow

EL PAÍS TECNOLOGÍA
on
Facebook
and
Twitter
or sign up here to receive our
weekly newsletter
.

Source: elparis

All tech articles on 2022-12-02

An algorithm learns to play Stratego like an expert human

How to play Stratego

You may like

Trends 24h

Latest