Article from WeChat public account:Quantum (ID:QbitAI), the original title: “Crushing 99.8% of human opponents, all three races are master class! StarCraft AI boarded Nature, technology first full disclosure “, author: dry out, sheep, fish, chestnuts, from the title figure: Oriental IC

Only 0.2% of StarCraft 2 players have not been crushed by AI.

This is AlphaStar, which is anonymously mixed into the ladder, and the latest transcripts handed over.

At the same time, DeepMind also fully disclosed AlphaStar’s current capabilities and full set of technologies on Nature:

AlphaStar, which has surpassed 99.8% of human players, has reached the guru in the three races of Protoss, Terran, and Zerg(Grandmaster) level.

In the paper, we also found a special training posture:

Not all agents are winning

DeepMind said in the blog, published in NatuThere are four major updates to AlphaStar on re:

One is the constraint, and now the AI ​​perspective is the same as humans, and the frequency of action is more restrictive.

Second, the Terran Protoss Zerg can be 1v1, and each race is a neural network of its own.

The third is that the league training is completely automatic. It is started from the supervised learning agent, not from the intelligent body that has been intensively studied.

The fourth is the Battle.net score. AlphaStar has reached the level of the master in all three races. It uses the same map as the human player, and all the games are played back.

Specific to the learning process of AI, DeepMind emphasizes special training goal settings:

Not every agent is looking to maximize the win.

Because the agent is in the process of self-playing (Self-Play) , it is easy to fall into a specific strategy, only in specific The situation is effective, and when faced with a complex game environment, performance will be unstable.

Therefore, the team refers to the training method of human players, which is to do targeted training with other players: an agent can expose the defects of another agent through its own operation, so that it can help each other. Develop some of the skills you want.

This gives you a different target: The first is the main agent, the goal is to win, the second is responsible for mining the main agents, helping them become stronger, not focusing on Improve your own equity. DeepMind refers to the second as “exploiter (Exploiter)“, we simply call it “sparring.”

The various complex strategies that AlphaStar has learned are cultivated in this process.

For example, blue is the main player, responsible for winning, red is the sparring to help it grow. Xiaohong discovered a cannon rush skill, Xiaolan could not resist:

Then, a new major player (small green) learned how to successfully defend against the red cannon rush skills:

At the same time, Xiaolu can also defeat the previous major player Xiaolan, which is achieved through economic advantages, unit combination and control:

After, another new sparring (small brown) , found the new player’s new weakness of the green, with hidden The knife defeated it:

Around and round, AlphaStar is getting stronger and stronger.

As for the details of the algorithm, this time it is fully shown.

AlphaStar technology, the most complete disclosure

Many real-life AI applications involve multiple agents competing and coordinating in complex environments.

The research on the real-time strategy (RTS) for StarCraft is a small goal in the process of solving this big problem.

In other words, the challenge of StarCraft is actually a multi-agent reinforcement learning algorithm challenge.

AlphaStar learns to play interstellar, or rely on deep neural networks, which receive data from the original game interface (input) , and then output a series of instructions , constitutes a certain action in the game.

AlphaStar will watch the game through an overview map and a list of units.

Before taking action, the agent will output the type of action to be issued (for example, build), who will apply the action, and what the target is And when to issue the next action.

Actions are sent to the game through the monitoring layer that limits the rate of action.

Training is done through supervised learning and intensive learning.

In the beginning, the training used supervised learning, and the material came from the live game of anonymous human players released by Blizzard.

This information allows AlphaStar to learn the macro and micro strategies of the game by mimicking the operation of the StarCraft.

The original agent, the game’s built-in elite (Elite) AI can be defeated, equivalent to the human gold segment (95%) .

And this early agent is the seed of reinforcement learning.

Based on it, a continuous league

Robustness of Alliance Training

Moreover, the learning objectives of the agent will adapt to the changing environment.

The weight of the neural network to each agent is also constantly changing as the reinforcement learning process changes. The ever-changing weight is the basis for the evolution of learning objectives.

Rules for weight updateIs a new off-policy reinforcement learning algorithm that includes experience replay (Self-Imitation Learning) and strategy distillation (Policy Distillation) and so on.

It lasted 15 years, AI dominates Starcraft

StarCraft is one of the most challenging real-time strategy (RTS) games, which not only need to coordinate short-term and long-term goals. In addition to dealing with unexpected situations, it has long been a “touchstone” for AI research.

Because it is faced with an imperfect information game situation, the challenge is very difficult, and researchers need to spend a lot of time to overcome the problems.

DeepMind said on Twitter that AlphaStar is able to achieve current results, and researchers have been working on the StarCraft series for 15 years.

But the work of DeepMind is really known, that is, the two years.

In 2017, after AlphaGo defeated Li Shishi’s second year, DeepMind teamed up with Blizzard to release an open source tool called PySC2. Based on this, combined with engineering and algorithm breakthroughs, it further accelerated the research on StarCraft.

Since then, many scholars have conducted a lot of research on StarCraft. For example, Yu Yang team of Nanjing University, Tencent AI Lab, University of California Berkeley and so on.

In January of this year, AlphaStar ushered in AlphaGo moments.

In the match with StarCraft II, AlphaStar dominated the game with a total score of 10-1. Human professional player LiquidMaNa only held 5 minutes and 36 seconds in front of it, and GG.

All-round professional player TLO lamented after losing, and it was difficult to play against AlphaStar. It was not like playing with people, and there was a feeling of being helpless.

After half a year, AlphaStar is once again ushered in evolution.

DeepMind implements its APM (hand speed) , and the vision is consistent with the human player, achieving the Protoss, Terran, Zerg Fully controlled and unlocked many maps.

At the same time, and announced a new development: AlphaStar will log into the game platform battle network, anonymous ladder matching.

Now, with the release of the latest paper, AlphaStar’s latest combat power has also been announced: defeated 99.8% of the players and gotr>

But people do have strength, and now they can still be positive AI.

However, there is only one master in the world who dares to speak like this.

Transportation Gate

Nature paper: https://doi.org/10.1038/s41586-019-1724-z

Preprint: https://storage.googleapis.com/deepmind-media/research/alphastar/AlphaStar_unformatted. Pdf

Blog Article:

https://deepmind.com/blog/article/AlphaStar-Grandmaster-level-in-StarCraft-II-using -multi-agent-reinforcement-learning

Competition video