Google's AlphaZero Beats The Best Chess-Playing Software Program

I bet everyone's heard of the arrival of the news of world's next Chess champion - the "alien" AlphaZero. No, it isn't exactly an alien, it's a software - but it does not work like a typical software. It uses techniques similar to the functions of the human brain and with its computational power it performs exceptionally better than the human.

The inception of AlphaZero took place in DeepMind - the famous branch of Google known for delivering amazing research in advancement of AI. 

Google’s DeepMind
DeepMind Technologies Limited is a British artificial intelligence research company which was founded by Demis Hassabis, Shane Legg, and Mustafa Suleyman in 2010. The start-up was later acquired by Google in 2014. 
The company is involved in pioneering research in the field of AI, developing programs which can learn to solve complex problems from observing their environment -a technique known as machine learning.
DeepMind made a breakthrough last year with its AlphaGo program which mastered the famous game Go. Go is an ancient and complex game of strategy and intuition which involves two players putting black and white markers on a 19-by-19 grid. The game is said to have an impossible number of playing permutations. Many previously believed it could not be played successfully by an AI program, however last year AlphaGo defeated world champion Lee Sedol. 
Google Deepmind's AlphaZeroAlphaGo was effective because it had been programmed with millions of moves made by past masters and was able to predict its own chances of winning, adjusting its strategy accordingly. The AlphaGo program used algorithms and practiced by analyzing data from 100,000 professional human games and played against itself some 30 million times.

AlphaZero is a generalized (and improvised) version of AlphaGo and DeepMind. The creators of AlphaZero recently published an academic paper at arXivwhich has not yet been peer reviewed. The paper describes the advancements made by a game-playing program which was able to master the games of Go, chess and Shogi (Japanese chess) within 24 hours. According to the paper’s authors,
“Starting from random play and given no domain knowledge except the game rules, AplhaZero achieved within 24 hours a superhuman level of play in the games of chess and shogi as well as Go, and convincingly defeated a world champion program in each case.”
In a series of 100 games played against reigning computer chess engine Stockfish 8, the AlphaZero system did not lose a single game, winning or drawing all of the 100 matches played. AlphaZero won 25 games while playing with the first mover advantage of white and picked up another three games playing as black. The rest of the contests were drawn with neither program recording win or loss. Even more impressive, AlphaZero achieved this feat after only four hours of self-training. AlphaZero was also able to defeat the world’s best Shogi computer playing program Elmo by learning for just two hours. AlphaZero was also able to beat its predecessor AlphaGo by learning for 8 hours. 
Due to the arcane nature of AI, researchers are always wary of another AI winter. Results such as these provide positive affirmations that we're headed in the right direction in our research.
The co-founder and CEO Hassabis presented further details of the system at the recent Neural Information Processing Systems (NIPS) AI conference in California. According to Hassabis, “It doesn’t play like a human and it doesn’t play like a program. It plays in a third, almost alien way.”

Hassabis speculates that because AlphaZero teaches itself, it has the advantage of not assigning value to individual pieces and attempting to minimize losses in the same manner which human players tend to do when playing chess.

Reinforcement Learning

The first computer program to defeat a human chess expert player was IBM's Deep Blue supercomputer who beat Kasparov on the 12th of May 1997. DeepMind is different from its competitors in its machine-learning based approach.
AlphaZero was able to acquire 1,400 years of human chess knowledge in an amazingly short amount of time. AlphaZero uses a reinforcement learning algorithm, a neural net, and only the pieces on the board for input.

Reinforcement learning refers to a type of machine learning algorithms in which our AI agent determines the best course of actions to achieve its goal with maximum performance. The technique uses a rewards and punishment system similar to how kids are taught about the good and the bad. 
If the agent performs an action which takes it towards the goal, it is rewarded. If the action takes it further away from the goal, the agent is punished. 
Let's say in a simplistic environment a good action is +1 points and a bad action is -1 points. When the agent reaches the goal, we'll add up all the points it gathered at each action. If the agent tried two different ways to achieve its goal, one with a cumulative result of +12 and another with +15, which one do you think will the agent adopt? 
This is how AlphaZero learned all about good moves and bad moves without having any prior knowledge of chess except the rules. AlphaZero learned to master chess by trial and error by playing against itself and further improving itself with each game

According to the authors of the paper AlphaZero learned opening moves in chess and gradually began to discard some moves in favor of others as it improved.

“AlphaZero compensates for the lower number of evaluations by using its deep neural network to focus much more selectively on the most promising variations- arguably a more “human-like” approach to search.”

In this manner it learned chess on its own
, akin to how humans learn. One more advantage you would have noticed is that it since it didn't require any prior game knowledge or special techniques except the concrete game rules, we can train it to learn any game other than chess. 
And yes, this is what the AlphaZero team did - they trained the agent for two other games - Go and shogi, and AlphaZero emerged victorious in both. 

AlphaZero is a generalized AI agent which is able to learn from the least amount of information available, and I hope most of you can visualize its capabilities. Present it with a problem, give it some basic rules and see what solution it suggests. This is one of the major applications of AI - solving complex problems in new ways which could lead to solutions not previously considered.

There are exciting implications for the AI techniques used in AlphaZero, majorly due to its ability to learn from the least amount of information. As we discussed, this could be applied to a number of areas such as medical diagnosis, weather and disaster predictions, solutions for better management in organisations and government - the possibilities are endless.
According to Hassabis the program is so powerful because it is “no longer constrained by the limits of human knowledge.” An example Hassabis believes that if applied it could be used for defeating Alzheimer’s disease, coming up with a cure in a matter of weeks which could take humans hundreds of years to find. Hassabis states that “Ultimately we want to harness algorithmic breakthroughs like this to help solve all sorts of pressing real world problems.”
AI programs may be able to drive forward human understanding of what is possible and positively impact the lives of humans. It is fascinating to see how far the research has come in AI developments and to speculate how much further still we can still go. 
Share on Google Plus
Love what you read? Share this article among your friends and comment your thoughts below. We'd love to hear from you! If you'd like to read more such articles, follow The Daily Programmer on Twitter @programmerdaily and receive fresh, well-researched content delivered to your feed.
    Blogger Comment
    Facebook Comment


Post a Comment