On February 2015, DeepMind published a paper describing an algorithm, dubbed Deep Q-Network (DQN), able to beat humans at several Atari 2600 games.

Motivations

I beleive there are 3 reasons that make the DQN algorithm fascinating:

  • Unlike the typical AI of your favorite video game, DeepMind’s DQN is not basing its gaming strategy on hidden variables from the game that a human player has no access too. In contrary, the information available to the DQN are limited to the visual images of the game, the score, and the number of different actions that can be performed in the game. To be clear on this last point, the DQN is only told how many actions can be done, but not what those actions correspond to. So basicly the DQN has the same information a human player would: it sees the screen, it knows how well it is performing from the score, and it has a controler in hand with a big red button and joystick that can go in several directions.
  • The DQN is a Reinforcement Learning (RL) algorithm. No one told the algorithm how to play a given game, it figured it out by itself! And it learns in a way that should sound familiar: by trials and errors. RL is not something new, but it still pretty cool!
  • Finally, and this is probably one of the major points, the exact same algorithm with the exact same initial parameters was able to learn to play to 49 different Atari games, achieving super-human level for 29 of them. This means that the algorithm can adapt to different situations and learn how to best behave in each of them.

Still not convinced this is an achivement? Try to beat DeepMind’s DQN at a single game by scoring an average 400 at Breakout or have a look at the DQN playing!

(I did score an 89 I am relatively proud about.)

Lets dive in

An algorithm must be seen to be believed.

(Donald Knuth)

So I was intrigued by this result, and it felt like the best way to properly understand it was to try to reproduce it. I ended up writing a Python code that you can find on my github account. Exploring the web I have found the OpenAI website that has produced a Python package called Gym. As is put on their website: “gym open-source project provides a simple interface to a growing collection of reinforcement learning tasks.” This includes Atari games and many others. This package makes your life super easy.

Developing the code, it soon became clear that training a working DQN playing some Atari games directly on my machine (a 4 years old macbook pro) was going to take several days if not weeks (this is due to the high dimensionality of the problem). Therefore I decided to focus on a simpler game that would take much less time for the DQN to master and that you should be able to train on your computer in less than an hour: CartPole-v0. The goal of the game is very simple: you need to balance a pole on a cart. To do so you have 2 possible actions: moving the cart to the left or moving the cart to the right. Of course, the initial conditions of the game are random so you always have to adapt to the new situation you are facing. If the pole gets tilted by more than something like 15 degrees or if the cart exits the window, you lost. If you keep it in balance for more than a certain number of steps, you win.

The following GIFs show the DQN in action at different stages of its training (a learning step correspond to an iteration of the for-loop in the code).

deep q-network untrained

Game played by an untrained DQN. Score: 46.
The DQN basically loses right away.

deep q-network 10000 steps

Game played by a DQN trained on 10000 learning steps. Score: 263.
The DQN undertsood it has to balance the pole.

deep q-network 34000 steps

Game played by a DQN trained on 34000 learning steps. Score: 690.The DQN is getting good at balancing the pole but on every game the cart is drifting to the left or right. It seems that the DQN still did not learn that it loses if the cart exits the window.

Final words

If you are interested in DQNs, I strongly encourage you to read the original paper, which is clearly written (thank you DeepMind). Feel also free to clone the code on my github account. Following the readme, it should be straight forward to install and run it!

Let me know what’s on your mind: leave a comment or send me a message.

Categories: Deep learningPython

Leave a Reply

Your email address will not be published. Required fields are marked *