What Is Reinforcement Learning and How Exactly Does it Work?

Devin Pickell
Devin Pickell  |  June 4, 2019

Machine learning has led to faster, smarter automation and prediction in nearly every major industry today.

In our previous guide, we compared the two most common types of machine learning today – supervised and unsupervised learning. These disciplines are used to build models that either meet desired outputs with extreme accuracy or to find naturally occurring patterns in large datasets.

But in this guide, we’re going to break down perhaps the most disruptive type of machine learning, referred to as reinforcement learning, and look at why it’s so complex.

What is reinforcement learning?

Before we dive into the technicalities of reinforcement learning, it’s important to understand the purpose of its name.

In psychology, reinforcement is applied to strengthen future behaviors using stimulation and motivation. This is done through positive reinforcement, a reward, or negative reinforcement, a penalty.

For example, if a child gets an A+ on their test, they may be positively reinforced with an ice cream cone. On the contrary, if they fail their test, they may be negatively reinforced with no television time. Both instances will likely strengthen future behavior.

With this context, we can now begin to define reinforcement learning.

Now that we have a basic idea of reinforcement learning, it’s time to understand more of how it works.

How does reinforcement learning work?

Reinforcement learning can be somewhat difficult to grasp, so we’re going to discuss how it works in one of its more popular applications – gaming.

Reinforcement in gaming

Many of us, at some point in our lives, have played games. They may have been simple 8-bit games on Atari, board games like chess and checkers, PC games like Runescape and WoW, or console games like Call of Duty and Halo.

Regardless of which game you played, you likely didn’t start playing it with great success.

Building your skill set in a game took time and practice. It took motivation to be rewarded for positive actions and motivation to not be penalized for negative actions. This concept transcends all games, regardless of how simple or complex it is, and it’s the same concept applied to reinforcement learning.

Teaching machines to play games

While the human brain naturally recognizes the purpose of a game, it’s much more difficult for machines. You could apply supervised learning, but this requires training data from previous human players. Because our skill-set will eventually plateau, this means the agent could never get “better” than a human.

In reinforcement learning, there is no training dataset nor output value. The agent is allowed to naturally compete, fail, and learn from its mistakes based on reward values and penalty values. Let’s use the very simple game of pong as an example, thanks to Andrej Karpathy on Github.


The purpose of pong is to ricochet the ball with your paddle so it ends up behind the opponent. Initially, the agent won’t understand this and fail numerous times, but eventually, it’ll make a correct move and will be positively reinforced to repeat the move.

After many, many games of pong, the agent should have a general understanding of the probability of moving UP successfully versus the probability of moving DOWN successfully using a 2-layer neural network. Here’s what that may look like:

pong neural network

These actions are reinforced until the reward is maximized. In terms of pong, this means achieving a perfect 20-0 score every time.

Advanced reinforced learning in gaming

Okay, so a human player could probably reach a skill-level where they could win every game of pong with maximum success, so what about a more complex game?

One of the most popular examples of advanced reinforced learning in gaming today is the creation of AlphaGo, a deep learning computer program which effectively became the world’s best Go player in 40 days.

Go is an ancient Chinese strategy board game with over 20 million players worldwide. I won’t get into the nuances of the game, but just know it’s complex and similar to chess.

Ancient Chinese board game Go

Here is the timeline of how AlphaGo became a worldwide phenom:

  • AlphaGo, like any learning agent, started off with zero knowledge of the game.
  • It was then fed the basic structure and strategy of the game using thousands of examples from both amateur and professional players.
  • In three days, it achieved a high skill level, and the testers began playing the program against itself.
  • This led to constant iteration, reinforcement, and pairing with search algorithms.
  • AlphaGo soon became different, more advanced versions of itself. Fan, Lee, Master, and ultimately Zero.
  • AlphaZero competed against the best human player, 18-time world champion Ke Jie. The agent won 100 games to 0.
AlphaGo reinforcement learning

In just 40 days, not only did AlphaGo become the best player in the world, it achieved an Elo rating above 5,000, which is essentially super-human levels.

Challenges to reinforcement learning

In just a few examples, we’ve seen how incredibly powerful reinforced learning can be, but this isn’t without some challenges.

The first most obvious challenge is that reinforced learning takes place in a delayed return environment. This basically means the more advanced task, the longer it’ll take the agent to learn and achieve maximum rewards.

Pong may have taken an hour or so, but AlphaZero took 40 days and millions upon millions of games. Imagine how many iterations it takes to roll out systems outside of the gaming world like in robotics, science, and economics.

Some other challenges to reinforcement learning are:

  • Adding more than one specified reward or rewards that aren’t explicit.
  • The actions of agents being delayed due to learning algorithms.
  • A lack of safety constraints.

What’s next for reinforcement learning?

Reinforcement learning has the potential for more groundbreaking discoveries and innovations, but what do some of these innovations look like?

With further research in reinforced learning and deep learning methods, envision highly intelligent stock trading, completely automated factories, advanced self-driving vehicles, smart prosthetics, and so much more.

If machine learning interests you, check out our article on some of the more clever machine learning examples today. We sourced our content from five business leaders who leverage machine learning in their products/services.

Devin Pickell

Devin Pickell

Devin is a Content Marketing Specialist at G2 Crowd writing about data, analytics, and digital marketing. Prior to G2, he helped scale early-stage startups out of Chicago's booming tech scene. Outside of work, he enjoys watching his beloved Cubs, playing baseball, and gaming.