From the course: Introduction to Artificial Intelligence

Reinforcement learning

From the course: Introduction to Artificial Intelligence

Reinforcement learning

- Online music is close to a $30 billion industry. And if you think about it, it's kind of a strange business. A lot of times you can buy the same songs from Apple Music, Spotify, or Tidal. So why would you want to sign up for one service over the other? For most people, this has to do with the power of their recommendation system. A lot of these systems started off by using unsupervised machine learning. They would recommend songs the same ways that online retailers would say things were frequently bought together. But the best music libraries don't just want you to buy things together, many of them want you to discover something new. So for that, you have to use a different form of machine learning. This from of machine learning is called Reinforcement Learning. These are machine learning algorithms that use rewards as a way to give the system incentive to find new patterns. A few years ago, Google used Reinforcement Learning to teach artificial intelligence systems how to play video games. Their AI systems beat expert players at Pong, Atari, and even more modern video games. But reinforcement learning can do much more than just play video games. These systems can improve over time, by setting a series of goals and rewards. Think of it this way, Spotify Discover Weekly compares your favorite songs with a bunch of related songs. The machine learning algorithm tracks every time you click and play a song. It also keeps track of how long you listen. The data scientists design the algorithms so that every time you click a related song, it gets a tiny digital reward. It's almost like money for the machine learning system. It gets this reward coin when you click on the recommended song. Then the longer you listen, the more the reward increases, so it gets a reward coin for every minute that you listen to the song. These algorithms often use something called Q-Learning. This type of reinforcement learning helps create more sophisticated rewards. In Q-Learning, there are set environments or states, there are also possible actions that can respond to the states. In Q Learning, you want the machine to improve the quality of the outcome, which is represented by the letter Q. In this case, you'd start with a Q of zero. Then you'd have the machine learn the actions that improve its conditions. The state of Q would go up each time you clicked a song. You can almost think of Q-Learning like the system's bank account. It can earn a bunch of coins and then see the balance grow. Once the reward system is set up, it'll look for patterns to try an increase its Q-Learning bank account. So if it sees a pattern where people who like one song might listen to another, it'll exploit that and make sure to put that song on your Discover Weekly. Reinforcement learning systems work best when your organization wants to do more than just cluster items that are frequently bought together. With Reinforcement Learning, your organization can build a system that thinks creatively about what your customer can discover.

Contents