Machine learning has been a hot topic for years now and has led to technological breakthroughs in many industries.
Modern advancements in fraud protection, product recommendations in e-retail, transportation efficiency, and improved medical diagnoses are a few of the many ways machine learning software is making a difference in our lives today.
However, there is actually more than one type of machine learning, along with a variety of algorithms and specific ways to apply them. In this guide, we’ll break down two of the most common types – supervised and unsupervised learning – and discuss their differences using some fun visuals and real-world situations.
Supervised vs unsupervised learning
Before diving into the nitty gritty of how supervised and unsupervised learning works, let’s first compare and contrast their differences.
Requires “training data,” or a sample dataset that will be used to train a model. This data must be labeled to provide context when it comes time for learning. We’ll explain the importance of training data later in this guide.
Requires desired output values. This helps the model reach what is believed to be the “correct” answer.
Requires a learning algorithm to map input values to the desired output values.
Requires validation data, this must also be labeled. Validation data will be passed through the model to test its accuracy.
Requires input data with no particular output.
There are no data labels nor training data for context.
Requires a learning algorithm to find naturally occurring patterns in the data.
And that’s really it when it comes to unsupervised learning. You can see its much less structured so it can find hidden patterns within the data, whereas supervised learning, we want the model to meet the desired expectations with high accuracy.
How supervised learning works
When it comes to supervised learning, there is a “ground truth,” which basically means we already know what the output values should be.
Ground truths are real-world assumptions of what we know. For example, dogs are dogs and cats are cats. This may be an oversimplification, but it’s important to note because we were taught this at some point in our lives, and machines will need to be taught as well. Thus, the purpose of supervised learning.
In our example, we’ll see how a machine can be trained to discern dogs from cats
Below, we can see how labels are applied to training data to provide some context for the machine learning algorithm.
The machine now has a basic “idea” of what a cat is based on the data and labels it was provided. Now, it’s time to validate the model to see how accurate it is.
Keep in mind, the machine doesn’t exactly know its divvying up cats from dogs, it just knows what it has learned thus far. This why training data and proper labels are so important. If data is noisy or incorrect, it can affect the quality of machine learning.
The more time and effort put into supervised learning, the more accurate the results will be. It’s unlikely the model will be spot-on the first time through, so it’s up to the person behind the model to keep refining it.
Common methods of supervised learning
There are many methods of supervised learning, but two of the most common used today are classification and regression. The example we used above discerning dogs from cats is considered classification.
Classification: The target variable consists of categories. For example, spam filters classify emails based on learned elements of spam or no spam. This is binary, meaning values 0 or 1 are applied.
Regression: The target variable is continuous. For example, the fluctuating price of a house or measuring the impacts of a diet can be considered regression.
Now, onto unsupervised learning.
How unsupervised learning works
Contrary to supervised learning, there is no such ground truth or “right answer” when it comes to unsupervised learning. Instead, the data is allowed to be in its raw, unlabeled state so the learning algorithm can attempt to find hidden patterns. This is the purpose of unsupervised learning.
In our example, we’ll see how a machine can learn to find patterns in unlabeled data.
From the example above, we can see that even without labels, the algorithm was able to sort the data based on the structures it identified. Though this may be obvious when it comes to land, water, and air animals, it could be much less obvious when dealing with massive datasets.
It’s worth noting that a lack of labels means it could be difficult to compare the performance of different unsupervised models.
Common methods of unsupervised learning
There are also many methods of unsupervised learning, but two of the most common used today are clustering and anomaly detection.
Clustering: Takes a cluster of data-points and divides it based on similarities within the data. Clustering is particularly good for grouping, however, when deep segmentation is needed – like when dealing with customers – it may not be the optimal method.
Anomaly detection: Many datasets will have an outlier or two. In large groups, outliers may be less significant, but in circumstances of fraud detection or equipment maintenance, detecting unusual activity can be very useful.
Time to learn more
Now that you’re able to point out the differences between supervised and unsupervised learning, it’s time to discover some more advanced types of machine learning.
Learn how reinforcement learning works, and how it was applied to beat one of the world's all-time greatest gamers in just 40 days.
Devin is a Content Marketing Specialist at G2 Crowd writing about data, analytics, and digital marketing. Prior to G2, he helped scale early-stage startups out of Chicago's booming tech scene. Outside of work, he enjoys watching his beloved Cubs, playing baseball, and gaming. (he/him/his)