Machine learning has been a hot topic for years now and has led to technological breakthroughs in many industries.
Modern advancements in fraud protection, product recommendations in e-retail, transportation efficiency, and improved medical diagnoses are a few of the many ways machine learning software is making a difference in our lives today.
However, there is actually more than one type of machine learning, along with a variety of algorithms and specific ways to apply them. In this guide, we’ll break down two of the most common types – supervised and unsupervised learning – and discuss their differences using some fun visuals and real-world situations.
Before diving into the nitty gritty of how supervised and unsupervised learning works, let’s first compare and contrast their differences.
And that’s really it when it comes to unsupervised learning. You can see its much less structured so it can find hidden patterns within the data, whereas supervised learning, we want the model to meet the desired expectations with high accuracy.
When it comes to supervised learning, there is a “ground truth,” which basically means we already know what the output values should be.
Ground truths are real-world assumptions of what we know. For example, dogs are dogs and cats are cats. This may be an oversimplification, but it’s important to note because we were taught this at some point in our lives, and machines will need to be taught as well. Thus, the purpose of supervised learning.
In our example, we’ll see how a machine can be trained to discern dogs from cats
Below, we can see how labels are applied to training data to provide some context for the machine learning algorithm.
The machine now has a basic “idea” of what a cat is based on the data and labels it was provided. Now, it’s time to validate the model to see how accurate it is.
Keep in mind, the machine doesn’t exactly know its divvying up cats from dogs, it just knows what it has learned thus far. This why training data and proper labels are so important. If data is noisy or incorrect, it can affect the quality of machine learning.
The more time and effort put into supervised learning, the more accurate the results will be. It’s unlikely the model will be spot-on the first time through, so it’s up to the person behind the model to keep refining it.
There are many methods of supervised learning, but two of the most common used today are classification and regression. The example we used above discerning dogs from cats is considered classification.
Classification: The target variable consists of categories. For example, spam filters classify emails based on learned elements of spam or no spam. This is binary, meaning values 0 or 1 are applied.
Regression: The target variable is continuous. For example, the fluctuating price of a house or measuring the impacts of a diet can be considered regression.
Now, onto unsupervised learning.
Contrary to supervised learning, there is no such ground truth or “right answer” when it comes to unsupervised learning. Instead, the data is allowed to be in its raw, unlabeled state so the learning algorithm can attempt to find hidden patterns. This is the purpose of unsupervised learning.
In our example, we’ll see how a machine can learn to find patterns in unlabeled data.
From the example above, we can see that even without labels, the algorithm was able to sort the data based on the structures it identified. Though this may be obvious when it comes to land, water, and air animals, it could be much less obvious when dealing with massive datasets.
It’s worth noting that a lack of labels means it could be difficult to compare the performance of different unsupervised models.
There are also many methods of unsupervised learning, but two of the most common used today are clustering and anomaly detection.
Clustering: Takes a cluster of data-points and divides it based on similarities within the data. Clustering is particularly good for grouping, however, when deep segmentation is needed – like when dealing with customers – it may not be the optimal method.
Anomaly detection: Many datasets will have an outlier or two. In large groups, outliers may be less significant, but in circumstances of fraud detection or equipment maintenance, detecting unusual activity can be very useful.
Now that you’re able to point out the differences between supervised and unsupervised learning, it’s time to discover some more advanced types of machine learning.
Learn how reinforcement learning works, and how it was applied to beat one of the world's all-time greatest gamers in just 40 days.
Devin is a former Content Marketing Specialist at G2, who wrote about data, analytics, and digital marketing. Prior to G2, he helped scale early-stage startups out of Chicago's booming tech scene. Outside of work, he enjoys watching his beloved Cubs, playing baseball, and gaming. (he/him/his)
Subscribe to keep your fingers on the tech pulse.