Nice to meet you.

Enter your email to receive our weekly G2 Tea newsletter with the hottest marketing news, trends, and expert opinions.

Zero-Shot Learning: Unlocking New Possibilities in AI

December 17, 2024

zero shot learning

Zero-shot learning brings out the intelligence in artificial intelligence by making it learn concepts without a lesson. 

Unlike traditional supervised learning methods, which require training models on vast amounts of labeled data to pair inputs with desired outputs, zero-shot learning allows models to generalize and categorize data quickly based on large unlabeled datasets. 

Zero-shot learning enables large language models (LLMs) to categorize information successfully without labeled datasets and frequent retraining. Businesses across sectors use these models for various tasks, including but not limited to translation, summarization, answering questions, content generation, and sentiment analysis.

How does zero-shot learning work? 

When humans learn in a zero-shot way, our learning process integrates experience, emotions, context, and deep understanding to generalize information. In contrast, artificial intelligence (AI) relies strictly on data and patterns without personal experiences, feelings, and other human thoughts.

Key components and techniques of zero-shot learning

  • Semantic embeddings and visual mappings: ZSL creates a shared space where known and unknown classes are represented. To do so, techniques like word embeddings (e.g., Word2Vec, GloVe) or visual features are used. These embeddings capture relationships between words, images, or attributes, allowing the model to predict unseen classes. Additionally, models like DeViSE align visual features with their corresponding semantic meanings.
  • Generative models: Generative models, like generative adversarial networks (GANs) and variational autoencoders (VAEs), create synthetic examples of unseen classes. By learning patterns from the classes it knows, the model can generate realistic data for classes it hasn't seen.
  • Attribute-based classification: Attributes are descriptive features (e.g., "furry," "four-legged") that help the model connect seen and unseen classes. These shared traits act like a bridge, allowing the model to classify new data based on previously learned attributes.
  • Transfer learning: This method speeds up model training and reduces the need for labeled data by applying knowledge from large datasets to new tasks. It uses domain adaptation to adjust knowledge from familiar tasks to new ones by aligning shared features. Alternatively, domain generalization exposes the model to diverse data, allowing it to handle new tasks without extra training.

Here's a video that summarizes how ZSL works.

Source: IBM

Generalized zero-shot learning (GZSL)

Generalized zero-shot learning is a learning method that builds on zero-shot learning. Unlike ZSL, which focuses only on unseen classes, GZSL takes a more practical approach by requiring the model to handle both seen (trained) and unseen (new) classes at the same time.

Zero-shot vs. few-shot vs. one-shot learning 

Zero-shot, one-shot, and few-shot learning are all techniques that help machine learning models predict new classes with minimal or no labeled data.

Zero-shot learning involves training machine learning models to recognize new classes without any labeled data. Instead of relying on labeled examples, these models utilize their existing knowledge and semantic similarities to make informed predictions. For instance, when identifying a koala, a zero-shot learning model might use its understanding of other bear species to make a reasonable prediction.

In one-shot learning, machine learning algorithms are trained to classify objects using a single example of each class. For example, a one-shot learning scenario in computer vision occurs when a deep learning model is presented with only one image and must quickly determine whether it is similar or different from a reference image. This approach allows models to make generalizations based on minimal data by focusing on similarities to make accurate predictions.

Few-shot learning expands on these principles by training AI models to generalize new data classes based on a few labeled samples per class. By considering a small number of examples, these models can make better, more accurate generalizations by extracting meaningful information from multiple instances. This method provides more training data, allowing the model to understand a data class better.

zero shot vs few shot vs one shotSource: DataCamp

How does zero-shot learning help large language models? 

Zero-shot learning enables large language models, like ChatGPT and Gemini, to perform tasks they haven’t been explicitly trained on. These models can tackle new tasks based on instructions provided through natural language prompting. 

As LLMs are exposed to vast amounts of data, they develop new understandings and connections of language, concepts, and tasks. This allows them to use their broad knowledge to scale and adapt to new functions without retraining each time. 

For example, you can ask an LLM about a niche topic, and it will pull from its broad knowledge base to generate relevant content based on underlying attributes, even if it hasn’t been specifically trained on that topic. 

Applications of zero-shot model 

There are many ways to use zero-shot learning to complete AI tasks, let's look at a few. 

Computer vision 

Similar to the example of recognizing an image of a koala without ever having seen one, zero-shot learning allows AI models to analyze pictures of new objects and identify them correctly. 

Rather than relying on vast training data for each new object, zero-shot learning allows models to understand and categorize new, unseen objects by connecting the information they already know with the new information they encounter. 

Natural language processing (NLP)

NLP is a significant application of zero-shot learning, as it allows models to predict words or phrases they haven’t encountered previously based on semantic similarities with known words. 

This capability is crucial for enterprises using chatbots or virtual assistants since it equips the models to handle new queries and provide quality customer service. 

Suppose a business trains a chatbot to handle questions about refunds and lost packages. If a new customer asks about a stolen package and a refund, the chatbot can use its knowledge of refunds and lost packages to provide a relevant answer.

Medical diagnostics

Zero-shot learning shows excellent potential in medical diagnostics and healthcare. It can help identify diseases or conditions that weren't part of the training data. For instance, a model trained on data for one disease can predict new variants of that disease that were not included during training.

Autonomous vehicles

Autonomous vehicles must accurately perceive their surroundings and make reliable decisions. Zero-shot learning allows these vehicles to handle new obstacles or situations they haven't faced before, promoting safer and more dependable driving.

For example, a vehicle with zero-shot learning can recognize and avoid unexpected hazards like construction zones or debris, even without prior training, improving safety and performance.

Advantages of zero-shot learning 

Zero-shot learning offers some compelling advantages, including the following.

It doesn't require extensive amounts of labeled data 

Traditional supervised learning models require large labeled datasets to perform new tasks and recognize objects. On the other hand, zero-shot learning relies on descriptive attributes and features to identify new classes of information. It makes machine learning models more accessible to those without extensive training datasets or the time to collect and label them. 

Kelwin Fernandes, CEO of NILG.AI, said that the lack of data needed to train the AI models is one of the primary advantages of zero-shot learning. “It facilitates the adoption of AI systems even in scenarios where the target user has no data. For example, even if your company doesn't have any historical data about categorizing customer support tickets, as long as you can provide the names of the categories, it should be able to predict the right category for new tickets.”

It has scalability potential 

Zero-shot learning can scale efficiently to new areas, categories, and concepts without significant model retraining time. Suppose a business uses a model to assist with customer segment development. In that case, teams can share new descriptions for evolving customer segments over time, allowing the AI to iterate and improve to meet these needs.

It's cost-effective for small teams and researchers 

Since zero-shot learning minimizes the dependency on large datasets, it can help teams reduce the costs associated with data collection and annotation. This cost-effectiveness is particularly beneficial for research teams and small businesses that want to leverage AI solutions but lack the funding or resources to compile extensive labeled datasets. 

Limitations of zero-shot learning 

As with all forms of technology, zero-shot learning possesses challenges worth considering before using these models. 

It might yield lower accuracy compared to other learning methods

Recall that zero-shot learning relies on descriptive attributes and features to classify new information. While it benefits from not requiring a large labeled dataset, trainers must use comprehensive descriptions to support accurate prediction-making. Imprecise information can lead to misclassifications and categorization errors. 

According to Dmytro Shevchenko, a data scientist at Aimprosoft, zero-shot learning isn’t as effective for complex tasks that require context without extensive training, which can lead to accuracy issues. 

“Accurate results usually require training with multiple examples or fine-tuning. I can give an excellent example of medical image classification. ZSL may fail if a model needs to accurately classify medical images into rare diseases because it lacks specific knowledge. In this case, additional training or customization with examples is required,” Shevchenko said.

There are some bias and fairness concerns 

Zero-shot learning models can inherit biases in the presented training data or auxiliary information they use to classify information. In other words, models can be biased toward the classes they’ve seen and may force unseen data into the seen class data. 

Researchers Akanksha Paul, Narayanan C. Krishnan, and Prateek Munjal have proposed a new method, Semantically Aligned Bias Reducing (SABR), to reduce bias in zero-shot learning and mitigate these effects. 

It doesn't work well for complex or niche tasks 

Zero-shot learning is best suited for simple tasks that require general knowledge. Models trained using these techniques may struggle with more complex tasks requiring specialized knowledge and domain expertise. In such cases, another training technique with more labeled data and examples may be necessary for the best results. 

Fernandes noted, “Although current models tend to work well in general domain tasks, they become less accurate if you go into very niche applications (e.g., industrial applications), and you may need to train/fine-tune your custom models.”

You get zero shots!

Zero-shot learning represents a significant step towards enabling machines to exhibit more human-like generalization and adaptability, albeit within the constraints of data-driven learning. 

Ultimately, zero-shot learning enables LLMs to handle tasks they weren’t explicitly taught or trained for. They rely on their existing knowledge and understanding of concepts and semantics to conduct simple tasks. 

While zero-shot learning is advantageous due to the lack of data need, scalability potential, and cost-effectiveness, it isn’t well-suited to assist with complex tasks and may yield lower accuracy. 

Don't have an in-house team of data scientists and ML developers? Try machine learning as a service (MLaaS) for model development and training.


Get this exclusive AI content editing guide.

By downloading this guide, you are also subscribing to the weekly G2 Tea newsletter to receive marketing news and trends. You can learn more about G2's privacy policy here.