Zero-shot learning: How Facebook teaches AI to learn more like people do

You may not have come across the term “zero-shot learning” before unless you’re interested in AI research, but, like all humans, you excel at it.

People typically find it easy to use past experiences to help them identify never-before-seen objects.

Someone who has never set eyes on a flamingo, for instance, would likely have no trouble identifying one, as long as they’d previously heard a description of it (“pink, with long, sticklike legs”) and had seen other birds.

Recommended Reading

But even the most advanced AI systems struggle with developing this sort of intuitive image understanding, which researchers at the University of Montreal named zero-data learning in 2008 and is now known as zero-shot learning (ZSL). AI systems typically learn about only the particular things that appear in the thousands or millions of examples that make up the training data that they’re given. Most machines simply don’t have the ability to extrapolate to recognize a flamingo — even after looking at a huge number of bird images — unless flamingo pictures were included in their training.

To teach an AI to identify birds it has never seen before — such as a flamingo — Facebook scientists trained the system to extract text-based features from written descriptions and use them to “imagine” what unseen species look like. This blend of text and image understanding is an example of zero-shot learning.

The AI research community, however, is focused on overcoming this limitation so that computers will need less training in order to accomplish a wide variety of tasks, such as recognizing a flamingo with zero prior examples of that specific kind of bird (hence the name zero-shot learning).

What Facebook is doing in this research field:

Developing machines that learn with humanlike flexibility and efficiency is one of Facebook’s long-term research priorities. Techniques like ZSL help move the field in this direction and could lead to general improvements for a wide range of AI systems, making them faster and easier to train and better able to work with new information.

Recent work conducted by our AI researchers and engineers demonstrates how Facebook is helping to advance the state of the art in ZSL. For example, in a research project called generative adversarial zero-shot learning (GAZSL), the AI learns to recognize many bird species by looking at images. But the system can also read up on new, never-before-seen bird species by analyzing related web articles. Looking at only those text descriptions — without seeing an image of the species — the AI extracts key features, such as the color or shape of the bird’s head, tail, and wings. The system can then “imagine” what the species looks like, generating a synthetic visual model of the bird. The researchers found that, for identifying unseen classes of birds, this approach works better than previous state-of-the-art ZSL models, which didn’t have this capability of imagination from text.

The key to this approach is the use of contextual information instead of rote memorization. The GAZSL system has effectively memorized the look of many species of birds, but not all of them. Recognizing more birds requires the AI to combine that baseline visual understanding of what makes up a bird — wings, feathers, etc. — with an understanding of how text can describe images. This synthesis of image and text understanding doesn’t eliminate the need for training entirely, but it’s an example of how ZSL can reduce training and help systems stay on their figurative toes in the face of unfamiliar data.

Why it matters:

For AI to reach its full potential, we need more systems that are able to train and act on a wider variety of data, including information that they’ve never encountered. Though ZSL-based research is at an extremely early stage and has yet to be deployed in real-world applications, this approach could provide a way to pursue the long-term goal of more nimble AI through a kind of reverse engineering, applying human-inspired learning methods to machines. It’s even possible for ZSL systems to learn in ways that people can’t. For example, LASER, another Facebook AI research project that uses zero-shot techniques, can understand multiple dialects and languages by leveraging similarities between them that are too subtle for most humans to take advantage of.

As ZSL continues to show results, it could enable near-term benefits, such as better recommendations and more advanced safeguards that automatically flag bad content within categories a system hasn’t seen before. More broadly, this approach represents the possibility of a fundamental upgrade to the way we train and use many types of AI systems. From powering increasingly multilingual technology to training computer vision systems to better understand unfamiliar objects, over the long term, ZSL could help transition AI away from today’s narrowly capable tools and toward the kind of versatility that’s so effortless for humans.