Recognizing blue cheetah-prints & other uncommon attributes
To help shoppers find exactly what they’re looking for, it’s important that product recognition systems excel at recognizing specific product characteristics — also known as attributes. But there are thousands of possibilities, and each one can apply to a range of categories. For example, you can have blue skirts, blue pants, blue cars, or even a blue sky. The most accurate AI systems today rely on labeled examples, known as supervised learning, to learn these attributes, but with near-infinite possibilities, this is not scalable. Even just 1,000 objects and 1,000 attributes would mean manually labeling more than a million combinations. Plus, some combinations might occur more frequently in data. For example, there might be many blue cars, but few blue cheetah-print clothing items.
How can we make our systems work even on rare occurrences?
We built a new model that learns from some attribute-object pairs and adapts to entirely new, uncommon attributes. So, if you train on blue skirts, blue cars, and blue skies, you’d still be able to recognize blue pants even if your model never saw them during training. We built a new compositional framework on top of our previous foundational research that uses deep learning to achieve state-of-the-art image recognition. This approach uses “weakly supervised learning,” where the model learns from associated hashtags from 78M public Instagram images rather than relying entirely on manually labeled examples. Notably, we added a new compositional module that makes it possible for us to predict combinations of objects and attributes that aren’t in the labeled example set. Each object can be modified with many attributes, increasing the fine-grained space of classes with few orders of magnitude. Meaning, we can scale to millions of images and hundreds of thousands of fine-grained class labels in ways that were not possible before. And we can quickly spin up predictions for new verticals to cover the range of products in our Facebook catalog, or even recognize those blue cheetah-print clothing items should we ever come across them.
While collecting the training data to train these models, we sampled objects and attributes from all geographies around the world. This helps us reduce the potential for bias in recognizing concepts like “wedding dress,” which is often white in Western cultures but is likely to be red in South Asian cultures, for instance. As part of our ongoing efforts to improve the algorithmic fairness of models we build, we trained and evaluated our AI models across subgroups, including 15 countries and four age buckets. By continuously collecting annotations for these subgroups, we can better evaluate and flag when models might work better at recognizing some attributes, like the neckline (V-neck, square, crew, etc.) on shirts for women compared with those for men if, for instance, we didn’t have enough training data of men wearing a V-neck shirt. Although the AI field is just beginning to understand the challenges of fairness in AI, we’re continuously working to understand and improve the way our products work for everyone across the world.
This model is now live on Marketplace, and as a next step, we’re exploring and deploying these models to strengthen AI-assisted tagging and product matches across our apps. We’re also working on using this technique to incorporate search with more flexibility, like: “Find a scarf with the same pattern and material as this skirt.”