Other researchers have since picked up on our work in GANs, using them to produce stunning high-resolution images. But GANs are notorious for being very difficult to tune and for often failing to converge. So FAIR has explored ways to make GANs more reliable by focusing on understanding adversarial training at the theoretical level. In 2017, we introduced the Wasserstein GAN (WGAN) method, which proposed a way to make the discriminator “smooth” and more efficient, in order to tell the generator how to improve its predictions. The WGAN was essentially the first GAN whose convergence was robust on a wide range of applications. This avoided the need to balance the output of discriminators and generators as the system is optimized, which results in significantly more learning stability, particularly for high-resolution image generation tasks.
Since then, FAIR researchers and Facebook engineers have used adversarial training methods for a range of applications, including long-term video prediction and the creation of fashion pieces. But the really interesting part of GANs is what they mean for the future. As a brand-new technique that wasn’t available to us even a few years ago, it opens up new opportunities to generate data in areas where we have little data. It will likely be a key tool on our quest to build machines that can learn on their own.
Text classification that scales
Text understanding isn’t a single task but a sprawling matrix of subtasks that organize words, phrases, and entire data sets of language into a format that machines can process. But before much of that work can take place, the text itself has to be classified. Years ago, NLP models such as word2vec classified text through extensive, word-based training, with the model assigning a distinct vector to each word in its training data set. For Facebook, the status quo was simply too slow and too dependent on fully supervised data. We needed text classification that could eventually work with hundreds or even thousands of languages, many of which don’t lend themselves to extensive data sets. And the system needed to scale across the entire range of text-based features and services, as well as our NLP research.
So in 2016 FAIR built fastText, a framework for rapid text classification and learning word representations that takes into account the larger morphology of the words it classifies. In a paper published in 2017, FAIR proposed a model that assigns vectors to “subword units” (e.g., sequences of 3 or 4 characters) rather than to whole words, allowing the system to create representations for words that didn’t appear in training data. The end result was a model whose classifications can scale to billions of words, learning from novel, untrained words while also training significantly faster than typical deep learning classifiers. In some cases, training that had taken several days with previous models was finished in a few seconds with fastText.
FastText has proved to be a vital contribution to the study and application of AI-based language understanding, and it’s now available in 157 languages. The original paper has been cited more than a thousand times in other publications, and fastText remains one of the most commonly used baselines for word embedding systems. Outside of Facebook, fastText has been used for a diverse array of applications, ranging from the familiar, such as suggesting message replies, to the exotic — an “algorithmic theater” production called The Great Outdoors, which used fastText to help select and order public internet comments that would become the script for each performance. The framework is deployed at Facebook to classify text across 19 languages, and it’s used in tandem with DeepText for translation and natural language understanding.
Cutting-edge translation research
Fast, accurate, and flexible translation is a key component of helping people around the world to communicate. So, in the early days of FAIR, we set out to find a new approach that would outperform statistical machine translation, which was then the state-of-the-art method. It took three years of work to build a CNN-based neural machine translation (NMT) architecture with the right combination of speed, accuracy, and learning. (FAIR published a paper in 2017 detailing its work.) In our experiments, this approach resulted in a 9x increase in speed over RNNs while maintaining state-of-the-art accuracy rates.