At Connectivity Lab, our mission is to connect the unconnected and underserved in the world. Ten percent of the world's population lives in areas of the world where connectivity is simply not available; connecting these often remote and rural areas will require the development of new wireless communication technologies and platforms.
Defining the specifications of the technologies that we are developing first requires accurate information of how people are aggregated in these areas. For example, short-range access networks such as Wi-Fi hotspots are suitable for people living close together, while cellular technologies are better for regions where people live farther apart, in isolated houses. Additionally, knowing how communities are located in relation to one another is important for planning backhaul networks — the links to the internet backbone. Villages lined up along a river or road could be connected by a string of terrestrial point-to-point links, while scattered settlements might require an aerial backhaul solution such as unmanned aerial vehicles or satellites.
Whatever technological solution will eventually be used to connect these people, accurate knowledge about the population distribution is at the core of its development. Creating a data set with high spatial resolution for some of the countries that could benefit from better internet connectivity is a large undertaking. Aggregate population counts on the spatial scale of provinces or districts are known from population censuses but alone are insufficient, as these areas vary in geographical size and do not provide insight about population distributions on a granular level.
We solved this challenge by applying techniques from computer vision on DigitalGlobe high-resolution satellite imagery. We identified human-built structures, such as buildings or other infrastructure, and used those locations as a proxy for where people live. We then combined our results with existing census counts and created a population data set with 5-meter resolution for 20 countries. While recognizing structures in aerial imagery is a popular task in computer vision, scaling it to a global level came with additional difficulty. Aside from processing billions of images, finding buildings with high fidelity in rural areas is really a needle-in-a-haystack problem: Typically, more than 99 percent of the landmass we analyze does not contain any human-made structure, and it therefore poses a challenge for the machine learning algorithms to learn from such an unbalanced data set.
For the computer vision analysis, we used a combination of three image-processing steps. First, we used a conventional image-processing procedure to preselect candidate areas that potentially contained human-made structures, discarding images with vast bodies of desert, forest, and water. Next, we invoked Facebook's image-recognition engine — based on a deep convolutional neural network that provides a fixed dimensional feature embedding for all images — and found that, with minor modifications, we could use the engine trained on normal photos to efficiently detect whether a satellite image contained a building. Finally, we developed a weakly supervised neural network with an architecture tailored for this particular problem. By using a binary labeling scheme (the image does/does not contain a building), the neural network learned “what” and “where” simultaneously. It succeeded in identifying outlines of buildings and highlighted those for which it had high confidence while suppressing areas not likely to contain human-made structures.