Engineering

How the Open Compute Project revolutionized the open hardware community

Ten years ago, data centers (and the hardware inside them) were treated as proprietary information. It was unheard of for companies to invest in homegrown hardware. Most companies bought their servers and racks and other components from vendors, and they came the way they came, encased in metal and unable to be upgraded or customized. If something didn’t work for your system, you went and bought another closed box.

Meanwhile, at Meta (then called Facebook), hundreds of millions of people were using our services. We needed to rethink our infrastructure to support the increasing number of people who used the Facebook app to share photos, videos, and messages with friends and family around the world. That meant expanding beyond some servers in a leased space. We needed an actual data center. We were interested in building innovation, efficiency, and speed into our infrastructure DNA to enable operational simplicity as we continued to grow. The only way to do this was to build our own hardware.

We knew we wanted this new hardware to be energy efficient but also able to scale. Meta engineers spent two years rethinking every piece of hardware that goes into a data center, no matter how big or small. They came up with a new server design to meet this goal, but they also needed to innovate to develop the technology that supported it. They designed new energy-efficient power supplies and racks, and an evaporative cooling system to keep the equipment from overheating.

Recommended Reading

In 2011, as we inched closer toward celebrating one billion people on the Facebook app, we opened our first data center, in Prineville, Oregon. Thanks to the new designs and hardware advancements, Prineville was 38 percent more energy efficient to build than our previous facilities. Our design required less electricity, while the state-of-the-art cooling system helped keep the hardware from overheating, which was a common concern in data centers of the past. We wanted to make sure we were optimizing for power efficiency. In the cooler months, heat from the servers was harnessed and used to warm office buildings. This led us to deliver an industry first — a data center that had a power usage effectivness (PUE) of 1.07. PUE is the ratio of the amount of power entering a data center to the power used to run the computer infrastructure within it. So, the lower the PUE number, the more efficient the data center’s electricity usage is for operations. When we started designing the Prineville data center, the prevailing average of the industry was 1.9, meaning that roughly 50 percent of energy was getting wasted. In contrast, our Prineville data center was operating with no more than 7 percent of energy wasted.

But we didn’t want to be one more closed box. We’ve always been active members of the open source community, where we’ve seen that sharing our work and collaborating with others has helped us, and the industry as a whole, innovate faster. We wanted to take that concept and apply it to hardware.

We shared the Prineville data center designs with the public and, along with Andy Bechtolsheim, Goldman Sachs, Intel, and Rackspace, launched the Open Compute Project (OCP) and invited other members to collaborate and build on our progress. Those designs became our first contribution to the OCP. Today, most large data centers have achieved PUE below 1.2.

For the past 10 years, OCP has transformed the hardware industry in much the same way that the open source software community has transformed software space. We now have members across a variety of industries, including technology, telecommunications, and retail. As we’ve grown, OCP has also expanded the types of projects it’s taken on. Today, we have projects that include cloud storage, network disaggregation, infrastructure software, timing appliances, and even disaggregation of chips to create an open chiplet-based ecosystem.

Building infrastructure to serve billions of people

Over the past decade, OCP has had a significant impact on the industry’s transition from operating as closed, siloed systems to becoming disaggregated, open systems. That means we can easily replace hardware or software when better technology becomes available — enabling us to improve efficiency in computing, storage, and networking. This has also allowed others to build data centers following the blueprints contributed to the OCP.

This seismic shift that began with the launch of OCP in 2011 accelerated in 2013 when we launched the Networking Project, which created a disaggregated network switch that made it easier to scale data center technologies and modify the software that runs on them. Our networking hardware followed a similar trajectory. We transformed proprietary devices into open systems and decoupled networking hardware and software, a move that marked one of OCP’s biggest impacts on the data center industry.

From there, we kept building. In 2014, we announced Wedge, a top-of-rack switch, and FBOSS, a Linux operating system. Wedge was the first instance of an open and disaggregated top-of-rack switch in the industry. With this, we gained a greater level of visibility, automation, and control in network operations while freeing up our engineers to focus more on bringing new capabilities to the network.

We continued to build on Wedge and FBOSS to develop Backpack and Minipack, modular platforms that enabled us to modify any part of a system without hardware and software interruptions. We also launched Yosemite, an open source modular chassis, which included processor modules that provided our infrastructure with capacity on demand.

Our partnership within the industry grew with the growth of OCP community. One example was our work with Microsoft, looking at rack designs for the server racks that populate data centers.

This effort culminated in our contribution of the Open Rack frame to OCP in 2019. Together, we designed a server rack that efficiently distributed power, stayed cool, and was interoperable to enable efficiency and operational flexibility.

Over time, our workloads continued to evolve to serve the needs of billions of people around the globe. To support artificial intelligence (AI) and machine learning demands, we again looked at OCP to explore a standard form for AI accelerator modules to increase flexibility around hardware accelerators that supported deep learning. In 2019, we shared our designs via the release of the OCP Accelerator Module to the OCP community.

Last year, we announced a new timekeeping service based on Network Time Protocol to improve accuracy from 10 milliseconds to 100 microseconds. Continuing our philosophy of open collaboration, we recently launched OCP’s Time Appliances Project to provide even more precise timing. Our first contribution here is a new time appliance, including the Time Card, a card that can turn almost any server into a time appliance. Meta continues to share and build standard and open ecosystems in various areas of data center infrastructure via Open Compute Project to promote greater collaboration and faster innovation in the field.

The future of OCP

Over the last decade, we’ve helped create an open source hardware community and reshaped networking infrastructure, management, and power usage. But we are not resting on what we have achieved thus far. At this year’s OCP Summit, we announced two new milestones for our data centers: We’ve partnered with Arista, Broadcom, and Cisco to develop next-generation data center network hardware, and we’ve migrated our network hardware to the OCP-standard Switch Abstraction Interface (SAI) API, which will allow our FBOSS team to move more quickly and reliably with our vendors and manufacturers.

Read more about our OCP Summit announcements.

As we look at the next decade, we want to build an open and standardized approach to solving even more of the big problems that our industry is facing collectively, including:

Creating open chiplet-based solutions: Moore’s era of transistor doubling has tapered, and the industry is looking at chiplets to continue achieving the performance gains of past. We are focused on building an open and standard die-to-die interface that can be used by all in the industry to develop innovative solutions that large data center operators, including Meta, can easily integrate within our data centers. This work is being pursued in the Open Domain Specific Architecture workgroup.

Reaching net zero emissions by 2030: We are dedicating the next decade to sustainability progress that will allow the entire industry to achieve their goals. Earlier this year, Meta launched the Sustainability and Circularity workgroup within OCP, which aims to tackle the challenge of building circular data centers with zero carbon emissions.

With OCP, we are collaborating to address challenges that are far too large and interconnected for one company to solve alone. Come join us in driving these OCP initiatives.

Written by:
Mark Roenigk

Director, Infrastructure

Omar Baldonado

Director, Engineering

Dharmesh Jani

Open Hardware Ecosystem Lead

We're hiring engineers!

Help us build infrastructure and solve big challenges at scale

Engineering

Meta’s engineering teams create the infrastructure and systems that underpin our apps and services, connecting more than 2 billion people.