AI Research SuperCluster is Meta’s New AI Supercomputer

Meta has just announced that they have created an AI supercomputer which they believe will become the world's fastest.

Developing the next generation of advanced AI will require powerful new computers capable of quintillions of operations per second. Today, Meta is announcing that we’ve designed and built the AI Research SuperCluster (RSC) — which we believe is among the fastest AI supercomputers running today and will be the fastest AI supercomputer in the world when it’s fully built out in mid-2022. Our researchers have already started using RSC to train large models in natural language processing (NLP) and computer vision for research, with the aim of one day training models with trillions of parameters.

What is under the hood of this massive machine?

AI supercomputers are built by combining multiple GPUs into compute nodes, which are then connected by a high-performance network fabric to allow fast communication between those GPUs. RSC today comprises a total of 760 NVIDIA DGX A100 systems as its compute nodes, for a total of 6,080 GPUs — with each A100 GPU being more powerful than the V100 used in our previous system. The GPUs communicate via an NVIDIA Quantum 200 Gb/s InfiniBand two-level Clos fabric that has no oversubscription. RSC’s storage tier has 175 petabytes of Pure Storage FlashArray, 46 petabytes of cache storage in Penguin Computing Altus systems, and 10 petabytes of Pure Storage FlashBlade.

It’s interesting to see that Meta opted for components from vendors vs. building their own. Hyperscalers typically build their own hardware to maximize efficiencies. I am certainly going to keep an eye on this project.

View original