Aurora, one of the first three U.S. exascale supercomputers, has not had a straightforward path to installation and operation. The system has been repeatedly reconceptualized and rescheduled over the years, with delivery slipping from 2021 to 2022, then from 2022 to 2023. Now, at long last, there seems to be a light at the end of the tunnel: Intel has announced that all of Aurora’s 10,624 compute blades have been installed at Argonne National Laboratory.
The HPE-built Aurora supercomputer will deliver over two peak exaflops of computing power, an increase over its target when it was slated for an earlier delivery date. (For comparison, the already-operational Frontier system at Oak Ridge National Laboratory currently has a peak rating around 1.68 exaflops.) Powering the exascale supercomputer: 10,624 nodes, each equipped with sextuple Intel Xeon Max series GPUs (née Ponte Vecchio) and dual Intel Xeon Max series CPUs (née Sapphire Rapids with HBM).
![](https://www.hpcwire.com/wp-content/uploads/2020/10/Aurora_System_v3_09-16-20_7560x3744_environment-3.jpg)
Intel also seemingly confirmed plans to debut Aurora on the November Top500 list, saying that “later this year, Aurora is expected to … achieve a theoretical peak performance of more than two exaflops … when it enters the Top500 list.” This will make for an exciting SC23 conference in Denver; the world’s only other publicly confirmed exascale system, Frontier, debuted across the pond at ISC 2022 in Hamburg, Germany.
At various points in Aurora’s timeline, each of the new Intel components had served as a major bottleneck, with the company repeatedly pushing timelines over the years. Perhaps most dramatically, Aurora was first outfitted with the non-HBM variant of the Sapphire Rapids CPU, after which those CPUs were removed and replaced with the HBM-equipped “Max series” variant.
Despite the tumultuous process, Intel’s release is unambiguous: the supercomputer, they say, is fully equipped with its 10,624 compute blades, with “64,744 Intel … GPU Max series and 21,248 Intel Xeon CPU Max series processors.” Those blades are spread across 166 racks, each equipped with 64 blades, themselves spread across an area around the size of two basketball courts. Intel stressed the difficulty of the process in its statement, calling the installation “a delicate operation, with each 70-pound blade requiring specialized machinery to be vertically integrated into Aurora’s refrigerator-sized racks.”
![](https://www.hpcwire.com/wp-content/uploads/2023/06/Intel-Argonne-installation-3.jpg-600x338.jpg)
“Aurora is the first deployment of Intel’s Max series GPU, the biggest Xeon Max CPU-based system, and the largest GPU cluster in the world,” said Jeff McVeigh, corporate vice president and general manager of the Super Compute Group at Intel. “We’re proud to be part of this historic system and excited for the ground-breaking AI, science, and engineering Aurora will enable.”
(The first part of that statement may not be strictly true, depending how pedantic you want to be: Argonne’s testbed system for Aurora, Sunspot, has been running the same tech for some months now.)
Speaking of Sunspot, Intel also confirmed that users will soon begin to migrate their workloads from the testbed system to the full Aurora system. Work on Aurora will span countless domains, but Intel stressed three that seem to be dominating many supercomputer announcements of late: climate modeling, drug discovery and generative AI. Argonne, of course, has a strong background in applying supercomputing to climate and life sciences problems. At ISC 2023, the lab announced plans to use Aurora for generative AI tasks: specifically, a large-language model for scientific computing called AuroraGPT, backed by a trillion parameters and trained on scientific data of various kinds.
“While we work toward acceptance testing, we’re going to be using Aurora to train some large-scale open source generative AI models for science,” said Rick Stevens, associate lab director at Argonne. “Aurora, with over 60,000 Intel Max GPUs, a very fast I/O system, and an all solid-state mass storage system, is the perfect environment to train these models.”
It remains a long road for Aurora: “working toward acceptance testing” includes a lot of additional steps, and then acceptance testing itself can take a very long time for a system such as this; Frontier wasn’t accepted until around seven months after it debuted.
That said: we can’t wait to see Aurora in action.
Header image: a member of the installation team delivers the last blade on a specialized trolley. Image courtesy of Argonne National Laboratory.