Last year we introduced Amazon EC2 UltraClusters of P4d instances, which put more than 4,000 NVIDIA A100 GPUs on a petabit-scale, non-blocking network, and we made them available to anyone with a model to train and a problem to solve. The feedback from customers has been great, and reinforced the need to keep pushing the boundaries of our imagination when working with them to solve hard problems.
In recent months, we’ve been busy making our Elastic Fabric Adapter (EFA) a mainstream technology in newer EC2 instance families so customers can truly optimize compute environments for their HPC codes by choosing from instance families with widely different characteristics. It led us to launch new Intel Ice Lake-based M6i and C6i instances – the C6i offers up to 15% better price-performance than C5, and comes with EFA support out of the box. We also launched the DL1 Habana-based instance, which offers up to 40% better price performance for training deep learning models. It also supports EFA.
EFA has been an important enabler for us, and along with the Nitro System, is accelerating the creation of this broad selection of instance families. It’s also let us balance our efforts between seeking performance from hardware advances, and productivity driven by software improvements. We’re conscious that HPC is a tool used by humans, and the productivity of those humans is the real measure of success.
All of this brings us to today’s announcements, from each of these two important areas.
Introducing Hpc6a
While existing customers have loved our range of HPC offerings, we know they’re also focused on lowering costs. Some workloads depend on a cost factor more than others.
So today we’re excited to announce the upcoming availability of a new HPC-optimized EC2 Hpc6a instance, with the best price-performance for running compute-intensive HPC workloads in Amazon EC2. The Hpc6a uses AMD’s 3rd generation EPYC (Milan) processors, and offers up to 65% better price-performance over comparable x86-based compute-optimized instances. And of course, it comes with 100 Gb/s EFA to run MPI applications at scale.
There’s been a lot of engineering work in Hpc6a to satisfy a broad range of HPC workloads. We’ll have more on that, including detail specs, pricing, and regional availability, at launch.
NICE EnginFrame
We’re also happy to announce the upcoming availability of NICE EnginFrame with support for hybrid environments. EnginFrame customers will be able to manage their HPC workflows both on-premises and in-cloud environments through a single, unified interface.
Customers have been telling us they want to maximize the returns from their existing investments in on-premises systems. For some time, EnginFrame has been helping to make HPC systems easier to use and has become a powerful productivity lever for many scientific and engineering organizations, whether they’re running on-premises or in the cloud.
As one part of this approach, EnginFrame integrates tightly with NICE DCV, our high-performance remote display protocol. DCV provides customers with a secure way to deliver remote desktops and application streaming from any server (on-premises or in the cloud) to any device, over varying network conditions. Companies as diverse as Volkswagen and Netflix use DCV to power their workforces, and it’s a powerful reminder that making HPC easier to use is crucial to applying it to a greater range of problems.
But we wanted the elasticity of the cloud to be tightly woven into EnginFrame. And we think EnginFrame’s productivity benefits should flow through to infrastructure provisioning, too – not just the application pathways. So, we’ve been hard at work this year enhancing EnginFrame’s cloud capabilities – and leveraging the new cluster API layer we launched with AWS ParallelCluster 3 in September, as preparation for this.
Read the full blog to learn more about the upcoming Hpc6a instance and EnginFrame hybrid support.
Reminder: You can learn a lot from AWS HPC engineers by subscribing to the HPC Tech Short YouTube channel, and following the AWS HPC Blog channel.