High-performance computing (HPC) has played a major role in advancing scientific research for decades using extremely large datasets and sophisticated modeling that mimics the physical world. Rapidly advancing is the ability to complement the power and capabilities of HPC with artificial intelligence (AI) to accelerate innovations and deliver faster outcomes.
I had the opportunity to talk with Radhika Rao, Senior Director of Data Center GPU Product Management at Intel, to discuss this AI transformation and its impact on the HPC landscape.
Q: With the rise of AI, what do HPC leaders and developers need to consider around AI? Why now?
Rao: In the last year alone, we’ve seen an explosive growth in using of AI in all industries. In HPC, we have reached a pivotal moment where we’re seeing a true HPC and AI convergence. We have been talking about for a long time, but we can see that happening now. AI is helping advance models and codes in physics, weather, manufacturing, and many more areas.
What is making this so relevant now is that AI has become mainstream due to the popularity and wide-scale use of ChatGPT (large language models) and generative AI. This trend is making it more important to view HPC and AI as a converged space to drive advances in science.
Q: What are the evolving requirements HPC leaders should consider when looking to invest in the next-gen environments for accelerating HPC and AI?
Rao: HPC workloads have traditionally had a very specific CPU-to-GPU ratio and compute profile. In the last couple of years, we’ve seen models change and become far more dynamic with increasing compute and scale requirements. As a result, the need for architecture flexibility to run these data-intensive workloads across heterogeneous environments has become critical, as well as the need for increased memory bandwidth and memory capacity. Another area to consider is sustainability requirements with respect to power and environmental impact. When building out large clusters to solve such problems as climate change and sustainability, you don’t want to be part of the problem. We must consider the sustainability footprint of the data center and new technology investments to ensure they are not creating a negative impact on the environment (see June article on Top Considerations for HPC, AI and Sustainability (hpcwire.com)).
Q: How has Intel’s portfolio advanced to address this HPC and AI convergence?
Rao: CPUs, and particularly Intel x86 process technologies from Intel, have been the backbone of HPC systems for decades. And now we are seeing powerful AI capabilities being infused into every aspect of compute, including in the HPC space. Intel’s CPUs are now complemented with a variety of built-in and discrete accelerators and GPUs. For example, the built-in Advanced Matrix Extensions (AMX) built into the 4th Gen Intel® Xeon® Scalable processors deliver 10x higher inference and training performance.1 Intel® Data Center GPU Max Series delivers up to 2x performance gain on HPC and AI workloads over competition.2 Recent MLPerf AI inference results spotlight the Intel® Gaudi®2 accelerator as the only viable alternative on the market for dedicated AI compute needs. Additionally, Intel is the only vendor to submit public CPU results on 4th Gen Intel Xeon and Intel Xeon Max Series with industry-standard, deep-learning ecosystem software.
Our portfolio is supported by a full suite of AI and HPC software development tools. Developers have traditionally been required to be use proprietary software to code and run AI and HPC models specific to each platform. With a new suite of open-sourced software, such as the Intel oneAPI toolkit, developers now have freedom of choice. They can program once and then run the code on different hardware, even shifting the underlying hardware mix over time to suit the needs of a particular workload. The oneAPI programming model supports Intel’s full hardware portfolio, as well as solutions from competitors.
Q: What are examples of some of how Intel is working across the ecosystem on this HPC and AI convergence?
Technology adoption is the key to converging HPC and AI into one system to advance scientific research. One example is the work we are doing with the Aurora Exascale Supercomputer at the Argonne Leadership Computing Facility (ALCF), a Department of Energy Office of Science User Facility at Argonne National Laboratory, and Hewlett Packard Enterprise. Aurora, being built on the full Intel® Max Series CPUs and GPUs, will offer researchers high computing speed and artificial intelligence capabilities to enable science that is not possible today. Earlier this year, Intel and Argonne National Lab announced the full Aurora specifications and efforts (with partners) to bring the power of generative AI and large language models (LLM) to science and society.
Beyond Aurora, there is much work being done to bring HPC and AI together. We have several software partners that are using oneAPI on Intel hardware to bring AI into some of the places that are very specific to HPC use cases. One example is Ansys who is combining the power of both the Intel Max Series GPUs and 4th Gen Intel Xeon processors to add AI capabilities into their applications. We are also deeply engaged with the AI and HPC software ecosystem, optimizing popular developer tools like Pytorch and Tensorflow.
Q: What’s one piece of advice you have for HPC leaders looking to invest in AI?
Rao: When adding AI capabilities to an HPC environment, the last thing anyone wants is to incur more costs or incur delays due to complex codes having to be ported from one programming model to another. Intel has made significant investments in both the hardware and software needed to run, scale and protect investments in modern HPC centers. The convergence of HPC and AI is making it even more important to adopt open standards, like oneAPI, so researchers can focus on delivering scientific breakthroughs faster and with greater precision.
One of the newest ways HPC technologists and developers can build, test and optimize AI and HPC applications is on the newly launched Intel® Developer Cloud. The Intel Developer Cloud provides developers access to the latest Intel HPC and AI technologies, including Intel Gaudi2 processors for deep learning, and the latest Intel hardware platforms, such as the 5th Gen Intel® Xeon® Scalable processors and Intel® Data Center GPU Max Series 1100 and 1550.
Learn more about how Intel’s HPC and AI portfolio is helping customers achieve outstanding results for demanding workloads and the complex problems they solve here.
1See [A16] and [A17] at intel.com/processorclaims: 4th Gen Intel® Xeon® Scalable processors. Results may vary.
2Visit intel.com/performanceindex (Events: Supercomputing 22) for workloads and configurations. Results may vary.