Preparing for Aurora: Porting a Computational Chemistry Code to Exascale Architectures

July 8, 2021

July 8, 2021 — As part of a series aimed at sharing best practices in preparing applications for the Aurora supercomputer, ALCF is highlighting researchers’ efforts to optimize codes to run efficiently on graphics processing units.

Bringing computational chemistry into the exascale era

The NWChemEx project, when realized, has the potential to accelerate the development of next-generation batteries, drive the design of new functional materials, and advance the simulation of combustive chemical processes, in addition to addressing a wealth of other pressing challenges at the forefront of molecular modeling.

As the original NWChem code is some quarter-century old, the NWChemEx developers decided to rewrite the application from the ground up, with the ultimate goal of providing the framework for a next-generation molecular modeling package. The new package is capable of enabling chemistry research on a variety of leading-edge high-performance computing (HPC) systems. Prominent among these systems will be the forthcoming Aurora supercomputer, an exascale Intel-HPE machine to be housed at the Argonne Leadership Computing Facility (ALCF), a U.S. Department of Energy (DOE) Office of Science User Facility located at Argonne National Laboratory.

Support from sponsors including DOE’s Exascale Computing Project (ECP) and the ALCF’s Aurora Early Science Program (ESP) provided the opportunity to restructure core functionality—including the elimination of longstanding bottlenecks associated with the generally successful NWChem code—concurrent with the production of sophisticated physics models intended to leverage the computing power promised by the exascale era. In accordance with this strategy, the developers have adopted the Aurora-supported DPC++ programming model.

From a design point-of-view, the development team gives equal weight and consideration to physics models, architecture, and software structure, in order to fully harness large-scale HPC systems. To this end, NWChemEx incorporates numerous modern software-engineering techniques for C++, while GPU compatibility and support have been planned since the project’s initial stages, thereby orienting the code to the demands of exascale as matter of constitution.

In order to overcome prior communication-related bottlenecks, the developers have localized communication to the greatest possible extent.

Maximal flexibility

The developers pursue multiple approaches to achieve general compatibility. At the core of their work is NVIDIA-based development using the CUDA model. Part of the impetus for this was the availability of and access to GPUs consonant with the team’s experience, thereby maximizing their chances for an efficient and effective development process while accelerating the path to milestone successes.

Today the NWChemEx project encompasses programming models such as CUDA, HIP, and DPC++ in order to target various hardware accelerators. Moreover, the portability of DPC++ potentially makes it a portable programming model for future architectures. With DPC++, explicit control of memory management and data transfers can be scheduled between host and device. The NWChemEx project uses the newly introduced Unified Shared Memory (USM) feature from the SYCL 2020 standards. USM enables developers to work with pointers over the traditional use of buffers and accessors. Work is in progress to transition existing DPC++ code to other SYCL 2020 standards.

Tracking code performance

The transition to GPU has fundamentally altered the ways in which the developers think about how to structure data. Combined with their decision to rewrite the code from the ground up, this has enabled greater creativity and more opportunities to optimize NWChemEx’s ability to seamlessly run on both CPUs and GPUs; while GPUs have opened up new scales of computing power, they still unavoidably have limits and finite memory. This invites a division of labor.

To help localize communication and thereby reduce related bottlenecks, NWChemEx is being geared such that CPUs handle communication protocols as well as any other non-intensive components (that is conditional-structure-based algorithms). Anything else—anything “embarrassingly parallel” or computationally expensive—is to be processed by GPU.

In order to understand the degree to which the application is utilizing experimental hardware, the developers implement a multitiered analysis for tracking code performance.

As a first step, the developers regularly perform roofline analysis to determine the disposition and dependencies of their algorithms: Are they compute-bound? memory-bound? both?

Second, the developers in actuality perform computations on the relevant experimental hardware and compare them against theoretical performance capabilities. This identifies precisely how efficiently processors are being utilized. Finally, the developers conduct a postmortem analysis to pinpoint the origin of errors and establish the scope of improvement that theoretically can be expected.

Intel’s compatibility tool

For Intel hardware, the developers employ Intel’s DPC++ Compatibility Tool to port any existing optimized CUDA code and translate it to DPC++. The Compatibility Tool is sophisticated enough that it reliably determines apposite syntax in translating abstractions from CUDA to SYCL, greatly reducing the developers’ burden. Subsequent to translation, the developers finetune the DPC++ code to remove any redundancies or inelegancies introduced by automation.

The most crucial aspect to using the Compatibility Tool is that it translates—on a timescale ranging from minutes to hours, depending on complexity—entire projects, not just mere source codes or specific functions.

This two-step process—automated translation followed by manual finetuning—generates, from old CUDA code, performant DPC++ code that specifically targets Intel architectures.

NWChemEX was supported by the Exascale Computing Project (17-SC-20-SC), a joint project of the U.S. Department of Energy’s Office of Science and National Nuclear Security Administration, responsible for delivering a capable exascale ecosystem, including software, applications, and hardware technology, to support the nation’s exascale computing imperative.

Additional support for this project was provided by the National Energy Research Scientific Computing Center (NERSC) Exascale Science Applications Program (NESAP) and the Oak Ridge Leadership Computing Facility’s (OLCF) Center for Accelerated Application Readiness (CAAR), as well as by NVIDIA, Intel, and HPE. NERSC and the OLCF are U.S. Department of Energy Office of Science User Facilities.

Click here to learn more.


Source: Nils Heinonen, Argonne Leadership Computing Facility

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industry updates delivered to you every week!

A Q&A with Quantum Systems Accelerator Director Bert de Jong

September 30, 2024

Quantum technologies may still be in development, but these systems are evolving rapidly and existing prototypes are already making a big impact on science and industry. One of the major hubs of quantum R&D is the Q Read more…

How GenAI Will Impact Jobs In the Real World

September 30, 2024

There’s been a lot of fear, uncertainty, and doubt (FUD) about the potential for generative AI to take people’s jobs. The capability of large language models (LLMs) to answer questions and handle digital tasks when p Read more…

US Implements Controls on Quantum Computing and other Technologies

September 27, 2024

Yesterday the Commerce Department announced  export controls on quantum computing technologies as well as new controls for advanced semiconductors and additive manufacturing technologies. AIP’s FYI has posted a good Read more…

IBM Develops New Quantum Benchmarking Tool — Benchpress

September 26, 2024

Benchmarking is an important topic in quantum computing. There’s consensus it’s needed but opinions vary widely on how to go about it. Last week, IBM introduced a new tool — Benchpress — intended to help evaluate Read more…

Editor’s Note: Datanami Is Now BigDATAwire

September 26, 2024

Earlier this week, Datanami completed the transition to BigDATAwire. Loyal readers will notice that we began this journey nearly two years ago. And while the transition may have taken a little longer than expected, it’ Read more…

Launch Codes: Code@TACC Alum Lands at UT Austin

September 26, 2024

For new college graduates, finding a job after earning your degree can take months. And, if the labor market is struggling with inflation, employment opportunities can be scarce. Being patient, staying positive, and expl Read more…

How GenAI Will Impact Jobs In the Real World

September 30, 2024

There’s been a lot of fear, uncertainty, and doubt (FUD) about the potential for generative AI to take people’s jobs. The capability of large language model Read more…

IBM and NASA Launch Open-Source AI Model for Advanced Climate and Weather Research

September 25, 2024

IBM and NASA have developed a new AI foundation model for a wide range of climate and weather applications, with contributions from the Department of Energy’s Read more…

Intel Customizing Granite Rapids Server Chips for Nvidia GPUs

September 25, 2024

Intel is now customizing its latest Xeon 6 server chips for use with Nvidia's GPUs that dominate the AI landscape. The chipmaker's new Xeon 6 chips, also called Read more…

Building the Quantum Economy — Chicago Style

September 24, 2024

Will there be regional winner in the global quantum economy sweepstakes? With visions of Silicon Valley’s iconic success in electronics and Boston/Cambridge� Read more…

How GPUs Are Embedded in the HPC Landscape

September 23, 2024

Grasping the basics of Graphics Processing Unit (GPU) architecture is crucial for understanding how these powerful processors function, particularly in high-per Read more…

Google’s DataGemma Tackles AI Hallucination

September 18, 2024

The rapid evolution of large language models (LLMs) has fueled significant advancement in AI, enabling these systems to analyze text, generate summaries, sugges Read more…

Quantum and AI: Navigating the Resource Challenge

September 18, 2024

Rapid advancements in quantum computing are bringing a new era of technological possibilities. However, as quantum technology progresses, there are growing conc Read more…

Shutterstock_2176157037

Intel’s Falcon Shores Future Looks Bleak as It Concedes AI Training to GPU Rivals

September 17, 2024

Intel's Falcon Shores future looks bleak as it concedes AI training to GPU rivals On Monday, Intel sent a letter to employees detailing its comeback plan after Read more…

AMD Clears Up Messy GPU Roadmap, Upgrades Chips Annually

June 3, 2024

In the world of AI, there's a desperate search for an alternative to Nvidia's GPUs, and AMD is stepping up to the plate. AMD detailed its updated GPU roadmap, w Read more…

Shutterstock_2176157037

Intel’s Falcon Shores Future Looks Bleak as It Concedes AI Training to GPU Rivals

September 17, 2024

Intel's Falcon Shores future looks bleak as it concedes AI training to GPU rivals On Monday, Intel sent a letter to employees detailing its comeback plan after Read more…

Nvidia Shipped 3.76 Million Data-center GPUs in 2023, According to Study

June 10, 2024

Nvidia had an explosive 2023 in data-center GPU shipments, which totaled roughly 3.76 million units, according to a study conducted by semiconductor analyst fir Read more…

Granite Rapids HPC Benchmarks: I’m Thinking Intel Is Back (Updated)

September 25, 2024

Waiting is the hardest part. In the fall of 2023, HPCwire wrote about the new diverging Xeon processor strategy from Intel. Instead of a on-size-fits all approa Read more…

Ansys Fluent® Adds AMD Instinct™ MI200 and MI300 Acceleration to Power CFD Simulations

September 23, 2024

Ansys Fluent® is well-known in the commercial computational fluid dynamics (CFD) space and is praised for its versatility as a general-purpose solver. Its impr Read more…

Shutterstock_1687123447

Nvidia Economics: Make $5-$7 for Every $1 Spent on GPUs

June 30, 2024

Nvidia is saying that companies could make $5 to $7 for every $1 invested in GPUs over a four-year period. Customers are investing billions in new Nvidia hardwa Read more…

Everyone Except Nvidia Forms Ultra Accelerator Link (UALink) Consortium

May 30, 2024

Consider the GPU. An island of SIMD greatness that makes light work of matrix math. Originally designed to rapidly paint dots on a computer monitor, it was then Read more…

Shutterstock 1024337068

Researchers Benchmark Nvidia’s GH200 Supercomputing Chips

September 4, 2024

Nvidia is putting its GH200 chips in European supercomputers, and researchers are getting their hands on those systems and releasing research papers with perfor Read more…

Leading Solution Providers

Contributors

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

October 30, 2023

With long lead times for the NVIDIA H100 and A100 GPUs, many organizations are looking at the new NVIDIA L40S GPU, which it’s a new GPU optimized for AI and g Read more…

Quantum and AI: Navigating the Resource Challenge

September 18, 2024

Rapid advancements in quantum computing are bringing a new era of technological possibilities. However, as quantum technology progresses, there are growing conc Read more…

Google’s DataGemma Tackles AI Hallucination

September 18, 2024

The rapid evolution of large language models (LLMs) has fueled significant advancement in AI, enabling these systems to analyze text, generate summaries, sugges Read more…

Microsoft, Quantinuum Use Hybrid Workflow to Simulate Catalyst

September 13, 2024

Microsoft and Quantinuum reported the ability to create 12 logical qubits on Quantinuum's H2 trapped ion system this week and also reported using two logical qu Read more…

IonQ Plots Path to Commercial (Quantum) Advantage

July 2, 2024

IonQ, the trapped ion quantum computing specialist, delivered a progress report last week firming up 2024/25 product goals and reviewing its technology roadmap. Read more…

IBM Develops New Quantum Benchmarking Tool — Benchpress

September 26, 2024

Benchmarking is an important topic in quantum computing. There’s consensus it’s needed but opinions vary widely on how to go about it. Last week, IBM introd Read more…

US Implements Controls on Quantum Computing and other Technologies

September 27, 2024

Yesterday the Commerce Department announced  export controls on quantum computing technologies as well as new controls for advanced semiconductors and additiv Read more…

Intel Customizing Granite Rapids Server Chips for Nvidia GPUs

September 25, 2024

Intel is now customizing its latest Xeon 6 server chips for use with Nvidia's GPUs that dominate the AI landscape. The chipmaker's new Xeon 6 chips, also called Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire