ALPINE/zfp Addresses Analysis, Visualization, and Data Reduction Needs for Exascale Science Applications

Oct. 10, 2023 — With the advent of the exascale supercomputing era, computational scientists can run simulations at higher resolutions, add more detailed physical phenomena, increase the size of the physical problems, and couple multiple codes spanning both physical and temporal scales. These exascale simulations generate ever-increasing amounts of data. The Data and Visualization efforts in the US Department of Energy’s (DOE’s) Exascale Computing Project (ECP) provide an ecosystem of capabilities for data management, analysis, lossy compression, and visualization that enables scientists to extract insight from these simulations while minimizing the amount of data that must be written to long-term storage (Figure 1).

Figure 1. The ECP Data and Visualization products form an integrated workflow.

Big data is the conceptual link between the separate ALPINE and zfp projects that comprise the joint ALPINE/zfp ECP effort. The ALPINE project focuses on both post hoc and in situ infrastructures. The in situ approach delivers visualization, data analysis, and data reduction capabilities while the simulation is running. This approach takes the human out of the loop and can potentially move much of the post hoc analysis or visualization tasks from post hoc to in situ. ALPINE also has supported the development of a range of analysis algorithms over the course of the ECP. These algorithms often have the goal or side benefit of data reduction. The zfp project addresses the compute and I/O mismatch through floating point compression algorithms.

James Ahrens (Figure 2), project lead of ALPINE and L3 for the ECP Data and Visualization portfolio and staff scientist at Los Alamos National Laboratory (LANL), noted, “The purpose of ALPINE is to provide insight from massive data through general yet exascale-capable visualization and analysis algorithms.” Peter Lindstrom, computer scientist and project leader in the Center for Applied Scientific Computing at Lawrence Livermore National Laboratory (LLNL) and lead developer of zfp, noted that “zfp reflects a change in mindset in compression. The idea is to throw out the least significant floating point data bits via lossy compression acting in accord with user-defined error bounds. Further, let the application directly use the compressed data by making the algorithms both fast and amenable to hardware acceleration.”

Figure 2. James Ahrens (left) and Peter Lindstrom (right).

These two projects, with ECP funding, address key software development needs for exascale science applications:

ALPINE
- Deliver exascale visualization and analysis algorithms that will be critical for ECP applications as the dominant analysis paradigm shifts from post hoc (postprocessing) to in situ (processing data in a code as it is generated).
- Deliver exascale-capable infrastructure for the development of in situ algorithms and deployment into existing applications, libraries, and tools.
zfp
- Deliver lossy compression through zfp—an open-source library for compressed floating point arrays that supports very high throughput read and write random access.
- Engage with ECP science applications to integrate zfp capabilities into their software, including variable-rate CUDA compression support, a HIP backend, support for 4D arrays, and new C and Python APIs for interacting with zfp’s C++ compressed-array classes.

The ECP Software Technology (ST) focus area is designed to support a diverse ecosystem of software products in a cohesive and collaborative software stack that emphasizes interoperability and sustainability for high-performance computing (HPC) and national security applications. Both ALPINE and zfp capabilities are part of this ECP ST focus area.

For easy deployment in a consistent system environment, these components are available through the Extreme-scale Scientific Software Stack (E4S) so they can be easily deployed as binaries or built on most computer systems from laptops to supercomputers.[1] Furthermore, the E4S software distribution is tested regularly on a variety of platforms, from Linux clusters to leadership platforms, to ensure performance and correct operations.[2]

ALPINE infrastructures include ParaView and VisIt for post hoc visualization and Catalyst and Ascent for in situ use cases. ParaView, Catalyst, and VisIt represent long-term investments by DOE, whereas Ascent is a new lightweight infrastructure developed under the ECP. Ascent supports a diverse set of simulations on many-core architectures. It provides a streamlined interface that minimizes the resource impacts on host simulations with minimal external dependencies.

These infrastructures integrate with co-design libraries, I/O capabilities, and compression capabilities, whereas zfp makes itself available through I/O technologies and other ECP libraries. Using only documentation and tutorials, external groups have adopted E4S software components, starting from scratch and using only internal resources. The uptake of Ascent demonstrates the success of the ALPINE ECP development and Ascent efforts. Such successes also indicate that short time to delivery and limited budget projects can successfully utilize Ascent and projects in which collaborative efforts are discouraged or forbidden.

To continue reading, click here.

Source: ECP