Oct. 14, 2024 — The 2024 Gordon Bell Prize (GBP) attracted a number of outstanding submissions, representing the accomplishments of teams from around the world, from which the Association for Computing Machinery (ACM) GBP Award Committee selected six finalists. Chester Gordon Bell (pictured above) is the namesake of the prestigious award that honors outstanding achievements in HPC each year.
The Gordon Bell Prize emphasizes innovation in applying HPC to applications in science, engineering, and large-scale data analytics, typically demonstrated using state-of-the-art technologies and the leading supercomputing platforms.
The award may take into account the importance of the problem tackled, and the likely impact of the methods and optimizations introduced. It may make an award based on peak performance, or special achievements in scalability and time-to-solution on important science and engineering problems.
“With its focus on the innovative use of HPC to achieve important scientific results, the Gordon Bell Prize rewards outstanding accomplishments in computational science,” Barbara Chapman, Chair of the ACM Gordon Bell Prize committee, said. “By disseminating new algorithms, computational techniques, and programming methodologies, it also helps shape this field. It is an enormous privilege to participate in the selection of the winner, and indeed of all the finalists, since each of them has made an extraordinary contribution to science.”
The accomplishments described by the Gordon Bell finalists for 2024 represent the increasingly diverse landscape of HPC technologies and include exciting new applications of HPC along with creative approaches that help address computationally challenging problems on a scale not tackled before.
These efforts have improved the state of the art with respect to the use of HPC in problems from materials science, biochemistry, protein design, genomic studies, and LLM behavioral studies. To achieve this, the finalists have built and deployed end-to-end workflows, innovated in the use of adaptive mixed precision, designed new algorithms and greatly improved existing algorithms and techniques, and demonstrated the benefits for HPC of a platform designed for AI workloads.
The hardware systems used to perform this work include multiple notable world-class computers: Alps (CSCS, Switzerland), Aurora (ANL, USA), Cerebras WSE (Cerebras HQ, USA), Frontier (ORNL, USA), Leonardo (EuroHPC/Cineca, Italy), new Sunway System (Wuxi, China), PDX (NVIDIA, USA), Perlmutter (NERSC, USA), Quartz (LLNL, USA), and Summit (ORNL, USA).
This Year’s Six Finalists
The following briefly describes the work performed by this year’s Gordon Bell Prize finalists.
MProt-DPO[CB3]: Breaking the ExaFLOPS Barrier for Multimodal Protein Design Workflows with Direct Preference Optimization
This novel work presents a scalable, multimodal workflow for protein design that trains an LLM to generate protein sequences, computationally evaluates the generated sequences, and then exploits them to fine-tune the model. Direct Preference Optimization steers the LLM toward the generation of preferred sequences, and enhanced workflow technology enables its efficient execution. A 3.5B and a 7B model demonstrate scalability and exceptional mixed precision performance of the full workflow on ALPS, Aurora, Frontier, Leonardo and PDX.
Authors: Gautham Dharuman, Kyle Hippe, Alexander Brace, Sam Foreman, Väinö Hatanpää, Varuni K. Sastry, Huihuo Zheng, Logan Ward, Servesh Muralidharan, Archit Vasan, Bharat Kale, Carla M. Mann, Heng Ma, Yun-Hsuan Cheng, Yuliana Zamora, Shengchao Liu, Chaowei Xiao, Murali Emani, Tom Gibbs, Mahidhar Tatineni, Deepak Canchi, Jerome Mitchell, Koichi Yamada, Maria Garzaran, Michael E. Papka, Ian Foster, Rick Stevens, Anima Anandkumar, Venkatram Vishwanath, Arvind Ramanathan.
Affiliations: Argonne National Laboratory, California Institute of Technology, CINECA, NVIDIA Corp., University of California at Berkeley.
Pushing the Limit of Quantum Mechanical Simulation to the Raman Spectrum of a Protein of 100 Million Atoms
This team has designed and implemented a quantum-fragmentation-based algorithm to produce the QF-RAMAN code. By exploiting chemical locality, it is able to handle a biological system containing 100 million atoms, consisting of fragments of the SARS-CoV-2 Spike glycoprotein in water. The protein’s Raman spectrum is compared with experimental data. The performance data indicate nearly linear weak and strong scaling on two architecturally distinct supercomputers: the 24,000 GPUs of ORISE and the 37,440,000 cores of the new Sunway supercomputer.
Authors: Honghui Shang, Ying Liu, Zhikun Wu, Zhenchuan Chen, Jinfeng Liu, Meiyue Shao, Yingzhou Li, Bowen Kan, Huimin Cui, Xiaobing Feng, Yunquan Zhang, Donald G. Truhlar, Hong An, Xiao He, Jinlong Yang.
Affiliations: University of Science and Technology of China, Chinese Academy of Sciences, China Pharmaceutical University, University of Minnesota, East China Normal University.
Toward Capturing Genetic Epistasis From Multivariate Genome-Wide Association Studies Using Mixed-Precision Kernel Ridge Regression
This team developed an output accuracy-preserving method for exploiting low-precision data types in matrix computations and used it to create a highly efficient Cholesky-based solver. Their tile-centric adaptive precision matrix operations, and task-based execution, enabled the largest-ever Genome-Wide Association Studies (GWAS) of 305K patients from a real data set, using a multivariate approach to identify genetic risk factors. Its outstanding scaling and very high mixed-precision performance was demonstrated on 8,100 GPUs of Alps, 36,100 GPUs of Frontier, 4096 GPUs of Leonardo, and 18432 GPUs of Summit.
Authors: Hatem Ltaief, Rabab Alomairy, Qinglei Cao, Jie Ren, Lotfi Slim, Thorsten Kurth, Benedikt Dorschner, Salim Bougouffa, Rached Abdelkhalak, David E. Keyes.
Affiliations: KAUST, Massachusetts Institute of Technology, Saint Louis University, NVIDIA Corp.
Breaking the Molecular Dynamics Timescale Barrier Using a Wafer-Scale System
This team has created an Embedded Atom Method (EAM)-based molecular dynamics code that exploits the ultra-fast communication and high memory bandwidth afforded by the 850,000 core-Cerebras Wafer-Scale Engine. It attains perfect weak scaling across the full system for grain boundary problems involving copper, tungsten and tantalum atoms, and can extend to multiple wafers. For problems up to 800,000 atoms, it calculates significantly more timesteps per second than EAM in LAMMPS on Quartz and Frontier, directly benefiting the modeling of phenomena that emerge at long timescales.
Authors: Kylee Santos, Stan Moore, Tomas Oppelstrup, Amirali Sharifian, Ilya Sharapov, Aidan Thompson, Delyan Z. Kalchev, Danny Perez, Robert Schreiber, Scott Pakin, Edgar A. Leon, James H. Laros III, Michael James, Sivasankaran Rajamanickam.
Affiliations: Cerebras Systems, Sandia National Laboratories, Lawrence Livermore National Laboratory, Los Alamos National Laboratory.
Democratizing AI: Open-source Scalable LLM Training on GPU-based Supercomputers
This work presents AxoNN, a scalable, portable, open-source framework for training and fine-tuning large language models (LLMs) and describes optimizations applied by the team that enable it to handle LLMs with hundreds of billions to trillions of parameters. They provide results of a study on the potential for the memorization of training data by large LLMs, using AxoNN on up to 405 billion parameters on Frontier. Their evaluations show the exceptional scaling and performance attained when training GPT-style transformer models with up 640 billion parameters on Alps, Frontier and Perlmutter.
Authors: Siddharth Singh, Prajwal Singhania, Aditya Ranjan, John Kirchenbauer, Jonas Geiping, Yuxin Wen, Neel Jain, Abhimanyu Hans, Manli Shu, Aditya Tomar, Tom Goldstein, Abhinav Bhatele.
Affiliations: University of Maryland, Max Planck Institute for Information Systems, University of California, Berkeley.
Breaking the Million-Electron and 1 EFLOP/s Barriers: Biomolecular-Scale Ab Initio Molecular Dynamics Using MP2 Potentials
This work describes a novel approach for accurately simulating complex biochemical phenomena via biomolecular-scale Ab Initio Molecular Dynamics simulations at quantum molecular wave function level. Multiple algorithmic innovations were employed to overcome the computational challenges posed by the use of the second-order Moller-Plesset perturbation theory. Evaluated using biomolecules with up to 2,043,328 electrons, the code exhibits high parallel efficiencies on the full Perlmutter and Frontier systems and sustained an unprecedented 59% of FP64 peak performance on Frontier.
Authors: Ryan Stocks, Jorge L. Galvez Vallejo, Fiona C. Y. Yu, Calum Snowdon, Elise Palethorpe, Jakub Kurzak, Dmytro Bykov, Giuseppe M. J. Barca.
Affiliations: University of Melbourne, Australian National University, AMD Inc., Oak Ridge National Laboratory.
Source: Barbara Chapman, SC24