Last year’s Supercomputing 2023 in November had record attendance, but the direction of high-performance computing was a hot topic on the floor. Expect more of that at the upcoming ISC High Performance 2024, which is happening next week (May 12-16) in Hamburg, Germany.
The questions revolve around computing systems that merge conventional systems and AI. Power efficiency will be an important part of the conversations.
Students showed the promise of HPC in research posters, with innovative ideas on how to tackle scientific problems with computing resources. The students showed an understanding of mixing AI and conventional precision processing.
Scientists are also active participants in coding. An emerging workforce of research managers who connect researchers with IT staff to allocate computing resources are taking on larger roles in research.
The conference is a showcase for European progress in high-performance computing, but the question of China, a supercomputing powerhouse, will dim the show.
Top500
ISC will kick off with the Top500 list of the fastest supercomputers, which will be released in a few days.
The most recent list had the 1-exaflop Frontier (based on AMD’s GPUs) at Oak Ridge Leadership Computing Facility retaining the top spot. In second place was a new entrant – Aurora (based on Intel’s GPUs) – at Argonne National Laboratory.
Aurora was partially benchmarked and could take the top spot on the upcoming Top500 list. Microsoft’s Eagle system in Azure cloud was a surprise entry at number 3.
Whether more private cloud companies put high-performance AI servers on Top500 remains an open question.
An Amazon executive at SC 2023 told HPCwire that an HPL benchmarking run specifically for Top500 was attractive but also complicated and costly. The company would keep its HPC systems dedicated to commercial needs.
Green500
Top500 officials at SC 2023 said there was a growing interest in the Green500 metric, which measured power efficiency of large-scale systems. Presentations around power efficiency attracted many attendees.
A study released in March this year showed HPC systems becoming more power efficient relative to performance increases over the last decade.
The interest in green computing goes beyond just climate change, the researchers said.
“In 2008, energy efficiency became a concern in HPC. The monetary cost of running HPC systems exceeded the cost of purchasing and maintaining them. This change highlighted the importance of improving the energy efficiency of the upcoming supercomputers,” the researchers said.
Researchers suggested HPC systems are on the right track with hybrid systems with GPUs and other computing options.
“Heterogeneous systems are potentially more energy efficient but require more human expertise to be fully exploited,” the researchers said.
Green500 could be an interesting benchmark going forward. Cooling systems had a major presence at SC 2023, and that trend could continue on the ISC 2024 floor. A Green500 bird-of-feather session will be hosted on May 15.
I/O and Storage
There’s still debate around the I/O and bandwidth requirements of AI, and some startups are addressing it. Lustre is an established default high-performance file system, but alternatives are emerging.
DAOS, which has existed since 2012, received major backing from Google late last year and is now emerging as an alternative high-performance file system designed for modern computing.
A number of alternative file systems and storage technologies, such as Weka’s WekaFS, which can connect to Lustre, and Vast Data Systems’ unconventional integrated storage system called Universal Storage, are emerging as alternatives.
Exascale and AI bring whole new storage and memory requirements to HPC, and the discussions could be around the various file systems, scalability, cost, integration, compatibility, and costs.
Hardware and Chips
Nvidia wasn’t present at the SC 2023 conference, but it couldn’t be ignored. System integrators and partners were doing the talking for them.
Intel, AMD, and Nvidia are having press conferences ahead of the show, so expect some news.
Europe discussed its hybrid exascale systems at the recent EuroHPC summit, which may fit into the larger conversation about system elasticity. Europe’s upcoming supercomputer, Jupiter, will be modular so AI and quantum accelerators can be added to systems.
Europe’s new exascale system, called Jules Vernes, will also be in the ISC 2024 conversations. The system will use SiPearl’s Rhea-2 chip and will be released in 2026.
Will China Be There?
The lack of Chinese exascale systems in the Top500 November list put the organizers in a bind, as the list didn’t represent a true picture of the fastest systems in the world.
China has put a Cold War-style iron curtain on its fastest systems, most likely in response to geopolitical pressure and a focus on self-reliance. There have been sporadic updates on systems in the country.
The assumption is China wants to shield its computing infrastructure from the U.S. The U.S. has banned the exports of high-tech AI and computing chips to China, and the country is now building chips internally for tech sovereignty.
The lack of submissions to the Top500, especially from China, concerns the organizers as older systems are now holding their place much longer on the list.
The number of systems submitted to Top500 has declined since 2017, and the average performance has also declined. The average age of a Top500 system is now 30 months, which has doubled from an average age of 15 months in 2018-2019.
The Top500 committee has tried to provide some visibility on systems in the country. China submitted an entry for the Gordon Bell awards for a supercomputer that could max out at 1.5 exaflops, but there could be many more exascale systems in the country, according to various online reports.
Chip and Hardware Makers
Intel’s participation in making exascale systems will likely pause at Aurora, which has the Ponte Vecchio GPU. Intel’s next HPC GPU, called Falcon Shores, will come at the end of 2025. Intel has refocused operations on customer needs instead of chasing supercomputing systems. The company is also pushing its Gaudi AI chips – which are ASICs – as its primary AI chip.
An absent Intel means AMD will have a competitive advantage with its MI300 GPUs, which will power the 2-exaflop El Capitan system at the Lawrence Livermore National Laboratory. Nvidia’s Blackwell GPUs, coming next year, will dominate high-performance systems.
The redefinition of HPC will be visible on the floor as more computing moves to AI. Nvidia’s Blackwell GPU announcement also signaled a shift to mixed precision and a move away from high-precision techniques.
ISC 2024 will also include a BoF session on the RISC-V ecosystem. This open instruction set architecture is gaining favor in China, Russia, and Europe as an alternative to the x86 and ARM architectures.
The number of supercomputer makers has also come down to two players. Supercomputer maker Atos is in financial trouble, and its spinoff, Eviden, is releasing AI products. HPE will remain the only competitive player standing.