SC21 Panel on Programming Models – Tackling Data Movement, DSLs, More

By John Russell

January 6, 2022

How will programming future systems differ from current practice? This is an ever-present question in computing. Yet it has, perhaps, never been more pressing given the rise of heterogeneous architectures and diverse hardware, the steady incorporation of AI technology, and the proliferation of new programming languages and models.

At SC21, a distinguished panel tackled this broad question. Higher levels of abstraction, a clearer focus on data movement – not compute functions – and the rise of domain-specific languages as important tools were among the dominant points of discussion, which touched on topics as diverse programming Cerebras’s wafer-scale chip to FPGAs.

Moderated by Hal Finkel (DOE), the panelists included Kathy Yelick (UC Berkeley), Saman Amarasinghe (MIT), Torsten Hoefler (ETH Zürich), Maya Gokhale (LLBNL) and Justin Gottschlich (Intel). Capturing the full discussion is too daunting, but each panelist made an opening statement that captures (at least directionally) much of their thinking. Presented here are brief portions (lightly edited) of panelists’ opening remarks.

Kathy Yelick. Image courtesy of Berkeley Lab Computing Sciences.

Yelick, who just assumed her new role as vice chancellor of research at LBNL, kicked off the panel saying, “[In] scientific computing, in general, I think we should think about how people are programming at much higher level of abstraction than we’re used to. I think if you look at machine learning, and the packages that people have built for machine learning, they’ve really shown that you can, with a lot of work in terms of how you implement some of those underlying algorithms, get very good performance out of those.

“That opens up HPC-type of access to a much broader community of people if they can program at the level of something like TensorFlow. And I’d like people to also think a little bit about systems like Julia and Jupyter notebooks as really the interface to the computers, rather than thinking about programming and languages based on things like C/C++ or Fortran. So really, I’m going to be advocating for a much higher level of abstraction, which is not to say that some of us won’t still be programming at a much lower level.”

Next up was Amarasinghe, who leads the compiler research group in MIT’s Computer Science & Artificial Intelligence Laboratory (CSAIL). A leader in the field of high-performance domain-specific languages, Amarasinghe’s group developed the Halide, TACO, Simit, and many other domain-specific languages and compilers,

“If you think about domain-specific languages, [it’s] not too much of a stretch – even if you say you are a C programmer, or Fortran programmer or Python programmer – to say nobody writes loops and arrays and low level things in these languages. We all use libraries. All the systems are based on libraries and that means you’re already programming in higher level abstraction with one caveat. These libraries don’t have understanding of how the entire thing is connected together. So, when you call a library function, it’s a standalone thing; it will do what’s asked and return,” he said.

“What a domain-specific language or domain-specific compiler does is, it can figure out the control flow between these library calls, understand how these things get stitched together and use that to begin to optimize performance. This is especially important now and for the future, because memory systems and data movement are becoming a really important issue,” said Amarasinghe.

Perhaps the most forceful champion for focusing on data movement in future programming development was Hoefler, who directs the Scalable Parallel Computing Laboratory (SPCL) at ETH Zurich. He argued counting FLOPS, as is done in ranking the Top500, misses the point in modern computing.

Commenting on the use of new large models such as GPT-3, he said, “Many companies are spending 10s of millions of dollars to train these models, and these are real HPC problems. They are the largest models people have trained [and] very much [what] we care about. We actually analyzed the workload a little bit more in detail. We found that the 99.8 percent of the floating-point operations in this workload is actually comprised of Tensor contractions [and] Tensor contractions are all expressed as matrix multiplication.

“So, this is wonderful, isn’t it? 99.8 percent of this workload is matrix multiplication. But if you actually look at the remaining 0.2 percent of operations in this workload, [it] turns out those are taking about 40 percent of the runtime. [That’s] because these Tensor contractions have been super highly-optimized over the years. The problem now [that] dominates everything else is data movement. We did some optimizations that I don’t want to go into detail about that show that you can actually speed this up quite significantly, and you can save millions of dollars by just looking at data movement,” said Hoefler.

Gottschlich, who is a principal AI scientist at Intel Labs and the director and founder of the machine programming research group at Intel, noted how Intel’s perspective on programing models has changed.

“When I joined back in 2010, Intel was very much a monolithic computing company, it was just a CPU. As I suspect everyone in the audience knows, we now consider ourselves to be very heterogeneous,” he said. “One of the core challenges we see today is not so much in the compute, but in the data movement. So, I just wanted to quickly acknowledge that I think the data movement, and figuring out how to deal with that, especially as we grow into deeper stochastic systems that tend to be improving their accuracy, as you have more IID data (independent and identically distributed data), that it becomes even more important that we figure out how to handle that that data movement problem.”

“Back in 2018, we published this paper, actually jointly with Saman (Amarasinghe) and some others, on the three pillars of machine programming. Machine programming is principally this idea that we are going to try to automate the development of software, and a byproduct of that is the automation of development of hardware given that much of hardware is developed through software. The three pillars are intention, invention and adaptation. Intention is principally concerned with trying to identify novel ways or improve the existing ways for programmers to specify their ideas to the machine. So, going back to, I think, both Kathy and Saman’s comments about higher order abstractions, and DSLs. In fact, I fully agree with this. I think that as we move forward, I suspect that to get outstanding performance, we really need to have this separation of intention from invention and adaptation. Once the intention is understood by the machine, then we can start to invent the algorithms and data structures that are necessary to fulfill that intention.”

Last to deliver intro remarks was Gokhale, distinguished member of technical staff at LLNL and an expert in reconfigurable computing and data intensive architectures.

“I feel as if we’re in a fix right now with a fusion of programming models and it’s because of scaling laws, which we all know very well, between the feature size and the power. What we’ve done is build specialized widgets, that do a smaller thing, but do it very well rather than a general-purpose thing. That is a cause of a lot of problems. [It’s] one factor that is leading us to a lot of new ideas in programming models, this idea of specialization and putting heterogeneous pieces together,” said Gokhale.

“To me, the future is system-on-chip (OSC) like environments. So, heterogeneous compute models, data and or control-driven, tightly or loosely-coupled. [For example,] if you’ve worked for Apple or worked on cell phones, that SOC environment. I have a background in reconfigurable computing with FPGAs that is the combination of SOC-like environment and higher level programming. It’s a difficult environment to work in, but I see that’s where we’re going. On the other side, I see workflows for programming, [with] model interfacing and mapping. [Often] you think of your favorite DSL; it’s just so elegant and so mathematical. But it has to talk to other pieces of things and how do you make it do that? How do you interoperate? [L]arge HPC workflows have embodied some of those ideas of being able to interface with [DSLs],” she said.

A rich discussion followed the introductory comments and the SC21 video was still posted as of this writing and accessible by SC21 registrants.

Topics: Applications, Developer Tools, People, Research

Sectors: Academia & Research, Government

Tags: domain-specific languages, HPC, HPC programming, Intel, LBNL, LLNL, MIT CSAIl

In This Club, You Must “Earn the Exa”

October 17, 2024

There have been some recent press releases and headlines with the phrase "AI Exascale" in them. Other than flaunting the word exascale or even zettascale, these stories do not provide enough information to justify using Read more…

Research Insights, HPC Expertise, Meaningful Collaborations Abound at TACCSTER 2024

October 17, 2024

It's a wrap! The Texas Advanced Computing Center (TACC) at UT Austin welcomed more than 100 participants for the 7th annual TACC Symposium for Texas Researchers (TACCSTER). The event exists to serve TACC's user community Read more…

Nvidia’s Blackwell Platform Powers AI Progress in Open Compute Project

October 16, 2024

Nvidia announced it has contributed foundational elements of its Blackwell accelerated computing platform design to the Open Compute Project (OCP). Shared at the OCP Global Summit in San Jose today, Nvidia said that key Read more…

On Paper, AMD’s New MI355X Makes MI325X Look Pedestrian

October 15, 2024

Advanced Micro Devices has detailed two new GPUs that unambiguously reinforce it as the only legitimate GPU alternative to Nvidia. AMD shared new facts on its next-generation GPU MI355X, based on CDNA4 architecture. The Read more…

Like Nvidia, Google’s Moat Draws Interest from DOJ

October 14, 2024

A "moat" is a common term associated with Nvidia and its proprietary products that lock customers into their hardware and software. Another moat breakdown should have them concerned. The U.S. Department of Justice is Read more…

Recipe for Scaling: ARQUIN Framework for Simulating a Distributed Quantum Computing System

October 14, 2024

One of the most difficult problems with quantum computing relates to increasing the size of the quantum computer. Researchers globally are seeking to solve this “challenge of scale.” To bring quantum scaling closer Read more…

In This Club, You Must “Earn the Exa”

October 17, 2024

There have been some recent press releases and headlines with the phrase "AI Exascale" in them. Other than flaunting the word exascale or even zettascale, these Read more…

Research Insights, HPC Expertise, Meaningful Collaborations Abound at TACCSTER 2024

October 17, 2024

It's a wrap! The Texas Advanced Computing Center (TACC) at UT Austin welcomed more than 100 participants for the 7th annual TACC Symposium for Texas Researchers Read more…

Nvidia’s Blackwell Platform Powers AI Progress in Open Compute Project

October 16, 2024

Nvidia announced it has contributed foundational elements of its Blackwell accelerated computing platform design to the Open Compute Project (OCP). Shared at th Read more…

On Paper, AMD’s New MI355X Makes MI325X Look Pedestrian

October 15, 2024

Advanced Micro Devices has detailed two new GPUs that unambiguously reinforce it as the only legitimate GPU alternative to Nvidia. AMD shared new facts on its n Read more…

Nvidia Is Increasingly the Secret Sauce in AI Deployments, But You Still Need Experience

October 14, 2024

I’ve been through a number of briefings from different vendors from IBM to HP, and there is one constant: they are all leaning heavily on Nvidia for their AI Read more…

NSF Grants $107,600 to English Professors to Research Aurora Supercomputer

October 9, 2024

The National Science Foundation has granted $107,600 to English professors at US universities to unearth the mysteries of the Aurora supercomputer. The two-year Read more…

VAST Looks Inward, Outward for An AI Edge

October 9, 2024

There’s no single best way to respond to the explosion of data and AI. Sometimes you need to bring everything into your own unified platform. Other times, you Read more…

Google Reports Progress on Quantum Devices beyond Supercomputer Capability

October 9, 2024

A Google-led team of researchers has presented more evidence that it’s possible to run productive circuits on today’s near-term intermediate scale quantum d Read more…

Intel’s Falcon Shores Future Looks Bleak as It Concedes AI Training to GPU Rivals

September 17, 2024

Intel's Falcon Shores future looks bleak as it concedes AI training to GPU rivals On Monday, Intel sent a letter to employees detailing its comeback plan after Read more…

Granite Rapids HPC Benchmarks: I’m Thinking Intel Is Back (Updated)

September 25, 2024

Waiting is the hardest part. In the fall of 2023, HPCwire wrote about the new diverging Xeon processor strategy from Intel. Instead of a on-size-fits all approa Read more…

Ansys Fluent® Adds AMD Instinct™ MI200 and MI300 Acceleration to Power CFD Simulations

September 23, 2024

Ansys Fluent® is well-known in the commercial computational fluid dynamics (CFD) space and is praised for its versatility as a general-purpose solver. Its impr Read more…

AMD Clears Up Messy GPU Roadmap, Upgrades Chips Annually

June 3, 2024

In the world of AI, there's a desperate search for an alternative to Nvidia's GPUs, and AMD is stepping up to the plate. AMD detailed its updated GPU roadmap, w Read more…

Nvidia Economics: Make $5-$7 for Every $1 Spent on GPUs

June 30, 2024

Nvidia is saying that companies could make $5 to $7 for every $1 invested in GPUs over a four-year period. Customers are investing billions in new Nvidia hardwa Read more…

Researchers Benchmark Nvidia’s GH200 Supercomputing Chips

September 4, 2024

Nvidia is putting its GH200 chips in European supercomputers, and researchers are getting their hands on those systems and releasing research papers with perfor Read more…

Nvidia Shipped 3.76 Million Data-center GPUs in 2023, According to Study

June 10, 2024

Nvidia had an explosive 2023 in data-center GPU shipments, which totaled roughly 3.76 million units, according to a study conducted by semiconductor analyst fir Read more…

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

October 30, 2023

With long lead times for the NVIDIA H100 and A100 GPUs, many organizations are looking at the new NVIDIA L40S GPU, which it’s a new GPU optimized for AI and g Read more…

xAI Colossus: The Elon Project

September 5, 2024

Elon Musk's xAI cluster, named Colossus (possibly after the 1970 movie about a massive computer that does not end well), has been brought online. Musk recently Read more…

IBM Develops New Quantum Benchmarking Tool — Benchpress

September 26, 2024

Benchmarking is an important topic in quantum computing. There’s consensus it’s needed but opinions vary widely on how to go about it. Last week, IBM introd Read more…

Intel Customizing Granite Rapids Server Chips for Nvidia GPUs

September 25, 2024

Intel is now customizing its latest Xeon 6 server chips for use with Nvidia's GPUs that dominate the AI landscape. The chipmaker's new Xeon 6 chips, also called Read more…

Quantum and AI: Navigating the Resource Challenge

September 18, 2024

Rapid advancements in quantum computing are bringing a new era of technological possibilities. However, as quantum technology progresses, there are growing conc Read more…

IonQ Plots Path to Commercial (Quantum) Advantage

July 2, 2024

IonQ, the trapped ion quantum computing specialist, delivered a progress report last week firming up 2024/25 product goals and reviewing its technology roadmap. Read more…

Google’s DataGemma Tackles AI Hallucination

September 18, 2024

The rapid evolution of large language models (LLMs) has fueled significant advancement in AI, enabling these systems to analyze text, generate summaries, sugges Read more…

US Implements Controls on Quantum Computing and other Technologies

September 27, 2024

Yesterday the Commerce Department announced export controls on quantum computing technologies as well as new controls for advanced semiconductors and additive Read more…

Microsoft, Quantinuum Use Hybrid Workflow to Simulate Catalyst

September 13, 2024

Microsoft and Quantinuum reported the ability to create 12 logical qubits on Quantinuum's H2 trapped ion system this week and also reported using two logical qu Read more…

Click Here for More Headlines

HPCwire is a registered trademark of Tabor Communications, Inc. Use of this site is governed by our Terms of Use and Privacy Policy.

Reproduction in whole or in part in any form or medium without express written permission of Tabor Communications, Inc. is prohibited.

Leading Solution Providers

Off The Wire

Industry Headlines

October 17, 2024

October 16, 2024

October 15, 2024

Subscribe to HPCwire's Weekly Update!

In This Club, You Must “Earn the Exa”

Research Insights, HPC Expertise, Meaningful Collaborations Abound at TACCSTER 2024

Nvidia’s Blackwell Platform Powers AI Progress in Open Compute Project

On Paper, AMD’s New MI355X Makes MI325X Look Pedestrian

Like Nvidia, Google’s Moat Draws Interest from DOJ

Recipe for Scaling: ARQUIN Framework for Simulating a Distributed Quantum Computing System

In This Club, You Must “Earn the Exa”

Research Insights, HPC Expertise, Meaningful Collaborations Abound at TACCSTER 2024

Nvidia’s Blackwell Platform Powers AI Progress in Open Compute Project

On Paper, AMD’s New MI355X Makes MI325X Look Pedestrian

Nvidia Is Increasingly the Secret Sauce in AI Deployments, But You Still Need Experience

NSF Grants $107,600 to English Professors to Research Aurora Supercomputer

VAST Looks Inward, Outward for An AI Edge

Google Reports Progress on Quantum Devices beyond Supercomputer Capability

Intel’s Falcon Shores Future Looks Bleak as It Concedes AI Training to GPU Rivals

Granite Rapids HPC Benchmarks: I’m Thinking Intel Is Back (Updated)

Ansys Fluent® Adds AMD Instinct™ MI200 and MI300 Acceleration to Power CFD Simulations

AMD Clears Up Messy GPU Roadmap, Upgrades Chips Annually

Nvidia Economics: Make $5-$7 for Every $1 Spent on GPUs

Researchers Benchmark Nvidia’s GH200 Supercomputing Chips

Nvidia Shipped 3.76 Million Data-center GPUs in 2023, According to Study

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

Leading Solution Providers

Contributors

Tiffany Trader

Editorial Director

Douglas Eadline

Managing Editor

John Russell

Senior Editor

Jamie Hampton

Contributing Editor

Kevin Jackson

Contributing Editor

Ali Azhar

Contributing Editor

Alex Woodie

Contributing Editor

Addison Snell

Contributing Editor

Drew Jolly

Assistant Editor

xAI Colossus: The Elon Project

IBM Develops New Quantum Benchmarking Tool — Benchpress

Intel Customizing Granite Rapids Server Chips for Nvidia GPUs

Quantum and AI: Navigating the Resource Challenge

IonQ Plots Path to Commercial (Quantum) Advantage

Google’s DataGemma Tackles AI Hallucination

US Implements Controls on Quantum Computing and other Technologies

Microsoft, Quantinuum Use Hybrid Workflow to Simulate Catalyst

The Information Nexus of Advanced Computing and Data systems for a High Performance World

Share

Copy short link