STEM-Trek NRG@SC23 Workshop: Inspiring and Enlightening!

By Elizabeth Leake, STEM-Trek Nonprofit

December 18, 2023

STEM-Trek, a nonprofit that supports scholarly travel, mentoring, and advanced skills training for science, technology, engineering, and mathematics, hosted a pre-conference workshop ahead of the annual Supercomputing Conference, SC23, in Denver, Colorado, on November 9-11, 2023. This year’s workshop, NRG@SC23, showcased a range of energy-related research projects that are developing best practices, toolkits, and novel strategies to tackle global grand challenges. Applications were especially encouraged from U.S. National Science Foundation (NSF) ACCESS Campus Champions from NSF EPSCoR (Established Program to Stimulate Competitive Research) states and territories and demographics that are underrepresented in Research Computing and Data science (RCD) academics and careers. 

STEM-Trek received an HPCwire Workforce Diversity & Inclusion Leadership Award in 2022.

The pre-conference experience is a great way for RCD professionals to meet colleagues worldwide who serve in similar roles. They coalesce as a cohort before entering the full conference, which, due to its size (over 14K attendees), can be overwhelming for newcomers. Their affinity garment (a red jacket this year) helps them find each other in the crowd. 

NRG@SC23 was the fifth such workshop sponsored by STEM-Trek. The first, held before SC15 in Austin, Texas, in collaboration with Executive Director Dan Stanzione at the Texas Advanced Computing Center (TACC), was for systems administrators and research facilitators. The second: “HPC On Open Ground,” or OCG@SC16, in collaboration with Dana Brunson (then Oklahoma State U, now Internet2) and Henry Neeman (U-Oklahoma), focused on food security science. URISC@SC17 provided training on high-speed networks and best practices associated with cybersecurity in coordination with Von Welch, then Principal Investigator (PI) of the NSF Trusted-CI program at Indiana University. COVID-19 forced everything to be online; STEM-Trek hosted two “ScienceSlam” remote competitions in 2020/21. In 2022, we held the first in-person workshop in five years: EarthSci@SC22. 

A “jetlag” day for NRG@SC23 international delegates helped them adjust to time zone and elevation changes. We visited the National Center for Atmospheric Research (NCAR) in Boulder upon the invitation of Daniel Howard (NCAR HPC Systems Engineer) and Wenfu Tang (NCAR Project Scientist). Dr. Tang arranged six amazing talks by NCAR scientists covering HPC-enabled research in agriculture, hydrology, air quality, urbanization, and early-warning systems in Africa. 

November 9th field trip to NCAR on the international delegates’ jetlag day, with U.S. hosts from NCAR, STEM-Trek, UMass-Boston, Clemson U, and UHouston-Clear Lake. (Source STEM-Trek)

 

NSF funding was historically available for U.S. delegates who attended our workshops, and STEM-Trek finds financial support for international participation. Again, this year, Google came through; Amazon Web Services and VAST Data sponsored meals. SC23 General Chair Dorian Arnold donated technical program registrations (with workshops and tutorials for 18 delegates), the room for our meeting, and audio-visual support. 

Thirty-eight attended NRG@SC23 – our largest cohort to date. Twenty-seven percent were female, also a record. Funding was tighter than ever – our NSF proposal fell through, but some NSF funds from a 2022 grant submitted with Boise State University, which supported three Idahoans, were rolled over. In October (after tickets were purchased), the South African government enacted cost containment measures that limited the number who could attend a single conference. Tech companies could only sponsor about half of what they previously donated. Even so, nine were supported (partially or fully) by their institutions, and one was entirely self-funded. It was gratifying that they found the pre-conference experience valuable–enough to justify the time and expense to attend – even in a tight budget year. We’ve always emphasized the importance of self-advocacy, and it must be working!  

NRG@SC23 Cohort by Occupation (Source: STEM-Trek) Click for larger image.

 

About half of the delegates were from the African HPC Ecosystems project led by the South African Centre for HPC (CHPC). This year, others from Nepal and Germany joined. Here’s the breakdown by region: U.S. (25 from 12 states; 6 EPSCoR, three female U.S. Air Force Academy Cadets (future Space Force?), and 18 ACCESS Campus Champions); South Africa (8 from three provinces); Mozambique (2 from their research and education network, MoRENet); Botswana (1); Nepal (1); and Germany (1).

Participant blogs and the full workshop agenda are available on the STEM-Trek website.

More, faster horses!!!

STEM-Trek Director and Founder Elizabeth Leake opened the workshop with a quote by Henry Ford, who said that if he had asked customers what they wanted, they would have said, “faster horses.” She added that the practice of adding more acceleration to high-performance computing (HPC) isn’t sustainable from energy and cost perspectives – it’s time to redirect.  

The IEEE Floating Point standard was registered in 1985. Since then, Moore’s Law ensured that we doubled power every two years until the data deluge broke it. When the pandemic affected supply chains, countries that didn’t make their own chips were disadvantaged – everyone needed technology for education, healthcare, commerce, and more. In response, the U.S. CHIPS and Science Act of 2022 provided $52.7B in research and development (R&D) funding, which influenced U.S. agency program offerings aimed at improving U.S. competitiveness in chip production and supply-chain resilience. The E.U. passed a comparable act that year allocating €43B from the public investment. Since then, there has been an estimated $200B investment in the private sector. As the U.S. renewed its commitment to R&D, the Whitehouse proposed historic increases for all science-serving agencies totaling $2B over the next decade (NSF, National Aeronautics and Space Administration/NASA, National Institute of Standards and Technology/NIST, Department of Energy, and others).

With some overlap, the data center construction industry is projected to exceed $400B by 2032. In the future, we will likely see small modular reactor innovation (SMR) powering data centers, which are traditionally built in regions with cheap real estate and an abundance of alternative energy. Many who attend our workshops live and work in such places. Idaho National Laboratory leads R&D in SMR nuclear innovation (e.g., Natrium reactors). We tried to engage SMR innovators for our workshop, but most are held to non-disclosure agreements as they pursue Nuclear Regulatory Commission approval – environmental review and licensing can take years. Meanwhile, keep an eye on Bill Gates, Kemmerer, Wyoming and TerraPower

More money – unintended consequences

The unprecedented public/private investment in R&D has unintended consequences for academia. University RCD talent that historically trained the workforce is being siphoned away at an alarming rate. Continuing a trend that began in 2018, more big RCD employers waived degree requirements in 2022 to attract candidates. From 2020 to 2022, many students dropped out of or failed to enter universities, opting instead to pursue careers – unwilling to accrue student debt for a virtual experience. Big tech reduced their footprint during COVID, so many are no longer wrangling huge office facilities. They’re building manufacturing facilities with processes that are driven by artificial intelligence (AI) that require less manual labor. Many can now offer skilled RCD talent full remote employment (for some RCD roles, not all). Remote is attractive to those who grew accustomed to it during the pandemic while reducing their carbon footprint. Commercial entities can offer higher salaries than academia, especially public schools, whose wages are governed by rigid state policies. Remote attendance is difficult for many universities to defend. Iconic architecture and the social experience gained through the traditional post-secondary experience built a strong destination-reliant legacy, although hybrid is now more common and likely here to stay. 

The HPCwire Job Bank trendline, captured via “Wayback Machine Internet Archive,” reflects a 2022 spike which can be attributed to recent changes (Accessed Nov. 26, 2023).

Open-source culture may be affected

Historically, most research funding has been sponsored directly by public investment or philanthropies in the form of grants to P.I.s, which helped foster a rich open-source culture. Intellectual property derived from commercial R&D is protected. The shift toward more private than public spending could impact those from resource-constrained colleges and universities who can’t afford to purchase licenses. Less data could be FAIR (Findable, Accessible, Interoperable, and Reusable). It’s important that the communities of practice that rely on open-source innovation continue to serve as advocates. To illuminate the importance of preserving the open-source culture, Alex Scammon (Head of Open-Source Development at G-Research) was invited to speak at NRG@SC23. 

Maybe we need unicorns instead of horses?

With greater emphasis on generative AI and quantum computing, workflows and information are growing in scale and complexity. Data must be managed more judiciously; software, hardware, and networks are being customized to achieve the highest precision, move quicker, and with the fewest bits – often at the edge. Custom computing environments, some employing exotic math, are designed to require much less (and much different) storage that consumes less energy. To address these demands, software, new interconnects, storage innovation, and methods of securing data from end-to-end are being developed. 

IBM released its quantum roadmap on Dec. 4, 2023, which has them on track for full potential – hardware, theory, and software – in 2033. Unlike classic computers that, with increased computational capacity, use a corresponding amount of energy, quantum, due to the physics involved, requires significantly less power, with the potential to offer far superior results. 

Quantum hasn’t escaped the Colorado School of Mines in Golden, Colorado. At NRG@SC23, Mines Grad Student Sean Feeney presented on quantum software, and Liwen Shih (UHouston-Clear Lake/visiting research faculty at Oak Ridge National Laboratory) emphasized the importance of quantum literacy. Showcasing a computational challenge for the National Renewable Energy Laboratory (NREL), also based in Golden, Judith Vidal (NREL with a Mines joint appointment) presented her work with the NREL Thermal Storage Materials Laboratory, which measures the full range of thermophysical material properties and material degradation evaluations – challenges that quantum technologies will greatly enhance in the future. 

Processes that are designed for energy efficiency tend to be quieter – a quality that is essential for the Square Kilometer Array (SKA) and other radio astronomy instrumentation being installed in radio-quiet regions of South Africa and Australia. When it’s operational, the SKA will share data (an estimated 11 exabytes daily) with institutions around the globe, including 19 ground-based telescopes supported by the NSF. Such globally distributed data challenges were among the science drivers that justified the NSF’s investment in the South Atlantic Cable System (SACS) via the Americas-Africa Lightpaths Express and Protect (AmLight-Exp) project based at Florida International University’s Center for Internet Augmented Research and Assessment (CIARA). SACS delivers 100G end-to-end connectivity to three continents. AmLight’s Vasilka Chergarova presented at NRG@SC23 about relationships they’re fostering in the U.S., pan-Africa, and Brazil. 

Composable computing, which underpins instruction-set architecture, is increasingly common. It’s no longer necessary to amass a commercial case for the production of a single-chip design (new fabrication facilities being built with the CHIPS investment will likely feature composable hardware and interconnects). Many exist in commercial clouds where industrial users can pay as they go. Amazon Web Services (AWS), for example, features 275 EC2 HPC instances with varied architecture, memory, bandwidth requirements, etc. AWS Graviton, a series of 64-bit ARM-based CPUs (designed by AWS subsidiary Annapurna Labs), launched in late 2018, released Graviton 4 on Nov. 28, 2023 (four versions in five years!).

Commercial cloud’s diversity is appealing but difficult for universities to adopt with a grant-funded financial model. Industry partners who can pay on demand are now more important, and they can also sponsor internships and workshops, like NRG@SC23. While a university could never afford to host as many HPC instances as AWS, NSF ACCESS (Advanced Cyberinfrastructure Coordination Ecosystem: Services & Support) features a diverse portfolio of options that are available to U.S. researchers and their collaborators at no cost. Its composable system, ACES (Accelerating Computing for Emerging Sciences) at Texas A&M University, features CPU, GPU, (Graphcore) IPU, and Field Programmable Gate Array (FPGA) nodes that can be used to design novel instruction-set architecture. ACCESS PI Shelley Knuth presented an overview of the U.S. federated program, and ACES PI Honggao Liu (Texas A&M University) shared ACES highlights during the NRG@SC23 workshop. 

U.S. agencies are making a greater investment in commercial cloud. Academic RCD facilitators must, therefore, make an intentional effort to on-board constituents. Each agency cloud has slightly different rules of engagement and eligibility requirements. NSF, through the ACCESS program, features Cloud Bank. Additionally, the U.S. National Institutes of Health (NIH) hosts the STRIDES program, the Department of Defense (DOD) has the Joint Warfighter Cloud, and NASA underwrites the Earthdata Cloud. 

Because architecture and standards have remained somewhat stable for decades, it will be difficult for mid and late-career RCD professionals to keep up with the rapid changes; the Academy tends to move at a snail’s pace. RCD training models in the U.S. and pan-Africa have historically re-employed decommissioned clusters. But in today’s landscape, once it’s out of warranty, it will grow obsolete quickly and is more vulnerable to attack since it’s not patched as aggressively. That said, on-prem hardware is extremely useful for hands-on teaching of the basics to prepare future systems, cybersecurity, electrical, and network engineers. Recommissioned hardware is still useful if it’s carefully maintained. 

As for talent, intellectual curiosity, tenacity, and creativity are prized as exotic math and software-defined architecture make their debut. Students must be exposed to theoretical and quantum computing – the ability to think outside of the box is critically important if we hope to solve big problems in novel ways. They need access to modern resources, such as NSF ACES and the Isango platform, envisioned as a small, affordable, composable, and portable training sled under warranty and backed by its global community of support. 

Ideally, RCD pros dedicated to teaching and training will remain in academic roles to prepare the workforce. It’ll take a concerted effort to keep up – they should plan to allocate 20 percent of their time toward learning new skills. Adequate funding for scholarly pursuits must be budgeted (conference travel and professional association memberships). Universities must offer more vocational training and ally with industry partners to support internships. 

We appreciate all volunteers who helped with NRG@SC23 planning: Bryan Johnston (South African CHPC), Shannon Beck (USAF Academy), Daniel Howard (NCAR), Wenfu Tang (NCAR), and Kurt Keville (UMass-Boston); a special thanks to Kurt for driving the van to NCAR!

Thank you, sponsors! 

 

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industry updates delivered to you every week!

Four Steps to Ensure GenAI Safety and Ethics

June 27, 2024

With the deployment of generative artificial intelligence (GenAI) happening at a rapid pace, organizations of all sizes are tasked with navigating the challenges around implementation, especially regarding ethics and Read more…

AI-augmented HPC and the Inflation of Science and Technology

June 27, 2024

Everyone is aware of the inflationary model of the early universe in which the volume of space expands exponentially then slows down. AI-augmented HPC (AHPC for short) has started to expand creating new space in the scie Read more…

Top Three Pitfalls to Avoid When Processing Data with LLMs

June 26, 2024

It’s a truism of data analytics: when it comes to data, more is generally better. But the explosion of AI-powered large language models (LLMs) like ChatGPT and Google Gemini (formerly Bard) challenges this conventional Read more…

Summer Reading: DARPA Showcases Quantum Benchmarking Progress

June 25, 2024

Last week, the Defense Advanced Research Projects Agency (DARPA) issued an interim progress update from the second phase of its Quantum Benchmark (QB) program. Begun in 2021 the QB effort has the ambitious “goal of rei Read more…

What We Know about Alice Recoque, Europe’s Second Exascale System

June 24, 2024

Europe officially announced its second exascale system, Alice Recoque, and you can expect to see that name on the Top500 supercomputer list in a few years. Alice Recoque is the new name for a supercomputer with the opera Read more…

Spelunking the HPC and AI GPU Software Stacks

June 21, 2024

As AI continues to reach into every domain of life, the question remains as to what kind of software these tools will run on. The choice in software stacks – or collections of software components that work together to Read more…

Shutterstock 2338659951

AI-augmented HPC and the Inflation of Science and Technology

June 27, 2024

Everyone is aware of the inflationary model of the early universe in which the volume of space expands exponentially then slows down. AI-augmented HPC (AHPC for Read more…

Summer Reading: DARPA Showcases Quantum Benchmarking Progress

June 25, 2024

Last week, the Defense Advanced Research Projects Agency (DARPA) issued an interim progress update from the second phase of its Quantum Benchmark (QB) program. Read more…

Spelunking the HPC and AI GPU Software Stacks

June 21, 2024

As AI continues to reach into every domain of life, the question remains as to what kind of software these tools will run on. The choice in software stacks – Read more…

HPE and NVIDIA Join Forces and Plan Conquest of Enterprise AI Frontier

June 20, 2024

The HPE Discover 2024 conference is currently in full swing, and the keynote address from Hewlett-Packard Enterprise (HPE) CEO Antonio Neri on Tuesday, June 18, Read more…

Slide Shows Samsung May be Developing a RISC-V CPU for In-memory AI Chip

June 19, 2024

Samsung may have unintentionally revealed its intent to develop a RISC-V CPU, which a presentation slide showed may be used in an AI chip. The company plans to Read more…

Qubits 2024: D-Wave’s Steady March to Quantum Success

June 18, 2024

In his opening keynote at D-Wave’s annual Qubits 2024 user meeting, being held in Boston, yesterday and today, CEO Alan Baratz again made the compelling pitch Read more…

Shutterstock_666139696

Argonne’s Rick Stevens on Energy, AI, and a New Kind of Science

June 17, 2024

The world is currently experiencing two of the largest societal upheavals since the beginning of the Industrial Revolution. One is the rapid improvement and imp Read more…

Under The Wire: Nearly HPC News (June 13, 2024)

June 13, 2024

As managing editor of the major global HPC news source, the term "news fire hose" is often mentioned. The analogy is quite correct. In any given week, there are Read more…

Atos Outlines Plans to Get Acquired, and a Path Forward

May 21, 2024

Atos – via its subsidiary Eviden – is the second major supercomputer maker outside of HPE, while others have largely dropped out. The lack of integrators and Atos' financial turmoil have the HPC market worried. If Atos goes under, HPE will be the only major option for building large-scale systems. Read more…

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

October 30, 2023

With long lead times for the NVIDIA H100 and A100 GPUs, many organizations are looking at the new NVIDIA L40S GPU, which it’s a new GPU optimized for AI and g Read more…

Everyone Except Nvidia Forms Ultra Accelerator Link (UALink) Consortium

May 30, 2024

Consider the GPU. An island of SIMD greatness that makes light work of matrix math. Originally designed to rapidly paint dots on a computer monitor, it was then Read more…

Nvidia H100: Are 550,000 GPUs Enough for This Year?

August 17, 2023

The GPU Squeeze continues to place a premium on Nvidia H100 GPUs. In a recent Financial Times article, Nvidia reports that it expects to ship 550,000 of its lat Read more…

Nvidia’s New Blackwell GPU Can Train AI Models with Trillions of Parameters

March 18, 2024

Nvidia's latest and fastest GPU, codenamed Blackwell, is here and will underpin the company's AI plans this year. The chip offers performance improvements from Read more…

Choosing the Right GPU for LLM Inference and Training

December 11, 2023

Accelerating the training and inference processes of deep learning models is crucial for unleashing their true potential and NVIDIA GPUs have emerged as a game- Read more…

Some Reasons Why Aurora Didn’t Take First Place in the Top500 List

May 15, 2024

The makers of the Aurora supercomputer, which is housed at the Argonne National Laboratory, gave some reasons why the system didn't make the top spot on the Top Read more…

Synopsys Eats Ansys: Does HPC Get Indigestion?

February 8, 2024

Recently, it was announced that Synopsys is buying HPC tool developer Ansys. Started in Pittsburgh, Pa., in 1970 as Swanson Analysis Systems, Inc. (SASI) by John Swanson (and eventually renamed), Ansys serves the CAE (Computer Aided Engineering)/multiphysics engineering simulation market. Read more…

Leading Solution Providers

Contributors

Nvidia Shipped 3.76 Million Data-center GPUs in 2023, According to Study

June 10, 2024

Nvidia had an explosive 2023 in data-center GPU shipments, which totaled roughly 3.76 million units, according to a study conducted by semiconductor analyst fir Read more…

Google Announces Sixth-generation AI Chip, a TPU Called Trillium

May 17, 2024

On Tuesday May 14th, Google announced its sixth-generation TPU (tensor processing unit) called Trillium.  The chip, essentially a TPU v6, is the company's l Read more…

Intel’s Next-gen Falcon Shores Coming Out in Late 2025 

April 30, 2024

It's a long wait for customers hanging on for Intel's next-generation GPU, Falcon Shores, which will be released in late 2025.  "Then we have a rich, a very Read more…

AMD Clears Up Messy GPU Roadmap, Upgrades Chips Annually

June 3, 2024

In the world of AI, there's a desperate search for an alternative to Nvidia's GPUs, and AMD is stepping up to the plate. AMD detailed its updated GPU roadmap, w Read more…

The NASA Black Hole Plunge

May 7, 2024

We have all thought about it. No one has done it, but now, thanks to HPC, we see what it looks like. Hold on to your feet because NASA has released videos of wh Read more…

Q&A with Nvidia’s Chief of DGX Systems on the DGX-GB200 Rack-scale System

March 27, 2024

Pictures of Nvidia's new flagship mega-server, the DGX GB200, on the GTC show floor got favorable reactions on social media for the sheer amount of computing po Read more…

AMD MI3000A

How AMD May Get Across the CUDA Moat

October 5, 2023

When discussing GenAI, the term "GPU" almost always enters the conversation and the topic often moves toward performance and access. Interestingly, the word "GPU" is assumed to mean "Nvidia" products. (As an aside, the popular Nvidia hardware used in GenAI are not technically... Read more…

How the Chip Industry is Helping a Battery Company

May 8, 2024

Chip companies, once seen as engineering pure plays, are now at the center of geopolitical intrigue. Chip manufacturing firms, especially TSMC and Intel, have b Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire