Feb. 14, 2024 — Five students from the University of Illinois Urbana-Champaign (UIUC) took part in a semester-long research experience program to gain practical knowledge on data analytics and statistical research using common computing HPC resources.
Daniel Ries, a Principal Data Scientist at Sandia National Laboratories, led the project, teaching undergraduates in statistics and computer science the skills they will need once entering the workforce.
According to Ries, the intent behind this research experience was multifaceted—he wanted to utilize a real-world research problem to teach students how to scope, execute and refine, and draw conclusions, apply knowledge gained from coursework to actual research, and—importantly—create reproducible research on a common computing platform. By obtaining access to the Anvil supercomputer, the students were able to accomplish all of this and more.
“Getting the students on Anvil was not only a benefit in terms of reproducibility, but in terms of what these students will be doing when they either go to grad school or get a full-time job in the data science world,” said Ries. “Most of the work that’s done at a company, at a research lab, in academia—the computing is done on servers. You don’t do computing on your own laptop or your own computer anymore. Just given the scale of models, the scale of data, it’s very common to have to get used to working in a server environment, a Linux environment, things like that, and I don’t think actually any of the students had experience with that. So it actually turned out to be a very good experience for them.”
Learning how to work on a common computing platform such as Anvil may have been initially daunting, but the students quickly proved they were up to the task.
“For the first couple weeks, it might have been a headache for the students,” said Ries, “But it’s the price they pay to actually learn how to do this and something they’re probably going to be doing at a real job later. They’ll be able to say, ‘Hey, I’m maybe not an expert in Linux scripting, but I can work my way around a server.’”
Not only did the students get hands-on HPC experience in an actual research application, but the research itself had practical implications. The group set their sights on a mode of predictive modeling known as “nowcasting.” With nowcasting, a research team is looking to predict weather conditions in the near future based on conditions in the very recent past.
For part of this research project, Ries wanted the undergraduate students to build a predictive model that could determine where lightning would strike in the next 15 to 60 minutes—a type of nowcasting that is immensely useful across multiple sectors. Lightning is dangerous. Being able to forecast where it will strike in real-time has safety implications (recreational, construction, power line workers, etc.) as well as economic implications—where forest fires are likely to start, crop damage, house/building damage, etc. This is precisely the type of work that Ries does in his role at Sandia National Laboratories, which highlights just how important an opportunity like this can be for the students.
“I thought that problem was very interesting, and so I was able to kind of tie it to a problem that the undergraduates could actually work on,” said Ries. “And, so, what they were doing—some of those results—I’ll very much be able to actually use them to further some of the work that I’m doing. So it wasn’t just work, you know, to create a flashy poster.”
The five students worked on three different models, focusing their efforts on the upper Midwest region. Using data collected from multiple sources—lightning information from the National Lightning Detection Network (NLDN) and remote sensing data from the GOES-16 Advanced Baseline Imager—the students were able to develop two traditional statistical models and a third, U-Net deep learning model.
The two traditional models, while typically not memory or computationally expensive, benefitted from the use of Anvil due to the sheer size of the data sets. And the U-Net model was trained on the Anvil GPUs, saving the team an enormous amount of time (30-60 minutes per training run versus a day or more without). By the end of the semester, the students successfully developed all three models.
“[The students] definitely created models that had the ability to predict,” said Ries. “I was definitely impressed with what they were able to do. In particular, both of the statistical models that they kind of approached ended up being actually more sophisticated than I anticipated at the beginning of the semester.”
Ries continued: “In terms of the U-Net deep learning model, that was actually very much a long shot. It was something that I had read about—these are being implemented by other research institutes and even commercial entities to try to nowcast lightning and other weather phenomenon. And I had never tried them, so I said, ‘Well, let’s see what these undergraduates can do.’ And they largely took it upon themselves. Other than me explaining it at a high level, they kind of did everything from there, so I was really impressed that they were able to get it working. They were able to transfer U-Net to use in different applications—the code and the data formats; everything—and bring it over to the application we were doing with lightning and the types of data we were working with. They did all of that, and I was really impressed they were able to do that in such a short amount of time.”
Throughout the semester, the students learned the importance of consistency and reproducibility in research and how to take advantage of high-performance computing. They also gained experience presenting research—the five undergraduates participated in a research symposium at U of I, and showcased their work in a smaller presentation to the U of I Statistics Department.
Overall, the research experience was a tremendous success, greatly benefiting the students, and Ries was thrilled with the result. He hopes to continue this type of research experience in the future, and make it a two-semester program instead of just one.
For more information on the types of research conducted at Sandia, please visit the Sandia National Laboratories website.
To learn more about HPC and how it can help you, please visit the “Why HPC?” page.
Source: Purdue University Rosen Center for Advanced Computing