Cerebras Announces New Capability for Training NLP Models

August 31, 2022

SUNNYVALE, Calif., Aug. 31, 2022 — Cerebras Systems, a pioneer in accelerating artificial intelligence (AI) compute, released yet another industry-first capability today. Customers can now rapidly train Transformer-style natural language AI models with 20x longer sequences than is possible using traditional computer hardware. This new capability is expected to lead to breakthroughs in natural language processing (NLP). By providing vastly more context to the understanding of a given word, phrase or strand of DNA, the long sequence length capability enables NLP models a much finer-grained understanding and better predictive accuracy.

“Earlier this year, the Cerebras CS-2 set the record for training the largest natural language processing (NLP) models of up to 20 billion parameters on a single device,” said Andrew Feldman, CEO and co-founder of Cerebras Systems. “We are now enabling our customers to train with longer sequences on the largest NLP models. This provides previously unobtainable accuracy, unlocking a new world of innovation and possibilities across AI and deep learning.”

Language is context specific. This is why translating word by word with a dictionary fails —without context, the meaning of words is often vague. In language, a word is best understood in the context of the surrounding words, which provide guides to understand the meaning. This is true in AI as well. Long sequence lengths enable an NLP model to understand a given word, within a larger and broader context.

Imagine hearing the expression “To be or not to be” without context, just using a dictionary. And then imagine understanding it within the context of Act II, Scene 1 of Hamlet. And then imagine if you had broader context and could understand it within the context of the entire play – or better yet, within the context of all Shakespearian literature. As the context within which understanding occurs is broadened, so too is the precision of the understanding. By vastly enlarging the context (the sequence of words within which the target word is understood), Cerebras enables NLP models to demonstrate a more sophisticated understanding of language. Bigger and more sophisticated context improves the accuracy of understanding in AI.

While many industries will benefit from this new capability, Cerebras’ pharmaceutical and life sciences customers are particularly excited about the implications for their drug discovery efforts. DNA is the language of life, and the analysis of DNA has been a particularly powerful application of large language models.

“Machine learning at GSK involves taking complex datasets generated at scale and answering very challenging biological questions,” said Kim Branson, senior vice president and global head of AI and Machine Learning at GSK. “The long sequence length capability enables us to examine a particular gene in the context of tens of thousands of surrounding genes. We know that surrounding genes have an impact on gene expression, but we have never before been able explore this within AI.”

The proliferation of NLP has been propelled by the exceptional performance of Transformer-style networks such as BERT and GPT. However, these models are extremely computationally intensive. Even when trained on massive clusters of graphics processing units (GPUs), today these models can only process sequences up to about 2,500 tokens in length. Tokens might be words in a document, amino acids in a protein, or base pairs on a chromosome. But an eight-page document could easily exceed 8,000 words, which means that an AI model attempting to summarize a long document would lack a full understanding of the subject matter. The unique Cerebras wafer-scale architecture overcomes this fundamental limitation and enables sequences up to a heretofore impossible 50,000 tokens in length.

This innovation unlocks previously unexplored frontiers of deep learning. Even within traditional language processing, there are many examples of tasks in which this type of extended context matters. Recent work has shown that for tasks such as evaluating intensive care unit patient discharge data and analyzing legal documents, seeing the entire document matters for understanding. These documents can be tens of thousands of words long. The potential applications beyond language are even more exciting. For example, research has shown that protein structures are highly dependent on long-range interactions between building blocks, and training models with longer sequence lengths is likely to yield better results. Now that the Cerebras CS-2 system makes long sequence training not only possible, but easy, researchers are sure to uncover many more applications and solve problems previously thought to be intractable.

Training large models with massive data sets and long sequence lengths is an area that the Cerebras CS-2 system, powered by the Wafer-Scale Engine (WSE-2), excels. The WSE-2 is the largest processor ever built. It is 56 times larger, has 2.55 trillion more transistors, and has 100 times as many compute cores as the largest GPU. This scale means that the WSE-2 has both the memory to hold computations for the largest layers for the largest models, and the computational power to process such huge computations quickly. In contrast, similar workloads on GPUs have to be parallelized across hundreds or thousands of nodes to train a model in a reasonable amount of time. This type of GPU infrastructure requires specialized expertise and valuable engineering time to set up. Meanwhile, the Cerebras CS-2 system can perform similar workloads with the push of a button, removing the complexity while accelerating time to insight.

For more information on Cerebras Systems, please visit the Cerebras blog.

About Cerebras Systems

Cerebras Systems is a team of pioneering computer architects, computer scientists, deep learning researchers, and engineers of all types. We have come together to build a new class of computer system, designed for the singular purpose of accelerating AI and changing the future of AI work forever. Our flagship product, the CS-2 system, which is powered by the world’s largest processor – the 850,000 core Cerebras WSE-2, enables customers to accelerate their deep learning work by orders of magnitude over graphics processing units.


Source: Cerebras Systems

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industry updates delivered to you every week!

Four Steps to Ensure GenAI Safety and Ethics

June 27, 2024

With the deployment of generative artificial intelligence (GenAI) happening at a rapid pace, organizations of all sizes are tasked with navigating the challenges around implementation, especially regarding ethics and Read more…

AI-augmented HPC and the Inflation of Science and Technology

June 27, 2024

Everyone is aware of the inflationary model of the early universe in which the volume of space expands exponentially then slows down. AI-augmented HPC (AHPC for short) has started to expand creating new space in the scie Read more…

Top Three Pitfalls to Avoid When Processing Data with LLMs

June 26, 2024

It’s a truism of data analytics: when it comes to data, more is generally better. But the explosion of AI-powered large language models (LLMs) like ChatGPT and Google Gemini (formerly Bard) challenges this conventional Read more…

Summer Reading: DARPA Showcases Quantum Benchmarking Progress

June 25, 2024

Last week, the Defense Advanced Research Projects Agency (DARPA) issued an interim progress update from the second phase of its Quantum Benchmark (QB) program. Begun in 2021 the QB effort has the ambitious “goal of rei Read more…

What We Know about Alice Recoque, Europe’s Second Exascale System

June 24, 2024

Europe officially announced its second exascale system, Alice Recoque, and you can expect to see that name on the Top500 supercomputer list in a few years. Alice Recoque is the new name for a supercomputer with the opera Read more…

Spelunking the HPC and AI GPU Software Stacks

June 21, 2024

As AI continues to reach into every domain of life, the question remains as to what kind of software these tools will run on. The choice in software stacks – or collections of software components that work together to Read more…

Shutterstock 2338659951

AI-augmented HPC and the Inflation of Science and Technology

June 27, 2024

Everyone is aware of the inflationary model of the early universe in which the volume of space expands exponentially then slows down. AI-augmented HPC (AHPC for Read more…

Summer Reading: DARPA Showcases Quantum Benchmarking Progress

June 25, 2024

Last week, the Defense Advanced Research Projects Agency (DARPA) issued an interim progress update from the second phase of its Quantum Benchmark (QB) program. Read more…

Spelunking the HPC and AI GPU Software Stacks

June 21, 2024

As AI continues to reach into every domain of life, the question remains as to what kind of software these tools will run on. The choice in software stacks – Read more…

HPE and NVIDIA Join Forces and Plan Conquest of Enterprise AI Frontier

June 20, 2024

The HPE Discover 2024 conference is currently in full swing, and the keynote address from Hewlett-Packard Enterprise (HPE) CEO Antonio Neri on Tuesday, June 18, Read more…

Slide Shows Samsung May be Developing a RISC-V CPU for In-memory AI Chip

June 19, 2024

Samsung may have unintentionally revealed its intent to develop a RISC-V CPU, which a presentation slide showed may be used in an AI chip. The company plans to Read more…

Qubits 2024: D-Wave’s Steady March to Quantum Success

June 18, 2024

In his opening keynote at D-Wave’s annual Qubits 2024 user meeting, being held in Boston, yesterday and today, CEO Alan Baratz again made the compelling pitch Read more…

Shutterstock_666139696

Argonne’s Rick Stevens on Energy, AI, and a New Kind of Science

June 17, 2024

The world is currently experiencing two of the largest societal upheavals since the beginning of the Industrial Revolution. One is the rapid improvement and imp Read more…

Under The Wire: Nearly HPC News (June 13, 2024)

June 13, 2024

As managing editor of the major global HPC news source, the term "news fire hose" is often mentioned. The analogy is quite correct. In any given week, there are Read more…

Atos Outlines Plans to Get Acquired, and a Path Forward

May 21, 2024

Atos – via its subsidiary Eviden – is the second major supercomputer maker outside of HPE, while others have largely dropped out. The lack of integrators and Atos' financial turmoil have the HPC market worried. If Atos goes under, HPE will be the only major option for building large-scale systems. Read more…

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

October 30, 2023

With long lead times for the NVIDIA H100 and A100 GPUs, many organizations are looking at the new NVIDIA L40S GPU, which it’s a new GPU optimized for AI and g Read more…

Everyone Except Nvidia Forms Ultra Accelerator Link (UALink) Consortium

May 30, 2024

Consider the GPU. An island of SIMD greatness that makes light work of matrix math. Originally designed to rapidly paint dots on a computer monitor, it was then Read more…

Nvidia H100: Are 550,000 GPUs Enough for This Year?

August 17, 2023

The GPU Squeeze continues to place a premium on Nvidia H100 GPUs. In a recent Financial Times article, Nvidia reports that it expects to ship 550,000 of its lat Read more…

Nvidia’s New Blackwell GPU Can Train AI Models with Trillions of Parameters

March 18, 2024

Nvidia's latest and fastest GPU, codenamed Blackwell, is here and will underpin the company's AI plans this year. The chip offers performance improvements from Read more…

Some Reasons Why Aurora Didn’t Take First Place in the Top500 List

May 15, 2024

The makers of the Aurora supercomputer, which is housed at the Argonne National Laboratory, gave some reasons why the system didn't make the top spot on the Top Read more…

Choosing the Right GPU for LLM Inference and Training

December 11, 2023

Accelerating the training and inference processes of deep learning models is crucial for unleashing their true potential and NVIDIA GPUs have emerged as a game- Read more…

Nvidia Shipped 3.76 Million Data-center GPUs in 2023, According to Study

June 10, 2024

Nvidia had an explosive 2023 in data-center GPU shipments, which totaled roughly 3.76 million units, according to a study conducted by semiconductor analyst fir Read more…

Leading Solution Providers

Contributors

Synopsys Eats Ansys: Does HPC Get Indigestion?

February 8, 2024

Recently, it was announced that Synopsys is buying HPC tool developer Ansys. Started in Pittsburgh, Pa., in 1970 as Swanson Analysis Systems, Inc. (SASI) by John Swanson (and eventually renamed), Ansys serves the CAE (Computer Aided Engineering)/multiphysics engineering simulation market. Read more…

Google Announces Sixth-generation AI Chip, a TPU Called Trillium

May 17, 2024

On Tuesday May 14th, Google announced its sixth-generation TPU (tensor processing unit) called Trillium.  The chip, essentially a TPU v6, is the company's l Read more…

Intel’s Next-gen Falcon Shores Coming Out in Late 2025 

April 30, 2024

It's a long wait for customers hanging on for Intel's next-generation GPU, Falcon Shores, which will be released in late 2025.  "Then we have a rich, a very Read more…

AMD Clears Up Messy GPU Roadmap, Upgrades Chips Annually

June 3, 2024

In the world of AI, there's a desperate search for an alternative to Nvidia's GPUs, and AMD is stepping up to the plate. AMD detailed its updated GPU roadmap, w Read more…

The NASA Black Hole Plunge

May 7, 2024

We have all thought about it. No one has done it, but now, thanks to HPC, we see what it looks like. Hold on to your feet because NASA has released videos of wh Read more…

Q&A with Nvidia’s Chief of DGX Systems on the DGX-GB200 Rack-scale System

March 27, 2024

Pictures of Nvidia's new flagship mega-server, the DGX GB200, on the GTC show floor got favorable reactions on social media for the sheer amount of computing po Read more…

AMD MI3000A

How AMD May Get Across the CUDA Moat

October 5, 2023

When discussing GenAI, the term "GPU" almost always enters the conversation and the topic often moves toward performance and access. Interestingly, the word "GPU" is assumed to mean "Nvidia" products. (As an aside, the popular Nvidia hardware used in GenAI are not technically... Read more…

How the Chip Industry is Helping a Battery Company

May 8, 2024

Chip companies, once seen as engineering pure plays, are now at the center of geopolitical intrigue. Chip manufacturing firms, especially TSMC and Intel, have b Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire